1 Introduction

Although quantum mechanics has been extensively verified experimentally, it still faces challenges to answer many fundamental questions. For instance, is probability amplitude, or wavefunction, just a mathematical tool or associated with ontic physical property? What is the meaning of wavefunction collapse during measurement? Does quantum entanglement imply non-local causal connection among entangled systems? The last question has been the source of contentions in understanding the EPR thought experiment [2] and Bell inequality [3]. These questions motivate the next level of reformulation of quantum mechanics. With the advancements of quantum information and quantum computing [4, 5] in recent decades, physicists are searching for new foundational principles from the information perspective [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. Reformulating quantum mechanics based on information principles appears promising, and we will briefly review here some of the interesting results relevant to this paper.

Zeilinger [7, 8] suggests that a foundational principle for quantum mechanics is that an elementary system carries 1 bit of information. Such principle brings novel insight on entanglement for, say, a bipartite system. Because if the 2 bits of information is exhausted in specifying joint properties of the two subsystems, then nothing can be specified for the individual subsystem. However, the question on whether entanglement is due to non-local causal effect remains unanswered. Another result that has gained considerable popularity is the interpretation of the role of wavefunction in quantum mechanics. In the information based interpretations of quantum mechanics, such as Relational Quantum Mechanics [6], QBism [11], the wavefunction in the Schrödinger equation is just a mathematical tool to hold the state of knowledge about the quantum system. There is no ontological reality associated with the wavefunction itself. This view can resolve certain paradoxes such as the EPR experiment [38].

At the mathematical formulation level, a number of theories have been proposed to derive the Schrödinger equation from information based principles. There are two categories of such reformulations. The first category of reformulation is based on pure information-theoretic principles. A recent such example is provided by Höhn [30, 31], where a concrete quantum theory for a single qubit and N-qubit from elementary rules on an observer’s information acquisition is successfully constructed. The limitation of such a reconstruction is that the connection to classical mechanics is not clearly shown. It only shows that an unitary time evolution operator governs the Schrödinger equation. The concrete form of Hamiltonian in the Schrödinger equation cannot be derived. The second category is based on classical mechanics, then adds additional information based variables into the reformulation. Reginatto first shows that by adding a term related to Fisher information in the least action principle, the Schrödinger equation can be obtained [37]. Later the Fisher information term is derived based on a postulate of exact uncertainty relation [43]. Various approaches based on entropy extremization are also proposed to derive quantum mechanics. The entropic dynamics [34, 35] attempts to extract quantum mechanics as an application of the methods of inference from maximizing Shannon entropy. Another variation approach based on relative entropy is constructed to recover stochastic mechanics which in turn can lead to the Schrödinger equation [50]. The limitation for the entropy extremization approaches in [34, 35] and [50] is their dependency on the stochastic mechanics as underlying physical model [47], which suffers from the concerns of hidden variables such as osmotic velocity, and its difficulty to explain non-local behavior of multi-particle systems [48].

We are more interested in the second category of reformulation because of its advantage of providing a clear connection between classical mechanics and quantum mechanics. This allows one to understand where quantumness originates from an information perspective. The purpose of the present work is to continue such reformulations but at a more fundamental level in order to avoid the limitations described above. At the center of our investigation effort is the extended least action principle. We assume a quantum system experiences vacuum fluctuations constantly. If we want to apply the least action principle, the challenge is how to calculate the additional action due to the vacuum fluctuations besides the action for a classical trajectory. To solve the problem, we further assume that a quantum system must manifest a minimal amount of action effort that is determined by the Planck constant in order to be observable. The challenge is then converted into finding the proper information metrics to measure the observable information due to vacuum fluctuation. As the main contribution of this paper, a novel method is introduced to calculate this information metric, which enables the extension of the least action principle for a quantum system. The detailed physical motivations of the extended least action principle and its underlying assumptions are described in Sect. 2.

From the extended least action principle, a series of results are obtained. First, by recursively applying the extended least action principle in an infinitesimal time interval and an accumulated time interval, the uncertainty relation and the Schrödinger equation are recovered; Although similar results have been obtained in other research works [34, 35, 37, 43, 44], what is novel here is the simplicity and cleanness. There are no arbitrary constants or Lagrangian multipliers introduced, and no additional postulates needed. The same method can be applied in the momentum representation to obtain the Schrödinger equation in momentum representation for a free particle. Imposing a no preferred representation assumption results in the transformation theory between position and momentum representations. Second, we will show that variation of the information metrics for vacuum fluctuations gives the Bohm quantum potential. The vacuum fluctuations are assumed to be local so that for a bipartite system, the vacuum fluctuations for the two subsystems are independent from each other. However, the corresponding information metrics, and consequently the Bohm quantum potential, for the two subsystems are inseparable in general. This suggests that the inseparability of Bohm quantum potential does not necessarily justify a non-local underlying mechanism. Third, we will demonstrate that the extended least action principle can be a mathematical tool to produce new results that were not reported in other research literature. By quantifying the information metrics for vacuum fluctuations using more general definitions of relative entropy such as the Rényi or Tsallis divergence, we obtain a generalized Schrödinger equation. The applicability of the generalized Schrödinger equation needs further investigation, but the equation is legitimate from the information-theoretic perspective.

Extending the least action principle in classical mechanics to quantum mechanics not only shows clearly how classical mechanics becomes quantum mechanics, but also opens up a new mathematical toolbox. Indeed, the quantum scalar field theory can be obtained as well from the least observability principle [55].

The rest of the article is organized as follows. First we describe in detail how the extended least action principle is constructed and what the underlying assumptions are. Then we show how the basic quantum theory is recovered. This follows by the derivation of a generalized Schrödinger equation not reported in earlier research literature. Next, we analyze the locality of vacuum fluctuations and its implications to the Bohm quantum potential.we give a detailed analysis on why quantum entanglement is non-causal using the formulations developed here. We then conclude the article after comprehensive discussions and comparisons to previous relevant research works.

2 The Extended Least Action Principle

The first assumption to make here is that there are vacuum fluctuations that a quantum system will be constantly experiencing. It is not our intention here to investigate the origin, or establish a physical model, of such vacuum fluctuations. Instead, we make a minimal number of assumptions on the underlying physical model, only enough so that we can apply the variation principle based on the degree of observability. The advantage of this approach is to avoid keeping track of physical details that are irrelevant for predicting future measurement results. It also avoids the potential need of introducing hidden variables such as the osmotic velocity in stochastic mechanics. The vacuum fluctuation is assumed to be local. This means that for a composite system, the fluctuation of each subsystem is independent of each other. The vacuum fluctuation is also assumed to be completely random such that the mean of fluctuations is zero but the variance is non-zero. We state the assumption as following:

Assumption 1 – A quantum system experiences vacuum fluctuations constantly. The fluctuations are completely random, and local.

Now consider a particle with mass m moving from position A to B. The motion of the particle is a combination of two independent components, the classical trajectory due to external potential, and the random vacuum fluctuations around any given position along the classical path. Due to the vacuum fluctuations, there is no definite trajectory. How to construct a principle based on information related metrics that can derive the laws of dynamics for this physical scenario?

In classical mechanics, the path trajectory follows the laws of dynamics derived through the least action principle. Thus, it is natural to consider recasting the least action principle to be based on information related metrics such that it can be extended to derive quantum mechanics. The action for the classical trajectory can be calculated as usual, the challenge here is to calculate the additional action due to vacuum fluctuations since the physical details of the vacuum fluctuations is unknown. We wish to find another way to calculate this additional action. The second assumption introduced next will help this attempt. We assume that the physical object must exhibit a minimal amount of action during its dynamical motion in order to be observable or distinguishable (relative to a reference frame), and this amount of action effort is determined by the Planck constant \(\hbar\). As such, the Planck constant is a discrete unit of action for measuring the observable information. Making use of this understanding of the Planck constant inversely provides us a new way to calculate the additional action due to vacuum fluctuations. That is, even though we do not know the physical details of vacuum fluctuations, the vacuum fluctuations manifest themselves via a discrete action unit determined by the Planck constant as an observable information unit. If we are able to define an information metric that quantifies the amount of observable information manifested by vacuum fluctuations, we can then multiply the metric with the Planck constant to obtain the action associated with vacuum fluctuations.

The existence of the constant \(\hbar\) and its interpretation cannot be deduced from classical mechanics, but has to be a fundamental assumption itself, or be derived from another fundamental postulate. The existence of the Planck constant implies a fundamental physical limitation that is not recognized in classical mechanics. Indeed, Rovelli has pointed out [6] that his postulate on limited information for a quantum system implies the existence of Planck constant. This implies that the Planck constant plays a role to connect physical variables to certain information metrics. But it is unclear how \(\hbar\) is used to measure the amount of information in the subsequent reconstruction effort of quantum theory in [6]. In this paper, instead of introducing a postulate of limited information for a quantum system, we assume there is a non-zero discrete action unit to measure the degree of observability exhibited from a path trajectory with action effort S, and this unit is called the Planck constant \(\hbar\). Conceptually, our assumption is more intuitive. What we assume is that there is a lower limit to the amount of action effort that a system needs to exhibit in order to be observable or distinguishable, and such a unit of action effort is defined by the Planck constant. Formally, the assumption can be stated as,

Assumption 2 – There is a lower limit to the amount of action that a physical system needs to exhibit in order to be observable. This basic discrete unit of action effort is given by \(\hbar /2\)where \(\hbar\)is the Planck constant.

The word exhibit implies that the observability is uncovered by the movement of the physical system itself, instead of an actual measurement.

With Assumption 2, the challenge to calculate the additional action due to vacuum fluctuation is converted to define a proper new information metric If, which measures the additional distinguishable, hence observable, information exhibited due to vacuum fluctuations. Even though we do not know the physical details of vacuum fluctuations (except that as Assumption 1 states, these vacuum fluctuations are completely random and local), the problem becomes less challenged since there are information-theoretic tools available. The first step is to assign a transition probability distribution due to vacuum fluctuation for an infinitesimal time step at each position along the classical trajectory. The distinguishability then can be defined as the information distance between the transition probability distribution and a uniform probability distribution. Uniform probability distribution is chosen here as reference to reflect the complete randomness of vacuum fluctuations. In information theory, the common information metric to measure the information distance between two probability distributions is relative entropy. Relative entropy is more fundamental to Shannon entropy since the latter is just a special case of relative entropy when the reference probability distribution is a uniform distribution. But there is a more important reason to use relative entropy. As shown in later section, when we consider the dynamics of the system for an accumulated time period, we assume the initial position is unknown but is given by a probability distribution. This probability distribution can be defined along the position of classical trajectory without vacuum fluctuations, or with vacuum fluctuations. The information distance between the two probability distributions gives the additional distinguishability due to vacuum fluctuations. It is again measured by a relative entropy. Thus, relative entropy is a powerful tool allowing us to extract meaningful information about the dynamic effects of vacuum fluctuations. Concrete form of \(I_f\) will be defined later as a functional of Kullback–Leibler divergence \(D_{KL}\), \(I_f:=f(D_{KL})\), where \(D_{KL}\) measures the information distances of different probability distributions caused by vacuum fluctuations. Thus, the total action due to both classical trajectory and vacuum fluctuation is

$$\begin{aligned} S_t = S_c + \frac{\hbar}{2}f(D_{KL}). \end{aligned}$$
(1)

where \(S_c\) is the classical action.  Quantum theory can be derived through a variation approach to minimize such a functional quantity, \(\delta S_t=0\). When \(\hbar \to 0\), \(S_t=S_c\). Minimizing \(S_t\) is then equivalent to minimizing \(S_c\), resulting in the dynamics laws of classical mechanics. However, in quantum mechanics, \(\hbar \ne 0\), the contribution from \(I_f\) must be included when minimizing the total action. We can see \(I_f\) is where the quantum behaviors of a system come from. These ideas can be condensed as the

Extended Principle of Least Action – The law of physical dynamics for a quantum system tends to exhibit as little as possible the action functional defined in (1).

Alternatively, we can interpret the extended least action principle more from an information perspective by rewriting (1) as 

$$I_t =\frac{2}{\hbar} S_c + I_f,$$
(1A)

where \(I_t=2S_t/\hbar\). Denote \(I_p=2S_c/\hbar\), which measures the amount of \(S_c\) using the discrete unit \(\hbar/2\). \(I_p\) is not a conventional information metric but can be considered carrying meaningful physical information. To see this connection, recall that the classical action is defined as an integral of Lagrangian over a period of time along a path trajectory of a classical object. There are two aspects to understand the action functional. In classical mechanics, the path trajectory can be traced, measured, or observed. Given two fixed end points, the longer of the path trajectory, the larger value of the action. It indicates (1) the more dynamic effort the system exhibits; and (2) the easier to trace the path and distinguish the object from the background reference frame, or in other words, the more physical information available for potential observation. Thus, action \(S_c\) not only quantifies the dynamic effort of the system, but also is associated with the detectability, or observability, of the physical object during the dynamics along the path. In classical mechanics, we focus on the first aspect via the least action principle, and derive the law of dynamics from minimizing the action effort. The second aspect is not useful since we cannot quantify the intuition that \(S\) is associated with the observability of the physical object. One reason is that there is no natural unit of action to convert \(S\) into a information related metric. The introduction of the Planck constant in Assumption 2 helps to quantify this intuition. We call \(I_p\) the observability of the classical trajectory. Similarly, \(I_f\) measure the distinguishable information of the probability distributions with and without vacuum fluctuations. Thus, \(I_t\) is the total observable information. With the expression above for \(I_t\), the extended least action principle can be re-stated as 

Principle of Least ObservabilityThe law of physical dynamics for a quantum system tends to exhibit as little as possible the observability defined in (1A).

Mathematically, there is no difference between (1) and (1A) when applying the variation principle to derive the laws of dynamics. The form of \(S_t\) in terms of action is more accessible to the physics community. However, the form of \(I_t\) in terms of observability seems conceptually more generic. We will leave the exact interpretations of the principle aside and use the two interpretations interchangeable in this paper. The key point to remember is that the Planck constant connects the physical action to metrics related to observable information in either interpretation.

Independent from the least observability principle, we need another assumption similar to the no preference of reference frame postulate in special relativity. The observable information of the physical dynamics can be expressed in different representations. Loosely speaking, a representation is characterized by a set of variables with their values acting like coordinates to describe the properties of the system [41]. For instance, the position representation uses position variables to describe the physical properties of the system. Similarly, the momentum representation uses momentum variables to describe the physical properties of the system. We assume that the total observable information extracted in a representation is a complete description of the dynamics of the system. The physical laws derived in other representations do not offer additional power of predictions for future measurement results. Consequently, the physical laws for the dynamics of the system derived from different representations must be equivalent. As shown later, from the same least observability principle, we can derive the Schrödinger equation independently in both position and momentum representations. But we demand the results must be equivalent. In summary, we have

Assumption 3 – There is no preferred representation for the law of physics derived in each representation.

Assumption 3 will lead the transformation formulation between position and momentum representations.

With the least observability principle and the underlying assumptions explained, we now proceed to describe the results from applying this principle.

3 Basic Quantum Formulation

3.1 Dynamics of Vacuum fluctuations and The Uncertainty Relation

First we consider the dynamics of a system an infinitesimal time internal \(\Delta t\). Suppose we choose a reference frame such that the dynamics of the system under study is only due to the random vacuum fluctuations. That is, if we ignore vacuum fluctuations, the system is at rest relative to such a referece frame. This also means the external potential is neglected for the time being. Define the probability for the system to transition from a 3-dimensional space position \({\textbf{x}}\) to another position \({\textbf{x}}+{\textbf{w}}\), where \({\textbf{w}}=\Delta {\textbf{x}}\) is the displacement in 3-dimensional space due to fluctuations, as \(\wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})d^3{\textbf{w}}\). The expectation value of classical action is \(S_c=\int \wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})Ld^3{\textbf{w}}dt\). Since we only consider the vacuum fluctuations, the Lagrangian L only contains the kinetic energy, \(L=\frac{1}{2}m{\textbf{v}}\cdot {\textbf{v}}\). For an infinitesimal time internal \(\Delta t\), one can approximate the velocity \({\textbf{v}}={\textbf{w}}/\Delta t\). This gives

$$\begin{aligned} S_c=\frac{m}{2\Delta t}\int ^{+\infty }_{-\infty } \wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}}){\textbf{w}}\cdot {\textbf{w}}d^3{\textbf{w}}. \end{aligned}$$
(2)

The information metrics \(I_f\) is supposed to capture the additional revelation of information due to vacuum fluctuations. Thus, it is naturally defined as a relative entropy, or more specifically, the Kullback-Leibler divergence, to measure the information distance between \(\wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})\) and some prior probability distribution. Since the vacuum fluctuations are completely random, it is intuitive to assume the prior distribution with maximal ignorance [35, 42]. That is, the prior probability distribution is a uniform distribution \(\mu\).

$$\begin{aligned} I_f&=: D_{KL}(\wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}}) || \mu ) \\&= \int \wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})ln[\wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})/\mu ]d^3{\textbf{w}}. \end{aligned}$$

Combined with (2), the total amount of information defined in (1A) is

$$\begin{aligned} I =&\frac{m}{\hbar \Delta t}\int \wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}}){\textbf{w}}\cdot {\textbf{w}}d^3{\textbf{w}} \\&+ \int \wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})ln[\wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})/\mu ]d^3{\textbf{w}}. \end{aligned}$$

Taking the variation \(\delta I = 0\) with respect to \(\wp\) gives

$$\begin{aligned} \delta I = \int (\frac{m}{\hbar \Delta t}{\textbf{w}}\cdot {\textbf{w}}+ln\frac{\wp }{\mu } +1)\delta \wp d^3{\textbf{w}} = 0. \end{aligned}$$
(3)

Since \(\delta \wp\) is arbitrary, one must have

$$\begin{aligned} \frac{m}{\hbar \Delta t}{\textbf{w}}\cdot {\textbf{w}}+ln\frac{\wp }{\mu } +1=0. \end{aligned}$$

The solution for \(\wp\) is

$$\begin{aligned} \wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}}) = \mu e^{-\frac{m}{\hbar \Delta t}{\textbf{w}}\cdot {\textbf{w}} - 1} = \frac{1}{Z}e^{-\frac{m}{\hbar \Delta t}{\textbf{w}}\cdot {\textbf{w}}}, \end{aligned}$$
(4)

where Z is a normalization factor that absorbs factor \(\mu e^{-1}\). Equation (4) shows that the transition probability density is a Gaussian distribution. The variance \(\langle w_i^2\rangle = \hbar \Delta t/2m\), where \(i\in \{1, 2, 3\}\) denotes the spatial index. Recalling that \(w_i/\Delta t = v_i\) is the approximation of velocity due to the vacuum fluctuations, we denote \(p_i^f=mv_i=mw_i/\Delta t\). Since \(\langle p_i^f\rangle \propto \langle w_i\rangle = 0\), then \(\langle (p_i+p_i^f)^2-p_i^2\rangle = \langle (p_i^f)^2\rangle\), and \(p_i^f\) can be considered as the fluctuations of momentum on top of the classical momentum. That is, \(\Delta p_i = p_i^f= mw_i/\Delta t\). Rearranging \(\langle w_i^2\rangle = \hbar \Delta t/2\,m=\langle (\Delta x_i)^2\rangle\) gives

$$\begin{aligned} \langle \Delta x_i\Delta p_i\rangle = \frac{\hbar }{2}. \end{aligned}$$
(5)

This relation is first proposed by Hall and Reginatto as an exact uncertainty relation [43, 44], where it is postulated with mathematical arguments. Here we derive it from the variation principle of minimizing the amount of information due to vacuum fluctuations. Now squaring both sides of (5) and applying the Cauchy-Schwarz inequality gives

$$\begin{aligned} \frac{\hbar ^2}{4}&=\langle \Delta x_i\Delta p_i\rangle ^2 = (\int \wp \Delta x_i\Delta p_i d^3{\textbf{w}})^2 \\&\le \int \wp (\Delta x_i)^2d^3{\textbf{w}} \int \wp (\Delta p_i)^2d^3{\textbf{w}} \\&= \langle (\Delta x_i)^2\rangle \langle (\Delta p_i)^2\rangle . \end{aligned}$$

Taking square root of both sides results in

$$\begin{aligned} \langle \Delta x_i\rangle \langle \Delta p_i\rangle \ge \hbar /2. \end{aligned}$$
(6)

3.2 Derivation of The Schrödinger Equation

We now turn to the dynamics for a cumulative period from \(t_A\rightarrow t_B\). Suppose a typical reference frame is chosen such that if the vacuum fluctuations are ignored, the system moves along a classical path trajectory. External potential is considered here with such a reference frame. In classical mechanics, the equation of motion is described by the Hamilton-Jacobi equation,

$$\begin{aligned} \frac{\partial S}{\partial t }+ \frac{1}{2m}\nabla S\cdot \nabla S + V = 0. \end{aligned}$$
(7)

Suppose the initial condition is unknown, and define \(\rho ({\textbf{x}}, t)\) as the probability density for finding a particle in a given volume of the configuration space. The probability density must satisfy the normalization condition \(\int \rho ({\textbf{x}}, t) d^3{\textbf{x}} = 1\), and the continuity equation

$$\begin{aligned} \frac{\partial \rho ({\textbf{x}}, t)}{\partial t }+ \frac{1}{m}\nabla \cdot (\rho ({\textbf{x}}, t)\nabla S) = 0. \end{aligned}$$

The pair \((S, \rho )\) completely determines the motion of the classical ensemble. As pointed out by Hall and Reginatto [43, 44], the Hamilton-Jacobi equation, and the continuity equation, can be derived from classical action

$$\begin{aligned} S_c = \int \rho \{ \frac{\partial S}{\partial t} + \frac{1}{2m}\nabla S\cdot \nabla S + V\} d^3{\textbf{x}}dt \end{aligned}$$
(8)

through fixed point variation with respect to \(\rho\) and S, respectively. Appendix A gives a more rigorous proof of (8) using extended canonical transformation method. Note that \(S_c\) and S are different physical variables. As shown in Appendix A, \(S_c\) can be considered as the ensemble average of classical action while S is a variable introduced in a canonical transformation that satisfied \({\textbf{p}}=\nabla S\). The degree of observability for the motion of this ensemble between the two fixed points is \(I_p = 2S_c/\hbar\) according to Assumption 2.

To define the information metrics for the vacuum fluctuations, \(I_f\), we slice the time duration \(t_A\rightarrow t_B\) into N short time steps \(t_0=t_A, \ldots , t_j, \ldots , t_{N-1}=t_B\), and each step is an infinitesimal period \(\Delta t\). In an infinitesimal time period at time \(t_j\), the particle not only moves according to the Hamilton-Jacobi equation but also experiences random fluctuations. The probability density \(\rho ({\textbf{x}}, t_j)\) alone is insufficient to encode all the observable information. Instead, we need to consider \(\rho ({\textbf{x}}+{\textbf{w}}, t_j)\) for all possible \({\textbf{w}}\). Such additional revelation of distinguishability is due to the vacuum fluctuations on top of the classical trajectory. The proper measure of this distinction is the information distance between \(\rho ({\textbf{x}}, t_j)\) and \(\rho ({\textbf{x}}+{\textbf{w}}, t_j)\). A natural choice of such information measure is \(D_{KL}(\rho ({\textbf{x}}, t_j) || \rho ({\textbf{x}}+{\textbf{w}}, t_j))\). We then take the average of \(D_{KL}\) over \({\textbf{w}}\). Denoting \(\langle \cdot \rangle _w\) the expectation value, and summing up such quantity for each infinitesimal time interval, lead to the definition

$$\begin{aligned} I_f&=: \sum _{j=0}^{N-1}\langle D_{KL}(\rho ({\textbf{x}}, t_j) || \rho ({\textbf{x}}+{\textbf{w}}, t_j))\rangle _w \end{aligned}$$
(9)
$$\begin{aligned}&=\sum _{j=0}^{N-1}\int d^3{\textbf{w}} d^3 {\textbf{x}}\wp ({\textbf{x}}+{\textbf{w}}| {\textbf{x}})\rho ({\textbf{x}}, t_j)ln \frac{\rho ({\textbf{x}}, t_j)}{\rho ({\textbf{x}}+{\textbf{w}}, t_j)}. \end{aligned}$$
(10)

Notice that \(\wp ({\textbf{x}}+{\textbf{w}}| {\textbf{x}})\) is a Gaussian distribution given in (4). When \(\Delta t\) is small, only small \({\textbf{w}}\) will contribute to \(I_f\). As shown in Appendix B, when \(\Delta t\rightarrow 0\), \(I_f\) turns out to be

$$\begin{aligned} I_f = \int d^3{\textbf{x}}dt \frac{\hbar }{4m}\frac{1}{\rho }\nabla \rho \cdot \nabla \rho . \end{aligned}$$
(11)

Eq. (11) contains the term related to Fisher information for the probability density [39]. Some literature directly adds Fisher information in the variation method as a postulate to derive the Schrödinger equation [37]. But (11) bears much more physical significance than Fisher information. First, it shows that \(I_f\) is proportional to \(\hbar\). This is not trivial because it avoids introducing additional arbitrary constants for the subsequent derivation of the Schrödinger equation. More importantly, defining \(I_f\) using the relative entropy opens up new results that cannot be obtained if \(I_f\) is defined using Fisher information, because there are other generic forms of relative entropy such as Rényi divergence or Tsallis divergence. As will be seen later, by replacing the Kullback–Leibler divergence with Rényi divergence, one will obtain a generalized Schrödinger equation. Other authors also derive (11) using mathematical arguments [43, 44], while our approach is based on intuitive information metrics. With (11), the total degree of observability is

$$\begin{aligned} I_t = \int \{\frac{2}{h}\rho [ \frac{\partial S}{\partial t} + \frac{1}{2m}\nabla S\cdot \nabla S + V] + \frac{\hbar }{4m}\frac{1}{\rho }\nabla \rho \cdot \nabla \rho \} d^3{\textbf{x}}dt. \end{aligned}$$
(12)

Variation of I with respect to S gives the continuity equation, while variation with respect to \(\rho\) leads to

$$\begin{aligned} \frac{\partial S}{\partial t} + \frac{1}{2m}\nabla S\cdot \nabla S + V - \frac{\hbar ^2}{2m}\frac{\nabla ^2\sqrt{\rho }}{\sqrt{\rho }} = 0, \end{aligned}$$
(13)

The last term is the Bohm quantum potential [45]. The Bohm potential is considered responsible for the non-locality phenomenon in quantum mechanics [46]. Historically, its origin is mysterious. Here we show that it originates from the information metrics related to relative entropy, \(I_f\). The physical implications of this result will be discussed later. Defined a complex function \(\Psi =\sqrt{\rho }e^{iS/\hbar }\), the continuity equation and the extended Hamilton-Jacobi equation (13) can be combined into a single differential equation,

$$\begin{aligned} i\hbar \frac{\partial \Psi }{\partial t} = [-\frac{\hbar ^2}{2m}\nabla ^2 + V]\Psi , \end{aligned}$$
(14)

which is the Schrödinger Equation.

In summary, by recursively applying the same least observability principle in two steps, we recover the uncertainty relation and the Schrödinger equation. The first step is for a short time period to obtain the transitional probability density due to vacuum fluctuations; Then the second step is for a cumulative time period to obtain the dynamics law for \(\rho\) and S. The applicability of the same variation principle shows the consistency and simplicity of the theory, although the form of Lagrangian is different in each step. In the first step, the Lagrangian only contains the kinetic energy \(L=m{\textbf{v}}\cdot {\textbf{v}}/2\), which is in the form of \(L=\dot{{\textbf{x}}}\cdot {\textbf{p}} - H\) where H is the classical Hamiltonian. In the second step, we use a different form of classical Lagrangian \(L^\prime = \partial S/\partial t + H\). As shown in Appendix A, L and \(L^\prime\) are related through an extended canonical transformation. The choice of Lagrangian L or \(L^\prime\) does not affect the form of Lagrange’s equations. Here we choose \(L^\prime = \partial S/\partial t + H\) as the classical Lagrangian in the second step in order to use the pair of variables \((\rho , S)\) in the subsequent variation procedure.

To demonstrate the simplicity of the least observability principle, in Appendix C, we apply the principle to derive the Schrödinger equation in an external electromagnetic field. The interesting point here in this example is that the external electromagnetic field has no influence on the vacuum fluctuations. This reconfirms that the information metrics \(I_f\) is independent of the external potential.

3.3 Transformation Between Position and Momentum Representations

The classical action \(S_c\) and information metrics \(I_f\) in (1) are so far defined in the position representation, i.e., using position x as variable. However, there can be other observable quantities to serve as representation variables. Momentum is one of such representation variables. We can find the proper expressions for \(S_c\) and \(I_f\) in the momentum representation, and follow the same variation principle to derive the quantum theory. By Assumption 3, one would expect the law of dynamics in the momentum representation is equivalent to that in the position representation derived earlier. First let’s consider the effect of fluctuations in a short time step \(\Delta t\). The vacuum fluctuations occur not only in spatial space, but also in momentum space. Denote the transition probability density for the vacuum fluctuations as \(\tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})\) where \(\mathbf {\omega }=\Delta {\textbf{p}}\) is due to the momentum fluctuations. The classical Lagrangian without considering external potential is \(L=({\textbf{p}}+\mathbf {\omega })\cdot ({\textbf{p}}+\mathbf {\omega })/2m\), and the average classical action is

$$\begin{aligned} S_c=\frac{\Delta t}{2m}\int \tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})({\textbf{p}}+\mathbf {\omega })\cdot ({\textbf{p}}+\mathbf {\omega }) d^3{\tilde{w}}. \end{aligned}$$

Since \(\langle \mathbf {\omega } \rangle =0\), the only term contributed in the variation with respect to \(\tilde{\wp }\) is the one with \(\langle \mathbf {\omega }\cdot \mathbf {\omega } \rangle\). Similar to the definition of \(I_f\) in the position representation, here we define \(I_f=:D_{KL}(\tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})||{\tilde{\mu }})\) where \({\tilde{\mu }}\) is a uniform probability density in the momentum space. Plugging all these expressions into (1) and let \(\delta S_t=0\) with respect to \(\tilde{\wp }\), one will obtain

$$\begin{aligned} \tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}}) = \frac{1}{Z'}e^{-\frac{\Delta t}{m\hbar }\mathbf {\omega }\cdot \mathbf {\omega }}, \end{aligned}$$

and \(Z'\) is the normalization factor. The variance \(\langle \omega _i^2 \rangle =\langle (\Delta p_i)^2\rangle = m\hbar /2\Delta t\), where i is the spatial index. This is also a Gaussian distribution but with a significant difference from (4) in the position representation. That is, when \(\Delta t\rightarrow 0\), \(\langle (\Delta p_i)^2\rangle \rightarrow \infty\) while \(\langle (\Delta x_i)^2\rangle \rightarrow 0\). This implies that when \(\Delta t\rightarrow 0\), the Gaussian distribution \(\tilde{\wp }\) becomes a uniform distribution. Note that \(\Delta p_i\Delta t=m\Delta x_i\), rearranging \(\langle (\Delta p_i)^2\rangle = m\hbar /2\Delta t\) gives the same uncertainty relation in (5).

For illustration purposes, we will only derive the momentum representation of the Schrödinger equation for a free particle. Let \(\varrho ({\textbf{p}}, t)\) be the probability density in the momentum representation, the classical action is

$$\begin{aligned} S_c = \int \varrho ({\textbf{p}}, t)\{\frac{\partial S}{\partial t} + \frac{{\textbf{p}}\cdot {\textbf{p}}}{2m}\}d^3{\textbf{p}}dt. \end{aligned}$$

\(I_f\) is defined similarly to (9) as

$$\begin{aligned} I_f =: \sum _{j=0}^{N-1}\langle D_{KL}(\varrho ({\textbf{p}}, t_j) || \varrho ({\textbf{p}}+\mathbf {\omega }, t_j)\rangle _{{\tilde{w}}}. \end{aligned}$$
(15)

However, when \(\Delta t\rightarrow 0\), \(\tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})\) becomes an uniform distribution, \(I_f \rightarrow \infty\) independent of \(\varrho\), as shown in Appendix E. This implies that \(I_f\) does not contribute when taking variation with respect to \(\varrho\). Thus,

$$\begin{aligned} \delta I_t = \delta \int \varrho ({\textbf{p}}, t)\{\frac{2}{\hbar }\frac{\partial S}{\partial t} + \frac{{\textbf{p}}\cdot {\textbf{p}}}{m\hbar }\}d^3{\textbf{p}}dt. \end{aligned}$$
(16)

Variation with respect to \(\varrho\) gives

$$\begin{aligned} \frac{\partial (S/\hbar )}{\partial t} + \frac{{\textbf{p}}\cdot {\textbf{p}}}{2m\hbar } = 0, \end{aligned}$$

and variation with respect to S gives \(\partial \varrho /\partial t = 0\). Defined \(\psi =\sqrt{\varrho }e^{i(S/\hbar )}\), the two differential equations are combined into a single differential equation,

$$\begin{aligned} i\hbar \frac{\partial \psi }{\partial t} = \frac{{\textbf{p}}\cdot {\textbf{p}}}{2m}\psi , \end{aligned}$$
(17)

which is the Schrödinger equation for a free particle in the momentum representation. Recalled that in the position representation, the Schrödinger equation for a free particle is \(i\hbar \partial \Psi /\partial t = [-(\hbar ^2/2m)\nabla ^2]\Psi\). The two equations are derived independently from the variation of dynamics information defined in (1). Assumption 3 demands that the two equations must be equivalent. To meet this requirement, one sufficient condition is that the two wavefunctions are transformed through

$$\begin{aligned} \Psi ({\textbf{x}}, t) = (\frac{1}{\sqrt{2\pi \hbar }})^3\int e^{i{\textbf{p}}\cdot {\textbf{x}}/\hbar }\psi ({\textbf{p}}, t)d^3{\textbf{p}}. \end{aligned}$$
(18)

This transformation justifies the introduction of operator \({\hat{p}}_i=:-i\hbar \partial /\partial x_i\) to represent momentum in the position representation, because using (18), one can verify that the expectation value of momentum \(\langle \psi ({\textbf{p}}, t)|p_i|\psi ({\textbf{p}}, t)\rangle\) can be computed as \(\langle \Psi ({\textbf{x}}, t) |{\hat{p}}_i|\Psi ({\textbf{x}}, t)\rangle\). Introduction of the momentum operator \({\hat{p}}_i=:-i\hbar \partial /\partial x_i\) leads to the commutation relation \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\).

Suppose in the momentum representation there is a different action unit \(\hbar _p \ne \hbar\). Repeating the same variation procedure gives a Schrödinger equation for a free particle

$$\begin{aligned} i\hbar _p\frac{\partial \psi }{\partial t} = \frac{{\textbf{p}}\cdot {\textbf{p}}}{2m}\psi . \end{aligned}$$

To satisfy Assumption 3, the transformation function (18) needs to be modified as

$$\begin{aligned} \Psi ({\textbf{x}}, t) = (\frac{1}{\sqrt{2\pi \hbar }})^3\int e^{i{\textbf{p}}\cdot {\textbf{x}}/\sqrt{\beta }\hbar }\psi ({\textbf{p}}, t)d^3{\textbf{p}} \end{aligned}$$

where \(\beta = \hbar _p/\hbar\). Consequently, \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar \sqrt{\beta }\). It is clear that the assumption of having a different constant \(\hbar _p \ne \hbar\) in momentum representation is incompatible with the well established Dirac commutation relation \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\). By accepting \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\), one must reject \(\hbar _p \ne \hbar\).

Deriving the Schrödinger equation, from the least observability principle, in the momentum representation with an external potential \(V({\textbf{x}})\ne 0\) is a much more complicated task. However, the theory for a free particle is sufficient to demonstrate why the Planck constant must be the same in both position and momentum representations.

4 The Generalized Schrödinger Equation

The term \(I_f\) is supposed to capture the additional distinguishability exhibited by the vacuum fluctuations, and is defined in (9) as the summation of the expectation values of Kullback–Leibler divergence between \(\rho ({\textbf{x}},t)\) and \(\rho ({\textbf{x}}+{\textbf{w}},t)\). However, there are more generic definitions of relative entropy, such as the Rényi divergence [51, 53]. From an information theoretic point of view, there is no reason to exclude alternative definitions of relative entropy. Suppose we define \(I_f\) based on Rényi divergence,

$$\begin{aligned} I_f^{\alpha }&=: \sum _{j=0}^{N-1}\langle D_R^{\alpha }(\rho ({\textbf{x}}, t_j) || \rho ({\textbf{x}}+{\textbf{w}}, t_j)\rangle _w \end{aligned}$$
(19)
$$\begin{aligned}&=\sum _{j=0}^{N-1}\int d^3{\textbf{w}} \wp ({\textbf{w}})\frac{1}{\alpha -1}ln (\int d^3{\textbf{x}} \frac{\rho ^{\alpha }({\textbf{x}}, t_j)}{\rho ^{\alpha -1}({\textbf{x}}+{\textbf{w}}, t_j)}). \end{aligned}$$
(20)

Parameter \(\alpha \in (0,1)\cup (1, \infty )\) is called the order of Rényi divergence. When \(\alpha \rightarrow 1\), \(I_f^{\alpha }\) converges to \(I_f\) as defined in (9). In Appendix D, we show that using \(I_f^{\alpha }\) and following the same variation principle, we arrive at a similar extended Hamilton-Jacobi equation as (13),

$$\begin{aligned} \frac{\partial S}{\partial t} + \frac{1}{2m}\nabla S\cdot \nabla S + V - \frac{\alpha \hbar ^2}{2m}\frac{\nabla ^2\sqrt{\rho }}{\sqrt{\rho }} = 0, \end{aligned}$$
(21)

with an additional coefficient \(\alpha\) appearing in the Bohm quantum potential term. Defined \(\Psi ^\prime =\sqrt{\rho }e^{iS/\sqrt{\alpha }\hbar }\), the continuity equation and the extended Hamilton-Jacobi equation (21) can be combined into an equation similar to the Schrödinger equation, see Appendix D,

$$\begin{aligned} i\sqrt{\alpha }\hbar \frac{\partial \Psi ^\prime }{\partial t} = [-\frac{\alpha \hbar ^2}{2m}\nabla ^2 + V]\Psi ^\prime . \end{aligned}$$
(22)

When \(\alpha =1\), the regular Schrödinger equation is recovered as expected. Equation (22) gives a family of linear equations for each order of Rényi divergence.

Interestingly, if we define \(\hbar _{\alpha }= \sqrt{\alpha }\hbar\), then \(\Psi ^\prime =\sqrt{\rho }e^{iS/\hbar _{\alpha }}\), and (22) becomes the same form of the regular Schrödinger equation with replacement of \(\hbar\) with \(\hbar _{\alpha }\). It is as if there is an intrinsic relation between the order of Rényi divergence and the Plank constant. This remains to be investigated further. On the other hand, if the wavefunction is defined as usual without the factor \(\sqrt{\alpha }\), \(\Psi ^\prime =\sqrt{\rho }e^{iS/\hbar }\), it will result in a nonlinear Schrödinger equation. This implies that the linearity of Schrödinger equation depends on how the wavefunction is defined from the pair of real variables \((\rho , S)\).

We also want to point out that \(I_f^{\alpha }\) can be defined using Tsallis divergence [52, 54] as well, instead of using the Rényi divergence,

$$\begin{aligned} \begin{aligned} I_f^{\alpha }&=: \sum _{j=0}^{N-1}\langle D_T^{\alpha }(\rho ({\textbf{x}}, t_j) || \rho ({\textbf{x}}+{\textbf{w}}, t_j)\rangle _w \\&=\sum _{j=0}^{N-1}\int d^3{\textbf{w}}\wp ({\textbf{w}})\frac{1}{\alpha -1}\{\int d^3{\textbf{x}}\frac{\rho ({\textbf{x}}, t_j)^{\alpha }}{\rho ({\textbf{x}}+{\textbf{w}}, t_j)^{\alpha -1}} -1\}. \end{aligned} \end{aligned}$$
(23)

When \(\Delta t\rightarrow 0\), it can be shown that the \(I_f^\alpha\) defined above converges into the same form as (D3). Hence it results in the same generalized Schrödinger equation (22).

5 Locality of Vacuum FluctuationsInsights on Entanglement

Now we apply the least observability principle to a bipartite system. The ensemble average of classical action for the bipartite system is given by

$$\begin{aligned} \begin{aligned} S_c =&\int \rho ({\textbf{x}}_a, {\textbf{x}}_b, t)\{ \frac{\partial S}{\partial t} + \frac{1}{2m_a}\nabla _a S\cdot \nabla _a S \\&+ \frac{1}{2m_b}\nabla _b S\cdot \nabla _b S + V)\} d^3{\textbf{x}}_ad^3{\textbf{x}}_bdt. \end{aligned} \end{aligned}$$
(24)

In addition, we need to consider the information metric \(I_f\) for the bipartite system due to fluctuations. One of the key points in Assumption 1 is the locality of the vacuum fluctuations. The fluctuations experienced by particle A are completely independent from the fluctuations experienced by particle B. Formally, the locality of vacuum fluctuation can be defined by the separability of the joint transition probability of the bipartite system,

$$\begin{aligned} \begin{aligned} \wp ({\textbf{x}}_a&+{\textbf{w}}_a, {\textbf{x}}_b+{\textbf{w}}_b, t_j|{\textbf{x}}_a,{\textbf{x}}_b, t_j) = \\&\wp _a({\textbf{x}}_a+{\textbf{w}}_a, t_j|{\textbf{x}}_a, t_j)\wp _b({\textbf{x}}_b+{\textbf{w}}_b, t_j|{\textbf{x}}_b, t_j). \end{aligned} \end{aligned}$$
(25)

Extend the definition of \(I_f\) in (9) to the bipartite system:

$$\begin{aligned} I_f =: \sum _{j=0}^{N-1}\langle D_{KL}(\rho ({\textbf{x}}_a,{\textbf{x}}_b, t_j) || \rho ({\textbf{x}}_a+{\textbf{w}}_a, {\textbf{x}}_b+{\textbf{w}}_b, t_j)\rangle _w. \end{aligned}$$
(26)

Using (25), we show in Appendix F that when \(\Delta t \rightarrow 0\),

$$\begin{aligned} I_f = \int d^3{\textbf{x}}_ad^3{\textbf{x}}_bdt\{\frac{\hbar }{4m_a}\frac{\nabla _a\rho \cdot \nabla _a\rho }{\rho } + \frac{\hbar }{4m_b}\frac{\nabla _b\rho \cdot \nabla _b\rho }{\rho }\}. \end{aligned}$$
(27)

Variation of \(I_f\) with respect to \(\rho\) gives the Bohm quantum potential for the bipartite system, as shown in (F6) of Appendix F,

$$\begin{aligned} Q = - \frac{\hbar ^2}{2m_a}\frac{\nabla _a^2\sqrt{\rho }}{\sqrt{\rho }} - \frac{\hbar ^2}{2m_b}\frac{\nabla _b^2\sqrt{\rho }}{\sqrt{\rho }}. \end{aligned}$$
(28)

The interesting finding here is that even though the vacuum fluctuations for the two subsystems are independent from each other, \(I_f\) and the Bohm potential are inseparable in general. The inseparability depends on the inseparability of the initial condition \(\rho ({\textbf{x}}_a, {\textbf{x}}_b, 0)\). This suggests that there is no need for a non-local mechanism underlying the inseparability of the Bohm quantum potential.

The Schrödinger equation of the bipartite system is derived in Appendix F as

$$\begin{aligned} i\hbar \frac{\partial \Psi }{\partial t} = [-\frac{\hbar ^2}{2m_a}\nabla _a^2 -\frac{\hbar ^2}{2m_b}\nabla _b^2 + V]\Psi , \end{aligned}$$
(29)

where \(\Psi ({\textbf{x}}_a, {\textbf{x}}_b, t) = \sqrt{\rho ({\textbf{x}}_a, {\textbf{x}}_b, t)}e^{iS({\textbf{x}}_a, {\textbf{x}}_b, t)/\hbar }\). Suppose there is no interaction between the two subsystems after \(t=0\) but the initial joint probability density at \(t=0\) is inseparable, then \(\Psi ({\textbf{x}}_a, {\textbf{x}}_b, t)\) is an entanglement state for \(t>0\). Such an entanglement state can be maintained and manifested even though the two non-interacting subsystems move away from each other. Similar to the inseparability of Bohm potential, the inseparable correlation is maintained through \(I_f\), but the underlying vacuum fluctuations are local for the two subsystems. This suggests that an inseparable correlation can be propagated through a mechanism that is local. The implication of locality of vacuum fluctuations on entanglement deserves further analysis and discussion.

6 Discussion and conclusions

6.1 Implications of Assumption 2

The interpretation of Planck constant as the discrete action unit for the degree of observability reflects a fundamental physical limit. That is, there is a lower limit to the action effort needed to exhibit observable information of the dynamical behavior of a physical system. Smaller action effort will not be observable, information exhibited by an action unit smaller than \(\hbar /2\) is indistinguishable and in-observable. In other words, the Planck constant determines the resolution (in terms of action) of the observable information for the dynamics behavior of a physical system. Historically the Planck constant was first introduced to show that energy of radiation from a black body is discrete. One can consider the discrete energy unit as the smallest unit to be distinguished, or detected, in the black body radiation phenomenon. Here, we just interpret Planck constant from an information acquisition point of view. Interestingly, the postulate in special relativity that the speed of light in vacuum is constant in all inertial reference frames reflects another limit for information propagation. As pointed out by Landau [40], the constant speed of light actually is a consequence of a fundamental physical limit that there is a limit of speed in any interaction between two systems. The speed limit also implies that propagation of physical information is not instant because information is propagated through physical media such as light. Thus, the Planck constant or the speed of light each manifests a physical limit from an information processing point of view, but from different angle.

As mentioned in the introduction section, the definition of \(I_p=2S_c/\hbar\) in Assumption 2 should not be associated with the phase of probability amplitude for a trajectory of a particle in Feynman’s path integral [1]. Fundamentally, the path integral theory does not interpret the Planck constant as the quantum of action effort to exhibit observable information. The difference of the factor 2 is purely due to mathematical reason, since in path integral \(S/\hbar\) is associated with the probability amplitude, whereas in our formulation, we deal with the variable of probability density, which is the modulus square of probability amplitude. Nevertheless, both path integral and our formulation based on the least observability principle give the same Schrödinger equation. This is because both theories start with the contribution of the classical path, then add the additional contributions due to vacuum fluctuations, but in different ways. In path integral, the summation of \(e^{iS/\hbar }\) from all possible paths for the probability amplitude effectively collects the contributions due to vacuum fluctuations. On the other hand, in the least observability principle, the effect of vacuum fluctuations is manifested through the summation of the Kullback–Leibler divergence as defined in (9).

6.2 Alternative Formulations of the Least Observability Principle

Alternatively, we can interpret the least observability principle based on Eq. (1) as minimizing \(I_f\) with the constraint of \(S_c\) being a constant, and \(\hbar /2\) simply being a Lagrangian multiplier for such a constraint. Again, mathematically, it is an equivalent formulation. In that case, Assumption 2 is not needed. Instead it will be replaced by the assumption that the average action \(S_c\) is a constant with respect to variations over \(\rho\) and S. Which assumption to use depends on which choice is more physically intuitive. We believe that the least observability principle based on Assumption 2, where the Planck constant defines the discrete unit of action effort to exhibit observable information, gives more intuitive physical meaning of the mathematical formalism.

6.3 Comparisons with Relevant Research Works

In the original paper for Relational Quantum Mechanics (RQM) [6], Rovelli proposes two postulates from information perspective. The first postulate, there is a maximum amount of relevant information that can be extracted from a system, is in the same spirit with Assumption 2. Rovelli has pointed out that his first postulate implies the existence of Planck constant. But the reconstruction effort of quantum theory in [6] does not define the meaning of information and how \(\hbar\) is used to compute the amount of information. Here we reverse the logic of the argument in [6]. We make explicit mathematical connections between \(\hbar\) and the degree of observability in (1), leading to the least observability principle to reconstruct quantum mechanics. Conceptually, we make it more clear the connection between the Planck constant and the discreteness of action effort to exhibit observable information. The second postulate in [6], it is always possible to acquire new information about a system, is motivated to explain the complementarity in quantum theory [30, 31]. This postulate appears quite counterintuitive. It is not needed in our theory in terms of explaining complementarity. Instead, we assume there is no preferred representation for physical laws, which is more intuitive. The no preferred representation assumption allows us to derive the transformation formulation between position and momentum representations, and consequently the commutative relation \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\). Other authors proposed postulates similar to the no preferred representation assumption, such as no preferred measurement [33], no preferred reference frame [32], but in very different contexts.

The entropic dynamics approach to quantum mechanics [34, 35] bears some similarity with the theory presented in this work. For instance, the formulations are carried out with two steps, an infinitesimal time step and a cumulative time period. It also aims to derive the physical dynamics by extremizing entropy. However, the entropic dynamics approach relies on another postulate on energy conservation to complete the derivation of the Schrödinger equation. The theory presented in this paper has the advantage of simplicity since it recursively applies the same least observability principle in both an infinitesimal time step and a cumulative time period. The entropic dynamics approach also requires several seemingly arbitrary constants in the formulation, while we only need the Planck constant \(\hbar\) and its meaning is clearly given in Assumption 2.

The derivation of the Schrödinger equation in Sect. 3.2 starts from (8) which is due to Hall and Reginatto [43, 44]. Mathematically, we arrive at the same extended Hamilton-Jacobi equation (13) as that in [43, 44]. However, the underlying physical foundation is very different. Hall and Reginatto assume an exact uncertainty relation (5), while in our theory (5) is derived from the least observability principle in a infinitesimal time step. We clearly show the information origin of the Bohm potential, while Hall and Reginatto derive it by assuming the random fluctuations in momentum space and the exact uncertainty relation. We also use the general definition of relative entropy for information metrics \(I_f\) and obtain the generalized Schrödinger equation, which is not possible using the methods presented in [43, 44].

6.4 Limitations

Assumption 1 makes minimal assumptions on the vacuum fluctuations, but does not provide a more concrete physical model for the vacuum fluctuations. The underlying physics for the vacuum fluctuations is expected to be complex but crucial for a deeper understanding of quantum mechanics. It is beyond the scope of this paper. The intention here is to minimize the assumptions that are needed to derive the basic formulation of quantum mechanics, so that future research can just focus on these assumptions.

Another limitation is that the Schrödinger equation in the momentum representation is only derived for a free particle. In the case that the external potential exists, the derivation will be complicated. We will leave it for future research. Thus, Assumption 3 is only applied in the case of a free particle. It remains to be confirmed if it is applicable for generic cases with external potential. However, for the purpose of demonstrating why the Planck constant must be the same in both position and momentum representation, we only need a special case of a free particle.

6.5 Conclusions

We propose an extended least action principle, or, least observability principle, to demonstrate how classical mechanics becomes quantum mechanics from the information perspective. The least observability principle extends the least action principle by factoring in two assumptions. Assumption 2 states that the Planck constant defines the lower limit to the amount of action that a physical system needs to exhibit in order to be observable. Classical mechanics corresponds to a physical theory when such a lower limit of action effort is approximated as zero. The existence of the Planck constant allows us to quantify the physical intuition that the action effort is also associated with the observability of the system dynamics. New information metrics for the additional degree of distinguishability exhibited from vacuum fluctuations are introduced. These metrics are defined in terms of relative entropy to measure the information distances of different probability distributions caused by local vacuum fluctuations. To derive quantum theory, the least observability principle seeks to minimize the degree of observability from both classical trajectory and vacuum fluctuations. Nature appears to behave as least distinguishable as possible for future observation. This principle allows us to elegantly derive the uncertainty relation between position and momentum, and the Schrödinger equations in both position and momentum representations. Adding the no preferred representation assumption, we obtain the transformation formulation between position and momentum representations. The Planck constant must be the same in different presentations in order to be compatible with the Dirac commutation relation between position and momentum.

The information metric \(I_f\) is responsible for the origin of the Bohm quantum potential. The Bohm potential is widely considered as non-local for a bipartite system. We have shown that such non-locality just reflects the inseparability of the information metrics \(I_f\) for the bipartite system. Interestingly, the inseparability of \(I_f\) is preserved and manifested through a local mechanism - the vacuum fluctuations. Thus, even though the Bohm potential is inseparable for a bipartite system, there is no non-local causal relation between the two subsystems.

Utilizing Rényi divergence in the least observability principle leads to a generalized Schrödinger equation (22) that depends on the order of Rényi divergence. Given the extensive experimental confirmations of the normal Schrödinger equation, it is inconceivable that one will find physical scenarios for which the generalized Schrödinger equation with \(\alpha \ne 1\) is applicable. However, the generalized Schrödinger equation is legitimate from an information perspective. It confirms that the least observability principle can produce new results.

Extending the least action principle in classical mechanics to the least observability principle in quantum mechanics not only illustrates clearly how classical mechanics becomes quantum mechanics, but also opens up a new mathematical toolbox. It can be applied to field theory to obtain the Schrödinger functional equation for a massive scalar field [55]. We expect other advanced quantum formulations, such as the Pauli equation for an electron with spin, can be obtained from it. Lastly, the principle brings in interesting implications on the interpretation aspects of quantum mechanics, including new insights on quantum entanglement, which will be reported separately. The perception on the speed of light and Planck constant from information acquisition is also intriguing. That is, the speed of light is the upper limit of propagating information, while the Planck constant is considered as the lower limit of action effort to exhibit observable information of a physical system.