1 Introduction

Advancements of quantum information and quantum computing [1, 2] in recent decades have inspired active researches for new foundational principles for quantum mechanics from the information perspective [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. Reformulating quantum mechanics based on information principles can bring in new conceptual insights to the unresolved challenges in the current quantum theory. For instance, is probability amplitude, or wavefunction, just a mathematical tool or associated with ontic physical property? Does quantum entanglement imply non-local causal connection among entangled objects? With this motivation, recently an extended least action principle is proposed to derive the formalism of non-relativistic quantum mechanics [36]. The principle can be understood as extending the least action principle in classical mechanics by minimizing proper information measures. This is achieved by factoring two assumptions. First, there is a lower limit to the amount of action a physical system needs to exhibit in order to be observable. Such a discrete action unit is defined by the Planck constant. It serves as a basic unit to measure the observable information from the action a physical system exhibits during its motion. Second, there is vacuum fluctuation that is completely random. New information metrics are introduced to measure the additional distinguishability, or observable information, due to these random fluctuations, which is then converted to additional amount of action due to the vacuum fluctuations using the first assumption. Applying the variational principle to minimize the total actions allows us to elegantly recover the basic formulations of non-relativistic quantum mechanics. In addition, a family of generalized Schrödinger equation for the wave functional is obtained by defining the information metrics for vacuum fluctuations using generic relative entropy definitions.

The goal of this paper is to apply the same principle to relativistic quantum field theory. Specifically, we will apply the extended least action principle to derive the quantum field theory of massive scalar fields. Impressively, we find that the only adjustment needed to the extended least action principle is to replace the assumption of random vacuum fluctuations in the non-relativistic setting to random field fluctuations in the relativistic settings. By recursively applying the extended least action principle, we are able to derive the transition probability density of the field fluctuations, the uncertainty relation, and most importantly, the Schrödinger equation of the wave functional for the scalar fields. The Schrödinger equation of the wave functional is the fundamental equation for the quantum scalar field theory in the Schrödinger picture, and it is typically introduced as a postulate. Here we derive it from a first principle. Similarly to the non-relativistic quantum formalism, by relaxing the definition of the information metrics using generic relative entropy, we obtain a family of generalized Schrödinger equations. The application of such generalized Schrödinger equations needs further investigation, but the result shows the flexibility of the mathematical framework.

The Schrödinger picture offers several advantages compared to the standard Fock space description of scalar fields [37]. In particular, the Schrödinger wave functional gives an intrinsic description of the vacuum without reference to the spectrum of excited states, which is an inherent problem in the Fock space of state in curved spacetime [37]. It is also argued that the Schrödinger picture in field theory is the most natural representation from the viewpoint of canonical quantum gravity where the spacetime is usually decomposed into a spatial manifold evolving in time [39]. The Schrödinger formulations in both non-relativistic quantum theory and relativistic quantum field theory allows us to understand the difference and similarity between the two theories. It may provide hints on applying certain concepts from one theory to the other. For instance, calculating information metrics such as the entanglement entropy of a quantum field is challenged [40]. In non-relativistic quantum mechanics, such a quantity for entangled systems is typically calculated with the help of the wave function. With the availability of the Schrödinger wave functional, one may find a similar method to calculate the entangled entropy for a scalar field.

Extending the least action principle in classical mechanics to derive quantum theories not only shows clearly how classical mechanics becomes quantum mechanics, but also offers a powerful mathematical framework. As shown in this paper, the principle and mathematical framework allow us to derive the Schrödinger equation for the wave functional of the scalar field in a way very similar to that in the non-relativistic settings. Although the derivation is currently carried in the Minkowski spacetime, it should not be difficult to extend the derivation in a curved spacetime. The extended least action principle also provides interesting implications on the interpretation of quantum theory, which will be discussed in a separate report.

The rest of the article is organized as follows. First, we briefly overview the least action principle for the classical scalar field, since it is the starting point of the quantum formulation. Second, we review the underlying assumptions for the extended least action principle and what should be adjusted to apply the principle in the case of scalar fields. In Section 4 we apply the principle recursively to analyze the dynamics of field fluctuations, then derive the uncertainty relation and the Schrödinger equation for the wave functional. The Schrödinger equation is generalized in Section 5. We then conclude the article after comprehensive discussions and comparisons to previous relevant research works.

2 Classical Theory for Massive Scalar Fields

This section briefly reviews the classical theory of scalar fields, the canonical transformation, and the Hamilton-Jacobi equation. Consider a massive scalar field configuration \(\phi \). Here we denote the coordinates for a four dimensional spacetime point x either by \(x=(x^{(0)}, x^{(i)})\) where \(i=\{1, 2, 3\}\), or by \(x=(t, \textbf{x})\) where \(\textbf{x}\) is a spatial point. The field component at a spacetime point x is denoted as \(\phi _x=\phi (x)\). The Lagrangian density for the a massive scalar field is given by

$$\begin{aligned} \mathcal {L}= & {} \frac{1}{2}[\partial _{\mu }\phi (x)]^2 - \frac{1}{2}m^2[\phi (x)]^2\nonumber \\= & {} \frac{1}{2}[\dot{\phi }(x)]^2 - \frac{1}{2}([\nabla \phi (x)]^2+m^2[\phi (x)]^2). \end{aligned}$$
(1)

where \(\mu =\{0, 1, 2, 3\}\) and the convention of Einstein summation is assumed. The first term \(\frac{1}{2}[\dot{\phi }(x)]^2\) resembles the kinetic energy density in Newtonian mechanics, while the second term is the potential energy density and denoted as \(V(\phi (x))\). The correspondent action functional is

$$\begin{aligned} A = \int d^4x\mathcal {L}. \end{aligned}$$
(2)

The momentum conjugate to the field is defined by

$$\begin{aligned} \pi (x) = \frac{\partial \mathcal {L}}{\partial (\partial _0\phi )}=\partial _0\phi (x) = \dot{\phi }(x). \end{aligned}$$
(3)

Applying the least action principle to minimize the action functional S, one obtains the Euler-Lagrange equation

$$\begin{aligned} \partial _{\mu }\partial ^{\mu }\phi + m^2\phi ^2 = 0, \end{aligned}$$
(4)

which is the Klein-Gordon equation for the massive scalar field.

Variables \((\phi , \pi )\) form a pair of canonical variables, and the corresponding Hamiltonian is constructed by a Legendre transform of the Lagrangian [37]

$$\begin{aligned} H[\phi ,\pi ]= & {} \int d^3\textbf{x}\{\pi (x)\dot{\phi }(x) - \mathcal {L}\}\nonumber \\= & {} \int d^3\textbf{x} \{\frac{1}{2}[\dot{\phi }(x)]^2 + V\}. \end{aligned}$$
(5)

Next we want to apply the canonical transformation technique in field theory. To do this, we will need to choose a foliation of the spacetime into a succession of spacetime hypersurfaces. Here we only consider the Minkowksi spacetime and it is natural to choose these to be the hypersurfaces \(\Sigma _{t}\) of fixed t. The field configuration \(\phi \) for \(\Sigma _{t}\) can be understood as a vector with infinitely many components for each spatial point on the Cauchy hypersurface \(\Sigma _t\) at time instance t and denoted as \(\phi _{t,\textbf{x}}=\phi (t,\textbf{x})\). For simplicity of notation, we will still denote \(\phi (t,\textbf{x})= \phi (x)\) for the rest of this paper, but the meaning of \(\phi (x)\) should be understood as the field component \(\phi _{\textbf{x}}\) at each spatial point of the hypersurfaces \(\Sigma _{t}\) at time instance t. In Appendix A, we show that by an extended canonical transformation, the action functional of the field can be written as

$$\begin{aligned} A_c = \int dt \{\frac{\partial S}{\partial t} + H[\phi , \pi ]\}, \end{aligned}$$
(6)

where \(S[\phi , t]\) is a generation functional that satisfies the identity \(\pi (x) = \delta S / \delta \phi (x)\). A special solution to the least action principle for the above action functional is \(\partial S/\partial t + H = 0\). Substituting H from (5), we have

$$\begin{aligned} \frac{\partial S}{\partial t} + \int d^3\textbf{x} \{\frac{1}{2}[\dot{\phi }(x)]^2 + V(\phi (x))\} = 0. \end{aligned}$$
(7)

Since \(\dot{\phi }(x) = \pi (x) = \delta S / \delta \phi (x)\), the above equation can be rewritten as

$$\begin{aligned} \frac{\partial S}{\partial t} + \int d^3\textbf{x} \{\frac{1}{2}(\frac{\delta S}{\delta \phi (x)})^2 + V(\phi (x)\} = 0. \end{aligned}$$
(8)

This is the Hamilton-Jacobi equation for the scalar field that governs the evolution of the functional S between the spacelike hypersurfaces. It is equivalent to the Klein-Gordon equation (4).

As also shown in Appendix A, suppose the scalar field configuration \(\phi \) follows a probability distribution, with probability density \(\rho [\phi ,t]\) for the hypersurface \(\Sigma _t\), the average value of the action functional is,

$$\begin{aligned} S_c = \int \mathcal {D}\phi dt \{\rho [\frac{\partial S}{\partial t} + \int d^3\textbf{x} \{\frac{1}{2}(\frac{\delta S}{\delta \phi (x)})^2 + V(\phi (x)]\}. \end{aligned}$$
(9)

Note that \(S_c\) and S are different functional, where \(S_c\) can be considered as the ensemble average of classical action functional and S is a generation functional introduced in an extended canonical transformation that satisfied \(\pi (x) = \delta S / \delta \phi (x)\). Now we consider the generalized canonical pair as \((\rho , S)\), and apply the least action principle on the action functional defined in (9). Variation of \(S_c\) over \(\rho \) leads to (8), and variation of \(S_c\) over S gives

$$\begin{aligned} \frac{\partial \rho }{\partial t} + \int \frac{\delta }{\delta \phi (x)}(\rho \frac{\delta S}{\delta \phi (x)}) d^3\textbf{x} = 0, \end{aligned}$$
(10)

which is the continuity equation for the probability density. Both (8) and (10) determine the dynamics of the classical scalar field ensemble, and they are obtained by applying the least action principle based on the action functional \(S_c\) defined in (9).

3 The Extended Least Action Principle

Ref. [36] shows that the least action principle in classical mechanics can be extended to derive quantum formulation by factoring in the following two assumptions.

Assumption 1 – A quantum system experiences vacuum fluctuations constantly. The fluctuations are local and completely random.

Assumption 2 – There is a lower limit to the amount of action that a physical system needs to exhibit in order to be observable. This basic discrete unit of action effort is given by \(\hbar /2\) where \(\hbar \) is the Planck constant.

The first assumption is generally accepted in mainstream quantum mechanics, which is responsible for the intrinsic randomness of the dynamics of a quantum object. Locality of vacuum fluctuation is assumed, and it implies that for a composite system, the fluctuation of each subsystem is independent of each other.

The justifications of the second assumption is explained in detail in Section II of Ref. [36]. Historically the Planck constant was first introduced to show that the energy of radiation from a black body is discrete. One can consider the discrete energy unit as the smallest unit to be distinguished, or detected, in the black body radiation phenomenon. In general, it is understood that Planck constant is associated with the discreteness of certain observable in quantum mechanics. Here, we just interpret the Planck constant from an information measure point of view. Essentially, what we assume is that there is a lower limit to the amount of action that the physical system needs to exhibit in order to be observable or distinguishable in potential observation, and such a unit of action is defined by the Planck constant.

Making use of this understanding of the Planck constant inversely provides us a new way to calculate the additional action due to vacuum fluctuations. That is, even though we do not know the physical details of vacuum fluctuations, the vacuum fluctuations manifest themselves via a discrete action unit determined by the Planck constant as an observable information unit. If we are able to define an information metric that quantifies the amount of observable information manifested by vacuum fluctuations, we can then multiply the metric with the Planck constant to obtain the action associated with vacuum fluctuations. Then, the challenge to calculate the additional action due to vacuum fluctuation is converted to define a proper new information metric \(I_f\), which measures the additional distinguishable, hence observable, information exhibited due to vacuum fluctuations. Even though we do not know the physical details of vacuum fluctuations (except that as Assumption 1 states, these vacuum fluctuations are completely random and local), the problem becomes less challenged since there are information-theoretic tools available. The first step is to assign a transition probability distribution due to vacuum fluctuation for an infinitesimal time step at each position along the classical trajectory. The distinguishability of vacuum fluctuation then can be defined as the information distance between the transition probability distribution and a uniform probability distribution. Uniform probability distribution is chosen here as reference to reflect the complete randomness of vacuum fluctuations. In information theory, the common information metric to measure the information distance between two probability distributions is relative entropy. Relative entropy is more fundamental to Shannon entropy since the latter is just a special case of relative entropy when the reference probability distribution is a uniform distribution. But there is a more important reason to use relative entropy. As shown in later sections, when we consider the dynamics of the system for an accumulated time period, we assume the initial position is unknown but is given by a probability distribution. This probability distribution can be defined along the position of classical trajectory without vacuum fluctuations, or with vacuum fluctuations. The information distance between the two probability distributions gives the additional distinguishability due to vacuum fluctuations. It is again measured by a relative entropy. Thus, relative entropy is a powerful tool allowing us to extract meaningful information about the dynamic effects of vacuum fluctuations. Concrete form of \(I_f\) will be defined later as a functional of Kullback-Leibler divergence \(D_{KL}\), \(I_f:=f(D_{KL})\), where \(D_{KL}\) measures the information distances  of different probability distributions caused by vacuum fluctuations. Thus, the total action from classical path and vacuum fluctuation is

$$\begin{aligned} S_t = S_c + \frac{\hbar }{2}I_f, \end{aligned}$$
(11)

where \(S_c\) is the classical action. Non-relativistic quantum theory can be derived through a variation approach to minimize such a functional quantity [36], \(\delta S_t=0\). When \(\hbar \rightarrow 0\), \(S_t=S_c\). Minimizing \(S_t\) is then equivalent to minimizing \(S_c\), resulting in Newton’s laws in classical mechanics. However, in quantum mechanics, \(\hbar \ne 0\), the contribution from \(I_f\) must be included when minimizing the total action. We can see \(I_f\) is where the quantum behavior of a system comes from. These ideas can be condensed as

Extended Principle of Least Action – The law of physical dynamics for a quantum system tends to exhibit as little as possible the action functional defined in (11).

Now we want to apply this principle to the scalar field and derive the quantum scalar field theory. Assumption 1 needs to be slightly modified, since in the field theory, one does not deal with a physical object. Instead, we are dealing with the field configuration. Assumption 1 is restated as

Assumption 1a – There are constant fluctuations in the field configurations. The fluctuations are completely random, and local.

It is not our intention here to investigate the origin, or establish a physical model, of such field fluctuations. Instead, we make a minimal number of assumptions on the underlying physical model, only enough so that we can apply the variation principle based on minimizing the total action.

Assumption 2 is unchanged for quantum field theory. The action of the classical scalar field \(S_c\) is given by (2), or (9). Similarly, the metrics to measure the additional distinguishable information exhibited due to field fluctuations, is defined as a functional of Kullback-Leibler divergence \(D_{KL}\), \(I_f:=f(D_{KL})\), where \(D_{KL}\) measures the information distances of different probability distributions caused by field fluctuations. Thus, the total action due to both classical field dynamics and field fluctuation is given by the same equation as (11). Quantum field theory can be derived through a variation method to minimize such a functional quantity, \(\delta S_t=0\).

Alternatively, we can interpret the extended least action principle more from an information perspective by rewriting (11) as

$$\begin{aligned} I_t =\frac{2}{\hbar } S_c + I_f, \end{aligned}$$
(12)

where \(I_t=2S_t/\hbar \). Denote \(I_p=2S_c/\hbar \), which measures the amount of \(S_c\) using the discrete unit \(\hbar /2\). \(I_p\) is not a conventional information metric but can be considered carrying meaningful physical information about the observability of the classical field. More discussion on the meaning of observability is provided later in Section 6. Similarly, \(I_f\) measures the distinguishable information of the probability distributions with and without field fluctuations. Thus, \(I_t\) is the total observable information. With (12), the extended least action principle can be restated asFootnote 1

Principle of Least Observability – The law of physical dynamics for a quantum field tends to exhibit as little as possible the observable information defined in (12).

Mathematically, there is no difference between (11) and (12) when applying the variation principle to derive the laws of field dynamics. The form of (11) in terms of actions appears more familiar in the physics community. However, The form of (12) in terms of observability seems conceptually more generic. We will leave the exact interpretations of the principle alone and use the two interpretations interchangeable in this paper. The key point to remember is that the Planck constant connects the physical action to metrics related to observable information in either interpretation.

Next we will show that by applying the variational principle to minimize the action functional defined in (11), we can obtain the uncertainty relation and the Schrödinger equation of the wave functional for the scalar field, which are the basic formulation of the quantum scalar field.

4 Quantum Theory for Massive Scalar Fields

4.1 Field Fluctuations and Uncertainity Relation

First we consider the field fluctuations in an equal times hyper-surfaces for an infinitesimal time internal \(\Delta t\). At a given time \(t\rightarrow t+\Delta t\) in the hyper-surface \(\Sigma _t\), the field configuration fluctuates randomly, \(\phi \rightarrow \phi + \omega \), where \(\omega =\Delta \phi \) is the change of field configuration due to random fluctuations. Define the probability for the field configuration to transition from \(\phi \) to \(\phi + \omega \) as \(p[\phi + \omega |\phi ]\mathcal {D}\omega \). The expectation value of classical action over all possible field fluctuations is \(S_c=\int p[\phi + \omega |\phi ]\mathcal {L}d^3\textbf{x}\mathcal {D}\omega dt\) where \(\mathcal {L}\) is given by (1) for a scalar field. For an infinitesimal time internal \(\Delta t\), one can approximate \(\dot{\phi }=\Delta \phi /\Delta t=\omega /\Delta t\). The classical action for the infinitesimal time internal \(\Delta t\) is approximately given by

$$\begin{aligned} S_c=\int p[\phi + \omega |\phi ]\mathcal {D}\omega \int _{\Sigma _t}\{\frac{[\omega (x)]^2}{2\Delta t}+V(\phi (x))\Delta t\}d^3\textbf{x}. \end{aligned}$$
(13)

The information metrics \(I_f\) is supposed to capture the additional revelation of information due to field fluctuations in the hypersurface \(\Sigma _t\). Thus, it is naturally defined as a relative entropy, or more specifically, the Kullback-Leibler divergence, to measure the information distance between \(p[\phi + \omega |\phi ]\) and some prior probability distribution. Since the field fluctuations are completely random, it is intuitive to assume the prior distribution with maximal ignorance [33, 45]. That is, the prior probability distribution is a uniform distribution \(\sigma \).

$$\begin{aligned} I_f&:= D_{KL}(p[\phi + \omega |\phi ]|| \sigma ) \\&= \int p[\phi + \omega |\phi ]ln[p[\phi + \omega |\phi ]/\sigma ]\mathcal {D}\omega . \end{aligned}$$

Combined with (13), the total action functional defined in (11) is

$$\begin{aligned} S_t =&\int p[\phi + \omega |\phi ]\mathcal {D}\omega \int (\frac{[\omega (x)]^2}{2\Delta t} + V(\phi (x)) \Delta t)d^3\textbf{x} \\&+ \frac{\hbar }{2}\int p[\phi + \omega |\phi ]ln[p[\phi + \omega |\phi ]/\sigma ]\mathcal {D}\omega . \end{aligned}$$

Taking the variation \(\delta S_t = 0\) with respect to p gives

$$\begin{aligned} \delta S_t = \frac{\hbar }{2}\int \{ \int (\frac{[\omega (x)]^2}{\hbar \Delta t} + \frac{2V \Delta t}{\hbar })d^3\textbf{x}+ln\frac{p}{\sigma } +1)\}\delta p \mathcal {D}\omega = 0. \end{aligned}$$
(14)

Since \(\delta p\) is arbitrary, one must have

$$\begin{aligned} \int ([\omega (x)]^2 + 2V (\Delta t)^2]d^3\textbf{x}+\hbar \Delta t(ln\frac{p}{\sigma } +1)=0. \end{aligned}$$

When \(\Delta t\) is infinitesimally small, we can ignore the higher order term with \((\Delta t)^2\), and obtain the solution for p as

$$\begin{aligned} p[\phi + \omega |\phi ]= & {} \sigma e^{-\frac{1}{\hbar \Delta t}\int [\omega (x)]^2 d^3\textbf{x} - 1}\nonumber \\= & {} \frac{1}{Z}e^{-\frac{1}{\hbar \Delta t}\int [\omega (x)]^2 d^3\textbf{x}}, \end{aligned}$$
(15)

where Z is a normalization factor that absorbs factor \(\sigma e^{-1}\). (15) shows that the transition probability density is a Gaussian-like distribution. It is independent of \(\phi \) and can be simply denoted as \(p[\omega ]\). Clearly, the expectation value of \(\omega (x)\) is

$$\begin{aligned} \langle \omega (x)\rangle = \int p[\omega ] \omega (x) \mathcal {D}\omega = 0. \end{aligned}$$
(16)

We also want to evaluate the expectation value field fluctuations at two spatial points in hypersurface \(\Sigma _t\), \(x=(t, \textbf{x})\) and \(x'=(t, \mathbf {x'})\),

$$\begin{aligned} \langle \omega (x)\omega (x')\rangle = \int p[\omega ] \omega (x) \omega (x')\mathcal {D}\omega . \end{aligned}$$
(17)

In Appendix B, we verify that

$$\begin{aligned} \langle \omega (x)\omega (x')\rangle = \frac{\hbar \Delta t}{2}\delta (\textbf{x}-\mathbf {x'}), \end{aligned}$$
(18)

Recall that \(\omega = \Delta \phi \), and \(\pi =\dot{\phi }=\Delta \phi /\Delta t=\omega /\Delta t\). Since \(\langle \omega \rangle =0\), one has \(\langle \pi \rangle =\langle \omega \rangle /\Delta t = 0\) as well. Thus, \(\Delta \pi = \pi - \langle \pi \rangle = \pi = \omega /\Delta t\), we re-arrange (18) as

$$\begin{aligned} \langle \Delta \phi (x)\Delta \pi (x')\rangle = \frac{\hbar }{2}\delta (\textbf{x}-\mathbf {x'}). \end{aligned}$$
(19)

Applying the Cauchy-Schwarz inequality we get

$$\begin{aligned} \langle \Delta \phi (x)\rangle \langle \Delta \pi (x')\rangle \ge \langle \Delta \phi (x)\Delta \pi (x')\rangle =\frac{\hbar }{2}\delta (\textbf{x}-\mathbf {x'}). \end{aligned}$$
(20)

But comparing with the \(\delta \)-function in the right hand side of (20) appears inappropriate. Instead, we introduce a pair of positive spatial test functions \(f(\textbf{x}), g(\textbf{x}): \mathbb {R}^3\rightarrow \mathbb {R}^+\), and define

$$\begin{aligned} \langle \omega (f)\omega (g)\rangle = \int p[\omega ] \{\int _{\Sigma _t}\omega (x) f(\textbf{x}) \omega (x')g(\mathbf {x'})d\textbf{x}d\mathbf {x'}\}\mathcal {D}\omega . \end{aligned}$$
(21)

Repeating the similar calculations from (18) to (20), we can obtain

$$\begin{aligned} \langle \Delta \phi (f)\rangle \langle \Delta \pi (g)\rangle \ge \frac{\hbar }{2}\langle f | g \rangle , \end{aligned}$$
(22)

where \(\langle f | g \rangle = \int _{\Sigma _t}f(\textbf{x})g(\textbf{x})d\textbf{x}\). This is the uncertainty relation between the field variable \(\phi \) and its conjugate momentum variable \(\pi \) for the scalar fields.

4.2 Derivation of The Schrödinger Equation for the Wave Functional

We now turn to the field dynamics for a period of time from \(t_A\rightarrow t_B\). As described earlier, the spacetime during the time duration \(t_A\rightarrow t_B\) is sliced into a succession of N Cauchy hypersurfaces \(\Sigma _{t_i}\), where \(t_i \in \{t_0=t_A, \ldots , t_i, \ldots , t_{N-1}=t_B\}\), and each time step is an infinitesimal period \(\Delta t\). The field configuration for each \(\Sigma _{t_i}\) is denoted as \(\phi (t_i)\), which has infinite number of components, labeled as \(\phi _{\textbf{x}}(t_i)=\phi (\textbf{x}, t_i)\), for each spatial point in \(\Sigma _{t_i}\). Without considering the random field fluctuation, the dynamics of the field configuration is governed by the Hamilton-Jacobi equation (8). Furthermore, we consider an ensemble of field configurations for hypersurface \(\Sigma _{t_i}\) that follow a probability densityFootnote 2\(\rho _{t_i}[\phi ] = \rho [\phi , t_i]\) which follows the continuity equation (10). As shown in Section 2, both the Hamilton-Jacobi equation and the continuity equation can be derived through variation over the classical action functional \(S_c\), as defined in (9), with respect to \(\rho \) and S, respectively.

To apply the extended least action principle, first we compute the action from the dynamics of the classical field ensemble as defined in (9). Next we need to define the information metrics for the field fluctuations, \(I_f\). For each new field configuration \(\phi +\omega \) due to the field fluctuations, there is a new probability density \(\rho [\phi +\omega , t_i]\). We need a proper metrics to measure the additional revelation of observable information due to the field fluctuations on top of the classical field dynamics. The proper measure of this distinction is the information distance between \(\rho [\phi , t_i]\) and \(\rho [\phi +\omega , t_i]\). A natural choice of such information measure is the relative entropy \(D_{KL}(\rho [\phi , t_i] || \rho [\phi +\omega , t_i])\). Moreover, we need to consider the contributions for all possible \(\omega \). Thus, we take the expectation value of \(D_{KL}\) over \(\omega \), denoted as \(\langle \cdot \rangle _{\omega }\). Then the contribution of distinguishable information due to field fluctuations for hypersurce \(\Sigma _{t_i}\) is \(\langle D_{KL}(\rho [\phi , t_i] || \rho [\phi +\omega , t_i])\rangle _{\omega }\). Finally, we sum up the contributions from all hypersurfaces, lead to the definition of information metrics

$$\begin{aligned} I_f:= & {} \sum _{i=0}^{N-1}\langle D_{KL}(\rho [\phi , t_i] || \rho [\phi +\omega , t_i])\rangle _{\omega }\end{aligned}$$
(23)
$$\begin{aligned}= & {} \sum _{i=0}^{N-1}\int \mathcal {D}\omega p[\omega ] \int \mathcal {D}\phi \rho [\phi , t_i]ln \frac{\rho [\phi , t_i]}{\rho [\phi +\omega , t_i]}. \end{aligned}$$
(24)

Notice that \(p[\omega ]\) is a Gaussian-like distribution given in (15). When \(\Delta t\) is small, only small fluctuations \(\omega \) will contribute to \(I_f\). As shown in Appendix C, when \(\Delta t\rightarrow 0\), \(I_f\) turns out to be

$$\begin{aligned} I_f = \frac{\hbar }{4}\int \frac{1}{\rho [\phi , t]}(\frac{\delta \rho [\phi , t]}{\delta \phi (x)})^2 d^3\textbf{x} \mathcal {D}\phi dt . \end{aligned}$$
(25)

Equation (25) is analogous to the Fisher information for the probability density [36, 44] in non-relativistic quantum mechanics. Some literature directly adds such Fisher information term in the variation method as a postulate to derive the Schrödinger equation [41, 43]. But (25) bears much more physical significance than Fisher information. First, it shows that \(I_f\) is proportional to \(\hbar \). This is not trivial because it avoids introducing additional arbitrary constants for the subsequent derivation of the Schrödinger equation. More importantly, defining \(I_f\) using relative entropy opens up new results that cannot be obtained if \(I_f\) is defined using Fisher information, because there are other generic forms of relative entropy such as Rényi divergence or Tsallis divergence. As will be seen later, by replacing the Kullback-Leibler divergence with Rényi divergence, one will obtain a family of generalized Schrödinger equations.

Together with (9), (25), and (11), the total action functional is

$$\begin{aligned} S_t= & {} \int \rho \{\frac{\partial S}{\partial t} + \int [\frac{1}{2}(\frac{\delta S}{\delta \phi (x)})^2 + V(\phi (x))\nonumber \\{} & {} + \frac{\hbar ^2}{8}(\frac{1}{\rho }\frac{\delta \rho }{\delta \phi (x)})^2 ]d^3\textbf{x}\}\mathcal {D}\phi dt. \end{aligned}$$
(26)

Variation of \(S_t\) with respect to S gives the same continuity (10), while variation with respect to \(\rho \) leads to (see Appendix C)

$$\begin{aligned} \frac{\partial S}{\partial t} =- \int \{\frac{1}{2}(\frac{\delta S}{\delta \phi (x)})^2 + V(\phi (x)) - \frac{\hbar ^2}{2R}\frac{\delta ^2 R}{\delta \phi ^2(x)} \}d^3\textbf{x}, \end{aligned}$$
(27)

where \(R[\phi , t]=\sqrt{\rho [\phi , t]}\). The last term in the R.H.S. of (27) is the scalar field equivalence of the Bohm quantum potential [49]. In non-relativistic quantum mechanics, the Bohm potential is considered responsible for the non-locality phenomenon in quantum mechanics [50]. Its origin is mysterious. Here we show that it originates from the information metrics related to relative entropy, \(I_f\).

Defined a complex functional \(\Psi [\phi ,t]=R[\phi , t]e^{iS[\phi , t]/\hbar }\), the continuity equation and the extended Hamilton-Jacobi equation (27) can be combined into a single functional derivative equation (see Appendix C),

$$\begin{aligned} i\hbar \frac{\partial \Psi [\phi , t]}{\partial t} = \{\int [-\frac{\hbar ^2}{2}\frac{\delta ^2}{\delta \phi ^2(x)} + V(\phi (x))]d^3\textbf{x}\}\Psi [\phi , t]. \end{aligned}$$
(28)

This is the Schrödinger equation for the wave functional \(\Psi [\phi ,t]\) with Hamiltonian operator

$$\begin{aligned} \hat{\mathcal {H}} = -\frac{\hbar ^2}{2}\frac{\delta ^2}{\delta \phi ^2(x)} + V(\phi (x)). \end{aligned}$$
(29)

It governs the evolution of wave functional \(\Psi [\phi ,t]\) between hypersurfaces \(\Sigma _t\). The potential density in (28), for the massive scalar field, is given in (1) as \(V(\phi (x))=\frac{1}{2}([\nabla \phi (x)]^2+m^2[\phi (x)]^2)\). But it can be generalized to be

$$\begin{aligned} V(\phi (x))=\frac{1}{2}[\nabla \phi (x)]^2+\frac{m^2}{2}[\phi (x)]^2+ \lambda [\phi (x)]^3 + \lambda '[\phi (x)]^4+ \ldots \end{aligned}$$
(30)

where the coefficients \(\lambda \), \(\lambda '\), represent mass and other coupling constants. Once the Schrödinger equation for the wave functional \(\Psi [\phi ,t]\) is obtained, other standard results follow, such as the solutions for the wave functional and the energy of the ground state and excited state [37].

In summary, by recursively applying the same extended least action principle in two steps, we recover the uncertainty relation and the Schrödinger representations of the standard relativistic quantum theory of scalar field [37, 38]. In the first step, we analyze the dynamics of field fluctuations in a hypersurface \(\Sigma _t\) for a short period of time interval \(\Delta t\), and obtain the transitional probability density due to field fluctuations; In the second step, we apply the principle for a cumulative time period to obtain the dynamics laws that govern the evolutions of \(\rho \) and S between the hypersurfaces. The applicability of the same principle in both steps shows the consistency and simplicity of the theory, although the forms of Lagrangian density are different in each step. In the first step, the Lagrangian density \(\mathcal {L}\) is given by (1), while in the second step, we use a different form of Lagrangian density \(\mathcal {L}^\prime = \rho (\partial S/\partial t + H)\). As shown in Appendix A, \(\mathcal {L}\) and \(\mathcal {L}^\prime \) are related through an extended canonical transformation. The choice of Lagrangian \(\mathcal {L}\) or \(\mathcal {L}^\prime \) does not affect the variation outcome, that is, the form of Legendre’s equations. We choose \(\mathcal {L}^\prime \) as the Lagrangian density in the second step in order to use the pair of functional \((\rho , S)\) in the subsequent variation procedure.

It is important to point out that the derivation of (28) depends on a particular foliation of the Minkowski spacetime. Therefore, the theoretical framework presented here treats time parameter differently and it is not obvious if the theory is Lorentz invariance. The issue is extensively studied in [42, 43], and the answer is that the theory is still fully relativistic. This is because using the resulting Hamiltonian operator \(\hat{\mathcal {H}}\) given by (29) and (30), one can identify the generators for translation and rotation operations for both time-like and spatial-like directions, and these generators satisfy the Poincaré algebra[43]. Although the theory singles out a particular time parameter for use through the foliation of spacetime, the Poincaré algebra guarantees that the resulting dynamical evolution is fully relativistic. This is because satisfying this algebra guarantees that one can construct a Poincaré covariant stress-energy tensor for the dynamical variablesFootnote 3.

5 The Generalized Schrödinger Equation for the Wave Functional

As mentioned earlier, by relaxing the definition of the information metrics \(I_f\), one can generalize the Schrödinger equation for the wave functional. The term \(I_f\) is supposed to capture the additional distinguishability exhibited by the field fluctuations, and is defined in (23) as the summation of the expectation values of Kullback-Leibler divergence between \(\rho [\phi ,t]\) and \(\rho [\phi +\omega ,t]\). However, there are more generic definitions of relative entropy, such as the Rényi divergence [51, 53]. From an information theoretic point of view, it is legitimate to consider alternative definitions of relative entropy. Suppose we define \(I_f\) based on Rényi divergence,

$$\begin{aligned} I_f^{\alpha }:= & {} \sum _{i=0}^{N-1}\langle D^{\alpha }_R(\rho [\phi , t_i] || \rho [\phi +\omega , t_i])\rangle _{\omega }\end{aligned}$$
(31)
$$\begin{aligned}= & {} \sum _{i=0}^{N-1}\int \mathcal {D}\omega p[\omega ] \frac{1}{\alpha -1}ln (\int \mathcal {D}\phi \frac{\rho ^{\alpha }[\phi , t_i]}{\rho ^{\alpha -1}[\phi +\omega , t_i]}). \end{aligned}$$
(32)

Parameter \(\alpha \in (0,1)\cup (1, \infty )\) is called the order of Rényi divergence. When \(\alpha \rightarrow 1\), \(I_f^{\alpha }\) converges to \(I_f\) as defined in (23). In Appendix D, we show that using \(I_f^{\alpha }\) and following the same variation principle, we arrive at a similar extended Hamilton-Jacobi equation as (27),

$$\begin{aligned} \frac{\partial S}{\partial t} =- \int \{\frac{1}{2}(\frac{\delta S}{\delta \phi (x)})^2 + V(\phi (x)) - \frac{\alpha \hbar ^2}{2R}\frac{\delta ^2 R}{\delta \phi ^2(x)} \}d^3\textbf{x}, \end{aligned}$$
(33)

with an additional coefficient \(\alpha \) appearing in the Bohm quantum potential term. Defined a complex functional \(\Psi _{\alpha }[\phi ,t]=R[\phi , t]e^{iS[\phi , t]/\sqrt{\alpha }\hbar }\), the continuity equation  and the extended Hamilton-Jacobi equation (33) can be combined into an equation similar to the Schrödinger equation (see Appendix D),

$$\begin{aligned} i\sqrt{\alpha }\hbar \frac{\partial \Psi _{\alpha }[\phi ,t]}{\partial t} = \{\int [-\frac{\alpha \hbar ^2}{2}\frac{\delta ^2}{\delta \phi ^2(x)} + V(\phi (x))]d^3\textbf{x}\}\Psi _{\alpha }[\phi , t]. \end{aligned}$$
(34)

When \(\alpha =1\), the regular Schrödinger equation of wave functional (28) is recovered, as expected. Equation (34) gives a family of linear equations for each order of Rényi divergence.

As observed in Appendix D, if we define \(\hbar _{\alpha }= \sqrt{\alpha }\hbar \), then \(\Psi _{\alpha }[\phi ,t]=R[\phi , t]e^{iS[\phi , t]/\hbar _{\alpha }}\), and (34) becomes the same form of the regular Schrödinger equation (28) but with replacement of \(\hbar \) to \(\hbar _{\alpha }\). It is as if there is an intrinsic relation between the order of Rényi divergence \(\alpha \) and the Plank constant \(\hbar \). This remains to be investigated further. On the other hand, if the wavefunction is defined as usual without the factor \(\sqrt{\alpha }\), \(\Psi [\phi ,t]=R[\phi , t]e^{iS[\phi , t]/\hbar }\), it will result in a nonlinear Schrödinger equation for the wave functional. This implies that the linearity of Schrödinger equation depends on how the wave functional is defined from the pair of real functional \((\rho , S)\).

We also want to point out that \(I_f^{\alpha }\) can be defined using Tsallis divergence [52, 54] as well, instead of using the Rényi divergence,

$$\begin{aligned} I_f^{\alpha }:= & {} \sum _{i=0}^{N-1}\langle D^{\alpha }_T(\rho [\phi , t_i] || \rho [\phi +\omega , t_i])\rangle _{\omega }\nonumber \\= & {} \sum _{i=0}^{N-1}\int \mathcal {D}\omega p[\omega ] \frac{1}{\alpha -1}\{\int \mathcal {D}\phi \frac{\rho ^{\alpha }[\phi , t_i]}{\rho ^{\alpha -1}[\phi +\omega , t_i]} -1\}. \end{aligned}$$
(35)

When \(\Delta t\rightarrow 0\), it can be shown that the \(I_f^\alpha \) defined above converges into the same form as (D4). Hence it results in the same generalized Schrödinger (34).

6 Discussion and Conclusions

6.1 Alternative Formulation of the Extended Least Action Principle

We mention in Section 3 that the extended least action principle can be restated as the principle of least observability by interpreting \(I_p=2S_c/\hbar \) as the observable information of the classical field. \(I_p\) is not a conventional information metric but can be considered carrying meaningful physical information. To see this connection, recall that the classical action is defined as an integral of the Lagrangian over the spaceetime. There are two aspects to understanding the action functional. A larger value of action indicates 1.) the more dynamic effort the system exhibits; and 2.) the easier to detect physical variables in the field, or in other words, the more physical information available for potential observation. Thus, action \(S_c\) not only quantifies the dynamic effort of the field, but also is associated with the detectability, or observability, of the field during dynamics. In classical mechanics, we focus on the first aspect via the least action principle, and derive the law of dynamics from minimizing the action effort. The second aspect is not useful since we cannot quantify  the intuition that S is associated with the observability of the physical object. One reason is that there is no natural unit of action to convert S into an information related metric. The introduction of the Planck constant in Assumption 2 helps to quantify this intuition.

Alternatively, we can interpret the least observability principle based on (12) as minimizing \(I_f\) with the constraint of \(S_c\) being a constant, and \(\hbar /2\) simply being a Lagrangian multiplier for such a constraint. Again, mathematically, it is an equivalent formulation. In that case, Assumption 2 is not needed. Instead it will be replaced by the assumption that the classical action functional \(S_c\) is a constant with respect to variations on \(\rho \) and S. But such an assumption needs sound justification. Which assumption to use depends on which choice is more physically intuitive. We believe that the least observability principle based on Assumption 2, where the Planck constant defines the discrete unit of action effort to exhibit observable information, gives more intuitive physical meaning of the mathematical formulation and without the need of a physical model for the field fluctuations.

6.2 Comparisons with Relevant Research Works

The Schrödinger equation for the wave functional of scalar fields is typically introduced as a postulate [37, 38] instead of derived from a first principle. An impressive attempts to derive it from the entropic dynamics approach can be found in Ref. [41, 43]. The entropic dynamic approach bears some similarity with the theory presented in this work. For instance, the formulations are carried out with two steps, an infinitesimal time step and a cumulative time period. It also aims to derive the physical dynamics by extremizing information quantity such as the relative entropy. However, the entropic dynamics approach relies on another postulate on energy conservation to complete the derivation of the Schrödinger equation. The theory presented in this paper, on the other hand, has the advantage of simplicity since it recursively applies the same least observability principle in both infinitesimal time step and cumulative time period. The entropic dynamics approach also requires several seemingly arbitrary constants in their formulations, while we only need the Planck constant \(\hbar \) and its meaning is clearly given in Assumption 2. We clearly show that the Bohm potential term in (27) is originated from the information metrics of field fluctuations \(I_f\), while  [41,42,43] justify it from information geometry perspective. The advantages of our approach have two fold. First, it is far more conceptually clear to define \(I_f\) as expectation value of relative entropy between different probability distribution due to field fluctuations. There is clear physical meaning associated with \(I_f\). Second, we show that by using the general definition of relative entropy for \(I_f\) we obtain the generalized Schrödinger equation, which is unclear using the information geometry justification. Despite the difference between the present works and the entropic dynamics approach, it is encouraged to notice the common interests. In particular, the results in [42, 43] can be useful if we want to extend the present works to the scalar fields in curved spacetime.

The derivation of the Schrödinger equation in Section 4.2 starts from (9) which is inspired from its non-relativistic version initially proposed by Hall and Reginatto [46, 47]. Ref. [36] gives a rigorous justification to the non-relativistic version of (9) using canonical transformation method. In Appendix A, we extend the canonical transformation method to scalar fields and prove (9). Hall and Reginatto [46, 47] only show the formulations in the non-relativistic setting. Even in the non-relativistic formulations, Hall and Reginatto assume an so-called exact uncertainty relation, while in our theory the exact uncertainty relation is derived from the same least observability principle in a infinitesimal time step.

6.3 Limitations and Future Researches

Assumption 1a makes minimal assumptions on the field fluctuations, but does not provide a more concrete physical model for the field fluctuations. The underlying physics for the field fluctuations is expected to be complex but crucial for a deeper understanding of quantum field theory. It is beyond the scope of this paper. The intention here is to minimize the assumptions that are needed to derive the Schrödinger equation for the wave functional, so that future research can just focus on justifying these assumptions.

As shown in the appendix, the infinite dimension integration over the field variable \(\phi (\textbf{x})\) is approximated as a N dimensional integral, then we take the limit \(N\rightarrow \infty \). This essentially assumes a uniform Lebesgue measure. There is argument that probability integration measure is needed to ensure consistency between Fock representation and Schödinger representation [39]. More rigorous mathematical treatment of infinite dimension integration is desirable. We also assume that the probability density \(\rho [\phi ]\) and its first order of functional derivative approach zero when \(|\phi |\rightarrow \infty \). These assumptions are intuitive and give the correct results, but it is valuable to seek for stronger justifications.

The formulations presented in this paper is based on the flat Minkowski spacetime. We expect it is possible to extend the formulations to curved spacetimes and derive the Schödinger equation for curved spacetime. Furthermore, it would be interesting to investigate whether the least observability principle can be applied to non-scalar fields such as fermion matter fields whose equation of motion is the Dirac equation.

6.4 Conclusions

The extended least action principle, or least observability principle, which is initially proposed to derive the non-relativistic quantum theory [36], is applied here to the scalar field theory. We successfully obtain the Schrödinger equation for the wave functional of the scalar field using the mathematical framework based on the principle. The Schrödinger equation of the wave functional is the fundamental equation for the quantum scalar field theory in the Schrödinger picture, and it is typically introduced as a postulate. Here we derive it from a first principle. The Schrödinger equation enables one to calculate other standard results for the scalar fields, such as the solutions for the wave functional and the energy of the ground state and excited states[37, 38].

The least observability principle illustrates how classical field theory becomes quantum field theory from the information perspective. These are captured in the two assumptions stated in Section 3. Assumption 2 points out that the Planck constant defines the discrete unit of action that a field configuration needs to exhibit in its dynamics in order to be observable. Classical field theory corresponds to a theory when such a lower limit of discrete action effort is approximated as zero. Assumption 1a demands new metrics to measure the additional observable information exhibited from field fluctuations, which is then converted to additional action using Assumption 2. These new information metrics are defined in terms of relative entropy to measure the information distances of different probability distributions caused by field fluctuations. To derive quantum theory, the extended least action principle seeks to minimize the total action from both classical field dynamics and additional field fluctuations. Nature appears to behave in a most economic fashion and exhibits as least observable information as possible. Furthermore, defining the information metrics \(I_f\) using Rényi divergence in the extended least action principle leads to a generalized Schrödinger equation (34) that depends on the order of Rényi divergence. At this point it is inconceivable that one will find physical scenarios for which the generalized Schrödinger equation for the wave functional with \(\alpha \ne 1\) is applicable. However, the generalized Schrödinger equation is legitimate from an information perspective. It confirms that the mathematical framework based on the extended least action principle can produce new results.

The works in Ref. [36] and this paper show that the extended least action principle can be applied to derive both non-relativistic quantum mechanics and relativistic quantum scalar field theory, demonstrating the versatility of the frameworks based on the principle. Extending the present work to scalar fields in curved spacetime is highly feasible. It is also reasonable to speculate the principle can be applied to obtain the quantum theory for non-scalar fields such as fermion matter fields, though it can be much more challenging since the structure of Lagrangian density for non-scalar fields is complicated.

Lastly, the extended least action principle also brings in interesting implications on the interpretation aspects of quantum mechanics, including new insights on quantum entanglement, which will be reported separately.