1 Introduction

Unlike the physics of the microscopic structure of sub-atomic particles (e.g. ‘core’ of an electron), much more is physically known about the microscopic structure of dislocations and their mutual interactions, as well as their interactions with applied loads, within a (nonlinearly) elastic crystal, both through direct experimental observation and through lattice statics/molecular dynamics/density functional theory calculations. Due to this knowledge, physically well-justified and transparent mathematical models can be posited for the phenomena, with the possibility of systematic refinement to include more detail when deemed necessary after mathematical study and comparison with experiment. There is a long and distinguished history of the study of dislocations in elasticity in the classical setting, see, e.g., [15], the continuously distributed setting, e.g., [68], [9, including second-order effects] and [10], and the connections of some of the kinematic aspects of dislocations to non-Riemannian Geometry [7, 11, 12]. As well, techniques for developing well-set, classical thermomechanical theories of the mechanics of continuous media comprising different types of materials exhibiting strongly nonlinear behavior and satisfying the relevant invariances and material symmetries are available [1316] and [10]. These ideas and techniques have been synthesized and extended to produce the theory/model of dislocation mechanics stated in [17], as reviewed in [18]. The theory admits the minimal specification of an energy density function \(\psi (W)\), where \(W\) is the inverse elastic distortion field (not necessarily a gradient), and that of a dislocation velocity field, the function \(V_{s}(\alpha , W, \rho )\) in (3), which, when guided by the requirements of being proportional to its derived thermodynamic driving force, is a specified function of the thermodynamically derived Cauchy stress tensor

$$T_{ij} = - \rho W_{ki} \psi '_{kj} $$

and the dislocation density tensor \(\alpha _{ij}\), admitting a scalar or matrix of material constants representing dislocation mobility. Here,

$$\psi '_{ij} = \partial _{W_{ij}} \psi , $$

and it suffices to use a rectangular Cartesian coordinate system and tensor components w.r.t its basis in this Section. The time variable is represented by the symbol \(t\) and not used as an index.

For prescribed static dislocation fields the framework is shown to be able to compute the stress and energy fields of such distributions in bodies of arbitrary geometry and general elastic symmetries [19, 20]. Similarly for prescribed dislocation velocity field, the setup is shown to be able to compute the evolution of the dislocation field [20]. And the evolution in the fully coupled case also has been shown to work well to predict nonsingular dislocation cores, dislocation annihilation, dissociation and stress-mediated interaction when restricted to dislocation motion within a planar layer in a 3–d body [21] within a ‘small deformation’ ansatz.

The phenomenon of macroscopic plasticity of crystalline materials corresponds to the collective dynamical behavior of a very large number of dislocation curves in an elastic body under generally time-dependent loads. While experimental observations and real practical applications of plasticity abound, it is fair to say that there does not exist a fundamental theory that arises as a coarse-graining of nonlinear dislocation dynamics as described above (or by any other model). The phenomenon of plasticity shows fascinating dynamical changes as a function of initial conditions and tamely evolving driving loads - e.g., yielding, Stage I, II, III, IV behaviors as a function of applied load temperature and initial crystal orientation, intricate patterned dislocation microstructure formation such as cells and sub-grain boundaries to name only a few - with no established fundamental theory for understanding them (the phenomenon is even richer, with rapidly driven situations also being of theoretical and practical interest). It is in this context that we would like to use a path integral implementation of the dynamics represented by (3) to evaluate how much of the reality of macroscopic plasticity can be understood by the combination of the model and the technique. The rough expectation is to be able to interpret drastic changes of overall behavior observed in reality as statistical phase transitions as understood in Effective/Statistical Field Theory.

A first step in this program is to define an action functional for the system (3) which, in the first instance, does not emanate from a variational principle; it is this objective that is tackled in Sects. 2, 3, and 4, refining the work in [18] following the ideas in [22]. Section 5 develops the variational principle accounting for a specified set of initial and boundary conditions for a nonlinear system of second-order PDE expressed as a first-order system.

Variational principles for ‘small deformation,’ static dislocation mechanics and internal stress problems by the method of ‘eigenstrains’ is presented in [23, 24], and there is a modern literature involving rigorous analysis reviewed, in detail in [19]. Our work involves finite-deformation, nonlinear dislocation dynamics including inertia.

2 The Essential Idea: An Optimization Problem for an Algebraic System of Equations

I thank Vladimir Sverak for insisting on the ‘simplest,’ transparent explanation of the ideas in [18, 22]. This brief Section is a result of that effort.

Consider a generally nonlinear system of algebraic equations in the variables \(x \in \mathbb{R}^{n}\) given by

$$ A_{\alpha }(x) = 0, $$
(1)

where \(A: \mathbb{R}^{n} \to \mathbb{R}^{N}\) is a given function (a simple example would be \(A_{\alpha }(x) = \bar{A}_{\alpha i} \, x^{i} - b_{\alpha}\), \(\alpha = 1 \ldots N\), \(i = 1 \ldots n\), where \(\bar{A}\) is a constant matrix, not necessarily symmetric (when \(n = N\)), and \(b\) is a constant vector). We allow for all possibilities \(0 < n \lesseqqgtr N > 0\).

The goal is to construct an objective function whose critical points solve the system (1) (when a solution exists) by defining an appropriate \(x^{*} \in \mathbb{R}^{n}\) satisfying \(A_{\alpha }(x^{*}) = 0\).

For this, consider first the auxiliary function

$$ \widehat{S}_{H}(x,z) = z^{\alpha }A_{\alpha }(x) + H(x) $$

(where \(H\) belongs to a class of scalar-valued function to be defined shortly) and define

$$ S_{H}(z) = z^{\alpha }A_{\alpha}(x_{H} (z)) + H(x_{H}(z)) $$

with the requirement that the system of equations

$$ z^{\alpha }\frac{\partial A_{\alpha}}{\partial x^{i}}(x) + \frac{\partial H}{\partial x^{i}}(x) = 0 $$
(2)

be solvable for the function \(x = x_{H}(z)\) through the choice of \(H\), and any function \(H\) that facilitates such a solution qualifies for the proposed scheme.

In other words, given a specific \(H\), it should be possible to define a function \(x_{H}(z)\) that satisfies

$$ z^{\alpha }\partial _{x^{i}} A_{\alpha }(x_{H}(z)) + \partial _{x^{i}} H (x_{H}(z)) = 0 \quad \forall z \in \mathbb{R}^{N} $$

(the domain of the function \(x_{H}\) may accommodate more intricacies, but for now we stick to the simplest possibility). Note that (2) is a set of \(n\) equations in \(n\) unknowns regardless of \(N\) (\(z\) for this argument is a parameter).

Assuming this is possible, we have

$$ \frac{\partial S_{H}}{\partial z^{\beta}} (z) = A_{\beta}(x_{H}(z)) + \left ( z^{\alpha }\frac{\partial A_{\alpha}}{\partial x^{i}}(x_{H}(z)) + \frac{\partial H}{\partial x^{i}}(x_{H}(z)) \right ) \frac{\partial x^{i}_{H}}{\partial z^{\beta}}(z) = A_{\beta}(x_{H}(z)), $$

using (2). Thus,

  • if \(z_{0}\) is a critical point of the objective function \(S_{H}\) satisfying \(\partial _{z^{\beta}} S_{H}(z_{0}) = 0\), then the system \(A_{\alpha}(x) = 0\) has a solution defined by \(x_{H}(z_{0})\);

  • if the system \(A_{\alpha}(x) = 0\) has a unique solution, say \(y\), and if \(z^{H}_{0}\) is any critical point of \(S_{H}\), then \(x_{H}\left (z^{H}_{0} \right ) = y\), for all admissible \(H\).

  • If \(A_{\alpha}(x) = 0\) has non-unique solutions, but \(\partial _{z^{\beta}} S(z) = 0\) (\(N\) equations in \(N\) unknowns) has a unique solution for a specific choice of the function \(z \mapsto x_{H}(z)\) related to a choice of \(H\), then such a choice of \(H\) may be considered a selection criterion for imparting uniqueness to the problem \(A_{\alpha}(x) = 0\).

  • Finally, to see the difference of this approach with the Least-Squares (LS) Method, we note that the optimality condition for the objective \(A_{\alpha}(x) A_{\alpha}(x)\) is \(A_{\alpha}(x) \partial_{x^{i}} A_{\alpha}(x) = 0\), and this does not imply \(A_{\alpha}(x) = 0\).

    For a linear system \(\bar{A} x = b\), the LS governing equations are given by

    $$\bar{A}^{T} \bar{A} z = \bar{A}^{T} b, $$

    with LS solution defined as \(z\) even when the original problem \(\bar{A} x = b\) does not have a solution (i.e., when \(b\) is not in the column space of \(\bar{A}\)). The LS problem always has a solution, of course. In contrast, in the present duality-based approach with quadratic \(H(x) = \frac{1}{2} x^{T} x\) the governing equation is

    $$\bar{A}\bar{A}^{T} z = b $$

    with solution to \(\bar{A} x = b\) given by \(x = \bar{A}^{T} z\), and the problem has a solution only when \(\bar{A} x = b\) has a solution, since the column spaces of the matrices \(\bar{A}\) and \(\bar{A}\bar{A}^{T}\) are identical.

    As a practical matter, the latter approach appears to have, at least in principle, advantages for solving large, consistent, underdetermined systems as the size of the matrix \(\bar{A}\bar{A}^{T}\) is much smaller than that of \(\bar{A}^{T} \bar{A}\) in this situation, with due consideration paid to conditioning-related robustness issues (cf. [30, 32], [31, pp. 299-300]).

3 A Class of Variational Principles for Nonlinear Dislocation Mechanics

We implement the idea of Sect. 2 to define an action(s) for the nonlinear partial differential equations of dislocation mechanics given by

$$ \begin{aligned} 0 & = e_{jrs} \partial _{r} W_{is} + \alpha _{ij}, \\ 0 & = \partial _{t} W_{ij} + \partial _{j} (W_{ik} v_{k}) - v_{k} e_{rkj} \alpha _{ir} - e_{jrs} \alpha _{ir} V_{s}(\alpha , W, \rho )\\ & = \partial _{t} W_{ij} + v_{k} \partial _{k} W_{ij} + W_{ik} \partial _{j} v_{k} - e_{jrs} \alpha _{ir} V_{s}(\alpha ,W, \rho ), \\ 0 & = \partial _{t} \rho + \partial _{k} (\rho v_{k}), \\ 0 & = \partial _{t} (\rho v_{i}) + \partial _{j} (\rho v_{i} v_{j}) + \partial _{j} (\rho W_{ki} \psi '_{kj}). \end{aligned} $$
(3)

The physical basis of this system of equations is explained in [18]. Briefly, the first equation is the equation of elastic incompatibility. The second reflects the compatibility between the velocity gradient, the rate of the change of the elastic distortion and the rate of permanent deformation produced by the motion of dislocations. The third equation is balance of mass, and the fourth, the balance of linear momentum. Setting \(\alpha = 0\) in the system above gives the equations of nonlinear elasticity written in an Eulerian setting.

First define the functional

$$\begin{aligned} &\widehat{S}_{H}[A,W,\theta ,\rho ,\lambda ,v,B,\alpha ] \\ &\quad = \int _{[0,T] \times \Omega} \, dt d^{3}x - W_{ij} \partial _{t} A_{ij} - W_{ik} v_{k} \partial _{j} A_{ij} - A_{ij} v_{k} e_{rkj} \alpha _{ir} - A_{ij} e_{jrs} \alpha _{ir} V_{s}(\alpha , W, \rho ) \\ &\qquad - \rho \partial _{t} \theta - \rho v_{k} \partial _{k} \theta - \rho v_{i} \partial _{t} \lambda _{i} - \rho v_{i} v_{j} \partial _{j} \lambda _{i} - \rho W_{ki} \psi '_{kj} \partial _{j} \lambda _{i} \\ &\qquad - e_{jrs} W_{is} \partial _{r} B_{ij} + B_{ij} \alpha _{ij} + H(W, \rho , v, \alpha ), \end{aligned}$$

which is obtained by converting (3) to scalar form by taking inner products with the ‘dual’ fields

$$ D = (A, \theta , \lambda , B), $$

integrating by parts on the space-time domain assuming the dual fields vanish on the boundary of the domain, and adding the potential \(H\). Now define

$$ U := (W, \rho , v, \alpha ) \quad \text{and} \quad \mathcal{D}:= ( \partial _{t} A, \nabla A, A, \partial _{t} \theta , \nabla \theta , \partial _{t} \lambda , \nabla \lambda , \nabla B, B) $$

(note ‘\(\mathcal{D}\neq D\)’) and require that there exists a function

$$ U_{H}(\mathcal{D}) = (W_{H}(\mathcal{D}),\rho _{H}(\mathcal{D}),v_{H}( \mathcal{D}), \alpha _{H}(\mathcal{D})) $$
(4)

such that for the functional \(S_{H}[A, \theta , \lambda , B]\) of the dual fields defined as

$$\begin{aligned} &\int _{[0,T]\times \Omega} \, dt d^{3}x \ \mathcal{L}_{H}(\mathcal{D}, U_{H}(\mathcal{D})) = S_{H} [A, \theta , \lambda , B] \\ & \quad := \widehat{S}_{H} [A, W_{H}(\mathcal{D}), \theta , \rho _{H}(\mathcal{D}), \lambda , v_{H}( \mathcal{D}), B, \alpha _{H}(\mathcal{D})], \end{aligned}$$
(5)

the first variation is given by (we suppress the subscript H on the elements of \(U_{H}\) for notational simplicity)

$$ \begin{aligned} \delta S_{H} &= \int _{[0,T]\times \Omega} \, dt d^{3}x - W_{ij}( \mathcal{D}) \partial _{t} \delta A_{ij} - W_{ik} (\mathcal{D}) v_{k}( \mathcal{D}) \partial _{j} \delta A_{ij} - \delta A_{ij} v_{k} ( \mathcal{D}) r_{rkj} \alpha _{ir} (\mathcal{D}) \\ & \quad - \delta A_{ij} e_{jrs} \alpha _{ir} ( \mathcal{D}) V_{s} (\alpha (\mathcal{D}), W(\mathcal{D}), \rho ( \mathcal{D})) \\ & - \rho (\mathcal{D}) \partial _{t} \delta \theta - \rho ( \mathcal{D}) v_{k} (\mathcal{D}) \partial _{k} \delta \theta \\ & - \rho (\mathcal{D}) v_{i}(\mathcal{D}) \partial _{t} \delta \lambda _{i} - \rho (\mathcal{D}) v_{i}(\mathcal{D}) v_{j}( \mathcal{D}) \partial _{j} \delta \lambda _{i} - \rho (\mathcal{D}) W_{ki}( \mathcal{D}) \psi '_{kj}(W(\mathcal{D})) \partial _{j} \delta \lambda _{i} \\ & - e_{jrs} W_{is} (\mathcal{D}) \partial _{r} \delta B_{ij} + \delta B_{ij} \alpha _{ij}(\mathcal{D}), \end{aligned} $$
(6)

a condition that is satisfied if the system

$$\begin{aligned} \frac{\partial \mathcal{L}_{H}}{\partial W_{lp}}& = - \partial _{t} A_{lp} - v_{p} \partial _{j} A_{lj} - A_{ij} e_{jrs} \alpha _{ir} \frac{\partial V_{s}}{\partial W_{lp}} (\alpha , W, \rho ) - e_{jrp} \partial _{r} B_{lj} \\ &- \rho \left ( \psi '_{lj}(W) \partial _{j} \lambda _{p} \right . \left . + \ W_{ki} \psi ''_{kjlp} \partial _{j} \lambda _{i} \right ) + \frac{\partial H}{\partial W_{lp}}(W,\rho , v, \alpha ) = 0, \\ \frac{\partial \mathcal{L}_{H}}{\partial \rho}& = - A_{ij} e_{jrs} \alpha _{ir} \frac{\partial V_{s}}{\partial \rho} (\alpha , W, \rho ) - \partial _{t} \theta - v_{k} \partial _{k} \theta - v_{i} \partial _{t} \lambda _{i} - v_{i} v_{j} \partial _{j} \lambda _{i} - W_{ki} \psi '_{kj} \partial _{j} \lambda _{i} \\ & + \frac{\partial H}{\partial \rho} (W, \rho , v, \alpha ) = 0, \\ \frac{\partial \mathcal{L}_{H}}{\partial v_{p}}& = - W_{ip} \partial _{j} A_{ij} - A_{ij} e_{rpj} \alpha _{ir} - \rho \partial _{p} \theta - \rho \partial _{t} \lambda _{p} - \rho v_{j} \partial _{j} \lambda _{p} - \rho v_{i} \partial _{p} \lambda _{i} \\ & + \frac{\partial H}{\partial v_{p}} (W, \rho , v, \alpha ) = 0, \\ \frac{\partial \mathcal{L}_{H}}{\partial \alpha _{lp}}& = - A_{lj} v_{k} e_{pkj} - A_{lj} e_{jps} V_{s}(\alpha , W, \rho ) - A_{ij} e_{jrs} \alpha _{ir} \frac{\partial V_{s}}{\partial \alpha _{lp}} (\alpha , W, \rho ) + B_{lp} \\ & + \frac{\partial H}{\partial \alpha _{lp}}(W,\rho , v, \alpha ) = 0, \end{aligned}$$
(7)

can be solved in the form of

$$ (W,\rho , v, \alpha ) = U_{H}(\mathcal{D}). $$

This is so, since solving (7) defines \(U_{H}(\mathcal{D})\) that ensures \(\frac{\partial \mathcal{L}_{H}}{\partial U} (\mathcal{D}, U_{H}( \mathcal{D})) = 0\) which then implies

$$ \frac{\partial \mathcal{L}_{H}}{\partial U} (\mathcal{D}, U_{H}( \mathcal{D})) \cdot \frac{\partial U_{H}}{\partial \mathcal{D}} ( \mathcal{D}) \cdot \delta \mathcal{D}= 0 \quad \text{for all} \ \mathcal{D}. $$

Note that (6) then is simply

$$ \delta S_{H} = \int _{[0,T]\times \Omega} \, dt d^{3}x\ \dfrac{\partial \mathcal{L}_{H}}{\partial \mathcal{D}} \cdot \delta \mathcal{D}, $$

and requiring

$$ \delta S_{H} = 0 \ \text{for all variations}\ \delta D \ \text{that vanish on the boundary of} \ \Omega \times [0,T] $$

shows that the Euler-Lagrange equations of the functional \(S_{H}\) defined in (5) are the equations of (3) with the substitution

$$ (W,\rho ,v, \alpha ) = U_{H}(\mathcal{D}). $$

This is so because \(\mathcal{L}_{H}\) is necessarily linear in its first argument, see (5).

To summarize, the primal equations (3) of dislocation mechanics are the Euler-Lagrange equations of any of the dual functionals, written in terms of particular specific combinations (mappings) of the dual fields for each choice of the function \(H\), each specific mapping defining the primal fields. Thus one may think of the primal fields as “gauge invariant” observable combinations of the dual fields (“gauge fields”) satisfying one specific set of equations (the primal system). While this is not how gauge fields appear in traditional gauge theories of physics, it is interesting that a completely different starting point and approach raise somewhat similar invariance structures that may be interpreted as symmetries.

As for the plausibility of being able to solve the algebraic system (7) given a specific \(\mathcal{D}\), consider \(H\) to be separately quadratic in each of its arguments, say \(U_{A}\), with large in magnitude coefficient, so, e.g., \(H = \frac{1}{2} \alpha _{W} W_{ij} W_{ij} + \cdots \), with \(1 \ll |\alpha _{W}|\). Then, assuming the solution of the Euler-Lagrange equations are bounded in some appropriate sense, (7) can indeed be solved to define \(U_{H}(\mathcal{D})\), and it has to be made sure that the solutions of the Euler-Lagrange equations (using this function) indeed satisfy the assumed bounds. To ensure that this latter condition is satisfied one has a large class of \(H\) functions to operate with but, at any rate, this is a delicate question of analysis, including how the requirement can be relaxed if required, in the context of solving the dual variational problem (and not necessarily its Euler-Lagrange equations).

We end this section with the following remarks:

  • Our system (3) does not involve multi-valued fields or non-simply connected domains for defining dislocation dynamics, but is fully capable of representing the topological charge of dislocation lines with its ingredients.

  • Based on the explorations of stress-coupled dislocation motion presented in [21, 25], the ‘primal’ system requires a ‘core-energy’ in the form of the dependence of the energy function \(\psi \) on the dislocation density \(\alpha \) as well. This results in the dislocation velocity depending on the \(\text{curl}\, \alpha \). Such a dependence is accommodated within our ‘action-generating’ scheme by adding an extra variable and equation to the system (3) of the form \(e_{jrs} \partial _{r} \alpha _{is} = \beta _{ij}\) and writing the dislocation velocity as \(V_{s} = V_{s}(\alpha , W, \rho , \beta )\). This would have the effect of increasing the number of fields in the dual problem as well.

    It is an interesting question whether the precise definition of a formally ‘small’ core energy contribution with a small parameter representing microscopic physics can make a difference in the development of an accurate model for the prediction of macroscopic behavior, and whether such a device should be allowed in the class of models admitted. Physically, in the context of the physics of dislocation dynamics, there appears to be no reason to exclude the possibility of the importance of such effects and, in fact, allows more precise physics to be incorporated in the description of gross macroscopic behavior (which is, admittedly, a double-edged sword in the context of coarse-graining). Some evidence to support such an expectation is also provided by the mathematically rigorous study of the inviscid Burgers equation, ‘regularized’ by a small viscous effect in one case and by dispersion in another [2628].

    Based on the above observation, one advantage of the ‘dual’ formalism proposed herein may be that when the microscopic physics to be added is not even qualitatively understood with certainty, working with a regularization on the dual side, may be guided solely by the aim of producing a ‘good’ dual extremal, i.e., with guaranteed existence in an appropriate function space. Doing so appears to require no modification to the physics of the primal problem, and then the limit of dual solutions, as the regularization parameter vanishes, may be studied.

  • In the context of an action functional that simply has as its Euler-Lagrange equation the given system of PDEs, the proposed scheme delivers, at least formally and under the stated requirements, what is needed. However, if the action functional is to be used in a path integral, dual fields \(D\) other than extremals matter as well. In this sense it is reasonable to demand that the added potential \(H\) in \(\mathcal{L}_{H}\) be subject to further requirements of invariance that may obstruct the inversion process required to define the function \(U_{H}(\mathcal{D})\). In case such a restriction is so severe as not to allow the definition of even a single ‘change of variables’ (\(U_{H}(\mathcal{D})\)), through the choice of some \(H\)), one can retain both the fields \(W\), \(A\) and still obtain a relevant action functional, as shown in [18].

4 Linear Dislocation Mechanics

We illustrate the proposed technique with a very closely related one (using a Legendre transform, cf. [22, 29]) in the simplified setting of linear dislocation mechanics with a prescribed dislocation velocity field \(V\) in space-time along with the ansatz

$$\begin{aligned} U_{ij} & := \delta _{ij} - W_{ij} \\ T_{ij} & := C_{ijkl} U_{kl}, \end{aligned}$$

ignoring all nonlinearities in (3) and assuming the mass density field \(\rho \) to a be specified field. The ansatz is justified for small elastic distortions \((U)\) about the ground state (cf., [18]). We note that \(C_{ijkl}\) is necessarily symmetric in \((k,l)\) and \((i,j)\) so that it is not invertible on the space of all second order tensors (and hence the stress only depends on the elastic strain, the symmetric part of \(U\)). With these assumptions, the system (3) may be expressed as

$$ \begin{aligned} 0 & = \partial _{t} (\rho v_{i}) - \partial _{j} (C_{ijkl} U_{kl}), \\ 0 & = \partial _{j} v_{i} - \partial _{t} U_{ij} - e_{jrs} \alpha _{ir} V_{s}, \\ 0 & = e_{jrs} \partial _{r} U_{is} - \alpha _{ij}. \end{aligned} $$
(8)

Taking inner products of these equations with the dual fields \(D = (\lambda , A, B)\) that vanish on the boundary and utilizing an arbitrary function \(M\) convex in the list of arguments \(M(v, U, \alpha )\) we define the functional

$$\begin{aligned} \widehat{S}[A,U,B, \alpha , \lambda , v] = \int _{[0,T]\times \Omega} \, dt d^{3}x & \ v_{i} (- \partial _{j} A_{ij} - \rho \partial _{t} \lambda _{i}) \\ & \ + U_{ij} ( \partial _{t} A_{ij} - e_{sjr} \partial _{r} B_{is} + C_{ijkl} \partial _{l} \lambda _{k}) \\ & \ + \alpha _{ir} (- A_{ij} e_{jrs} V_{s} - B_{ir}) - M(U, \alpha , v). \end{aligned}$$

Defining

$$ p := (-\partial _{j} A_{ij} - \rho \partial _{t} \lambda _{i}, \partial _{t} A_{ij} - e_{sjr} \partial _{r} B_{is} + C_{ijkl} \partial _{l} \lambda _{k}, - A_{ij} e_{jrs} V_{s} - B_{ir}); \quad Q := (v, U, \alpha ), $$

and \(M^{*}(p)\) the Legendre transform of \(M(Q)\) given by

$$ \begin{aligned} (v_{M}(p), U_{M}(p), \alpha _{M}(p)) =: Q_{M}(p) &= (\partial _{Q} M)^{-1}(p) \\ M^{*}(p) & = Q_{M}(p) \cdot p - M( Q_{M}(p)) \\ \partial _{p} M^{*}(p) & = Q_{M}(p) \end{aligned} $$
(9)

(well-defined because of the convexity of \(M(Q)\)), we define the dual action, \(S_{M}[D]\),

$$ \widehat{S}[A,U_{M}(p), B, \alpha _{M}(p), \lambda , v_{M}(p)] =: S_{M}[D] = \int _{[0,T]\times \Omega} \, dt d^{3}x \ M^{*}(p) $$

whose first variation is given by (after an integration by parts)

$$\begin{aligned} \delta S_{M} & = \int _{[0,T]\times \Omega} dt d^{3}x \ Q_{M}(p)\, \delta p \\ & = \int _{[0,T]\times \Omega} dt d^{3}x \ \delta \lambda _{i} \left ( \partial _{t}(\rho v_{i}(p)) - \partial _{j} (C_{ijkl} U_{kl}(p) ) \right ) \\ & \quad + \ \delta A_{ij} \left (\partial _{j} (v_{i}(p)) - \partial _{t} (U_{ij}(p)) - e_{jrs} \alpha _{ir}(p) V_{s} \right ) \\ & \quad + \ \delta B_{is} ( e_{srj} \partial _{r} (U_{ij}(p)) - \alpha _{is}(p) ), \end{aligned}$$

(where we have dropped the subscript M on the dual-to-primal mapping fields for notational convenience). Thus, the dual Euler-Lagrange equations are the system (8) expressed in terms of the dual fields through the mapping codified in (9)2, regardless of the convex potential \(M\) chosen to define the dual functional \(S_{M}\).

This exercise exposes an interesting fact in a simple setting. Clearly, for \(M\) to be convex in \(U\) it cannot be invariant as it has to depend on the skew-symmetric part of the latter - and rotational invariance/invariance under superposed rigid deformations in the linear setting precludes such a dependence. However, the use of such a potential in the dual theory does not in any way obstruct the definition of correct physics as embodied in the Euler-Lagrange equations solved.

5 Dual Variational Principle for a Primal Problem with Initial and Boundary Conditions

Consider the system of PDE:

$$ \begin{aligned} \partial _{t} u_{I} &= \mathbb{A}_{IJ}\, u_{J} + \mathbb{B}_{IJk}\, \partial _{k} u_{J} + \mathbb{C}_{IJkl} \, \partial _{l} B_{Jk} + \mathfrak{f}_{I}(u, B, C) + \partial _{k} ( \mathfrak{A}_{Ik}(u, B, C)), \\ \partial _{i} u_{I} & = B_{Ii}, \\ \partial _{j} B_{Ii} & = C_{Iij}, \end{aligned} $$
(10)

where \(\mathbb{A}\), \(\mathbb{B}\), ℂ are arrays of real constants, \(\mathfrak{f}_{I}\) and \(\mathfrak{A}_{Ij}\) are, for each \(I\), \(j\), given, real-valued, smooth functions of the arguments shown, uppercase Latin indices span 1 to \(n\), and lowercase Latin indices representing space-dimensions span 1 to \(1 \leq d \leq 3\), and \(t\) is time.

The functions \(\mathfrak{f}\), \(\mathfrak{A}\) do not contain any terms linear in the array \((u, B, C)\).

Let the initial and boundary conditions for (10) be

$$ \begin{aligned} & u_{I}(x, 0) = \overline{u}^{(i)}_{I}(x), \quad x \in \Omega , \\ & \mathfrak{A}_{Ik}(u(x,t),B(x,t),C(x,t)) \, n_{k}(x) \\ & \quad + \left ( \mathbb{B}_{IJk} u_{J}(x,t) + \mathbb{C}_{IJik} B_{Ji}(x,t) \right ) n_{k}(x) = \overline{\tau}_{I}(x,t), \quad (x,t) \in \partial \Omega _{\tau}(x) \times (0,T], \\ & u_{I}(x,t) = \overline{u}^{(b)}_{I} (x,t), \quad (x,t) \in \partial \Omega _{u}(x) \times (0,T], \\ & \partial _{i} u_{I} (x, t) = B_{Ii}(x,t) = \overline{B}_{Ii} (x,t), \quad (x,t) \in \partial \Omega _{\nabla u}(x) \times (0,T], \end{aligned} $$
(11)

where \(n\) represents the outward unit normal to the boundary of the domain \(\partial \Omega \), the functions with overhead bars are prescribed, and the subsets \(\partial \Omega _{\tau}\), \(\partial \Omega _{u}\), \(\partial \Omega _{ \nabla u}\) of the spatial boundary \(\partial \Omega \) can very well be empty for a specific problem.

The initial and boundary conditions (11) for the primal system (10) is simply a set of conditions that encompass the commonly encountered ones for up to second-order systems of partial differential equations; the present work does not deal with the question of well-posedness of the system (10) with this specified set of initial and boundary conditions. For instance, it may very well be that in a specific problem, well-posedness requires \(\partial _{i} u_{I} = B_{Ii}\) not to be specified on any part of the boundary of \(\Omega \). In that case \(\partial \Omega _{\nabla} u\) can be chosen to be the empty set, and then, as suggested by (12) below, the dual field \(\rho \) needs to be prescribed on the entirety of \(\partial \Omega \) for all times.

Our scheme [22] then suggests defining

$$ \begin{aligned} \widehat{S}[u,B,C,\lambda ,\gamma ,\rho ] = & \int _{\Omega \times (0,T)} d^{3}x dt \, - \partial _{t} \lambda _{I} u_{I} - \lambda _{I} \mathfrak{f}_{I}(u, B, C) + \mathfrak{A}_{Ik}(u, B, C) \partial _{k} \lambda _{I} \\ & - u_{I} \mathbb{A}_{JI} \lambda _{J} + u_{I} \, \mathbb{B}_{JIk} \, \partial _{k} \lambda _{J} + B_{Ii}\, \mathbb{C}_{JIil}\, \partial _{l} \lambda _{J}- u_{I} \partial _{i} \gamma _{Ii} \\ & - \gamma _{Ii} B_{Ii} - B_{Ii} \partial _{j} \rho _{Iij} - \rho _{Iij} C_{Iij} + H(u, B, C, x,t). \end{aligned} $$

where we have allowed for \(H\) to depend on \((x,t)\) as well. This generality is practically useful, especially when \(H\) plays the role of a selection criterion for non-unique solutions of the primal problem.

Next define the arrays \(U\), \(P\), \(L\), \(F\) and the function \(M\):

$$\begin{aligned} & U := (u_{I}, B_{Ii}, C_{Iij}), \quad L := (\lambda _{I}, \ - \partial _{k} \lambda _{I}) \\ & P := (\partial _{t} \lambda _{I} + \partial _{i} \gamma _{Ii} + \mathbb{A}_{JI} \lambda _{J} - \mathbb{B}_{JIk} \partial _{k} \lambda _{J}, \\ & \gamma _{Ii} + \partial _{j} \rho _{Iij} - \mathbb{C}_{JIil} \, \partial _{l} \lambda _{J}, \quad \rho _{Iij}), \quad F := (\mathfrak{f}_{I}, \mathfrak{A}_{Ij}), \\ & M(U, L) := H(U,x,t) - L \cdot F(U) \end{aligned}$$

(noting that \(F\) is indeed a function of only \(U\)).

Following [22], we now ask that \(M\) be such that \(\exists \ U_{H}(P, L,x,t)\) satisfying \(\partial _{U} M (U_{H}(P, L,x,t), L,x,t) = P\), \(\forall P\) and in terms of this define

$$ M^{*}(P,L,x,t) := U_{H}(P,L,x,t) \cdot P - M(U_{H}(P,L,x,t), L,x,t). $$

We then replace \(U\) in \(\widehat{S}\) by \(U_{H}(P,L)\) to define the dual functional

$$\begin{aligned} S[\lambda , \gamma , \rho ]& := \int _{\Omega \times (0,T)} d^{3}x dt \, - P \cdot U_{H}(P,L,x,t) + M(U_{H}(P,L,x,t), L,x,t),\\ & = - \int _{\Omega \times (0,T)} d^{3}x dt \, M^{*}(P,L,x,t) \end{aligned}$$

whose first variation is given by

$$ \delta S = \int _{\Omega \times (0,T)} d^{3}x dt \, - \partial _{P} M^{*} \cdot \delta P - \partial _{L} M^{*} \cdot \delta L. $$

Now, a calculation detailed in [22, Sect. 6], shows that

$$ \partial _{P} M^{*}(P, L,x,t) = U_{H}(P,L,x,t), \quad \partial _{L} M^{*}(P,L,x,t) = F(U_{H}(P,L,x,t)). $$

Then, using the notation \(U_{H}(P,L,x,t) = \left ( u^{H}_{I}(P, L,x,t), B^{H}_{Ii}(P,L,x,t), C^{H}_{Iij}(P,L,x,t) \right )\), \(I = 1 \) to \(n\); \(i, j = 1 \) to \(d\),

$$ \begin{aligned} \delta S & = \int _{\Omega \times (0,T)} d^{3}x dt \, - U_{H}(P,L,x,t) \cdot \delta P - F(U_{H}(P,L),x,t) \cdot \delta L \\ & = \int _{\Omega \times (0,T)} d^{3}x dt \, - u^{H}_{I}(P,L,x,t) ( \partial _{t} \delta \lambda _{I} + \partial _{i} \delta \gamma _{Ii} + \mathbb{A}_{JI} \delta \lambda _{J} - \mathbb{B}_{JIk} \partial _{k} \delta \lambda _{J} ) \\ & \quad - B^{H}_{Ii}(P,L,x,t) (\delta \gamma _{Ii} + \partial _{j} \delta \rho _{Iij} - \mathbb{C}_{JIil} \, \partial _{l} \delta \lambda _{J}) - C_{Iij}^{H}(P,L,x,t) \delta \rho _{Iij} \\ & \quad - \mathfrak{f}_{I}(U_{H}(P,L,x,t)) \delta \lambda _{I} + \mathfrak{A}_{Ik}(U_{H}(P,L,x,t)) \partial _{k} \delta \lambda _{I}, \end{aligned} $$

and collecting terms,

$$\begin{aligned} \delta S & = \int _{\Omega }d^{3}x \, \delta \lambda _{I}(x, 0) u_{I}^{H}(P(x, 0), L(x,0), x, 0) \\ & \quad- \int _{\Omega }d^{3}x \, \delta \lambda _{I}(x, T) u_{I}^{H}(P(x, T), L(x,T), x, T) \\ & \quad + \int _{\Omega \times (0,T)} d^{3}xdt \ \delta \lambda _{I} \big( \partial _{t} u^{H}_{I}(P,L,x,t) - \mathfrak{f}_{I}(U_{H}(P,L,x,t)) \\ & \quad- \partial _{k} \left ( \mathfrak{A}_{Ik}(U_{H}(P,L,x,t))\right ) \big) \\ & \quad + \int _{\Omega \times (0,T)} d^{3}xdt \ \delta \lambda _{I} \big( - \mathbb{A}_{IJ} u^{H}_{J} (P,L,x,t) - \mathbb{B}_{IJk} \partial _{k} \left (u^{H}_{J}(P,L,x,t)\right ) \\ & \quad- \mathbb{C}_{IJil} \partial _{l}\left ( B^{H}_{Ji}(P,L,x,t) \right ) \big) \\ & \quad + \int _{0}^{T} dt \int _{\partial \Omega} da \ \delta \lambda _{I} \big( \mathbb{M}_{jk} \mathfrak{A}_{Ij}(U_{H}(P,L,x,t)) + u^{H}_{J}(P,L,x,t) \mathbb{B}_{IJk} \\ & \quad+ \mathbb{C}_{IJik} B^{H}_{JI}(P,L,x,t) \big)\, n_{k} \\ & \quad -\int _{0}^{T} dt \int _{\partial \Omega} da \ \delta \gamma _{Ii} \,u_{I}^{H}(P,L,x,t) \,n _{i} \\ & \quad + \int _{\Omega \times (0,T)} d^{3}x dt \, \delta \gamma _{Ii} \left ( \partial _{i} u_{I}^{H}(P,L,x,t) - B_{Ii}^{H}(P,L,x,t) \right ) \\ & \quad -\int _{0}^{T} dt \int _{\partial \Omega} da \ \delta \rho _{Iij} B^{H}_{Ii}(P,L,x,t) \, n_{j} \\ & \quad + \int _{\Omega \times (0,T)} d^{3}x dt \, \delta \rho _{Iij} \left ( \partial _{j} B^{H}_{Ii}(P,L,x,t) - C^{H}_{Iij}(P,L,x,t)\right ). \end{aligned}$$

From the above calculation we read off the modification required to \(S\) to account for the specified initial and boundary conditions:

$$\begin{aligned} &S_{ibvp}[\lambda , \gamma , \rho ] := - \int _{\Omega \times (0,T)} d^{3}x dt\, M^{*}(P(x,t),L(x,t),x,t) \\ & \quad - \int _{\Omega }d^{3}x \, \lambda _{I}(x, 0) \, \overline{u}^{(i)}_{I} (x) - \int _{0}^{T} dt \int _{\partial \Omega _{\tau}} da \,\lambda _{I}(x,t) \,\overline{\tau}_{I}(x,t) \\ & \quad + \int _{0}^{T} dt \int _{\partial \Omega _{u}} da \, \gamma _{Ii} (x,t) \, \overline{u}^{(b)}_{I} (x,t)\, n_{i}(x) \\ &\quad + \int _{0}^{T} dt \int _{\partial \Omega _{\nabla u}} da \, \rho _{Iij}(x,t) \overline{B}_{Ii}(x,t) \, n_{j}(x) \\ &\text{with} \quad \lambda _{I}(x, T) = \text{`arbitrarily' prescribed and } \delta \lambda _{I}(x,T) = 0, x \in \Omega , \\ &\phantom{with}\quad \lambda _{I}(x, t) = \text{`arbitrarily' prescribed and } \delta \lambda _{I}(x,t) = 0, (x,t) \in \partial \Omega \backslash \partial \Omega _{\tau }\times (0, T], \\ &\phantom{with} \quad \gamma _{Ii}(x, t) = \text{`arbitrarily' prescribed and } \delta \gamma _{Ii}(x,t) = 0, (x,t) \in \partial \Omega \backslash \partial \Omega _{u} \times (0, T], \\ &\phantom{with} \quad \rho _{Iij}(x, t) = \text{`arbitrarily' prescribed and } \delta \rho _{Iij}(x,t) = 0, (x,t) \in \partial \Omega \backslash \partial \Omega _{\nabla u} \times (0, T], \end{aligned}$$
(12)

(with prescriptions chosen to avoid discontinuities at ‘space-time corners’ of boundary of the space-time domain \(\Omega \times (0,T)\)).

We end by noting that in ongoing work these ideas have been used to successfully formulate and compute approximate solutions (with minimal error) of the heat equation and the first-order wave equation in bounded domains, in one space dimension and time. These parabolic and hyperbolic equations are solved by a common methodology based on computing weak solutions to degenerate elliptic boundary value problems in a space-time domain, involving oblique natural boundary conditions. Uniqueness of solutions to the corresponding dual problems with the initial-boundary conditions developed following the above ideas is also shown.

As already noted in the Introduction, the primary application envisioned for this variational principle is in enabling a Feynman Path Integral based statistical analysis of dislocation dynamics. A second major application is in designing an efficient and robust numerical scheme for the system (3), which is a system of Hamilton-Jacobi equations. It also has the potential of providing a pathway to the rigorous analysis of the system in the hands of bona-fide experts in PDE and variational calculus.