The purpose of this paper is to solve an inverse problem associated with Yang–Mills theories in Minkowski space \({\mathbb {R}}^{1+3}\). The objective is the recovery of the gauge field A on a causal domain where waves can propagate and return, given data on a small observation set inside the domain.

The starting point of Yang–Mills theories is a compact Lie group G with Lie algebra \({\mathfrak {g}}\). Without loss of generality, we shall think of G as a matrix Lie group and hence \({\mathfrak {g}}\) will be a matrix Lie algebra. We assume also that G is connected and endowed with a bi-invariant metric, or equivalently, an inner product on \({\mathfrak {g}}\) invariant under the adjoint action.

In their most general formulation, Yang–Mills theories take place in the adjoint bundle of a principal bundle with structure group G over space-time. Since our region of interest in space-time will be a contractible set \(M\subset {\mathbb {R}}^{1+3}\), we might as well assume from the start that we are working with the trivial adjoint bundle \(M\times {\mathfrak {g}}\). The main object of the theory is a gauge field A, also known as Yang–Mills potential. In geometric language this is simply a connection \(A\in C^{\infty }(M;T^*M\otimes {\mathfrak {g}})=\Omega ^{1}(M;{\mathfrak {g}})\), that is, a smooth \({\mathfrak {g}}\)-valued 1-form. In general, we denote the set of \({\mathfrak {g}}\)-valued forms of degree k by \(\Omega ^{k} = \Omega ^{k}(M;{\mathfrak {g}})\).

There is a natural pairing \([\cdot , \cdot ]: \Omega ^{p}\otimes \Omega ^{q}\rightarrow \Omega ^{p+q}\) given in our situation as

$$\begin{aligned}{}[\omega ,\eta ]=\omega \wedge \eta -(-1)^{pq}\eta \wedge \omega , \end{aligned}$$

where the wedge product of \({\mathfrak {g}}\)-valued forms is understood using matrix multiplication in \({\mathfrak {g}}\). Using the pairing we define a covariant derivative

$$\begin{aligned} d_{A}:\Omega ^{k}(M;{\mathfrak {g}})\rightarrow \Omega ^{k+1}(M;{\mathfrak {g}}), \qquad d_{A}\omega =d\omega +[A,\omega ]. \end{aligned}$$

Given a gauge field A, we can associate to it, its field strength or curvature. This is defined as

$$\begin{aligned} F_{A}:=dA+\frac{1}{2}[A,A]=dA+A\wedge A \in \Omega ^{2}(M;{\mathfrak {g}}) \end{aligned}$$

and it always satisfies the Bianchi identity \(d_{A}F_{A}=0\). Moreover, \(d_A^2 \omega = [F_A, \omega ]\) for any \(\omega \in \Omega ^k\).

Yang–Mills equations

The Yang–Mills equations arise as the Euler–Lagrange equations for the Yang–Mills action functional which we now recall. The inner product in \({\mathfrak {g}}\) naturally induces a pairing \(\langle \cdot ,\cdot \rangle _{\text {Ad}}\)

$$\begin{aligned} \Omega ^{p}(M,{\mathfrak {g}})\times \Omega ^{q}(M,{\mathfrak {g}})\rightarrow \Omega ^{p+q}(M). \end{aligned}$$

If \(\star \) denotes the Hodge star operator of the Minkowski metric, the Yang-Mills functional is given by

$$\begin{aligned} S_{{{\,\mathrm{YM}\,}}}(A):=\frac{1}{2}\int _{M}\langle F_{A},\star F_{A}\rangle _{\text {Ad}}. \end{aligned}$$

If G is a subgroup of the unitary group, we may take as adjoint invariant inner product \(-\text {trace}(XY)\), where XY are matrices in \({\mathfrak {g}}\), and thus \(S_{{{\,\mathrm{YM}\,}}}(A)\) may also be written as a constant multiple of

$$\begin{aligned} \int _{M}\text {trace}((F_{A})_{\alpha \beta }F_{A}^{\alpha \beta })\,d\text {vol}, \end{aligned}$$

as is frequently found in the physics literature. From this functional one easily derives the Yang–Mills equations:

$$\begin{aligned} d_{A}^*F_{A}=0, \end{aligned}$$

where \(d_{A}^*\) is the formal adjoint of \(d_A\) and given by

$$\begin{aligned} d_{A}^*:\Omega ^{k}(M;{\mathfrak {g}})\rightarrow \Omega ^{k-1}(M;{\mathfrak {g}}), \qquad d_{A}^*=\star d_{A}\star . \end{aligned}$$

(In general for a Lorentzian space-time of dimension m, the formal ajoint acting on k-forms has the expression \(d_{A}^*=(-1)^{m+km}\star d_{A}\star \).)

The Yang–Mills equations are gauge invariant in the sense that if two connections A and B are gauge equivalent and if A satisfies (1) then also B satisfies \(d_{B}^*F_{B}=0\). The connections A and B being gauge equivalent means that there is a section \({\mathbf {U}}\in C^\infty (M; G)\) such that

$$\begin{aligned} B = {\mathbf {U}}^{-1} d {\mathbf {U}}+ {\mathbf {U}}^{-1} A {\mathbf {U}}. \end{aligned}$$

This property can be easily deduced from the fact that the action \(S_{{{\,\mathrm{YM}\,}}}\) is gauge invariant.

Main result

We will consider an inverse problem for the Yang–Mills equations in the causal diamond

$$\begin{aligned} {\mathbb {D}} = \{ (t,x) \in {\mathbb {R}}^{1+3} : |x| \le t + 1,\ |x| \le 1 - t \}. \end{aligned}$$

For a fixed \(0< \epsilon _0 < 1\), the data will be given on the subset

$$\begin{aligned} \mho = \{(t,x) : (t,x)\text { is in the interior of }{\mathbb {D}} \text { and }|x| < \epsilon _0 \}. \end{aligned}$$

We we say that \(A \in \Omega ^{1}({\mathbb {D}};{\mathfrak {g}})\) is a background connection if it satisfies the Yang–Mills equations (1) in \({\mathbb {D}}\). Due to the gauge invariance, the determination of a background connection on \({\mathbb {D}}\) is considered only up to the action of the following pointed gauge group

$$\begin{aligned} G^0({\mathbb {D}},p) = \{{\mathbf {U}}\in C^\infty ({\mathbb {D}}; G) : {\mathbf {U}}(p) = {{\,\mathrm{id}\,}}\}, \end{aligned}$$

where \(p = (-1, 0) \in {{\overline{\mho }}}\). The reason for considering the pointed gauge group instead of the full gauge group

$$\begin{aligned} G({\mathbb {D}}) = C^\infty ({\mathbb {D}}; G), \end{aligned}$$

is technical in nature as we shall explain below, see discussion after Lemma 6. Both gauge groups are clearly related by \(G({\mathbb {D}})/G^{0}({\mathbb {D}},p)=G\).

For \(A, B \in C^k({\mathbb {D}};T^*{\mathbb {D}}\otimes {\mathfrak {g}})\), with \(k \in {\mathbb {N}}\), we say that \(A \sim B\) in \({\mathbb {D}}\) if there is \({\mathbf {U}}\in G^0({\mathbb {D}},p)\) such that (2) holds in \({\mathbb {D}}\). Moreover, we write

$$\begin{aligned} \partial ^- {\mathbb {D}} = \{ (t,x) \in {\mathbb {D}} : |x| = t + 1 \} \end{aligned}$$

and say that \(A \sim B\) near \(\partial ^- {\mathbb {D}}\) if there are \({\mathbf {U}}\in G^0({\mathbb {D}},p)\) and a neighbourhood \({\mathcal {U}} \subset {\mathbb {D}}\) of \(\partial ^- {\mathbb {D}}\) such that (2) holds in \({\mathcal {U}} \cap {\mathbb {D}}\). The sets \({\mathbb {D}}\), \(\mho \) and \(\partial ^- {\mathbb {D}}\) are visualized in Figure 1.

Fig. 1
figure 1

The set \(\mho \) (in blue) inside the diamond \({\mathbb {D}}\) in the \(1+2\) dimensional case. The part \(\partial ^- {\mathbb {D}}\) of the boundary of \({\mathbb {D}}\) is shaded in yellow. The point p is drawn in red

We let A be a background connection, and consider the data set

$$\begin{aligned} {\mathcal {D}}_A = \{ V|_\mho :\&V \in C^3({\mathbb {D}}; T^* {\mathbb {D}} \otimes {\mathfrak {g}})\text { satisfies }d_{V}^*F_{V}=0\text { in } {\mathbb {D}} \setminus \mho \\&\text {and }V \sim A\text { near }\partial ^- {\mathbb {D}} \}. \end{aligned}$$

Let us remark that we could consider the source-to-solution map given in Proposition 4 instead of the more abstract data set \({\mathcal {D}}_A\). We prefer to formulate our main result using \({\mathcal {D}}_A\) since the definition of the source-to-solution map is technical, requiring suitable gauge fixing among other things. In fact, it is precisely in the proof of Proposition  4 that the pointed gauge group is needed. Nevertheless, intuitively, it is helpful to think of the data set as that produced by an observer creating sources J supported in \(\mho \) and observing solutions V to \(d_{V}^*F_{V}=J\) in \(\mho \).

The data set \({\mathcal {D}}_A\) could also be reformulated in terms of the pairs \((J, V|_\mho )\) satisfying \(d_{V}^*F_{V}=J\), with J supported in \(\mho \). This formulation, while being somewhat redundant as \(J = d_{V}^*F_{V}\) can be computed given \(V|_\mho \), suggests viewing \({\mathcal {D}}_A\) informally as the graph of the map taking J to \(V|_\mho \). However, we reiterate that defining such map requires care. In addition to gauge fixing, we need to take into account the compatibility condition \(d_{V}^*J=0\) that every source must satisfy, see Lemma 2. Our abstract formulation of the data set \({\mathcal {D}}_{A}\) bypasses these problems while incorporating the natural gauge invariance of the theory.

We are now ready to formulate our main result.

Theorem 1

Suppose that \(A, B \in \Omega ^{1}({\mathbb {D}};{\mathfrak {g}})\) solve (1) in \({\mathbb {D}}\). Then \({\mathcal {D}}_A = {\mathcal {D}}_B\) if and only if \(A \sim B\) in \({\mathbb {D}}\).

Clearly if \(A \sim B\) in \({\mathbb {D}}\) then \({\mathcal {D}}_A = {\mathcal {D}}_B\). The non-trivial content of the theorem is the opposite implication. It follows from Proposition 10 in Appendix Appendix B that if A and B are as in the theorem, then \(A \sim B\) in \({\mathbb {D}}\) if and only if \(A \sim B\) near \(\partial ^- {\mathbb {D}}\).

Outline of the proof of Theorem 1

The objective is to reduce the proof of the theorem to an inversion result for a broken non-abelian light ray transform as in [7]. The broken light ray transform that arises in this paper is that related to the adjoint representation given the natural habitat of the Yang–Mills theories. In [7] we studied the broken light ray transform associated with the fundamental representation, so our first task is to relate the two.

To go from the data set \({\mathcal {D}}_{A}\) to the broken non-abelian light transform we follow the template laid out in [7] where a considerably simpler wave equation with cubic non-linearity was studied. The first step is then to process the abstract data set and convert it into a manageable source-to-solution map and this already brings the question of gauge fixing to the forefront. The construction of source-to-solution map uses two types of gauges: the temporal gauge and the relative Lorenz gauge. The temporal gauge is easy to implement as it involves solving a linear matrix ODE to make the time component of a Yang–Mills potential A to vanish, that is, \(A_0=0\). This gauge is particularly suited to prove uniqueness results, cf. Proposition 2 below.

It is important to remark that uniqueness does really depend on the shape of the set where the connections satisfy the Yang–Mills equations. The causal diamond \({\mathbb {D}}\) has the special feature that perturbations cannot propagate in it through the top boundary \(|x|=1-t\), whereas the bottom boundary is under control due to the assumed gauge equivalence near \(\partial ^- {\mathbb {D}}\). In particular, even if a background connection A satisfies the Yang–Mills equations on a larger set than \({\mathbb {D}}\), we do not expect to be able to recover it outside \({\mathbb {D}}\) given data on \(\mho \). Moreover, it does not appear to be possible to prove Theorem 1 using presently known unique continuation results, as discussed in more detail below.

A connection V is said to be in relative Lorenz gauge with respect to the background A if \(d_{A}^* V = d_{A}^* A\). The advantage of this gauge is that if A satisfies Yang–Mills \(d_{A}^*F_{A}=0\), and \(d_{V}^*F_{V}=J\), then the difference \(W=V-A\) satisfies a semilinear wave equation where the leading part is given by the connection wave operator \(\Box _{A}=d_{A}d^{*}_{A}+d_{A}^*d_{A}\), cf. (23). This is very helpful for solving the foward problem and for the microlocal analysis used to extract information from the source-to-solution map.

Following [7], the idea is to consider the non-linear interaction of three singular waves produced by sources which are conormal distributions. We carefully track the principal symbol produced by the non-linear interaction and extract from that the non-abelian broken light ray transform. This requires a delicate calculation unlike anything in the previous literature, in which the structure of the Lie algebra \({\mathfrak {g}}\) comes into consideration. This is the technical core of the proof, and perhaps one of the most innovative aspects of the paper. After this computation, contained in Section 8.2, there is one further hurdle to overcome: to use the source-to-solution map we must revert back to the temporal gauge and check that no information is lost in the process.

Discussion and comparison with previous literature

It is tempting to think that a result like Theorem 1 can be obtained from a unique continuation principle. It must be stressed that unique continuation for linear wave equations with time-dependent coefficients is simply false as there are counterexamples [1]. Although the difference of two solutions to the Yang–Mills equations in the Lorenz gauge satisfies a linear wave equation (with coefficients depending on both the solutions), due to unique continuation failing, our inverse problem is not “immediately solvable” and hence a different approach is needed. We mention that an inverse problem for Yang–Mills connections on a Riemannian manifold was studied in [6]. The proofs there are based on unique continuation for elliptic systems, however, the elliptic case is very different from the hyperbolic one.

This paper sits firmly within the program, initiated in [7], that is motivated by the Yang–Mills–Higgs system. In addition to the Yang–Mills potential A, a Higgs field \(\Phi \in C^{\infty }(M,{\mathfrak {g}})\) is present in this system. The equations for the pair of fields \((A,\Phi )\) are given by

$$\begin{aligned}&d_{A}^*F_{A}+[\Phi ,d_{A}\Phi ]=0; \end{aligned}$$
$$\begin{aligned}&d_{A}^*d_{A}\Phi +V'(|\Phi |^{2})\Phi =0, \end{aligned}$$

where \(V'\) is the derivative of a smooth function \(V: [0,\infty )\rightarrow {\mathbb {R}}\). More generally, we can consider these equations when \(\Phi \) is a section of an associated bundle determined by a given representation of G. The focus of [7] was the recovery of A via the second equation (5), when V is assumed to be a quadratic potential (the most popular choice in Yang–Mills–Higgs theories): this turns (5) into a wave equation with a cubic non-linearity. The present paper focuses on the first equation (4); more precisely in the pure Yang–Mills case where \(\Phi =0\). There are two substantial differences between [7] and the present paper. First, when A is fixed, the second equation (5) is no more gauge invariant, and hence the construction of source-to-solution map in [7] does not require gauge fixing. Second, the quadratic potential V leads to particularly simple non-linear structure in [7], and the resulting analysis of principal symbols is much more straightforward than in the present paper.

As already mentioned above, we consider the non-linear interactions of three singular waves. Interaction of singular waves has been studied outside the context of inverse problems. In particular, the wave front set of a triple cross-derivative has been studied in the case of the \(1+2\)-dimensional Minkowski space by Rauch and Reed [39]. The references [3, 24, 34, 35, 40] have results of similar nature. The use of non-linear interactions in the context of inverse problems was initiated in [29], where the wave front set resulting from the interaction of four singular waves was studied. The same approach was used for the Einstein equations in [28], and subsequently in [32, 46], in some ways the closest previous results to ours. For a review of this approach, see [30]. We observed in our above mentioned work [7] that it is sufficient to consider interactions of three singular waves, simplifying the analysis. Three-fold interactions are used in the present paper.

Non-linearities allow solving inverse problems that are open for the corresponding linearized equations. In particular, the inverse problem for the linearized Yang–Mills equation, see e.g. (32) below (where some lower order terms are discarded), is open. The only known results are in the case \(G=U(1)\), see [12, 41], and these results impose convexity assumptions not satisfied by the geometric setting of Figure 1. The same is true for recovery zeroth order terms, solved with and without convexity assumptions for certain scalar linear [43] and non-linear wave equations [14], respectively.

We mention that non-linear interactions have also been used to recover non-linear terms for scalar wave equations [33], scalar elliptic equations [13, 31], and scalar real principal type equations [38]. In these four works, non-linear terms do not contain any derivatives, contrary to the Einstein and Yang–Mills equations. Non-linear interactions involving derivatives have also been studied in the context of scalar wave equations [47] and elastodynamics [10]. In addition, inverse problems have been studied for various non-linear equations using methods originally developed in the context of linear elliptic equations. In particular, the method of complex geometrical optics originating from [45], and importantly extended by [27, 37], was first applied to an inverse coefficient determination problem for a non-linear parabolic equation [21] and subsequently to several other inverse problems [2, 5, 22, 23, 25, 42, 44].

There are numerous analogies between the problem studied here and that of the Einstein equations considered in [28]. For starters, both problems have gauges: in the Einstein case the gauge group is the diffeomorphism group. The role of the relative Lorenz gauge is played by wave coordinates and one could also say that the Fermi coordinates used in [28] are the analogue of the temporal gauge. Both problems have a compatibility condition for the sources: the Einstein tensor has zero divergence and Yang–Mills has \(d_{A}^*d_{A}^*F_{A}=0\).

However, there are important differences and we want to stress those, since they are essential in resolving the inverse problem in the different contexts. After suitable gauge fixing and linearization, both the Einstein and Yang–Mills equations reduce to a linear wave equation. The unknown Lorentzian metric appears in the leading order terms of the equation in the former case while the background gauge field A features at the subprincipal level in the latter case. The Lorentzian metric affects the Lagrangian geometry of the parametrix for the wave equation but the effect of A is visible only in the principal symbol of the parametrix. Thus the need for a symbol calculation in the present paper that takes into consideration the structure of the Lie algebra \({\mathfrak {g}}\). Finally, the two inverse problems reduce to very different purely geometric problems. In our case, we read the broken non-abelian light ray transform from certain principal symbols, whereas in the Einstein case, the so-called light observation sets are obtained by analysing the wave front sets of suitable solutions, see [17, 29] for the corresponding geometric problem.

Outline of the paper

Section 2 introduces parallel transport in both the principal and the adjoint representation and reduces Theorem 1 to inversion of the broken non-abelian light ray transform via [7, Proposition 2] in the case that G has finite centre. Section 3 discusses the Yang–Mills equations with a source. Section 4 introduces the relative Lorenz gauge and the temporal gauge, thus setting up the scence for the source-to-solution map. The latter is discussed in Section 5 where the important Proposition 4 is proved. Section 6 computes the equations for the triple cross-derivative when three sources are introduced. Section 7 supplies the necessary tools from microlocal analysis needed to compute the symbol of the triple interaction and the latter is computed in Section 8. Section 9 proves a result about the structure of Lie algebras with trivial centre, and completes the proof of Theorem 1 in the case that G has finite centre. The final Section 10 contains the proof of Theorem 1 in the general case.

There are three appendices, first of which derives explicit formulas in coordinates, for example, for \(d_A^* F_A\). The second appendix discusses the direct problem for the Yang–Mills equations, and the last one gives an elementary alternative to the result in Section 9 in the case that \({\mathfrak {g}}= {{\,\mathrm{{\mathfrak {su}}}\,}}(n)\) with \(n \ge 2\).

Parallel Transport

We will explain in Section 10 how the case of an arbitrary compact, connected Lie group G can be reduced to the case that G has finite centre, that is, the set

$$\begin{aligned} Z(G) = \{z \in G : zh = hz\text { for all }h \in G\} \end{aligned}$$

is finite. In this case, the proof of Theorem 1 will ultimately boil down to inversion of a non-abelian broken light ray transform. This transform is the composition of two parallel transports, and we begin by defining the parallel transport used in the paper.

For the moment we may let (Mg) be any Lorentzian manifold, and G any compact matrix Lie group with Lie algebra \({\mathfrak {g}}\). However, we will work with trivial bundles for simplicity. Let \(A\in \Omega ^{1}(M;{\mathfrak {g}})\) be a connection and let us first define the parallel transport on the principal bundle \(M \times G\) with respect to A: the parallel transport \({\mathbf {U}}_\gamma ^A\) along a curve \(\gamma :[0,T]\rightarrow M\) is given by \({\mathbf {U}}_\gamma ^A = U(T)\) where U is the solution of the ordinary differential equation

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{U}}+\left\langle A,{\dot{\gamma }}(t) \right\rangle U=0, &{} t \in [0,T], \\ U(0)= {{\,\mathrm{id}\,}}. \end{array}\right. } \end{aligned}$$

Here \(\left\langle \cdot ,\cdot \right\rangle \) is the pairing between covectors and vectors.

In general, if \({\mathbb {V}}\) is a vector space and \(\rho : G\rightarrow {{\,\mathrm{GL}\,}}({\mathbb {V}})\) is a linear representation, the parallel transport on the associated vector bundle \(M\times {\mathbb {V}}\) is defined by \({\mathbf {P}}_\gamma ^{A,\rho } = \rho ({\mathbf {U}}_\gamma ^A)\). Two representations will be of importance to us. First, when \(G \subset {{\,\mathrm{GL}\,}}({\mathbb {C}}^n)\) and \({\mathbb {V}} = {\mathbb {C}}^n\) we have the representation given by \(\rho ={{\,\mathrm{id}\,}}\). In other words, \({\mathbf {P}}_\gamma ^{A,{{\,\mathrm{id}\,}}} v = {\mathbf {U}}_\gamma ^A v\) for \(v \in {\mathbb {V}}\). We call this the principal representation.

Second, when \({\mathbb {V}}={\mathfrak {g}}\) we have the adjoint representation \(\rho = {{\,\mathrm{Ad}\,}}\) where \({{\,\mathrm{Ad}\,}}(h)\), \(h \in G\), is typically written \({{\,\mathrm{Ad}\,}}_h\) and defined by \({{\,\mathrm{Ad}\,}}_h b = h b h^{-1}\) for \(b \in {\mathfrak {g}}\). We have

$$\begin{aligned} {\mathbf {P}}_{\gamma }^{A,{{\,\mathrm{Ad}\,}}} b = {{\,\mathrm{Ad}\,}}_{{\mathbf {U}}_\gamma ^A} b = {\mathbf {U}}_\gamma ^A b ({\mathbf {U}}_\gamma ^A)^{-1}, \quad b \in {\mathfrak {g}}. \end{aligned}$$

It is straightforward to verify that \(W(t) = U(t) b U^{-1}(t)\) solves

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{W}}+ [\left\langle A,{\dot{\gamma }}(t) \right\rangle , W] =0, &{} t \in [0,T], \\ W(0)= V, \end{array}\right. } \end{aligned}$$

where U is the solution of (6).

When M is a convex subset of Minkowski space \({\mathbb {R}}^{1 + 3}\) and \(x, y \in M\), there is a unique geodesic \(\gamma \) from x to y, up to reparametrization. The parallel transport \({\mathbf {U}}_\gamma ^A\) does not depend on the parametrization of \(\gamma \), and we write simply \({\mathbf {P}}_{y \leftarrow x}^{A, \rho } = {\mathbf {P}}_\gamma ^{A, \rho }\) in this case.

We are now ready to define the non-abelian broken light ray transforms used in the proof of Theorem 1. We write

$$\begin{aligned} {\mathbb {L}}&= \{(x,y) \in {{\mathbb {D}}}^2 : \text {there is a lightlike geodesic joining}\, x\text { and }y\}, \nonumber \\ {\mathbb {S}}^+(\mho )&= \{(x,y,z) \in {{\mathbb {D}}}^3 : (x,y), (y,z) \in {\mathbb {L}},\ x< y < z,\ x,z \in \mho ,\ y \notin \mho \}, \end{aligned}$$

where \(x<y\) means that there is a future pointing causal curve from x to y. (For \((x,y) \in {\mathbb {L}}\), we have \(x<y\) if and only if the time coordinate of \(y-x\) is strictly positive.) Define

$$\begin{aligned} {\mathbf {S}}^{A,\rho }_{z \leftarrow y \leftarrow x}&= {\mathbf {P}}^{A,\rho }_{z \leftarrow y} {\mathbf {P}}^{A,\rho }_{y \leftarrow x}, \quad (x,y,z) \in {\mathbb {S}}^+(\mho ). \end{aligned}$$

We will reduce the transform \({\mathbf {S}}^{A,{{\,\mathrm{Ad}\,}}}_{z \leftarrow y \leftarrow x}\) to \({\mathbf {S}}^{A,{{\,\mathrm{id}\,}}}_{z \leftarrow y \leftarrow x}\) as follows:

Lemma 1

Suppose that a compact, connected matrix Lie group G has finite centre and let \(A, B \in \Omega ^{1}({\mathbb {D}};{\mathfrak {g}})\). If \({\mathbf {S}}^{A,{{\,\mathrm{Ad}\,}}}_{z \leftarrow y \leftarrow x} = {\mathbf {S}}^{B,{{\,\mathrm{Ad}\,}}}_{z \leftarrow y \leftarrow x}\) for all \((x,y,z) \in {\mathbb {S}}^+(\mho )\) then \({\mathbf {S}}^{A,{{\,\mathrm{id}\,}}}_{z \leftarrow y \leftarrow x} = {\mathbf {S}}^{B,{{\,\mathrm{id}\,}}}_{z \leftarrow y \leftarrow x}\) for all \((x,y,z) \in {\mathbb {S}}^+(\mho )\).


Let \((x,y,z) \in {\mathbb {S}}^+(\mho )\) and \(b \in {\mathfrak {g}}\). Then \({\mathbf {u}}b = b {\mathbf {u}}\) where

$$\begin{aligned} {\mathbf {u}}=({\mathbf {U}}^B_{z \leftarrow y} {\mathbf {U}}^B_{y \leftarrow x})^{-1} {\mathbf {U}}^A_{z \leftarrow y} {\mathbf {U}}^A_{y \leftarrow x}= {\mathbf {U}}^B_{x \leftarrow y} {\mathbf {U}}^B_{y \leftarrow z} {\mathbf {U}}^A_{z \leftarrow y} {\mathbf {U}}^A_{y \leftarrow x}. \end{aligned}$$

As this holds for all \(b \in {\mathfrak {g}}\) we see that \({\mathbf {u}}\) is in the centre Z(G). For the convenience of the reader we recall the proof of this well-known fact. Let \(h \in G\). As G is connected, there is a path \(H : [0,1] \rightarrow G\) satisfying \(H(0) = {{\,\mathrm{id}\,}}\) and \(H(1) = h\). Define the path \(F(t) = {\mathbf {u}}H(t) {\mathbf {u}}^{-1} H^{-1}(t)\) in G. Then \(F(0) = {{\,\mathrm{id}\,}}\) and

$$\begin{aligned} {\dot{F}} = {\mathbf {u}}{\dot{H}} {\mathbf {u}}^{-1} H^{-1} - {\mathbf {u}}H {\mathbf {u}}^{-1} H^{-1} {\dot{H}} H^{-1} = {\mathbf {u}}H H^{-1} {\dot{H}} {\mathbf {u}}^{-1} H^{-1} - {\mathbf {u}}H {\mathbf {u}}^{-1} H^{-1} {\dot{H}} H^{-1} = 0, \end{aligned}$$

where we used the fact that \(b = H^{-1} {\dot{H}} \in {\mathfrak {g}}\) commutes with \({\mathbf {u}}^{-1}\). We conclude that \({\mathbf {u}}h {\mathbf {u}}^{-1} h^{-1} = F(1) = {{\,\mathrm{id}\,}}\).

Now \({\mathbf {u}}\in Z(G)\) depends continuously on x, y and z, and \({\mathbf {u}}\rightarrow {{\,\mathrm{id}\,}}\) when \(y \rightarrow x\) and \(z \rightarrow x\). As Z(G) is finite, we have \({\mathbf {u}}= {{\,\mathrm{id}\,}}\), and therefore

$$\begin{aligned} {\mathbf {U}}^A_{z \leftarrow y} {\mathbf {U}}^A_{y \leftarrow x} = {\mathbf {U}}^B_{z \leftarrow y} {\mathbf {U}}^B_{y \leftarrow x}. \end{aligned}$$

\(\square \)

We have previously inverted the transform \({\mathbf {S}}^{A,{{\,\mathrm{id}\,}}}_{z \leftarrow y \leftarrow x}\) in the case of the unitary group \(G = \mathrm {U}(n)\), see Proposition 2 of [7], where slightly different choice of \(\mho \) and \({\mathbb {D}}\) is used. However, the proof works for any matrix Lie group, and also for the present choice of \(\mho \) and \({\mathbb {D}}\). Moreover, the gauge \({\mathbf {u}}\) defined in Lemma 3 of [7] is smooth up to \(\partial {\mathbb {D}}\) whenever the two connections A and B are smooth up to \(\partial {\mathbb {D}}\).

Until treating the case of an arbitrary compact, connected Lie group in Section 10, we will focus on proving:

Proposition 1

Suppose that G has finite centre. If A and B are as in Theorem 1 and if \({\mathcal {D}}_A = {\mathcal {D}}_B\), then there are \({{\tilde{A}}} \sim A\) and \({{\tilde{B}}} \sim B\) in \({\mathbb {D}}\) such that \({\mathbf {S}}^{{{\tilde{A}}},{{\,\mathrm{Ad}\,}}}_{z \leftarrow y \leftarrow x} = {\mathbf {S}}^{{{\tilde{B}}},{{\,\mathrm{Ad}\,}}}_{z \leftarrow y \leftarrow x}\) for all \((x,y,z) \in {\mathbb {S}}^+(\mho )\).

Under the additional assumption that G has finite centre, Theorem 1 follows then from Proposition 1, Lemma 1 and the proof of Proposition 2 in [7].

Yang–Mills Equations with a Source

In this section we let (Mg) be any oriented Lorentzian manifold, and consider the Yang–Mills equations with a source

$$\begin{aligned} d^*_V F_V = J \end{aligned}$$

on M. Here the source J cannot be arbitrarily chosen but must obey the compatibility condition

$$\begin{aligned} d^*_V J = 0 \end{aligned}$$

due to the following well-known lemma. We give a proof for the convenience of the reader.

Lemma 2

Let \(V \in C^3(M;T^*M\otimes {\mathfrak {g}})\). Then \(d^*_V d^*_V F_V = 0\), and the Yang–Mills equations with a source (9) imply the compatibility condition (10).


Since \(d_{V}^*= \pm \star d_{V}\star \) we see that given any \(\omega \in \Omega ^{k}(M;{\mathfrak {g}})\) we have

$$\begin{aligned} (d_{V}^*)^{2}\omega = \pm \star d_{V}\star \star d_{V}\star \omega =\pm \star d_{V}^2\star \omega =\pm \star [F_{V},\star \omega ]. \end{aligned}$$

So it is enough to prove that \([F_{V},\star F_{V}]=0\). But this is a purely algebraic fact that holds for any \(\omega \in \Omega ^{2}(M;{\mathfrak {g}})\), that is,

$$\begin{aligned}{}[\omega ,\star \omega ]=0, \quad \omega \in \Omega ^{2}(M;{\mathfrak {g}}). \end{aligned}$$

This is equivalent with

$$\begin{aligned} \omega \wedge \star \omega -\star \omega \wedge \omega =0. \end{aligned}$$

To check this, write \(\omega =\omega _{ij}dx^{i}\wedge dx^{j}\) and note that

$$\begin{aligned} dx^{i}\wedge dx^{j}\wedge \star (dx^{k}\wedge dx^{l})\ne 0 \end{aligned}$$

if and only if \(i=k\), \(j=l\), \(i \ne j\) and \(k \ne l\). Thus

$$\begin{aligned} \omega \wedge \star \omega =(\omega _{ij})^{2}dx^{i}\wedge dx^{j}\wedge \star (dx^{i}\wedge dx^{j}) \end{aligned}$$

and since

$$\begin{aligned}dx^{i}\wedge dx^{j}\wedge \star (dx^{i}\wedge dx^{j})=\star (dx^{i}\wedge dx^{j})\wedge dx^{i}\wedge dx^{j}\end{aligned}$$

\(\star \omega \wedge \omega \) has the same expression and (11) holds. \(\quad \square \)

The next lemma, proven again for convenience, implies that the source in (9) changes to \({\mathbf {U}}^{-1} J {\mathbf {U}}\) when a gauge transformation \({\mathbf {U}}\in C^\infty (M,G)\) acts on V. We use the shorthand notation \(B = {\mathbf {U}}\cdot A\) for (2).

Lemma 3

\(B = {\mathbf {U}}\cdot A\) implies

$$\begin{aligned} d_{B}^*F_{B}={\mathbf {U}}^{-1}d_{A}^*F_{A}{\mathbf {U}}. \end{aligned}$$


By assumption

$$\begin{aligned}B={\mathbf {U}}^{-1}d{\mathbf {U}}+{\mathbf {U}}^{-1}A{\mathbf {U}}.\end{aligned}$$

A direct calculation from the definitions shows that

$$\begin{aligned} d_{B}\omega ={\mathbf {U}}^{-1}d_{A}({\mathbf {U}}\omega {\mathbf {U}}^{-1}){\mathbf {U}}, \quad \omega \in \Omega ^p. \end{aligned}$$

Using \(d^*_{A}=\star d_{A}\star \) and (13) we see that

$$\begin{aligned}d_{B}^*F_{B}={\mathbf {U}}^{-1}d_{A}^*F_{A}{\mathbf {U}}\end{aligned}$$

since \(F_{B}={\mathbf {U}}^{-1}F_{A}{\mathbf {U}}\). \(\quad \square \)

Gauge Fixing

Gauge fixing is a mathematical procedure for coping with redundant degrees of freedom in field variables. Our work uses two gauges, namely the temporal gauge and the relative Lorenz gauge. While these are typical gauge choices, we will give below a self-contained presentation of certain, perhaps less commonly used, properties of these gauges.

Temporal gauge

In this section we write \((x^0, x^1, x^2, x^3) = (t,x) \in {\mathbb {R}}^{1+3}\) for the Cartesian coordinates. The signature convention \((-+++)\) is chosen for the Minkowski metric. A connection \(A \in \Omega ^1(M;{\mathfrak {g}})\), with \(M \subset {\mathbb {R}}^{1+3}\), is said to be in the temporal gauge if \(A_0 = 0\) where \(A = A_\alpha dx^\alpha \).

For a connection \(V \in \Omega ^1({\mathbb {D}}; {\mathfrak {g}})\) we define a connection \({\mathscr {T}}(V)\) in temporal gauge by

$$\begin{aligned} {\mathscr {T}}(V) = {\mathbf {U}}\cdot V, \quad \text {where} \quad {\left\{ \begin{array}{ll} \partial _t {\mathbf {U}}= -V_0 {\mathbf {U}}, \\ {\mathbf {U}}|_{t = \psi (x)} = {{\,\mathrm{id}\,}}, \end{array}\right. } \end{aligned}$$

and \(\psi (x) = |x|-1\). Observe that \(\{(t,x) \in {\mathbb {D}} : t = \psi (x)\} = \partial ^- {\mathbb {D}}\) and \({\mathbf {U}}\in G^0({\mathbb {D}},p)\). Therefore \({\mathscr {T}}(V) \sim V\) in \({\mathbb {D}}\).

We shall prove the following uniqueness result:

Proposition 2

Let \(A, B \in C^3({\mathbb {D}};T^*{\mathbb {D}}\otimes {\mathfrak {g}})\) solve the Yang–Mills equations (1) in the set \({\mathbb {D}} \setminus \mho \). Suppose that \(d_A^* F_A = d_B^* F_B\) in \(\mho \) and that there is \({\mathbf {U}}\in C^\infty ({\mathbb {D}}; G)\) such that \(A = {\mathbf {U}}\cdot B\) near \(\partial ^- {\mathbb {D}}\) and that \({\mathbf {U}}= {{\,\mathrm{id}\,}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\). Suppose, furthermore, that both A and B are in the temporal gauge. Then \({\mathbf {U}}\) does not depend on t, and \(A = {\mathbf {U}}\cdot B\) in \({\mathbb {D}}\).

Reduced equations

We follow a reduction given in [9]. Suppose that a connection \(A \in \Omega ^1(M;{\mathfrak {g}})\) is in temporal gauge and write \(d_A^* F_A = J\). For the convenience of the reader, we give a proof of the following formula, see Lemma 12 in Appendix Appendix A,

$$\begin{aligned} d_A^* F_A&= \left( \partial _\beta (\partial ^\alpha A_\alpha ) -\partial ^\alpha \partial _\alpha A_\beta - [\partial ^\alpha A_\alpha , A_\beta ]\right. \\&\qquad \left. - 2 [A^\alpha , \partial _\alpha A_\beta ] + [A^\alpha , \partial _\beta A_\alpha ] - [A^\alpha , [A_\alpha , A_\beta ]]\right) dx^\beta . \end{aligned}$$

Here, and throughout the paper, indices are raised and lowered by using the Minkowski metric. Taking \(\beta = 0\) we get the constraint equation

$$\begin{aligned} \partial _0 (\partial ^a A_a) + [A^a, \partial _0 A_a] = J_0, \end{aligned}$$

with \(a=1,2,3\), and taking \(\beta = j = 1,2,3\) we get

$$\begin{aligned} \partial _j(\partial ^a A_a) -\partial ^\alpha \partial _\alpha A_j + {{\tilde{N}}}_j(A, \partial _x A) = J_j. \end{aligned}$$

Here \(\partial _x A = (\partial _1 A, \partial _2 A, \partial _3 A)\) and \({{\tilde{N}}}_j\) contains the terms that are of order one and zero,

$$\begin{aligned} {{\tilde{N}}}_j(A, \partial _x A) = -[\partial ^a A_a, A_j] - 2 [A^a, \partial _a A_j] + [A^a, \partial _j A_a] - [A^a, [A_a, A_j]]. \end{aligned}$$

In the remainder of this section, we will use systematically Greek letters for indices over 0, 1, 2, 3 and Latin letters for 1, 2, 3.

We differentiate (15) using \(\partial _j\) and (16) using \(\partial _0\), to obtain

$$\begin{aligned}&\partial _j \partial _0 (\partial ^a A_a) = -[\partial _j A^a, \partial _0 A_a] - [A^a, \partial _j \partial _0 A_a] + \partial _j J_0 \\&\partial _j \partial _0 (\partial ^a A_a) -\partial ^\alpha \partial _\alpha \partial _0 A_j + \partial _0 {{\tilde{N}}}_j(A, \partial _x A) = \partial _0 J_j. \end{aligned}$$

Substituting the first equation to the second one gives

$$\begin{aligned} \Box \partial _t A_j + N_j(A, \partial _x A, \partial _t A, \partial _x \partial _t A) = \partial _t J_j - \partial _j J_0, \end{aligned}$$

where we have written

$$\begin{aligned} \Box = -\partial ^\alpha \partial _\alpha = \partial _t^2 - \partial _{x_1}^2 - \partial _{x_2}^2 - \partial _{x_3}^2, \end{aligned}$$


$$\begin{aligned} N_j(A, \partial _x A, \partial _t A, \partial _x \partial _t A) = -[\partial _j A^a, \partial _0 A_a] - [A^a, \partial _j \partial _0 A_a] + \partial _0 {{\tilde{N}}}_j(A, \partial _x A). \end{aligned}$$

We call (17) the reduced Yang–Mills equations.


Observe that for bilinear and trilinear forms b and m,

$$\begin{aligned} b(A,A) - b({{\tilde{A}}}, {{\tilde{A}}})&= b(A-{{\tilde{A}}}, A) + b({{\tilde{A}}}, A-{{\tilde{A}}}), \\ m(A,A,A) - m({{\tilde{A}}}, {{\tilde{A}}}, {{\tilde{A}}})&= m(A-{{\tilde{A}}}, A, A) + m({{\tilde{A}}}, A - {{\tilde{A}}}, A) + m({{\tilde{A}}}, {{\tilde{A}}}, A - {{\tilde{A}}}). \end{aligned}$$

Hence if A and \({{\tilde{A}}}\) satisfy (17) with the same J, then the difference \(A-{{\tilde{A}}}\) satisfies a linear equation of the form

$$\begin{aligned} \Box \partial _t (A-{{\tilde{A}}}) + X_1 \partial _t (A - {{\tilde{A}}}) + X_2 (A - {{\tilde{A}}}) = 0 \end{aligned}$$

where \(X_j\), \(j=1,2\), are first order differential operators in the \(x^1, x^2\) and \(x^3\) variables, with coefficients that depend on A and \({{\tilde{A}}}\), and whence also on the \(x^0\) variable. Writing \(u = A-{{\tilde{A}}}\), \(Y_1 = -1\) and \(Y_2 = 0\), the system (19) is equivalent to (65), with \(f_1 = 0\) and \(f_2 = 0\), studied in Appendix Appendix B.

Proof of Proposition 2

\(A_0 = 0 = B_0\) implies that \({\mathbf {U}}^{-1} \partial _t {\mathbf {U}}= 0\), that is, \(\partial _t {\mathbf {U}}= 0\). Due to its time-independence, \({\mathbf {U}}\) is well-defined and smooth in whole \({\mathbb {D}}\) and \({\mathbf {U}}= {{\,\mathrm{id}\,}}\) in \(\mho \). We define \({{\tilde{A}}} = {\mathbf {U}}\cdot B\) and proceed to show that \(A = {{\tilde{A}}}\) in \({\mathbb {D}}\).

As \({{\tilde{A}}}\) is gauge equivalent to B, the Yang–Mills equations \(d_{{{\tilde{A}}}} F_{{{\tilde{A}}}} = 0\) hold in \({\mathbb {D}} \setminus \mho \). As \({\mathbf {U}}= {{\,\mathrm{id}\,}}\) in \(\mho \), we have \({{\tilde{A}}} = B\) in \(\mho \). Therefore \(d_{{{\tilde{A}}}} F_{{{\tilde{A}}}} = d_{A} F_{A}\) in \(\mho \). As \({\mathbf {U}}\) does not depend on t, we see that \({{\tilde{A}}}_0 = 0\). Hence A and \({{\tilde{A}}}\) are two solutions to the reduced Yang–Mills equations (17), with the same J, and the difference \(A - {{\tilde{A}}}\) satisfies (19). As they also coincide near \(\partial ^- {\mathbb {D}}\), Lemma 14 in Appendix Appendix B implies that \(A={{\tilde{A}}}\) in \({\mathbb {D}}\).

Relative Lorenz gauge

For a moment we may let (Mg) be any oriented Lorentzian manifold of even dimension. Consider two connections A and V on M solving the Yang–Mills equations without (1) and with (9) a source, respectively. That is, \(d_A^* F_A = 0\) and \(d_V^* F_V = J\). We will rewrite the latter equation in terms of the difference \(W = V - A\).

Directly from the definition of curvature


and thus

$$\begin{aligned} F_V = F_A + d_A W + [W, W]/2. \end{aligned}$$

Since \(d_A^*= \star d_A \star \) it follows that \(d_V^*= d_A^*+ \star [W, \star \cdot ]\). Combining this with (20) and \(d_A^* F_A = 0\), we see that \(d_V^* F_V = J\) is equivalent with

$$\begin{aligned} d_A^*d_A W + \star [W, \star F_A] + {\mathcal {N}}(W) = J, \end{aligned}$$

where the non-linear part reads

$$\begin{aligned} {\mathcal {N}}(W) = \frac{1}{2} d_A^*[W, W] + \star [W, \star d_A W] + \frac{1}{2} \star [W, \star [W, W]]. \end{aligned}$$

We say that \(V \in \Omega ^1(M;{\mathfrak {g}})\) is in the Lorenz gauge relative to a background connection \(A \in \Omega ^1(M;{\mathfrak {g}})\) if \(d_A^*V=d_A^*A\). In this case (21) is equivalent with

$$\begin{aligned} \Box _A W + \star [W, \star F_A] + {\mathcal {N}}(W) = J, \end{aligned}$$

where \(\Box _A = d_A d^*_A + d^*_A d_A\) is the connection wave operator.

The semilinear wave equation (23), together with suitable initial conditions, is solvable when the source J is small and smooth enough, see, for example, (the proof of) Theorem 6 in [26]. However, its solution W solves the actual Yang–Mills equations (21) if and only if \(d_A d_A^*W=0\). Recall also that if W solves (21), or equivalently (9), then J satisfies the compatibility condition (10). We will therefore study the system combining (10) and (23). Observe that (10) is equivalent with

$$\begin{aligned} \partial _t J_0 + [A_0, J_0] + [W_0, J_0] = \partial ^j J_j + [A^j, J_j] + [W^j, J_j], \end{aligned}$$

where \(j=1,2,3\). This can be viewed as an ordinary differential equation for \(J_0\).

We begin with an uniqueness result that is similar to Proposition 2. For \(r > 0\) and \(x \in {\mathbb {R}}^{1+3}\) we define the rescaled and translated diamond

$$\begin{aligned} {\mathbb {D}}(x,r) = \{ry + x : y \in {\mathbb {D}} \}. \end{aligned}$$

Lemma 4

Let \(r > 0\) and \(x \in {\mathbb {R}}^{1+3}\) and write \(\tilde{{\mathbb {D}}} = {\mathbb {D}}(x,r)\). Let \(A \in \Omega ^1(\tilde{{\mathbb {D}}},{\mathfrak {g}})\) and suppose that \(W_{(\ell )}, J_{(\ell )} \in C^2(\tilde{{\mathbb {D}}};T^*\tilde{{\mathbb {D}}}\otimes {\mathfrak {g}})\) solve

$$\begin{aligned} {\left\{ \begin{array}{ll} \Box _A W + \star [W, \star F_A] + {\mathcal {N}}(W) = J, \\ d_A^* J + \star [W, \star J] = 0, \end{array}\right. } \end{aligned}$$

in \(\tilde{{\mathbb {D}}}\) for \(\ell =1,2\). Suppose, furthermore, that \(W_{(\ell )}, J_{(\ell )}\), \(\ell =1,2\), vanish near \(\partial ^- \tilde{{\mathbb {D}}}\) and that the spatial parts of \(J_{(1)}\) and \(J_{(2)}\) of coincide on \(\tilde{{\mathbb {D}}}\), that is, \(J_{(1),j} = J_{(2),j}\) for \(j=1,2,3\). Then \(W_{(1)} = W_{(2)}\) and \(J_{(1)} = J_{(2)}\) in \(\tilde{{\mathbb {D}}}\).


Pseudolinearization analogous to that in Section 4.1.2 shows that the difference \((W_{(1)} - W_{(2)}, J_{(1)} - J_{(2)})\) solves a system of the form (65) in Appendix Appendix B with \(f_1 = 0\) and \(f_2 = 0\). The coefficients of this system depend on \(W_{(\ell )}, J_{(\ell )}\) and they satisfy the assumptions of Lemma 14 in Appendix Appendix B. Lemma 14 is formulated for \({\mathbb {D}}\) rather than for \(\tilde{{\mathbb {D}}}\), however, the form of the system (65) is invariant under a rescaling and translation. Therefore Lemma 14 holds also for \(\tilde{{\mathbb {D}}}\) and we conclude by applying it. \(\square \)

We will now turn to existence of solutions to the Yang–Mills equations. It is convenient to work in the cylinder \(M = (-2,2) \times {\mathbb {R}}^3\) containing the diamond \({\mathbb {D}}\), rather than in \({\mathbb {D}}\). Let us consider again the system combining (10) and (23),

$$\begin{aligned} {\left\{ \begin{array}{ll} \Box _A W + \star [W, \star F_A] + {\mathcal {N}}(W) = J, &{} t \ge -1, \\ d_A^* J + \star [W, \star J] = 0, &{} t \ge -1, \\ W = 0,\ J = 0, &{} t \le -1. \end{array}\right. } \end{aligned}$$

Lemma 5

Let \(A \in \Omega ^1(M; {\mathfrak {g}})\) and suppose that \(W,J \in C^3(M;T^*M\otimes {\mathfrak {g}})\) solve (25). Suppose moreover that A solves (1) in \({\mathbb {D}}\) and that \({{\,\mathrm{supp}\,}}(J_j)\), \(j=1,2,3\), is contained in the interior of \({\mathbb {D}}\). Then W solves (21) in \({\mathbb {D}}\), with J on the right-hand side.


The equations (21) and (23) differ by the term \(d_A d_A^* W\) on the left-hand side. Hence it is enough to verify that \(H = 0\) in \({\mathbb {D}}\) where \(H = d^*_A W\). We write \(V = W + A\). As A solves (1) in \({\mathbb {D}}\), \(d_V^* F_V\) coincides with the left-hand side of (21) in \({\mathbb {D}}\), and the first equation in (25), in other words (23), implies that \(d^*_V F_V + d_A H = J\) in \({\mathbb {D}}\). Applying \(d^*_V\) to this equation, we have used Lemma 2 and the second equation in (25) that \(d_V^* d_A H = 0\) in \({\mathbb {D}}\). This is a linear wave equation for H. We will show below that W vanishes near \(\partial ^- {\mathbb {D}}\). Hence also H vanishes near \(\partial ^- {\mathbb {D}}\), and as it satisfies the linear wave equation, it vanishes in the whole \({\mathbb {D}}\). This type of finite speed of propagation result is of course standard, and it follows also from Lemma 14 Appendix Appendix B.

Let us now show that W vanishes near \(\partial ^- {\mathbb {D}}\). There is \(r \in (0,1)\) such that \({{\,\mathrm{supp}\,}}(J_j) \subset {\mathbb {D}}(0,r)\) for \(j=1,2,3\). Let \(\tilde{{\mathbb {D}}}\) in Lemma 4 satisfy \(\tilde{{\mathbb {D}}} \cap {\mathbb {D}}(0,r) = \emptyset \) and \(\partial ^- \tilde{{\mathbb {D}}} \subset \{t < -1\}\). Lemma 4 implies that \(W = 0\) in \(\tilde{{\mathbb {D}}}\) by comparison with the trivial solution. By varying \(\tilde{{\mathbb {D}}}\) we see that W vanishes in \(\{t \le 0\} \setminus {\mathbb {D}}(0,r)\), and also near \(\partial {\mathbb {D}} \cap \{t=0\}\). In particular, W vanishes near \(\partial ^- {\mathbb {D}}\). \(\quad \square \)

Remark 1

As the second equation in (25) is equivalent with the ordinary differential equation (24), we see that if \({{\,\mathrm{supp}\,}}(J_j) \subset (0,T) \times K\), \(j=1,2,3\), for some \(K \subset {\mathbb {R}}^3\), then also \({{\,\mathrm{supp}\,}}(J_0) \subset (0,T) \times K\) for a solution of (25).

We prove the following result in Appendix Appendix B.

Proposition 3

Suppose that \(A \in \Omega ^1(M; {\mathfrak {g}})\) is bounded, together with all its derivatives, and let \(k \ge 4\). Then there is a neighbourhood \({\mathcal {H}}\) of the zero function in \(H^{k+2}(M;{\mathfrak {g}})\) such that for all \(J_j \in {\mathcal {H}}\), \(j=1,2,3\), there is a unique solution

$$\begin{aligned} W \in H^{k+1}(M; T^*M \otimes {\mathfrak {g}}), \quad J_0 \in H^{k+1}(M; {\mathfrak {g}}) \end{aligned}$$

of (25) with \(J=J_0 dx^0 + \dots + J_3 dx^3\). Moreover, the map \((J_1, J_2, J_3) \mapsto (W,J_0)\) is smooth from \({\mathcal {H}}^3\) to \(H^{k+1}(M; T^*M \otimes {\mathfrak {g}} \oplus {\mathfrak {g}})\).

Source-to-Solution Map

We begin with a lemma, that will be used only once, and that highlights the difference between the pointed gauge group \(G^0({\mathbb {D}}, p)\) and the full gauge group \(G({\mathbb {D}})\).

Lemma 6

Suppose that \({{\tilde{A}}} \sim A\) near \(\partial ^- {\mathbb {D}}\) and consider the modified data set

$$\begin{aligned} \tilde{{\mathcal {D}}}_A = \{ V' \in {\mathcal {D}}_A :\&V' = {\tilde{A}}\text { in }\mho \text { near }\partial ^- {\mathbb {D}} \}. \end{aligned}$$

Let \(V' \in \tilde{{\mathcal {D}}}_A\). Then there are \({\mathbf {U}}\in G^0({\mathbb {D}}, p)\) and \(V \in C^3({\mathbb {D}}; T^* {\mathbb {D}} \otimes {\mathfrak {g}})\) such that \(V' = V|_\mho \), \(V = {\mathbf {U}}\cdot {\tilde{A}}\) near \(\partial ^- {\mathbb {D}}\), and \({\mathbf {U}}= {{\,\mathrm{id}\,}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\).


It follows immediately from the definitions of the sets \({\mathcal {D}}_A\) and \(\tilde{{\mathcal {D}}}_A\) that there are \({\mathbf {U}}\in G^0({\mathbb {D}}, p)\) and \(V \in C^3({\mathbb {D}}; T^* {\mathbb {D}} \otimes {\mathfrak {g}})\) such that \(V' = V|_\mho \), \(V = {\mathbf {U}}\cdot {\tilde{A}}\) near \(\partial ^- {\mathbb {D}}\), and \(V = {{\tilde{A}}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\). Then \({\mathbf {U}}\) satisfies

$$\begin{aligned} {\mathbf {U}}\cdot {{\tilde{A}}} = {{\tilde{A}}} \end{aligned}$$

in \(\mho \) near \(\partial ^- {\mathbb {D}}\). As (26) is equivalent with the differential equation \(d{\mathbf {U}}= [{\tilde{A}}, {\mathbf {U}}]\), and \({\mathbf {U}}(p) = {{\,\mathrm{id}\,}}\), it follows that \({\mathbf {U}}= {{\,\mathrm{id}\,}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\). \(\quad \square \)

If we used gauge equivalence with respect to \(G({\mathbb {D}})\) in the definition \({\mathcal {D}}_A\), then (26) would still hold in a neighbourhood \({\mathcal {U}} \subset {{\overline{\mho }}}\) of \(\partial ^- {\mathbb {D}} \cap {{\overline{\mho }}}\), however, this simply says that \({\mathbf {U}}|_{{\mathcal {U}}}\) is in the stabilizer subgroup \(\{{\mathbf {U}}\in C^\infty ({\mathcal {U}}; G) : {\mathbf {U}}\cdot {{\tilde{A}}} = {{\tilde{A}}}\}\) with respect to \({{\tilde{A}}}|_{{\mathcal {U}}}\). In general, the stabilizer subgroup may be non-trivial.

Recall that the temporal gauge version \({\mathscr {T}}(V)\) of a connection V is defined by (14). Recall, furthermore, that the system (25) of Yang–Mills equations in relative Lorenz gauge with the compatibility condition is posed on \(M = (-2,2) \times {\mathbb {R}}^3\).

Proposition 4

Suppose that \(A \in \Omega ^{1}({\mathbb {D}};{\mathfrak {g}})\) satisfies (1) in \({\mathbb {D}}\). Then there is a connection \({\tilde{A}} \in \Omega ^{1}({\mathbb {D}}; {\mathfrak {g}})\) such that \({\tilde{A}} \sim A\) in \({\mathbb {D}}\), \({\tilde{A}}|_\mho \) is in temporal gauge, and the following holds: for all \(x \in \mho \) there are a neighbourhood \(\mho _0 \subset \mho \) of x and a neighbourhood \({\mathcal {H}}\) of the zero function in \(H_0^7(\mho _0;{\mathfrak {g}})\) such that \({\mathcal {D}}_A\) determines \({\tilde{A}}|_\mho \) and the source-to-solution map

$$\begin{aligned} L(J_1,J_2,J_3) = {\mathscr {T}}(V)|_\mho , \quad J_j \in {\mathcal {H}},\ j=1,2,3, \end{aligned}$$

where \(V = W + {\tilde{A}}\) and \((W,J_0)\) is the solution of (25) with \(J=J_0 dx^0 + \dots + J_3 dx^3\) and with A replaced by an arbitrary smooth, compactly supported extension of \({\tilde{A}}\) to M.


Let \({\tilde{A}}' \in {\mathcal {D}}_A\) be in the temporal gauge and satisfy \(d_{{\tilde{A}}'}^*F_{{\tilde{A}}'}=0\) in \(\mho \). Such \({\tilde{A}}'\) exists, for example, \({\tilde{A}}' = {\mathscr {T}}(A)|_\mho \) is a possible choice. There is \({\tilde{A}}\) such that \({\tilde{A}}' = {\tilde{A}}|_\mho \), \(d_{{\tilde{A}}}^*F_{{\tilde{A}}}=0\) in \({\mathbb {D}}\) and \({\tilde{A}} \sim A\) near \(\partial ^- {\mathbb {D}}\). Proposition 10 in Appendix Appendix B implies that \({\tilde{A}} \sim A\) in \({\mathbb {D}}\). Choose a smooth, compactly supported extension of \({{\tilde{A}}}\) in M, still denoted by \({{\tilde{A}}}\).

For \(x \in \mho \) we choose \(\epsilon > 0\) small enough so that \({\mathbb {D}}(x,\epsilon ) \subset \mho \) and let \(\mho _0\) be the interior of \({\mathbb {D}}(x,\epsilon )\). Let \(t_0\) be the time coordinate of x. Let \(J_j \in H^7_0(\mho _0; {\mathfrak {g}})\), \(j=1,2,3\), be small, and consider the solution \((W,J_0)\) of the system (25) with \(A = {\tilde{A}}\) in \((-1,t_0) \times {\mathbb {R}}^3\). This solution vanishes outside \(\mho _0\) and near \(\partial ^- {\mathbb {D}}(x,\epsilon )\), and it does not depend on \({\tilde{A}}\) away from \(\mho _0\). The vanishing of \((W,J_0)\) outside \(\mho _0\) and near \(\partial ^- {\mathbb {D}}(x,\epsilon )\) is shown similarly to the vanishing of W near \(\partial ^- {\mathbb {D}}\) in the proof of Lemma 5, and we omit this argument. To see that \((W,J_0)\) does not depend on \({\tilde{A}}\) away from \(\mho _0\), we consider two solutions to (25) with different backgrounds A in \((-1,t_0 + \epsilon ) \times {\mathbb {R}}^3\). Both the backgrounds are assumed to coincide with \({{\tilde{A}}}\) in \(\mho _0\). As both the solutions vanish near \(\partial ^- {\mathbb {D}}(x,\epsilon )\), Lemma 4 implies that they are identical in \({\mathbb {D}}(x,\epsilon )\).

Extending \((W,J_0)\) by zero we get a solution in the set \(\mho _- = \mho \cap \{t < t_0\}\). To summarize, the solution \((W,J_0)\) in \(\mho _-\) is determined by \({\tilde{A}}'\) and our choice of \(J_j\), \(j=1,2,3\). Defining a connection \({{\hat{V}}} = {{\hat{V}}}(J_1, J_2, J_3)\) on \(\mho _-\) by \({{\hat{V}}} = W + {\tilde{A}}\) we have \(d_{{{\hat{V}}}}^* F_{{{\hat{V}}}} = J\) in \(\mho _-\) where \(J = J_0 dx^0 + \dots +J_3 dx^3\). We write \(\mho _+ = \mho \cap \{t > t_0\}\), and consider the set

$$\begin{aligned} {\mathcal {L}} = {\mathcal {L}}(J_1,J_2,J_3) = \{ {\mathscr {T}}(V')&: V' \in \tilde{{\mathcal {D}}}_A, V' = {{\hat{V}}}\text { in } \mho _-, \\&\text {and the spatial part of|}\,d_{V'}^* F_{V'}\text { vanishes in }\mho _+ \}. \end{aligned}$$

Here \({\mathscr {T}}\) is defined by (14) with \(|x| < \epsilon _0\), cf. (3). No confusion should arise from our use of \({\mathscr {T}}\) for temporal gauge both in \(\mho \) and in \({\mathbb {D}}\) since \({\mathscr {T}}(V|_\mho ) = {\mathscr {T}}(V)|_\mho \) for a connection V on \({\mathbb {D}}\).

As \({{\hat{V}}}\) is determined by \({\mathcal {D}}_A\) (and the choice of \({{\tilde{A}}}'\)), also \({\mathcal {L}}\) is determined by \({\mathcal {D}}_A\). Moreover, \({\mathscr {T}}(V)|_\mho \in {\mathcal {L}}\) where \(V = W + {\tilde{A}}\) and \((W,J_0)\) is the solution of (25) in M with \(J_j\), \(j=1,2,3\), as above and \(A={\tilde{A}}\). The solution \((W,J_0)\) in M is an extension of the solution \((W,J_0)\) in \((0,t_0) \times {\mathbb {R}}^3\), which justifies our reuse of symbols. Observe that Proposition 3, together with the Sobolev embedding theorem, guarantees that \(W \in C^3({\mathbb {D}}; T^* {\mathbb {D}} \otimes {\mathfrak {g}})\), and that Remark 1 guarantees that \({{\,\mathrm{supp}\,}}(J_0) \subset \mho \).

To conclude the proof, it remains to show that \({\mathcal {L}}\) consists of a single element. Suppose that \(W', {{\tilde{W}}}' \in {\mathcal {L}}\). By Lemma 6 there are connections V, \({{\tilde{V}}}\) and gauges \({\mathbf {u}}\), \({{\tilde{{\mathbf {u}}}}}\) satisfying \(W' = {\mathscr {T}}(V)|_\mho \), \({{\tilde{W}}}' = {\mathscr {T}}({{\tilde{V}}})|_\mho \), \(d_{V}^*F_{V}=0 = d_{{{\tilde{V}}}}^*F_{{{\tilde{V}}}}\) in \({\mathbb {D}} \setminus \mho \), \(V = {\mathbf {u}}\cdot {\tilde{A}}\) and \({{\tilde{V}}} = {{\tilde{{\mathbf {u}}}}} \cdot {{\tilde{A}}}\) near \(\partial ^- {\mathbb {D}}\), and \({\mathbf {u}}= {{\,\mathrm{id}\,}}= {{\tilde{{\mathbf {u}}}}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\). We define

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t {\mathbf {U}}= -V_0 {\mathbf {U}}, \\ {\mathbf {U}}|_{t = \psi (|x|)} = {{\,\mathrm{id}\,}}, \end{array}\right. } \quad {\left\{ \begin{array}{ll} \partial _t {{\tilde{{\mathbf {U}}}}} = -{{\tilde{V}}}_0 {{\tilde{{\mathbf {U}}}}}, \\ {{\tilde{{\mathbf {U}}}}}|_{t = \psi (|x|)} = {{\,\mathrm{id}\,}}, \end{array}\right. } \end{aligned}$$

and set \(W = {\mathbf {U}}\cdot V\) and \({{\tilde{W}}} = {{\tilde{{\mathbf {U}}}}} \cdot {{\tilde{V}}}\). Then \(W_0 = 0 = {{\tilde{W}}}_0\) in \({\mathbb {D}}\). Moreover, it follows from the definition of \({\mathscr {T}}\) that \(W' = W|_{\mho }\) and \({{\tilde{W}}}' = {{\tilde{W}}}|_{\mho }\).

There holds \(V = {{\tilde{A}}} = {{\tilde{V}}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\). This implies \({\mathbf {U}}= {{\tilde{{\mathbf {U}}}}}\) and \(W = {{\tilde{W}}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\). Writing \({\mathbf {U}}_- = {\mathbf {U}}{\mathbf {u}}{{\tilde{{\mathbf {u}}}}}^{-1}{{\tilde{{\mathbf {U}}}}}^{-1}\), we have that \(W = {\mathbf {U}}_- \cdot {{\tilde{W}}}\) near \(\partial ^- {\mathbb {D}}\) and \({\mathbf {U}}_-= {{\,\mathrm{id}\,}}\) in \(\mho \) near \(\partial ^- {\mathbb {D}}\).

In fact, as \(V = {{\hat{V}}} = {{\tilde{V}}}\) in \(\mho _-\), we have \({\mathbf {U}}= {{\tilde{{\mathbf {U}}}}}\) and \(W = {{\tilde{W}}}\) in \(\mho _-\). Hence also \(d_{W}^* F_W = d_{{{\tilde{W}}}}^* F_{{{\tilde{W}}}}\) in \(\mho _-\). The spatial parts of \(d_{V}^* F_V\) and \(d_{{{\tilde{V}}}}^* F_{{{\tilde{V}}}}\) vanish in \(\mho _+\). As gauge transformations act componentwise on \(d_{W}^* F_W\), see (12), also the spatial parts of \(d_{W}^* F_W\) and \(d_{{{\tilde{W}}}}^* F_{{{\tilde{W}}}}\) vanish in \(\mho _+\). Writing \(J_0\) for the temporal part of \(d_{W}^* F_W\), the compatibility condition \(d_W^* d_{W}^* F_W = 0\), see Lemma 2, together with \(W_0 = 0\), implies that \(\partial _t J_0 = 0\) in \(\mho _+\). The same holds for \({{\tilde{J}}}_0\), the temporal part of \(d_{{{\tilde{W}}}}^* F_{{{\tilde{W}}}}\). But \(J_0 ={{\tilde{J}}}_0\) on \(\mho \cap \{t=t_0\}\), and hence \(J_0 ={{\tilde{J}}}_0\) in \(\mho _+\). To summarize \(d_{W}^* F_W = d_{{{\tilde{W}}}}^* F_{{{\tilde{W}}}}\) in \(\mho \). Proposition 2 implies that \(W = {{\tilde{W}}}\) in \(\mho \). In other words \(W' = {{\tilde{W}}}'\) and this is the only element in \({\mathcal {L}}\). \(\quad \square \)

Linearization of the Yang–Mills Equations in Lorenz Gauge

Let us study multiple-fold linearizations of (23). Consider a three-parameter family

$$\begin{aligned} (W,J) = (W(\epsilon ), J(\epsilon )), \quad \epsilon = (\epsilon _{(1)}, \epsilon _{(2)}, \epsilon _{(3)}), \end{aligned}$$

of solutions to (23), vanishing for \(t \le 0\), where \(\epsilon \) is in a neighbourhood of the origin in \({\mathbb {R}}^3\). Assume that the source term is linear in the sense that \(J = \sum _{k = 1}^3 \epsilon _{(k)} J_{(k)}\) for some \(J_{(k)} \in \Omega ^1({\mathbb {R}}^{1+3}; {\mathfrak {g}})\). Writing

$$\begin{aligned} Y_{(k)} = \frac{\partial W}{\partial \epsilon _{(k)}}\bigg |_{\epsilon = 0}, \quad Y_{(kl)} = \frac{\partial ^2 W}{\partial \epsilon _{(k)}\partial \epsilon _{(l)}}\bigg |_{\epsilon = 0}, \quad Y_{(123)} = \frac{\partial ^3 W}{\partial \epsilon _{(1)}\partial \epsilon _{(2)}\partial \epsilon _{(3)}}\bigg |_{\epsilon = 0}, \end{aligned}$$

and differentiating (23) in \(\epsilon \) gives the following system of linear wave equations

$$\begin{aligned} {\left\{ \begin{array}{ll} \Box _A Y_{(k)} + \star [Y_{(k)}, \star F_A] = J_{(k)}, &{} t \ge 0, \\ \Box _A Y_{(kl)} + \star [Y_{(kl)}, \star F_A] + N(2) = 0, &{} t \ge 0, \\ \Box _A Y_{(123)} + \star [Y_{(123)}, \star F_A] + N(3) = 0, &{} t \ge 0, \\ Y_{(k)} = Y_{(kl)} = Y_{(123)} = 0, &{} t \le 0, \end{array}\right. } \end{aligned}$$

where the nonlinear terms read

$$\begin{aligned} N(2)&= \frac{1}{2}d_A^*[Y_{(k)}, Y_{(l)}] + \frac{1}{2}d_A^*[Y_{(l)}, Y_{(k)}] + \star [Y_{(k)}, \star d_A Y_{(l)} ] + \star [Y_{(l)}, \star d_A Y_{(k)} ], \end{aligned}$$

and, writing \(S_3\) for the set of permutations on \(\{1,2,3\}\),

$$\begin{aligned} N(3)&= \frac{1}{2} \sum _{\pi \in S_3} \bigg ( \frac{1}{2}d_A^*[Y_{(\pi (1)\pi (2))}, Y_{(\pi (3))}] + \frac{1}{2}d_A^*[Y_{(\pi (1))}, Y_{(\pi (2)\pi (3))}] \\ {}&\qquad \quad + \star [Y_{(\pi (1)\pi (2))}, \star d_A Y_{(\pi (3))} ] + \star [Y_{(\pi (1))}, \star d_A Y_{(\pi (2)\pi (3))} ] \\&\qquad \quad + 2 \star [Y_{(\pi (1))}, \star [Y_{(\pi (2))}, Y_{(\pi (3))}]]\bigg ). \end{aligned}$$

Now we continue the calculation in Cartesian coordinates in Minkowski space \({\mathbb {R}}^{1+3}\), and use the formulas

$$\begin{aligned} d_A^*[X, Z]&= [d_A^* X, Z]-[X, d_A^* Z] \nonumber \\&\qquad + [\partial ^\alpha X_\beta + [A^\alpha , X_\beta ], Z_\alpha ] dx^\beta - [X_\alpha , \partial ^\alpha Z_\beta + [A^\alpha , Z_\beta ]] dx^\beta , \end{aligned}$$
$$\begin{aligned} \star [X, \star d_A Z]&= -[X^\alpha , \partial _\alpha Z_\beta + [A_\alpha , Z_\beta ]] dx^\beta +[X^\alpha , \partial _\beta Z_\alpha + [A_\beta , Z_\alpha ]] dx^\beta , \end{aligned}$$
$$\begin{aligned} \star [X, \star [Y, Z]]&=- [X^\alpha , [Y_\alpha , Z_\beta ]] dx^\beta + [X^\alpha , [Y_\beta , Z_\alpha ]] dx^\beta . \end{aligned}$$

These formulas are derived in Appendix Appendix A. Using (29)–(31) and the Lorenz gauge condition \(d_A^* W = 0\), we rewrite the first three equations in (28), modulo lower order terms, as follows

$$\begin{aligned} \Box _A Y_{(k)}&= J_{(k)}, \end{aligned}$$
$$\begin{aligned} \Box _A Y_{(kl)}&= {\tilde{N}}(2), \end{aligned}$$
$$\begin{aligned} \Box _A Y_{(123)}&= {\tilde{N}}(3), \end{aligned}$$

where the components of the right-hand sides of the last two equations read

$$\begin{aligned} {\tilde{N}}_\beta (2)&= 2 [Y_{(k)}^\alpha , \partial _\alpha Y_{(l), \beta }] - [Y_{(k)}^\alpha , \partial _\beta Y_{(l), \alpha }] + 2 [Y_{(l)}^\alpha , \partial _\alpha Y_{(k), \beta }] - [Y_{(l)}^\alpha , \partial _\beta Y_{(k), \alpha }],\\ {\tilde{N}}_\beta (3)&= \frac{1}{2}\sum _{\pi \in S_3} \bigg ( 2[Y_{(\pi (1)\pi (2))}^\alpha , \partial _\alpha (Y_{(\pi (3)), \beta })] - [Y_{(\pi (1)\pi (2))}^\alpha , \partial _\beta (Y_{(\pi (3)), \alpha }) ] \\&\qquad \quad + 2[Y_{(\pi (1))}^\alpha , \partial _\alpha (Y_{(\pi (2)\pi (3)), \beta })] - [Y_{(\pi (1))}^\alpha , \partial _\beta (Y_{(\pi (2)\pi (3)), \alpha }) ] \\&\qquad \quad + 4 [Y_{(\pi (1))}^\alpha , [Y_{(\pi (2)), \alpha }, Y_{(\pi (3)), \beta }] ] \bigg ). \end{aligned}$$

Preliminaries on Microlocal Analysis

Distributions associated to conormal bundles and two Lagrangians

The advantage of working in the relative Lorenz gauge is that the Yang–Mills equations reduces to a cubic nonlinear wave equation with the linear part given by the connection wave operator \(\Box _A\), modulo zeroth order terms. The parametrix for \(\Box _A\) is a distribution associated to an intersecting pair of Lagrangians (shortly an IPL distribution), in the sense of [36], and we use the product calculus of conormal distributions to study the non-linear part.

The proof of Proposition 1 in the next section relies solely on symbolic computations, and we recall here only that conormal and IPL distributions have principal symbols and that the corresponding symbol maps are isomorphisms, modulo lower order terms in a suitable sense. We will not recall the definitions of these classes of distributions, them being somewhat technical, instead we refer the reader to [7] for a review of the theory that we use and that was originally developed in [11, 18, 36]. Even the precise definition of spaces of symbols is not important for our present purposes, since we will consider only symbols that are positively homogeneous in the fibre variable.

Recall that a pseudodifferential operator A on a manifold X with a homogeneous principal symbol a is said to be elliptic at \((x,\xi ) \in T^*X \setminus 0\) if \(a(x,\xi ) \ne 0\). The wavefront set \({{\,\mathrm{WF}\,}}(u) \subset T^*X \setminus 0\) of a distribution u on X is the complement of its regular set, whilst the regular set consists of such points \((x,\xi ) \in T^*X \setminus 0\) that there is a zeroth order pseudodifferential operator A that is elliptic at \((x,\xi )\) and that satisfies \(Au \in C^\infty (X)\). We denote by \({{\,\mathrm{singsupp}\,}}(u)\) the projection of \({{\,\mathrm{WF}\,}}(u)\) on X, and by \({{\,\mathrm{WF}\,}}(A)\) the essential support of A, that is, the projection of \({{\,\mathrm{WF}\,}}({\mathscr {A}}) \subset (T^*X \setminus 0)^2\) on the first factor \(T^*X\setminus 0\) where \({\mathscr {A}}\) is the Schwartz kernel of A. Moreover, we say that A is a microlocal cutoff near \((x,\xi ) \in T^*X \setminus 0\) if A is elliptic at \((x,\xi )\) and \({{\,\mathrm{WF}\,}}(A)\) is contained in a small neighbourhood of \(\{(x, \lambda \xi ) : \lambda > 0\}\).

Let E be a complex smooth vector bundle over X and \(\Omega ^{1/2}\) the half density bundle. A conormal distribution \(u \in I^m(N^*Y; E \otimes \Omega ^{1/2})\) of order \(m \in {\mathbb {R}}\) is a compactly supported distribution taking values on the tensor bundle \(E \otimes \Omega ^{1/2}\) with \(\text {WF}(u)\) contained in the conormal bundle \(N^*Y\) of a submanifold Y of X. In addition, u is required to have certain local structure on Y, see (2.4.1) in [18], precise form of which is not important for our purposes. What is important is that the principal symbol \(\sigma [u]\) of u is a smooth section of \(E \otimes \Omega ^{1/2}\), invariantly defined on \(N^*Y \setminus 0\), and that the principal symbol map \(u \mapsto \sigma [u]\) gives the short exact sequence,

$$\begin{aligned} 0 \rightarrow I^{m-1}(N^*Y; E \otimes \Omega ^{1/2}) \hookrightarrow I^m(N^*Y; E \otimes \Omega ^{1/2}) \nonumber \\ \xrightarrow {\;\sigma \;} S^{m + n/4}/S^{m + n/4 - 1}(N^*Y; E \otimes \Omega ^{1/2}) \rightarrow 0, \end{aligned}$$

see [18, Theorem 2.4.2] and [19, Theorem 18.2.11]. Here n is the dimension of X and \(S^{m}(N^*Y; E \otimes \Omega ^{1/2})\), with \(m \in {\mathbb {R}}\), is the space of symbols, see [19, Definition 18.2.10]. For our purposes it suffices to note that positively homogeneous sections of degree m are in this space, and that if \(\Omega ^{1/2}\) is trivialized by choosing a nowhere vanishing positively homogeneous section \(\mu \) of degree r, then \(\sigma [u]\) is positively homogeneous of degree \(m + r\) if

$$\begin{aligned} (\mu ^{-1} \sigma [u])(x, \lambda \xi ) = \lambda ^m (\mu ^{-1} \sigma [u]) (x, \xi ), \qquad \text{ for } \text{ any } \lambda > 0 \text{ and } (x,\xi ) \in N^* Y \setminus 0. \end{aligned}$$

Since the half density is involved here, the given homogeneity looks a little different from the classical definition in [19, p.67].

More generally, a Lagrangian distribution \(u \in I^{m}(\Lambda ; E \otimes \Omega ^{1/2})\) is a compactly supported distribution with \(\text {WF}(u)\) contained in a conical Lagrangian submanifold \(\Lambda \) of \(T^*X \setminus 0\), and certain local structure, see (3.2.14) in [18]. Its principal symbol is invariantly defined on \(\Lambda \) as a smooth section of the bundle \(E \otimes \Omega ^{1/2} \otimes L\), where L is the Maslov bundle over \(\Lambda \). Analogously to (35) the principal symbol map gives an isomorphism

$$\begin{aligned} I^m(\Lambda ; E \otimes \Omega ^{1/2}) \rightarrow S^{m + n/4}(\Lambda ; E \otimes \Omega ^{1/2} \otimes L) \end{aligned}$$

modulo lower order terms, see [18, Theorem 3.2.5]. We write also

$$\begin{aligned} I(\Lambda ; E) = \bigcup _{m \in {\mathbb {R}}} I^{m}(\Lambda ; E \otimes \Omega ^{1/2}). \end{aligned}$$

The notion of Lagrangian distributions is insufficient to completely describe the fundamental solution of wave equations as two Lagrangian manifolds are needed in order to describe the propagating singularities and the singularities at the source. An IPL distribution \(u \in I^{m}(\Lambda _0, \Lambda _1; E \otimes \Omega ^{1/2})\) is compactly supported distribution with \(\text {WF}(u)\) contained in \(\Lambda _0\cup \Lambda _1\), where \((\Lambda _0, \Lambda _1)\) is a cleanly intersecting pair of conical Lagrangian submanifolds of \(T^*X \setminus 0\), and with certain local structure on \(\Lambda _0 \cup \Lambda _1\), see [36]. Here \(\Lambda _1\) is a manifold with boundary, while \(\Lambda _0\) is a manifold without boundary, and by cleanly intersecting, we mean

$$\begin{aligned} \Lambda _0 \cap \Lambda _1 = \partial \Lambda _1, \quad T_\lambda (\Lambda _0) \cap T_\lambda (\Lambda _1) = T_\lambda (\partial \Lambda _1). \end{aligned}$$

Again what we really need in the present paper is the symbol map for such distributions. In this case the symbol map is an isomorphism, modulo lower order terms, from \(I^{m}(\Lambda _0, \Lambda _1; E \otimes \Omega ^{1/2})\) to the space

$$\begin{aligned} \left\{ (a^{(1)}, a^{(0)}) \,\bigg | \begin{array}{l} a^{(0)} \in S^{m-1/2+n/4}(\Lambda _0 \setminus \partial \Lambda _1; E \otimes \Omega ^{1/2} \otimes L),\\ a^{(1)} \in S^{m+n/4}(\Lambda _1; E\otimes \Omega ^{1/2} \otimes L),\\ a^{(1)}|_{\partial \Lambda _1} = {\mathscr {R}} a^{(0)},\\ ha^{(0)} \text{ is } \text{ smooth } \text{ up } \text{ to } \partial \Lambda _1 \text{ if } h \text{ vanishes } \text{ on } \partial \Lambda _1. \end{array} \right\} . \end{aligned}$$

We remark that \({\mathscr {R}}\) maps the \(E\otimes \Omega ^{1/2} \otimes L\)-valued symbols over \(\Lambda _0\) to the \(E\otimes \Omega ^{1/2} \otimes L\)-valued symbols over \(\Lambda _1\) and acts as a multiplication by a scalar on E.

If \((x,\xi ) \in \Lambda _j \setminus \partial \Lambda _1\) for \(j=0\) or \(j=1\), then there is a microlocal cutoff \(\chi \) near \((x,\xi )\) such that \(\chi u \in I(\Lambda _j; E)\) for all \(u \in I^{m}(\Lambda _0, \Lambda _1; E \otimes \Omega ^{1/2})\). The only place where we need the full picture of IPL distributions, instead of the above microlocal reduction to Lagrangian distributions, is equation (39) giving an initial condition on \(\partial \Lambda _1\) for a transport equation on \(\Lambda _1\). Moreover, apart from (39), we can also avoid the use of Lagrangian distributions in favour of conormal distributions, since all the Lagrangian manifolds \(\Lambda _0\) and \(\Lambda _1\) considered below will be conormal bundles away from \(\partial \Lambda _1\).

The principal symbol \(\sigma [\Box _A]\) and the subprincipal symbol \(\sigma _{\text {sub}}[\Box _A]\) read

$$\begin{aligned} \sigma [\Box _A](x, \xi ) = \xi ^\alpha \xi _\alpha , \quad \sigma _{\text {sub}}[\Box _A](x, \xi ) = 2 \imath ^{-1} [\xi ^\alpha A_\alpha , \cdot ]. \end{aligned}$$

We denote by \(\Phi _s\), \(s \in {\mathbb {R}}\), the flow of the Hamilton vector field \(H_{\sigma [\Box _A]}\) of \(\sigma [\Box _A]\), and define for a subset \({\mathscr {B}}\) of the characteristic set \(\Sigma \) of \(\Box _A\) the future flowout of \({\mathscr {B}}\) by

$$\begin{aligned} \{ (y,\eta ) \in \Sigma ;\ (y,\eta ) = \Phi _s(x,\xi ),\ s \in {\mathbb {R}},\ (x,\xi ) \in {\mathscr {B}},\ y \ge x \}. \end{aligned}$$

As \(\Box _A\) is of real principal type one can use the theory by Hörmander and Duistermaat [11] to understand its parametrix. A completely symbolic parametrix construction, based on IPL distributions, was given by Melrose and Uhlmann [36], and the following adaptation of their construction in the vector valued case can be found in [7]:

Proposition 5

Let \(\Lambda _0\) be a conormal bundle such that \(H_{\sigma [\Box _A]}\) is nowhere tangent to \(\Lambda _0\). Denote by \(\Lambda _1\) the future flowout of \(\Lambda _0 \cap \Sigma \). Consider the wave equation

$$\begin{aligned} \left\{ \begin{array}{ll} \Box _A u = f,&{} \text{ in } {\mathbb {R}}^{1 + 3}\\ u|_{t<0} = 0,&{} \end{array} \right. \end{aligned}$$

where \(f \in I(\Lambda _0; E)\) and \(E = T^*{\mathbb {R}}^{1+3} \otimes {\mathfrak {g}}\). Then \(u \in \bigcup _{m \in {\mathbb {R}}} I^{m}(\Lambda _0, \Lambda _1; E \otimes \Omega ^{1/2})\) and the corresponding principal symbols satisfy

$$\begin{aligned} ({\mathscr {L}}_{H_{\sigma [\Box _A]}} + \imath \sigma _{\text {sub}}[\Box _A]) \sigma [u] = 0&\text{ on } \Lambda _1 \setminus \Lambda _0, \end{aligned}$$
$$\begin{aligned} \sigma [u] = {\mathscr {R}}((\sigma [\Box _A])^{-1} \sigma [f])&\text{ on } \Lambda _1 \cap \Lambda _0. \end{aligned}$$

Here \({\mathscr {L}}_{H_{\sigma [\Box _A]}}\) denotes the Lie derivative with respect to \(H_{\sigma [\Box _A]}\).

We will compute symbols related to the non-linear terms by using the following result, implicitly contained in [15] and explicitly formulated for example in [7].

Proposition 6

Let \(K_{(1)}\) and \(K_{(2)}\) be two transversal submanifolds of X, let

$$\begin{aligned} (x,\xi ) \in N^*(K_{(1)} \cap K_{(2)}) \setminus (N^*K_{(1)} \cup N^*K_{(2)}), \end{aligned}$$

and let \(u_{(j)} \in I(N^* K_{(j)}; E)\), \(j=1,2\). If \(\chi \) is a microlocal cutoff near \((x,\xi )\) and \(\mu \) is a nowhere vanishing half density on X, then writing \(u_{(1)} u_{(2)} = \mu (\mu ^{-1}u_{(1)}) (\mu ^{-1} u_{(2)})\), there holds \(\chi (u_{(1)}u_{(2)}) \in I(N^*(K_{(1)} \cap K_{(2)}); E)\) and

$$\begin{aligned} \sigma [\chi (u_{(1)}u_{(2)})](x, \xi ) = \mu ^{-1}(x)\sigma [\chi ](x, \xi ) \sigma [u_{(1)}](x, \xi _1) \sigma [u_{(2)}](x, \xi _{(2)}), \end{aligned}$$

where \(\xi = \xi _{(1)} + \xi _{(2)}\) with \(\xi _{(1)} \in N^*K_{(1)}\) and \(\xi _{(2)} \in N^*K_{(2)}\).

Parallel transport for the principal symbol

As in [7], the transport equation (38) can be understood as a parallel transport equation as in Section 2,

$$\begin{aligned} \partial _s {{\hat{u}}}_\alpha + [\left\langle A, {{\dot{\gamma }}} \right\rangle , {{\hat{u}}}_\alpha ] = 0, \quad {{\hat{u}}}_\alpha (s) = e^{\varrho (s)}(\mu ^{-1}\sigma [u_\alpha ])(\varvec{\beta }(s)),\quad u = u_\alpha dx^\alpha . \end{aligned}$$

Here \(\mu \) is a nowhere vanishing half density on \(\Lambda _1 \setminus \Lambda _0\), \(\varvec{\beta }(s) = (\gamma (s), {{\dot{\gamma }}}^*(s))\), with \({{\dot{\gamma }}}^* = {{\dot{\gamma }}}_\alpha dx^\alpha \), is the bicharacteristic curve emanating from \(\varvec{\beta }(0) \in \Lambda _0 \cap \Lambda _1\), and

$$\begin{aligned} \varrho (s) = \int _{0}^s (\mu ^{-1}{\mathscr {L}}_{H_{\sigma [\Box _A]}} \mu )(\varvec{\beta }(r)) dr. \end{aligned}$$

Comparing with (7), we see that the 1-form components \({{\hat{u}}}_\alpha \) satisfy the parallel transport equation on \(M \times {\mathfrak {g}}\) corresponding to the adjoint representation of G. In particular, if \(x,y \in {\mathbb {L}}\) and the singular support of f does not intersect the line segment from x to y, then

$$\begin{aligned} e^{\varrho (s)}(\mu ^{-1}\sigma [u_\alpha ])[u_\alpha ](y, \xi ) = {\mathbf {P}}_{y \leftarrow x}^{A,{{\,\mathrm{Ad}\,}}} \left( (\mu ^{-1}\sigma [u_\alpha ])[u_\alpha ](x, \xi )\right) , \end{aligned}$$

where \(\xi \) is the covector corresponding to the direction of the line segment, and \(\varvec{\beta }\) in (41) satisfies \(\varvec{\beta }(0) = (x,\xi )\) and \(\varvec{\beta }(s) = (y,\xi )\).

We will also need the fact that positive homogeneity is preserved in (42) in the sense of the following proposition, where we have fixed a nowhere vanishing half density \(\mu \) of degree 1/2 on \(\Lambda _1 \setminus \Lambda _0\).

Proposition 7

Let \(u \in I(\Lambda _0, \Lambda _1; T^*{\mathbb {R}}^{1+3} \otimes {\mathfrak {g}} \otimes \Omega ^{1/2})\) be an IPL distribution solving (37) and its symbol \(\sigma [u]\) positively homogeneous of degree \(q + 1/2\) on \(\Lambda _1 \setminus \Lambda _0\). Suppose that \(\Lambda _1 \setminus \Lambda _0 = N^*K \setminus 0\) for some \(K \subset {\mathbb {R}}^{1+3}\). Then for any \((y, \xi ) \in N^*K \setminus 0\) with \((y, \xi ) = \Phi _s(x, \xi )\) for some \(s \in {\mathbb {R}}\), we have

$$\begin{aligned} e^{\varrho (s)}(\mu ^{-1}\sigma [u])(y, \pm \lambda \xi ) = \lambda ^q {\mathbf {P}}_{y \leftarrow x}^{A,{{\,\mathrm{Ad}\,}}} ((\mu ^{-1}\sigma [u])(x, \pm \xi ) ), \quad \text{ for } \text{ any } \lambda > 0. \end{aligned}$$

Recall that \(\Phi _s\) is the flow of the Hamilton vector field \(H_{\sigma [\Box _A]}\). For the proof, the reader is referred to our work [7, Proposition 1].

Proof of Proposition 1

We follow the construction in [7], however, the analysis in the present paper is more involved due to the non-linearity in Yang–Mills equations being more complicated than the simple cubic non-linearity considered in [7], and also due to the gauge invariance of the Yang–Mills equations. We will focus on the new features of the proof and refer to [7] for technical details that are unchanged.

In order to apply the microlocal machinery in Section 7 we need to consider the Yang–Mills equations on the tensor product bundle \(T^* {\mathbb {R}}^{1+3} \otimes {\mathfrak {g}}\otimes \Omega ^{1/2}\). This is achieved by choosing a nowhere vanishing half density \(\mu \) on \({\mathbb {R}}^{1+3}\) and by considering the conjugated operator \(\mu ^{-1} P(\mu W)\) instead of \(P(W) = \Box _A W + \star [W, \star F_A] + {\mathcal {N}}(W)\), cf. (23). In fact, we choose \(\mu \) so that \(\mu =1\) identically in the Cartesian coordinates, and to simplify the notation, we omit writing \(\mu \) in what follows. However, we warn the reader that additional determinant factors appear in other coordinates. These can be included in the factors \({{\tilde{\alpha }}}_{(k)}\) in (51), and \(\alpha _{(k)}\), \(\alpha _{(kl)}\) and \(\alpha \) in (53).

Recall that \({\mathbb {S}}^+(\mho )\) is defined by (8). Let \((x_{(1)},y,z) \in {\mathbb {S}}^+(\mho )\) and consider the line segments \(\gamma _{y \leftarrow x_{(1)}}\) and \(\gamma _{z \leftarrow y}\) from \(x_{(1)}\) to y and from y to z, respectively. We write

$$\begin{aligned} \eta = {{\dot{\gamma }}}_{z \leftarrow y}^*(0), \quad \xi _{(1)} = {{\dot{\gamma }}}_{y \leftarrow x_{(1)}}^*(\ell ), \end{aligned}$$

where \(\ell \in {\mathbb {R}}\) satisfies \(\gamma _{y \leftarrow x_{(1)}}(\ell ) = y\) and \(\cdot ^* : T_y {\mathbb {R}}^{1+3} \rightarrow T_y^* {\mathbb {R}}^{1+3}\) denotes the tangent-cotangent isomorphism given by the Minkowski metric. After rescaling \(\eta \) and \(\xi _{(1)}\), and after a rotation in \({\mathbb {R}}^3\), we may assume that

$$\begin{aligned} \eta = (1, -a(r), r, 0), \quad \xi _{(1)} = (1,1,0,0), \end{aligned}$$

where \(a(r) = \sqrt{1-r^2}\) and \(r \in (-1,1)\). Then we let \(s > 0\) be small and set

$$\begin{aligned} \xi _{(2)} = (1,a(s),s,0), \quad \xi _{(3)} = (1,a(s),-s,0). \end{aligned}$$

The rationale behind this choice of \(\xi _{(k)}\), \(k=2,3\), is that now \(\eta \) can be written as the linear combination

$$\begin{aligned} \eta = \kappa _{(1)} \xi _{(1)} + \kappa _{(2)} \xi _{(2)} + \kappa _{(3)} \xi _{(3)}, \end{aligned}$$

where the scalars \(\kappa _{(k)}\) are given explicitly by

$$\begin{aligned} \kappa _{(1)} = 1 - \frac{1 + a(r)}{1 - a(s)}, \quad \kappa _{(2)} = \frac{1 + a(r)}{2(1 - a(s))} + \frac{1}{2} \frac{r}{s}, \quad \kappa _{(3)} = \frac{1 + a(r)}{2(1 - a(s))} - \frac{1}{2} \frac{r}{s}. \end{aligned}$$

Writing \(\gamma (\cdot ; x, \xi )\) for the geodesic on \({\mathbb {R}}^{1+3}\) with the initial conditions \(\gamma (0; x, \xi ) = x\) and \({{\dot{\gamma }}}^*(0;x, \xi ) = \xi \), we define

$$\begin{aligned} x_{(k)} = \gamma (-\ell ;y,\xi _{(k)}), \quad k=2,3. \end{aligned}$$

Then \(x_{(2)}, x_{(3)} \in \mho \) for small enough \(s > 0\).

It turns out that in the coordinates satisfying (44)–(45) it is enough to use sources with all but the \(dx^2\) component vanishing. Let \(b_{(k)} \in {\mathfrak {g}}\) and set

$$\begin{aligned} J_{(k),2} = J_{(k),2}(s) = b_{(k)} \chi _{(k)} \delta _{x_{(k)}}, \quad k=1,2,3, \end{aligned}$$

where \(\delta _{x_{(k)}}\) is the Dirac delta distribution at \(x_{(k)}\) and \(\chi _{(k)}\) is a microlocal cutoff near \((x_{(k)}, \pm \xi _{(k)})\). Here the sign is chosen to be that of \(\kappa _{(k)}\), that is, − for \(k=1\) and \(+\) for \(k=2,3\). Moreover, \(\chi _{(k)}\) is chosen so that

  • (\(\chi \)1) the principal symbol \(\sigma [\chi _{(k)}]\) is positively homogeneous of degree q;

  • (\(\chi \)2) \({{\,\mathrm{supp}\,}}(J_{(k),2}) \subset \mho _{(k)}\) where \(\mho _{(k)} \subset \mho \) is a neighbourhood of \(x_{(k)}\), and for all \(k \ne l\) it holds that \(x_{(l)} \notin {\mathcal {J}}^+(\mho _{(k)})\) where

    $$\begin{aligned} {\mathcal {J}}^+(\mho _{(k)}) = \{y \in {\mathbb {R}}^{1+3} : x < y\text { or }x = y\text { for some }x \in \mho _{(k)} \}; \end{aligned}$$
  • (\(\chi \)3) \({{\hat{\mho }}}_{(k)} \cap \Gamma _{(l)} = \emptyset \) for all \(k \ne l\) where

    $$\begin{aligned} {{\hat{\mho }}}_{(k)}&= \{(t,x') \in {\mathbb {R}}^{1+3}: ({{\tilde{t}}}, x') \in \mho _{(k)} \text { for some }{{\tilde{t}}} \in {\mathbb {R}}\}, \\ \Gamma _{(k)}&= \{\gamma ({{\tilde{t}}}; x_{(k)}, \xi ) : {{\tilde{t}}} \in {\mathbb {R}},\ (x_{(k)}, \xi ) \in {{\,\mathrm{WF}\,}}(\chi _{(k)})\}. \end{aligned}$$

The degree \(q \in {\mathbb {R}}\) is chosen negative enough so that \(J_{(k),2} \in H_0^7(\mho ; {\mathfrak {g}})\). The geometric setting is shown in Figure 2.

Fig. 2
figure 2

Three line segments (in black) along the lightlike geodesics \(\gamma _{y \leftarrow x_{(k)}}\) from \(x_{(k)}\) (in red) to y (in blue), \(k=1,2,3\), in the hyperplane \(x^3=0\). Coordinates are chosen so that (44)–(45) hold and that \(x_{(1)}\) is at the origin. All three points \(x_{(k)}\) are in the plane \(x^0=0\), and there exist neighbourhoods \(\Omega _{(k)}\) of \(x_{(k)}\) so that (\(\chi \)2) holds. The set \({{\hat{\Omega }}}_{(k)}\) is a small neighbourhood of the dashed red line through \(x_{(k)}\) (in particular, \({{\hat{\Omega }}}_{(1)}\) is a neighbourhood of the \(x^0\)-axis), and \(\Gamma _{(k)}\) is a small neighbourhood of the black line through \(x_{(k)}\), for small \(\Omega _{(k)}\) and \({{\,\mathrm{WF}\,}}(\chi _{(k)})\), hence (\(\chi 3\)) holds

Proposition 8

Let \(x_{(1)},y,z\) and \(\eta \), as well as, \(b_{(k)}\) and \(J_{(k),2}(s)\), with \(k=1,2,3\) and small \(s>0\), be as above, and define for \(\epsilon _{(k)} \in {\mathbb {R}}\), \(k=1,2,3\),

$$\begin{aligned} J_{2}(\epsilon ,s) = \epsilon _{(1)} J_{(1),2}(s) + \epsilon _{(2)} J_{(2),2}(s) + \epsilon _{(3)} J_{(3),2}(s), \quad \epsilon = (\epsilon _{(1)}, \epsilon _{(2)}, \epsilon _{(3)}). \end{aligned}$$

Let \({{\tilde{A}}}\) and L be as in Proposition 4. Suppose that \(r \ne 0\) in (44), \(b_{(2)} = b_{(3)}\). Then for any \(s_0>0\), the following point values of symbols

$$\begin{aligned} \sigma [\partial _{\epsilon _{(1)}}\partial _{\epsilon _{(2)}}\partial _{\epsilon _{(3)}}L(0, J_{2}(0, s), 0)](z,\eta ), \quad s \in (0,s_0), \end{aligned}$$

determine \({\mathbf {S}}^{{{\tilde{A}}},{{\,\mathrm{Ad}\,}}}_{z \leftarrow y \leftarrow x_{(1)}}[b_{(2)}, [b_{(1)}, b_{(2)}]]\).

As \((x_{(1)},y,z) \in {\mathbb {S}}^+(\mho )\) and \(b_{(1)}, b_{(2)} \in {\mathfrak {g}}\) can be chosen arbitrarily apart from the constraint \(r \ne 0\), Proposition 1 follows from Propositions 4 and 8 together with Proposition 9 in Section 9 below. Here the case \(r=0\) follows by continuity.

For the convenience of readers who do not wish to enter into theory of Lie algebras, we have included an elementary alternative to Proposition 9 in the case \({\mathfrak {g}}= {{\,\mathrm{{\mathfrak {su}}}\,}}(n)\), with \(n \ge 2\), see Lemma 16 in Appendix Appendix C. This special case is interesting in view of the \({{\,\mathrm{SU}\,}}(3) \times {{\,\mathrm{SU}\,}}(2) \times \mathrm {U}(1)\) gauge group of the standard model.

We will proceed to give a proof of Proposition 8 in Sections 8.18.3.

Microlocal reduction from (25) to (23)

Let \(J_{(k),2}\), \(k=1,2,3\), be as in (47), and write \(J_2 = J_2(\epsilon ,s)\) for the function defined by (48). To simplify the notation, we write \(J_j = J_{(k),j} = 0\) for \(k=1,2,3\) and \(j=1,3\), and, for the remainder of this section, somewhat abusively \(A = {{\tilde{A}}}\) where \({\tilde{A}}\) is as in Proposition 4. Then we denote by

$$\begin{aligned} (W,J_0) = (W(\epsilon ),J_0(\epsilon )), \quad \epsilon = (\epsilon _{(1)}, \epsilon _{(2)}, \epsilon _{(3)}), \end{aligned}$$

the solution of (25) with \(J_j\), \(j=1,2,3\), as above and \(\epsilon \) near the origin of \({\mathbb {R}}^3\). The derivatives of W with respect to \(\epsilon \) are denoted by \(Y_{(k)}\), \(Y_{(kl)}\) and \(Y_{(123)}\) as in (27), and we write also

$$\begin{aligned} \rho _{(k)} = \frac{\partial J_0}{\partial \epsilon _{(k)}} \bigg |_{\epsilon = 0}, \quad \rho _{(kl)} = \frac{\partial ^2 J_0}{\partial \epsilon _{(k)}\epsilon _{(l)}} \bigg |_{\epsilon = 0}, \quad \rho _{(123)} = \frac{\partial ^3 J_0}{\partial \epsilon _{(1)}\epsilon _{(2)}\epsilon _{(3)}} \bigg |_{\epsilon = 0}. \end{aligned}$$

For notational convenience, we translate the origin in (25) so that the initial conditions are given at \(t=0\) rather than at \(t=-1\).

Recall that the second equation in (25) is equivalent with (24). Differentiating (24) with respect to \(\epsilon _{(k)}\) for \(k=1, 2, 3\) gives

$$\begin{aligned} \partial _t \rho _{(k)} + [A_0, \rho _{(k)}] = \partial ^j J_{(k),j} + [A^j , J_{(k),j}]. \end{aligned}$$


$$\begin{aligned} \xi = (\tau , \xi ') = (\xi _0, \xi _1, \xi _2, \xi _3) \in T_x^* {\mathbb {R}}^{1+3}, \quad x = (t,x') = (x^0,x^1,x^2,x^3)\in {\mathbb {R}}^{1+3}, \end{aligned}$$

the operator \(\partial _t\) is elliptic away from its characteristic set \(\{\tau = 0\} \subset T^* {\mathbb {R}}^{1+3}\). The wave front set of the right-hand side of (50) is contained in a small neighbouhood of \(\{(x_{(k)}, \lambda \xi _{(k)}) : \lambda \ne 0\}\), and therefore it is disjoint from \(\{\tau = 0\}\). It follows that \(\rho _{(k)} \in I(N^*\{x_{(k)}\}; {\mathfrak {g}})\) since the right-hand side of (50) is in this class. Recalling the form of \(\xi _{(k)}\), \(k=1,2,3\), see (44) and (45), symbol evaluation gives

$$\begin{aligned} \sigma [\rho _{(1)}](x_{(1)}, -\xi _{(1)}) = 0, \quad \sigma [\rho _{(k)}](x_{(k)}, \xi _{(k)}) = (-1)^k s \sigma [J_{(k),2}](x_{(k)}, \xi _{(k)}), \quad k=2,3. \end{aligned}$$

Hence \(Y_{(k)}\) solves (32) with \(J_{(k)}\) satisfying

$$\begin{aligned} J_{(k)} \in I(N^*\{x_{(k)}\}; T^* {\mathbb {R}}^{1+3} \otimes {\mathfrak {g}}), \quad \sigma [J_{(k)}](x_{(k)}, \pm \xi _{(k)}) = {{\tilde{\alpha }}}_{(k)} b_{(k)} \omega _{(k)}, \end{aligned}$$

where the sign is that of \(\kappa _{(k)}\), \({{\tilde{\alpha }}}_{(k)} = \sigma [\chi _{(k)}](x_{(k)}, \pm \xi _{(k)}) \ne 0\), \(b_{(k)}\) is as in (47), and

$$\begin{aligned} \omega _{(1)} = dx^2, \quad \omega _{(k)} = (-1)^k s dx^0 + dx^2, \quad k=2,3. \end{aligned}$$

It follows that away from \(x_{(k)}\),

$$\begin{aligned} Y_{(k)} \in I(N^* K_{(k)}; T^* {\mathbb {R}}^{1+3} \otimes {\mathfrak {g}}), \end{aligned}$$

where \(N^*K_{(k)}\) is the bicharacteristic flowout emanating from \((x_{(k)}, \xi _{(k)})\). In other words, writing \(x_{(k)} = (t_{(k)}, x_{(k)}')\),

$$\begin{aligned} K_{(k)} = \left\{ (t_{(k)} + s, x_{(k)}' + s \theta ) \in {\mathbb {R}}^{1+3} : |\theta | = 1, s > 0 \right\} . \end{aligned}$$

Moreover, \({{\,\mathrm{singsupp}\,}}(Y_{(k)}) \subset \Gamma _{(k)}\).

The second derivative of (24) in \(\epsilon \) for distinct \(k, l =1, 2, 3\) reads

$$\begin{aligned} \partial _t \rho _{(kl)} + [A_0, \rho _{(kl)}] = - [Y_{(k),0}, \rho _{(l)}] - [Y_{(l),0}, \rho _{(k)}] + [Y^j_{(l)}, J_{(k),j}] + [Y^j_{(k)}, J_{(l),j}]. \end{aligned}$$

As \({{\,\mathrm{supp}\,}}(J_{(k),j}) \subset \mho _{(k)}\) by (\(\chi 2\)), it follows from (50) and \(J_0 = 0\) for \(t \le 0\) that \({{\,\mathrm{supp}\,}}(\rho _{(k)}) \subset {{\hat{\mho }}}_{(k)}\). We see that \(Y_{(k)}\) is smooth in the support of \(\rho _{(l)}\) for distinct k and l, since \({{\hat{\mho }}}_{(k)} \cap \Gamma _{(l)} = \emptyset \) by (\(\chi \)3). Moreover, \(Y_{(k)}\) solves (32) with vanishing initial conditions and with the source satisfying \({{\,\mathrm{supp}\,}}(J_{(k)}) \subset {{\hat{\mho }}}_{(k)} \subset {\mathcal {J}}^+(\mho _{(k)})\), whence \({{\,\mathrm{supp}\,}}(Y_{(k)}) \subset {\mathcal {J}}^+(\mho _{(k)})\) due to finite speed of propagation (as discussed in the proof of Lemma 5 finite speed of propagation follows from Lemma 14 in Appendix Appendix B). As \({{\,\mathrm{singsupp}\,}}(\rho _{(l)}) = \{x_{(l)}\}\), it follows from (\(\chi \)2) that \(\rho _{(l)}\) is smooth in the support of \(Y_{(k)}\) for distinct k and l. Analogously, \(Y_{(k)}\) is smooth in \({{\,\mathrm{supp}\,}}(J_{(l)})\) and \(J_{(l)}\) is smooth in \({{\,\mathrm{supp}\,}}(Y_{(k)})\) for \(k \ne l\). Therefore the right-hand side of (52) is smooth, and so is \(\rho _{(kl)}\). This again implies that \(Y_{(kl)}\) satisfies (33) modulo smooth terms.

The third derivative of (24) in \(\epsilon \) can be written as

$$\begin{aligned} \partial _t \rho _{(123)} + [A_0, \rho _{(123)}]&= \frac{1}{2} \sum _{\pi \in S_3} \bigg (-[ Y_{(\pi (1)\pi (2)), 0}, \rho _{(\pi (3))}] - [ Y_{(\pi (1)), 0}, \rho _{(\pi (2)\pi (3))}] \\&\qquad + [ Y^j_{(\pi (1)\pi (2))}, J_{(\pi (3)),j}] \bigg ). \end{aligned}$$

It follows from [20, Th. 8.2.10] that, for distinct k and l, any \((x,\xi ) \in {{\,\mathrm{WF}\,}}(Y_{(k)}Y_{(l)})\) with lightlike \(\xi \) satisfies \((x,\xi ) \in {{\,\mathrm{WF}\,}}(Y_{(j)})\) for \(j=k\) or \(j=l\). Then (33) implies that

$$\begin{aligned} {{\,\mathrm{singsupp}\,}}(Y_{(kl)}) \subset {{\,\mathrm{singsupp}\,}}(Y_{(k)}) \cup {{\,\mathrm{singsupp}\,}}(Y_{(l)}). \end{aligned}$$

Similarly with the above, we see also that \({{\,\mathrm{supp}\,}}(Y_{(kl)}) \subset {\mathcal {J}}^+(\mho _{(k)}) \cup {\mathcal {J}}^+(\mho _{(l)})\) and \({{\,\mathrm{supp}\,}}(\rho _{(kl)}) \subset {{\hat{\mho }}}_{(k)} \cup {{\hat{\mho }}}_{(l)}\) for \(k \ne l\). As above, this implies that \(\rho _{(123)}\) is smooth, and that \(Y_{(123)}\) satisfies (34) modulo smooth terms.

Principal symbols of interacting waves

The linearized equation (33) has source \({\tilde{N}}(2)\) that consists of products of solutions \(Y_{(k)}\), \(k=1,2,3\), to the linear wave equation (32). These products can be viewed as the interactions of waves \(Y_{(k)}\) and \(Y_{(l)}\). Then the solution \(Y_{(kl)}\) to (33) describes the linear waves emanating from the source of such interacting waves \(Y_{(k)}\) and \(Y_{(l)}\). Analogously the solution \(Y_{(123)}\) to (34) describes waves emanating from interaction of \(Y_{(1)}\), \(Y_{(2)}\) and \(Y_{(3)}\).

As \(\xi _{(k)}\), \(k=1,2,3\), are linearly independent, the submanifolds \(K_{(k)}\), \(k=1,2,3\), intersect transversally at y, and we may compute the principal symbols \(\sigma [Y_{(123)}](y,\eta )\) using the product formula (40). This requires using the direct sum decomposition

$$\begin{aligned} \eta = \eta _{(1)} + \eta _{(2)} + \eta _{(3)} \in N^*_y K_{(1)} \oplus N^*_y K_{(2)} \oplus N^*_y K_{(3)}, \end{aligned}$$

where \(\eta _{(k)} = \kappa _{(k)} \xi _{(k)}\) and the scalars \(\kappa _{(k)}\) are given by (46). We will omit below the details related to the choices of the microlocal cutoff when applying (40). The same choices as in [7] can be used, see (54) there and its proof.

By (43) the incoming principal symbols satisfy

$$\begin{aligned} \sigma [Y_{(k)}](y, \eta _{(k)}) = \alpha _{(k)} |\kappa _{(k)}|^{q-1} {\mathbf {P}}_{y \leftarrow x_{(k)}}^{A,{{\,\mathrm{Ad}\,}}} b_{(k)} \omega _{(k)}, \end{aligned}$$

where the scalar factors \(\alpha _{(k)}\) converge in \({\mathbb {C}}\setminus 0\) as \(s \rightarrow 0\). The factors \(\alpha _{(k)}\) are independent from A, and their precise form is not important for our purposes. We refer to [7] for more detail on how to compute these factors. Let us point out, however, that typically \(\alpha _{(k)} \ne {{\tilde{\alpha }}}_{(k)}\), with \({{\tilde{\alpha }}}_{(k)}\) as in (51), due to a contribution from \({\mathscr {R}}\) and \(\sigma [\Box _A]^{-1}\) in (39).

We use the shorthand notations

$$\begin{aligned} {{\hat{Y}}}_{(j)}&= (\alpha _{(j)})^{-1} |\kappa _{(j)}|^{1-q} \sigma [Y_{(j)}](y, \eta _{(j)}), \nonumber \\ {{\hat{Y}}}_{(kl)}&= -\imath (\alpha _{(kl)})^{-1} |\kappa _{(k)}\kappa _{(l)}|^{1-q} \sigma [Y_{(kl)}](y, \eta _{(kl)}), \nonumber \\ {{\hat{Y}}}_{(123)}&= -\alpha ^{-1} |\kappa _{(1)}\kappa _{(2)}\kappa _{(2)}|^{1-q} \sigma [Y_{(123)}](y, \eta ), \end{aligned}$$

where \(\eta _{(kl)} = \eta _{(k)} + \eta _{(l)}\), \(\alpha _{(kl)} = \alpha _{(k)}\alpha _{(l)}\), and \(\alpha = \iota \alpha _{(1)}\alpha _{(2)}\alpha _{(3)}\). The constant \(\iota \in {\mathbb {C}} \setminus 0\) comes from (39) and is independent from A. Then

$$\begin{aligned} {\hat{Y}}_{(k l),\beta }= & {} p^{-1}(y, \eta _{(k l)}) \left( 2 \eta _{(l), \alpha } [{\hat{Y}}_{(k)}^\alpha , {\hat{Y}}_{(l), \beta } ] - \eta _{(l), \beta } [{\hat{Y}}_{(k)}^\alpha , ({\hat{Y}}_{(l), \alpha }) ] \right. \\&\left. +\, 2 \eta _{(k), \alpha } [{\hat{Y}}_{(l)}^\alpha , {\hat{Y}}_{(k), \beta } ] - \eta _{(k), \beta } [{\hat{Y}}_{(l)}^\alpha , {\hat{Y}}_{(k), \alpha } ] \right) , \end{aligned}$$

where \(p(y, \xi ) = - \xi _0^2 + \xi _1^2 + \xi _2^2 + \xi _3^2\). Writing

$$\begin{aligned} {{\hat{Y}}}_{(kl),\beta } = c_{(kl),\beta } p^{-1}(y, \eta _{(k l)}) [{{\tilde{b}}}_{(k)}, {{\tilde{b}}}_{(l)}], \quad {{\tilde{b}}}_{(j)} = {\mathbf {P}}_{y \leftarrow x_{(j)}}^{A,{{\,\mathrm{Ad}\,}}} b_{(j)}, \end{aligned}$$

we have

$$\begin{aligned} c_{(12),0}&=\kappa _{(1)}+2 \kappa _{(2)} s^2-\kappa _{(2)},\quad c_{(12),1}=\kappa _{(1)}-a(s) \kappa _{(2)},\quad c_{(12),2}=2 \kappa _{(1)} s+\kappa _{(2)} s,\\ c_{(13),0}&=\kappa _{(1)}+2 \kappa _{(3)} s^2-\kappa _{(3)},\quad c_{(13),1}=\kappa _{(1)}-a(s) \kappa _{(3)},\quad c_{(13),2}=-2 \kappa _{(1)} s-\kappa _{(3)} s, \end{aligned}$$


$$\begin{aligned} c_{(23),0}&=-3 \kappa _{(2)} s^2+\kappa _{(2)}+3 \kappa _{(3)} s^2-\kappa _{(3)},\\ c_{(23),1}&=a (s) \kappa _{(2)} s^2+a (s) \kappa _{(2)}-a (s) \kappa _{(3)} s^2-a (s) \kappa _{(3)},\\ c_{(23),2}&=\kappa _{(2)} s^3-3 \kappa _{(2)} s+\kappa _{(3)} s^3-3 \kappa _{(3)} s. \end{aligned}$$


$$\begin{aligned} p(y, \eta _{(2 3)}) = 2(a(r) + a(s)) (\kappa _{(1)}-1), \quad p(y, \eta _{(1 k)}) = 2(a(r) + a(s)) \kappa _{(k)}, \quad k = 2,3. \end{aligned}$$

For our purposes, it is enough to compute the leading order terms with respect to s, in the limit \(s \rightarrow 0\), of the first two 1-form components of \({\hat{Y}}_{(1 2 3)}\). The cubic terms

$$\begin{aligned}{}[{\hat{Y}}_{(\pi (1))}^\alpha , [{\hat{Y}}_{(\pi (2)), \alpha }, {\hat{Y}}_{(\pi (3)), \beta }] ], \quad \beta =0,1, \end{aligned}$$

are of order s. Indeed, if \(\beta =1\) then the last factor vanishes, and if \(\beta =0\) then the last factor is of order s. Hence for \(\beta =0,1\),

$$\begin{aligned} {\hat{Y}}_{(1 2 3),\beta }= & {} \frac{1}{2} \sum _{\pi \in S_3} \left( 2 \eta _{(\pi (3)), \alpha } [{\hat{Y}}_{(\pi (1)\pi (2))}^\alpha , {\hat{Y}}_{(\pi (3)), \beta }] - \eta _{(\pi (3)), \beta } [{\hat{Y}}_{(\pi (1)\pi (2))}^\alpha , {\hat{Y}}_{(\pi (3)), \alpha } ] \right. \\&\left. +\, 2 \eta _{(\pi (2)\pi (3)), \alpha } [{\hat{Y}}_{(\pi (1))}^\alpha , {\hat{Y}}_{(\pi (2)\pi (3)), \beta }] - \eta _{(\pi (2)\pi (3)), \beta } [{\hat{Y}}_{(\pi (1))}^\alpha , {\hat{Y}}_{(\pi (2)\pi (3)), \alpha } ] \right) \\&+{\mathcal {O}}(s). \end{aligned}$$

It is in principle straightforward to express \({\hat{Y}}_{(1 2 3),\beta }\) in terms of \({{\tilde{b}}}_{(j)}\), analogously to (54). We do not reproduce here the details of this long computation, however, we have verified the below expression (55) using a computer algebra system, and our code is available online [8]. There holds

$$\begin{aligned}&{\hat{Y}}_{(1 2 3),0} = {\hat{Y}}_{(1 2 3),1} = -6 s^{-1} [{{\tilde{b}}}_{(1)}, [{{\tilde{b}}}_{(2)}, {{\tilde{b}}}_{(3)}]] \nonumber \\&\quad +\left( 6 s^{-1} + \frac{3 r}{1+a(r)}\right) [{{\tilde{b}}}_{(2)}, [{{\tilde{b}}}_{(1)}, {{\tilde{b}}}_{(3)}]] \nonumber \\&\quad +\left( -6 s^{-1} + \frac{3 r}{1+a(r)}\right) [{{\tilde{b}}}_{(3)}, [{{\tilde{b}}}_{(1)}, {{\tilde{b}}}_{(2)}]] + {\mathcal {O}}(s). \end{aligned}$$

The terms of order \(s^{-1}\) cancel out due to the Jacobi identity. Hence

$$\begin{aligned} \lim _{s \rightarrow 0} {\hat{Y}}_{(1 2 3),\beta } = \frac{3 r}{1+a(r)} \lim _{s \rightarrow 0}\left( [{{\tilde{b}}}_{(2)}, [{{\tilde{b}}}_{(1)}, {{\tilde{b}}}_{(3)}]] + [{{\tilde{b}}}_{(3)}, [{{\tilde{b}}}_{(1)}, {{\tilde{b}}}_{(2)}]] \right) , \quad \beta = 0,1. \end{aligned}$$

Taking \(b_{(3)} = b_{(2)}\) yields

$$\begin{aligned} \frac{1+a(r)}{6 r} \lim _{s \rightarrow 0} {\hat{Y}}_{(1 2 3),\beta } = \lim _{s \rightarrow 0} [{{\tilde{b}}}_{(2)}, [{{\tilde{b}}}_{(1)}, {{\tilde{b}}}_{(2)}]] = {\mathbf {P}}_{y \leftarrow x_{(1)}}^{A,{{\,\mathrm{Ad}\,}}} [b_{(2)}, [b_{(1)}, b_{(2)}]], \quad \beta = 0,1, \end{aligned}$$

where we used the following simple consequence of the Jacobi identity

$$\begin{aligned}{}[{\mathbf {P}}_{y \leftarrow x}^{A, {{\,\mathrm{Ad}\,}}} b_{(1)}, {\mathbf {P}}_{y \leftarrow x}^{A, {{\,\mathrm{Ad}\,}}} b_{(2)}] = {\mathbf {P}}_{y \leftarrow x}^{A, {{\,\mathrm{Ad}\,}}} [b_{(1)}, b_{(2)}], \quad b_{(1)}, b_{(2)} \in {\mathfrak {g}},\ x,y \in {\mathbb {R}}^{1+3}. \end{aligned}$$

Indeed, let \(W_j\), \(j=1,2\), be the solutions of (7) with \(V=V_j\). Then the Jacobi identity implies

$$\begin{aligned} \partial _t [W_1, W_2]&= -[[\left\langle A, {{\dot{\gamma }}} \right\rangle , W_1], W_2] - [W_1, [\left\langle A, {{\dot{\gamma }}} \right\rangle , W_2]] \\&= [W_2, [\left\langle A, {{\dot{\gamma }}} \right\rangle , W_1]] + [W_1, [W_2, \left\langle A, {{\dot{\gamma }}} \right\rangle ]] = -[\left\langle A, {{\dot{\gamma }}} \right\rangle , [W_1, W_2]]. \end{aligned}$$

Thus \([W_1, W_2]\) solves (7) with \(V = [V_1, V_2]\) and (56) follows.

We apply (43) to obtain

$$\begin{aligned} \alpha _{(0)}^{-1} \lim _{s \rightarrow 0} \left( c \sigma [Y_{(1 2 3),\beta }](z,\eta ) \right) = {\mathbf {P}}_{z \leftarrow y}^{A,{{\,\mathrm{Ad}\,}}} {\mathbf {P}}_{y \leftarrow x_{(1)}}^{A,{{\,\mathrm{Ad}\,}}} [b_{(2)}, [b_{(1)}, b_{(2)}]], \quad \beta = 0,1, \end{aligned}$$

where \(c = c(s) = - (1+a(r))(6r\alpha )^{-1} |\kappa _{(1)}\kappa _{(2)}\kappa _{(2)}|^{1-q}\) and \(\alpha _{(0)} \in {\mathbb {C}}\setminus 0\) is independent from A.

Principal symbol in temporal gauge

To finish the proof of Proposition 8, we show that for \(\beta =1,2,3\),

$$\begin{aligned} \sigma [\partial _{\epsilon _{(1)}}\partial _{\epsilon _{(2)}}\partial _{\epsilon _{(3)}}L_\beta (0, J_{2}(0, s), 0)](z,\eta ) = -\frac{\eta _\beta }{\eta _0} \sigma [Y_{(1 2 3),0}](z,\eta ) + \sigma [Y_{(1 2 3),\beta }](z,\eta ). \end{aligned}$$

Indeed, Proposition 8 follows from (57) and (58) with \(\beta = 1\).

Recall that \(L(0, J_{2}(\epsilon , s), 0)\) is defined by \({\mathscr {T}}(V)|_\mho \) where \(V = W + A\) and W is as in (49). To simplify the notation, we write

$$\begin{aligned} V_{(k)} = \frac{\partial V}{\partial \epsilon _{(k)}}\bigg |_{\epsilon = 0}, \quad V_{(kl)} = \frac{\partial ^2 V}{\partial \epsilon _{(k)}\partial \epsilon _{(l)}}\bigg |_{\epsilon = 0}, \quad V_{(123)} = \frac{\partial ^3 V}{\partial \epsilon _{(1)}\partial \epsilon _{(2)}\partial \epsilon _{(3)}}\bigg |_{\epsilon = 0}. \end{aligned}$$

As A is smooth, \(\sigma [V_{(123)}](z,\eta ) = \sigma [Y_{(1 2 3)}](z,\eta )\). It remains to study how the principal symbol \(\sigma [V_{(123)}]\) transforms under passing to the temporal gauge with \({\mathscr {T}}\).

Let \({\mathbf {U}}= {\mathbf {U}}(\epsilon )\) be as in (14) with \(V = V(\epsilon )\), and write

$$\begin{aligned} U_{(k)} = \frac{\partial {\mathbf {U}}}{\partial \epsilon _{(k)}}\bigg |_{\epsilon = 0}, \quad U_{(kl)} = \frac{\partial ^2 {\mathbf {U}}}{\partial \epsilon _{(k)}\partial \epsilon _{(l)}}\bigg |_{\epsilon = 0}, \quad U_{(123)} = \frac{\partial ^3 {\mathbf {U}}}{\partial \epsilon _{(1)}\partial \epsilon _{(2)}\partial \epsilon _{(3)}}\bigg |_{\epsilon = 0}. \end{aligned}$$

Recall that we are using the notation \(A = {{\tilde{A}}}\) where \({{\tilde{A}}}\) is as in Proposition 4. In particular, \(A|_\mho \) is in temporal gauge. This, together with \(V|_{\epsilon =0} = A\), implies that \({\mathbf {U}}|_{\epsilon = 0} = {{\,\mathrm{id}\,}}\) in \(\mho \).

We will consider V and \({\mathbf {U}}\) near the point \(z \in \mho \). Recall that \(Y_{(k)}\) is singular only in \(\Gamma _{(k)}\) and that \(Y_{(kl)}\) is singular only in \(\Gamma _{(k)} \cup \Gamma _{(l)}\). Therefore \(V_{(k)}\) and \(V_{(kl)}\) are smooth near z. Moreover, as \({{\,\mathrm{WF}\,}}(V_{(k)})\) and \({{\,\mathrm{WF}\,}}(V_{(kl)})\) are disjoint from the characteristic set \(\{\tau = 0\}\) of \(\partial _t\), the ordinary differential equation in (14) implies that also \(U_{(k)}\) and \(U_{(kl)}\) are smooth near z.


$$\begin{aligned} T = \frac{\partial ^3 {\mathscr {T}}(V)}{\partial \epsilon _{(1)}\partial \epsilon _{(2)}\partial \epsilon _{(3)}}\bigg |_{\epsilon = 0}, \end{aligned}$$

and differentiating (14) in \(\epsilon _1\), \(\epsilon _2\) and \(\epsilon _3\) at \(\epsilon = 0\) yields that

$$\begin{aligned} T= & {} d U_{(123)} + U^{-1}_{(123)} A + A U_{(123)} + V_{(123)} \\&\quad +\frac{1}{2} \sum _{\pi \in S_3} \bigg ( U^{-1}_{(\pi (1) \pi (2))} V_{(\pi (3))} + U^{-1}_{(\pi (1))} V_{(\pi (2) \pi (3))} + V_{(\pi (1) \pi (2))} U_{(\pi (3))} + V_{(\pi (1))} U_{(\pi (2) \pi (3))} \\&\quad + \,U^{-1}_{(\pi (1) \pi (2))} d U_{(\pi (3))} + U^{-1}_{(\pi (1))} d U_{(\pi (2) \pi (3))} + U^{-1}_{(\pi (1) \pi (2))} A U_{(\pi (3))} + U^{-1}_{(\pi (1))} A U_{(\pi (2) \pi (3))} \bigg ), \end{aligned}$$

where \(U_{(123)}\) solves

$$\begin{aligned} \partial _t U_{(123)} = - V_{(123),0} - \frac{1}{2} \sum _{\pi \in S_3} \bigg (V_{(\pi (1)\pi (2)),0} U_{(\pi (3))} + V_{(\pi (1)),0} U_{(\pi (2)\pi (3))}\bigg ). \end{aligned}$$

In addition, \({\mathbf {U}}^{-1} {\mathbf {U}}= {{\,\mathrm{id}\,}}\) implies

$$\begin{aligned} U^{-1}_{(123)} + \frac{1}{2} \sum _{\pi \in S_3} \bigg (U^{-1}_{(\pi (1)\pi (2))} U_{(\pi (3))} + U^{-1}_{(\pi (1))} U_{(\pi (2)\pi (3))} \bigg ) + U_{(123)} = 0. \end{aligned}$$

Therefore, modulo smooth terms, near z there holds

$$\begin{aligned} T = dU_{(123)} - U_{(123)} A + A U_{(123)} + V_{(123)}, \quad \partial _t U_{(123)} = -V_{(123),0}. \end{aligned}$$

Near z it holds that \(V_{(123)}\) is a conormal distribution associated to the future flowout of \(N^* (K_{(1)} \cap K_{(2)} \cap K_{(3)}) \cap \Sigma \), cf. (36). We refer to Appendix C of [7] for a precise description of this flowout. As the flowout is contained in the characteristic set \(\Sigma \) of \(\Box _A\), it is disjoint from the characteristic set \(\{\tau = 0\}\) of \(\partial _t\). The second equation in (59) implies that \(U_{(123)}\) is a conormal distribution associated to the same flowout near z.

We write \({{\hat{X}}} = \sigma [X](z,\eta )\) where \(X = T, V_{(123)}, U_{(123)}\). Then taking principal symbols in (59) gives for \(\beta = 0,1,2,3\),

$$\begin{aligned} {\hat{T}}_\beta = i \eta _\beta {\hat{U}}_{(123)} + {\hat{V}}_{(123),\beta }, \quad i \eta _0 {\hat{U}}_{(123)} = -{\hat{V}}_{(123),0}. \end{aligned}$$

Solving for \({\hat{U}}_{(123)}\) in the second equation and substituting in the first one yields (58). This finishes the proof of Proposition 8, and hence also Proposition 1 is proven.

Lie Algebras with Trivial Centre

The material that follows is quite classical and can be found in many texbooks on Lie algebras. We start by defining notations and recalling basic results following mainly the exposition from [16, Chapter 7].

Let \({\mathfrak {g}}\) be the Lie algebra of a compact connected Lie group of matrices G and let \({{\mathfrak {g}}}_{{\mathbb {C}}}\) be its complexification. An element \(Z\in {{\mathfrak {g}}}_{{\mathbb {C}}}\) can be uniquely written as \(Z=X+iY\) for \(X,Y\in {\mathfrak {g}}\), and we define \(Z^*=-X+iY\). Note that \(Z^*\) is the usual conjugate transpose of Z in the case \({\mathfrak {g}}= {\mathfrak {u}}(n)\). There is an inner product on \({{\mathfrak {g}}}_{{\mathbb {C}}}\) that is real-valued on \({\mathfrak {g}}\) and that satisfies, see [16, Proposition 7.4],

$$\begin{aligned} \langle \text {ad}_{Z}(X),Y\rangle =\langle X, \text {ad}_{Z^*}(Y)\rangle , \quad X,Y,Z \in {{\mathfrak {g}}}_{{\mathbb {C}}}. \end{aligned}$$

If \({\mathfrak {t}}\) is a maximal commutative subalgebra of \({\mathfrak {g}}\), then

$$\begin{aligned} {\mathfrak {h}}={\mathfrak {t}}+i{\mathfrak {t}}\end{aligned}$$

is a Cartan subalgebra of \({{\mathfrak {g}}}_{{\mathbb {C}}}\) and its dimension is called the rank of \({{\mathfrak {g}}}_{{\mathbb {C}}}\). The roots of \({{\mathfrak {g}}}_{{\mathbb {C}}}\) relative to \({\mathfrak {h}}\) are those elements \(\alpha \in {\mathfrak {h}}\) such that there is \(0\ne X\in {{\mathfrak {g}}}_{{\mathbb {C}}}\) so that

$$\begin{aligned}{}[H,X]=\langle \alpha ,H\rangle X, \quad \text {for all }H \in {\mathfrak {h}}, \end{aligned}$$

where we use the convention that the inner product is linear in the second variable (and anti-linear in the first one). We let \(\Delta \) be the collection of roots. By [16, Proposition 7.15] each root \(\alpha \) belongs to \(i{\mathfrak {t}}\) and that we can decompose \({{\mathfrak {g}}}_{{\mathbb {C}}}\) as a direct sum

$$\begin{aligned} {{\mathfrak {g}}}_{{\mathbb {C}}}={\mathfrak {h}}\oplus \bigoplus _{\alpha \in \Delta }{\mathfrak {g}}_{\alpha } \end{aligned}$$

where \({\mathfrak {g}}_{\alpha }\) contains the eigenvectors associated to \(\alpha \), that is, the vectors X satisfying (60). Moreover, see [16, Proposition 7.18, Theorems 7.19 and 7.23],

  1. (1)

    each \({\mathfrak {g}}_{\alpha }\) is 1-dimensional;

  2. (2)

    if \(X\in {\mathfrak {g}}_{\alpha }\) with \(\alpha \in \Delta \), then \(X^*\in {\mathfrak {g}}_{-\alpha }\);

  3. (3)

    if \({{\mathfrak {g}}}_{{\mathbb {C}}}\) has trivial center, the roots span \({\mathfrak {h}}\).

We can in fact pick linearly independent elements \(X_{\alpha }\in {\mathfrak {g}}_{\alpha }\), \(Y_{\alpha }=X^*_{\alpha }\in {\mathfrak {g}}_{-\alpha }\) and \(H_{\alpha }\in {\mathfrak {h}}\) such that \(H_{\alpha }\) is a multiple of \(\alpha \) and such that \([X_{\alpha },Y_{\alpha }]=H_{\alpha }\), \([H_{\alpha },X_{\alpha }]=2X_{\alpha }\) and \([H_{\alpha },Y_{\alpha }]=-2Y_{\alpha }\). This generates an \(\mathfrak {sl}(2,{\mathbb {C}})\)-subalgebra inside \({{\mathfrak {g}}}_{{\mathbb {C}}}\) and implies that the elements

$$\begin{aligned} E^{\alpha }_{1}:=\frac{i}{2}H_{\alpha };\,\,\;E^{\alpha }_{2}=\frac{i}{2}(X_{\alpha }+Y_{\alpha });\,\,\;E^{\alpha }_{3}=\frac{i}{2}(Y_{\alpha }-X_{\alpha }) \end{aligned}$$

belong to \({\mathfrak {g}}\) and span a Lie subalgebra isomorphic to \({\mathfrak {su}}(2)\), see [16, Corollary 7.20]. Note that the set \(\{E_{\alpha }^{1}, E^{2}_{\alpha }, E^{3}_{\alpha }\}_{\alpha \in \Delta }\) spans \({\mathfrak {g}}\) over the reals if \({\mathfrak {g}}\) has trivial centre. The commutation relations of Pauli matrices imply that \({\mathfrak {su}}(2)\) is spanned by the nested commutators [X, [XY]] with \(X, Y \in {{\,\mathrm{{\mathfrak {su}}}\,}}(2)\). Hence the discussion above immediately implies:

Proposition 9

Let \({\mathfrak {g}}\) be the Lie algebra of a compact connected Lie group of matrices. Assume that \({\mathfrak {g}}\) has trivial centre. Then \({\mathfrak {g}}\) is the linear span of [X, [XY]] for \(X,Y\in {\mathfrak {g}}\).

The Case of General Lie Group

Suppose now G is any compact connected Lie group. In what follows it is convenient to express some previous notions in slightly more abstract form. Let \(\omega \in \Omega ^{1}(G,{\mathfrak {g}})\) be the (left) Maurer-Cartan 1-form of G. Given \({\mathbf {U}}\in G^{0}({\mathbb {D}},p)\) we express the gauge equivalence between \(A,B\in \Omega ^{1}(M,{\mathfrak {g}})\) as

$$\begin{aligned} {\mathbf {U}}^*\omega +\text {Ad}_{{\mathbf {U}}^{-1}}(A)=B, \end{aligned}$$

where \(\text {Ad}:G\rightarrow GL({\mathfrak {g}})\) is the usual Adjoint representation. For matrix Lie groups \(\omega =g^{-1}dg\) and \(\text {Ad}_{g}(a)=gag^{-1}\) for \(a\in {\mathfrak {g}}\) and we recover the expression (2) for the gauge equivalence between A and B that we have used so far.

Suppose now that \(p:{\widetilde{G}}\rightarrow G\) is a covering of G, then p is a Lie group homomorphism and \(p^*\omega _{G}=\omega _{{\widetilde{G}}}\). Given \({\mathbf {U}}\in G^{0}({\mathbb {D}},p)\), there is a unique \({\widetilde{{\mathbf {U}}}}\in {\widetilde{G}}^{0}({\mathbb {D}},p)\) such that \(p\circ {\widetilde{{\mathbf {U}}}}={\mathbf {U}}\). This is because the domain of \({\mathbf {U}}\) is simply connected and we are fixing the value of \({\mathbf {U}}\) at p to be the identity. We deduce that (61) holds if and only if the following equation holds

$$\begin{aligned} {\widetilde{{\mathbf {U}}}}^*\omega _{{\widetilde{G}}}+\text {Ad}_{{\widetilde{{\mathbf {U}}}}^{-1}}(A)=B. \end{aligned}$$

In other words, A and B are gauge equivalent via a gauge in \(G^{0}({\mathbb {D}},p)\) if and only if they are gauge equivalent via a gauge in \({\widetilde{G}}^{0}({\mathbb {D}},p)\). The same observation applies for gauges defined near \(\partial ^{-}{\mathbb {D}}\). One very useful consequence is that the data seta \({\mathcal {D}}_{A}\) does not really depend on the group G as long as it has Lie algebra \({\mathfrak {g}}\).

We are going to use this set up as follows. Every compact connected Lie group G admits a finite cover of the form \({\mathbb {T}}^r\times G_{1}\), where \({\mathbb {T}}^r\) is an r-torus and \(G_{1}\) is a compact Lie group with finite centre [4, Theorem 8.1, p. 233]. At the level of the Lie algebra this corresponds to an orthogonal splitting \({\mathfrak {g}}={\mathfrak {z}}\oplus {\mathfrak {g}}_{1}\), where \({\mathfrak {g}}_{1}\) is the Lie algebra of \(G_{1}\) and it has no centre. Given \(A\in \Omega ^{1}(M,{\mathfrak {g}})\) we split uniquely

$$\begin{aligned} A=A_{Z}+A_{1}\in {\mathfrak {z}}\oplus {\mathfrak {g}}_{1}. \end{aligned}$$

Now we claim:

Lemma 7

Let \(A,B\in \Omega ^{1}(M,{\mathfrak {g}})\). Then \({\mathcal {D}}_{A}={\mathcal {D}}_{B}\) iff \({\mathcal {D}}_{A_{Z}}={\mathcal {D}}_{B_{Z}}\) and \({\mathcal {D}}_{A_{1}}={\mathcal {D}}_{B_{1}}\).


Using that elements in the centre \({\mathfrak {z}}\) commute with everything, a quick calculation shows that given \(V\in C^{3}({\mathbb {D}};T^*{\mathbb {D}}\otimes {\mathfrak {g}})\) with \(V=V_{Z}+V_{1}\) we can write the curvature of V as

$$\begin{aligned} F_{V}=F_{V_{1}}+dV_{Z} \end{aligned}$$

since \(d_{V}=d_{V_{1}}\). Hence

$$\begin{aligned} d^{*}_{V}F_{V}=d_{V_{1}}^{*}(F_{V_{1}}+dV_{Z})=d^{*}_{V_{1}}F_{V_{1}}+d^{*}_{V_{1}}dV_{Z}. \end{aligned}$$

Again using commutativity, \(d_{V_{1}}^*dV_{Z}=d^*dV_{Z}\) since \(dV_{Z}\) is also in the centre. Hence

$$\begin{aligned} d_{V}^*F_{V}=d^*dV_{Z}+d^{*}_{V_{1}}F_{V_{1}}\in {\mathfrak {z}}\oplus {\mathfrak {g}}_{1}. \end{aligned}$$

This implies that \(d_{V}^*F_{V}=0\) in \({\mathcal {D}}\setminus \mho \) iff \(d^*dV_{Z}=d^{*}_{V_{1}}F_{V_{1}}=0\) in \({\mathcal {D}}\setminus \mho \) and the lemma follows. \(\quad \square \)

We can deal with the abelian component \(A_{Z}\) directly by unique continuation.

Lemma 8

If \({\mathcal {D}}_{A_{Z}}={\mathcal {D}}_{B_{Z}}\), then there is \(u\in C^{\infty }({\mathbb {D}};{\mathbb {T}}^{r})\) with \(u(p)=\text { id}\) such that

$$\begin{aligned} B_{Z}=A_{Z}+u^{-1}du. \end{aligned}$$


It suffices to prove the claim for \(r=1\), i.e. in the case of the circle \(S^{1}\). To avoid cluttering the notation we drop the subscript \(``Z''\) during the proof. If the group is abelian, the Yang–Mills equations reduces to the Maxwell equation \(d^*F_{A}=0\), where \(F_{A}=dA\). Since \(dF_{A}=0\), the curvature satisfies \(\Box F_{A}=0\), where \(\Box =d^*d+dd^*\). The gauges \(u\in C^{\infty }({\mathbb {D}};S^{1})\) all have the form \(u=e^{i\phi }\) for \(\phi \) a real-valued function since \({\mathbb {D}}\) is simply connected.

Since \(A\in {\mathcal {D}}_{A}={\mathcal {D}}_{B}\), there is V with \(d^*F_{V}=0\) in \({\mathbb {D}}\setminus \mho \), \(V\sim B\) near \(\partial ^- {\mathbb {D}}\) and \(A|_{\mho }=V|_{\mho }\). Thus \(d^*F_{V}=0\) in \({\mathbb {D}}\). It follows that \(\Box (F_{A}-F_{V})=0\) in \({\mathbb {D}}\) and \(F_{A}=F_{V}\) in \(\mho \) and by Holmgren’s unique continuation principle, \(F_{A}=F_{V}\) in \({\mathbb {D}}\), i.e. \(d(A-V)=0\). Since \({\mathbb {D}}\) is simply connected, A and V are gauge equivalent in \({\mathbb {D}}\). But since \(V\sim B\) near \(\partial ^- {\mathbb {D}}\), it follows that A and B are gauge equivalent near \(\partial ^- {\mathbb {D}}\). Proposition 10 implies now that A and B are gauge equivalent in the whole \({\mathbb {D}}\). \(\quad \square \)

We are now ready to prove our main result.

Proof of Theorem 1

We consider the finite cover \({\mathbb {T}}^r\times G_{1}\) of G as above. By Lemma 7 we know that \({\mathcal {D}}_{A_{Z}}={\mathcal {D}}_{B_{Z}}\) and \({\mathcal {D}}_{A_{1}}={\mathcal {D}}_{B_{1}}\). Let u be the gauge from Lemma 8. We have already proven Theorem 1 in the case that \(G = G_{1}\), since it has finite centre. Thus there is \({\mathbf {U}}\in G^{0}_{1}({\mathbb {D}},p)\) so that \(A_{1}\) and \(B_{1}\) are gauge equivalent via \({\mathbf {U}}\). Finally, \(p\circ (u,{\mathbf {U}})\in G^{0}({\mathbb {D}},p)\) gives a gauge equivalence between A and B as desired. \(\quad \square \)