1 Introduction

Classical integrable field theories in two dimensions are characterised by the existence of a Lax connection which is on-shell flat and depends meromorphically on an auxiliary complex curve, typically the Riemann sphere. Determining whether a given field theory is integrable or not is usually a very difficult problem since there is no systematic way of constructing a suitable Lax connection, if one exists.

Over the past couple of years, however, two closely related general frameworks have emerged for explaining the algebraic and geometric origins of Lax connections in \(2\hbox {d}\) integrable field theories. From a Hamiltonian perspective, an origin was proposed in [30], and further developed in [16], based on Gaudin models associated with untwisted affine Kac–Moody algebras and the representation theory of such algebras. From a Lagrangian perspective, an origin was proposed by Costello and Yamazaki [14], following earlier work on integrable spin chains in [10,11,12,13, 32], based on the introduction of surface defects in 4d Chern–Simons theory. A much older connection between Lagrangians for (hierarchies of) integrable field theories in 2d and field theories of Wess–Zumino–Witten type was pioneered in [25].

In the Hamiltonian formulation of integrable field theories, there is an important dichotomy between so-called ‘ultralocal’ and ‘non-ultralocal’ theories. This distinction is based on the classical r-matrix formalism [28, 29], specifically on whether or not the r-matrix of the given integrable field theory is skew-symmetric [23, 24, 26].

The affine Gaudin model perspective on integrable field theories was specifically developed in [30] to address the problem of quantisation of non-ultralocal theories. Note that a related approach was used in [35] to treat ultralocal field theories as Gaudin models associated with affine Kac–Moody algebras. On the other hand, it was demonstrated on examples in [14] that both ultralocal and non-ultralocal field theories can be described from the perspective of \(4\hbox {d}\) Chern–Simons theory. In the non-ultralocal case, further examples were shown in [17] to fit within this framework, and more recently a very general family of new non-ultralocal integrable field theories was also constructed using this approach in [21] following [3]. By performing a Hamiltonian analysis of \(4\hbox {d}\) Chern–Simons theory, it was shown in [31] that in the case of non-ultralocal field theories this frameworks is, in fact, intimately related to that of affine Gaudin models. By contrast, a Hamiltonian analysis of the class of ultralocal theories from the perspective of \(4\hbox {d}\) Chern–Simons theory has so far not been considered. The main purpose of this paper is to initiate such a study.

In fact, very recently, an independent line of research emerged in [7] where the classical r-matrix structure was derived in the context of covariant Hamiltonian field theory. In that setting, a covariant Poisson bracket replaces the standard Poisson bracket and the r-matrix determines the Poisson algebra satisfied by the whole Lax connection (a 1-form) and not just by its spatial component, called the Lax matrix.

Such results have been established successfully for ultralocal theories with rational r-matrix (nonlinear Schrödinger and modified Korteweg–de Vries) and trigonometric r-matrix (sine-Gordon). However, the generalisation to non-ultralocal theories has resisted all attempts so far. In particular, the famous example of the principal chiral model, which is intrinsically non-ultralocal, does not seem to be easily amenable to this covariant formalism. Nevertheless, a certain reduction of the principal chiral model dynamics can be reproduced by an ultralocal integrable field theory [19], for which an action was obtained in [2]. This model can be seen as a special case of a large class of models with Lax pairs of Zakharov–Shabat type which derive from an action first introduced by Zakharov and Mikhailov [33]. Our observation is that this general class of models admits an ultralocal r-matrix structure of rational type and is therefore suited for a covariant Hamiltonian treatment.

The main goal of the present work is to begin exploring the covariant Hamiltonian structure of certain ultralocal integrable field theories which can be obtained from the \(4\hbox {d}\) Chern–Simons perspective, using the Zakharov–Mikhailov class of models as our guiding example. The covariant approach to integrable field theories initiated in [7] is in contrast with the long tradition of analysing the standard Hamiltonian formulation of integrable field theories and may offer new insights when it comes to their (covariant) quantisation. An interesting by-product of our approach is the interpretation of the flatness condition of the Lax connection as a covariant Hamilton equation associated with a covariant Hamiltonian which we derive from the Zakharov–Mikhailov action.

In Sect. 2, we show that the Zakharov–Mikhailov action of [33] can be derived from \(4\hbox {d}\) Chern–Simons theory. Since, in our case, the meromorphic 1-form \(\omega \) appearing in the \(4\hbox {d}\) Chern–Simons action is \(\omega = \mathrm{d}z\), it has a double pole at infinity so we follow a similar approach to [3] by first regularising the action of \(4\hbox {d}\) Chern–Simons theory. We then couple minimally the \(4\hbox {d}\) gauge field A to a collection of Lie group valued fields \(\{ \phi _m \}_{m=1}^{N_1}\) and \(\{ \psi _n \}_{n=1}^{N_2}\) localised along surface defects.

In Sect. 3, we derive the covariant Poisson algebra satisfied by the Lax connection of the Zakharov–Mikhailov class of models. We also present the covariant Hamiltonian of the theory and derive a remarkable formula connecting it to the Lax connection. This formula represents the covariant analogue of the well-known formula relating a traditional Hamiltonian H with the Lax matrix L which we write schematically as “\(H= {{\,\mathrm{Tr}\,}}L^2\)”. We show that the flatness condition of the Lax connection takes the form of a covariant Hamilton equation, thus giving it a new interpretation in this context. The results of this section rely heavily on the variational bicomplex formalism as presented in [18, Chap. 19] and on ideas developed for instance in [20]. For a detailed account geared specifically towards the implementation of these ideas in 2d integrable field theories, we refer the reader to [7].

2 Zakharov–Mikhailov action from 4d Chern–Simons

Using the same notation as in [33], we let \(N_1, N_2 \in {\mathbb {Z}}_{\ge 1}\) and fix subsets \(\{ a_m \}_{m=1}^{N_1}\) and \(\{ b_n \}_{n=1}^{N_2}\) of \({\mathbb {C}}P^1\), which we take to be disjoint as in [33], namely \(a_m \ne b_n\) for all \(m = 1, \ldots , N_1\) and \(n = 1, \ldots , N_2\). We parametrise the plane \(\Sigma :={\mathbb {R}}^2\) with “light-cone” coordinates \(\eta \) and \(\xi \).

We shall work with the general linear group \(GL_N\) with Lie algebra \(\mathfrak {gl}_N\) of \(N \times N\) matrices, following [33], but we expect our results to hold more generally for any semisimple Lie algebra. We denote the trace by \({{\,\mathrm{Tr}\,}}{:}\,\mathfrak {gl}_N \rightarrow {\mathbb {R}}\).

Let \(X :=\Sigma \times {\mathbb {C}}P^1\). We shall use the notation

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\bigg ( \sum _A \mathsf {u}_A \mathrm{d}x^A \wedge \sum _B \mathsf {v}_B \mathrm{d}x^B \bigg ) :=\sum _{A,B} {{\,\mathrm{Tr}\,}}(\mathsf {u}_A \mathsf {v}_B) \mathrm{d}x^A \wedge \mathrm{d}x^B \end{aligned}$$

for \(\mathfrak {gl}_N\)-valued p- and q-forms on X, where \(p, q = 0, \ldots , 4\), \(\mathsf {u}_A, \mathsf {v}_B \in \mathfrak {gl}_N\) and AB are multi-indices with \(|A| = p\) and \(|B| = q\) so that \(\{ \mathrm{d}x^A \}\) and \(\{ \mathrm{d}x^B \}\) denote bases of 1-forms for the space \(\varOmega ^p(X)\) and \(\varOmega ^q(X)\), respectively.

2.1 Regularised 4d Chern–Simons action

Since the 2d integrable field theory we want to describe is ultralocal, we consider the meromorphic 1-form \(\omega = \mathrm{d}z\). The Lagrangian of the corresponding 4d Chern–Simons theory is given by

$$\begin{aligned} L_{\mathrm{CS}} :=\frac{\mathrm{i}}{4 \pi } \mathrm{d}z \wedge \mathrm {CS}(A), \end{aligned}$$
(2.1)

where \(\mathrm {CS}(A) :={{\,\mathrm{Tr}\,}}(A \wedge \mathrm{d}A + \tfrac{2}{3} A \wedge A \wedge A )\) denotes the Chern–Simons 3-form and A is a \(\mathfrak {gl}_N\)-valued 1-form on X which we can decompose as

$$\begin{aligned} A = A_\xi \mathrm{d}\xi + A_\eta \mathrm{d}\eta + A_{{\bar{z}}} \mathrm{d}{\bar{z}}. \end{aligned}$$
(2.2)

Note that there is no need to include a \(\mathrm{d}z\)-component since this would drop out from the Lagrangian (2.1). The components \(A_\xi \), \(A_\eta \) and \(A_{{\bar{z}}}\) are taken to be smooth functions away from the set of marked points \(\{ a_m \}_{m=1}^{N_1}\) and \(\{ b_n \}_{n=1}^{N_2}\), but it will be important for later to allow \(A_\xi \) and \(A_\eta \) to be singular at those points. Specifically, we will assume that these components can be written locally as \(A_\xi = (z - a_m)^{-1} B_{m, \xi }\) near \(a_m\) for \(m = 1, \ldots , N_1\) and as \(A_\eta = (z - b_n)^{-1} B_{n, \eta }\) near \(b_n\) for \(n = 1, \ldots , N_2\), where \(B_{m, \xi }\) and \(B_{n, \eta }\) are smooth functions on X. One easily checks that, despite the presence of these singularities, the Lagrangian (2.1) remains locally integrable near \(\Sigma \times \{ a_m \}\) for \(m = 1, \ldots , N_1\) and near \(\Sigma \times \{ b_n \}\) for \(n = 1, \ldots , N_2\).

However, since the 1-form \(\mathrm{d}z\) has a double pole at \(z = \infty \), the 4-form \(\mathrm{d}z \wedge \mathrm {CS}(A)\) is not locally integrable near \(\Sigma \times \{ \infty \}\). For this reason, we need to regularise the action which we do following [3]. First, note that

$$\begin{aligned} \mathrm{d}\mathrm {CS}(A)&= {{\,\mathrm{Tr}\,}}(\mathrm{d} A \wedge \mathrm{d}A + \tfrac{2}{3} \mathrm{d}A \wedge A \wedge A - \tfrac{2}{3} A \wedge \mathrm{d}A \wedge A + \tfrac{2}{3} A \wedge A \wedge \mathrm{d}A )\\&= {{\,\mathrm{Tr}\,}}(F(A) \wedge F(A) ) \end{aligned}$$

where \(F(A) :=\mathrm{d}A + A \wedge A \in \varOmega ^2(\Sigma \times {\mathbb {C}}P^1, \mathfrak {gl}_N)\) is the curvature of A. Here, we have used the fact that \({{\,\mathrm{Tr}\,}}(A \wedge A \wedge A \wedge A) = 0\) for any 1-form \(A \in \varOmega ^1(X, \mathfrak {gl}_N)\) by the cyclicity of the trace.

We can now rewrite the Lagrangian (2.1) of 4d Chern–Simons theory as

$$\begin{aligned} L_{\mathrm{CS}} = \frac{\mathrm{i}}{4 \pi } d \big ( z \, \mathrm {CS}(A) \big ) - \frac{\mathrm{i}}{4 \pi } z \, {{\,\mathrm{Tr}\,}}(F(A) \wedge F(A) ), \end{aligned}$$

where the first term is exact but has a double pole at infinity while the second term only has a simple pole and is therefore locally integrable near \(\Sigma \times \{ \infty \}\). We therefore define the regularised action of \(4\hbox {d}\) Chern–Simons theory by dropping the exact term above and keeping only the second term, namely we set

$$\begin{aligned} S_{4d}(A) :=- \frac{\mathrm{i}}{4 \pi } \int _X z {{\,\mathrm{Tr}\,}}( F(A) \wedge F(A) ). \end{aligned}$$
(2.3)

Note that we can continue to assume that A has no \(\mathrm{d}z\)-component, namely it can be expressed as in (2.2), since (2.3) is invariant under local transformations

$$\begin{aligned} A \mapsto A + \chi \mathrm{d}z \end{aligned}$$
(2.4)

for any \(\chi \in C^\infty (X, \mathfrak {gl}_N)\). Indeed, under such a transformation the curvature F(A) transforms as \(F(A) \mapsto F(A) + (\mathrm{d}\chi + [A, \chi ]) \wedge \mathrm{d}z\) from which it follows that

$$\begin{aligned} z {{\,\mathrm{Tr}\,}}(F(A) \wedge F(A))&\longmapsto z {{\,\mathrm{Tr}\,}}(F(A) \wedge F(A)) + 2 z {{\,\mathrm{Tr}\,}}\big ( F(A) \wedge (\mathrm{d}\chi + [A, \chi ]) \big ) \wedge \mathrm{d}z\\&\quad = z {{\,\mathrm{Tr}\,}}(F(A) \wedge F(A)) + 2 \, d \big ( z \mathrm{d}z \wedge {{\,\mathrm{Tr}\,}}(F(A) \chi ) \big ) \end{aligned}$$

where in the second line we have used the fact that \(\mathrm{d}F(A) = F(A) \wedge A - A \wedge F(A)\).

The action (2.3) is also invariant under gauge transformations

$$\begin{aligned} A \longmapsto ^g A :=- dg g^{-1} + g A g^{-1} \end{aligned}$$
(2.5)

for any \(g \in C^\infty (X, G)\). Indeed, the transformation of the curvature F(A) under a gauge transformation (2.5) is given by conjugation \(F(^g A) = g F(A) g^{-1}\); hence, the result follows by the invariance of the trace.

2.2 Adding surface defects

We would like to modify the action (2.3) by adding to it terms which couple the \(4\hbox {d}\) bulk field A to a collection of \(2\hbox {d}\) fields localised on the surface defects \(\Sigma \times \{ a_m \}\) and \(\Sigma \times \{ b_n \}\) for \(m=1, \ldots , N_1\) and \(n = 1, \ldots , N_2\). We shall make use of the embedding \(\iota _x{:}\,\Sigma \times \{ x \} \hookrightarrow X\) for \(x \in \{ a_m \}_{m=1}^{N_1} \cup \{ b_n \}_{n=1}^{N_2}\).

Specifically, to each marked point \(a_m\) for \(m = 1, \ldots , N_1\) we associate a Lie group valued field \(\phi _m \in C^\infty (\Sigma , GL_N)\) which we think of as living on the surface defect \(\Sigma \times \{ a_m \}\). Likewise, to each of the marked points \(b_n\) for \(n = 1, \ldots , N_2\) we associate a Lie group valued field \(\psi _n \in C^\infty (\Sigma , GL_N)\), living on the surface defect \(\Sigma \times \{ b_n \}\). Let us also fix constant non-dynamical elements \(U^{(0)}_m, V^{(0)}_n\) in \(\mathfrak {gl}_N\) for \(m = 1, \ldots , N_1\) and \(n = 1, \ldots , N_2\). Note that the \(2\hbox {d}\) fields \(\phi _m\) and \(\psi _n\) are effectively valued in a quotient of \(GL_N\) by the stabilisers of \(U^{(0)}_m\) and \(V^{(0)}_n\), respectively.

Following the discussion of order defects in [14], we now couple the \(4\hbox {d}\) gauge field A to the collection of \(2\hbox {d}\) fields \(\phi _m\) and \(\psi _n\) on the surface defects by replacing the regularised \(4\hbox {d}\) Chern–Simons action (2.3) with

$$\begin{aligned} S\big ( A, \{ \phi _m \}_{m=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big ) :=S_{\mathrm{4d}}(A) + S_{\mathrm{defect}}\big ( A, \{ \phi _m \}_{m=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big ) \end{aligned}$$
(2.6)

where we define

$$\begin{aligned} S_{\mathrm{defect}}\big ( A, \{ \phi _m \}_{m=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big )&:=- \sum _{m=1}^{N_1} \int _{\Sigma \times \{ a_m \}} {{\,\mathrm{Tr}\,}}\big ( \phi _m^{-1} (d_\Sigma + \iota _{a_m}^*A) \phi _m U^{(0)}_m \big ) \wedge \mathrm{d}\xi \nonumber \\&\quad -\,\sum _{n=1}^{N_2} \int _{\Sigma \times \{ b_n \}} {{\,\mathrm{Tr}\,}}\big ( \psi _n^{-1} (d_\Sigma + \iota _{b_n}^*A) \psi _n V^{(0)}_n \big ) \wedge \mathrm{d}\eta . \end{aligned}$$
(2.7)

Here, \(d_\Sigma \) denotes the de Rham differential on \(\Sigma \).

To maintain the gauge invariance of the action under (2.5) after introducing the surface defects, we need to let the \(2\hbox {d}\) fields transform as

$$\begin{aligned} \phi _m \longmapsto g \phi _m, \quad \psi _n \longmapsto g \psi _n. \end{aligned}$$
(2.8)

It is straightforward to check that the extended action (2.6) is then gauge invariant since the expressions \(\phi _m^{-1} (d_\Sigma + \iota _{a_m}^*A) \phi _m\) and \(\psi _n^{-1} (d_\Sigma + \iota _{b_n}^*A) \psi _n\) are themselves gauge invariant.

2.3 Bulk equations of motion

Consider the variation \(A \mapsto A + \epsilon a\) of the action (2.6), for some arbitrary \(a = a_\eta \mathrm{d}\eta + a_\xi \mathrm{d}\xi + a_{{\bar{z}}} \mathrm{d}{\bar{z}} \in \varOmega ^1_c(X, \mathfrak {gl}_N)\) of compact support. This reads

$$\begin{aligned}&\delta _a S\big ( A, \{ \phi _m \}_{m=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big ) :=\frac{\mathrm{d}}{\mathrm{d}\epsilon }\bigg |_{\epsilon = 0} S\big ( A + \epsilon a, \{ \phi _m \}_{m=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big ) \\&\quad = \frac{\mathrm{i}}{2 \pi } \int _X \mathrm{d}z \wedge {{\,\mathrm{Tr}\,}}(a \wedge F(A) ) - \sum _{m=1}^{N_1} \int _{\Sigma \times \{ a_m \}} {{\,\mathrm{Tr}\,}}(a_\eta U_m) \mathrm{d}\eta \wedge \mathrm{d}\xi \\&\qquad \,\, + \sum _{n=1}^{N_2} \int _{\Sigma \times \{ b_n \}} {{\,\mathrm{Tr}\,}}(a_\xi V_n) \mathrm{d}\eta \wedge \mathrm{d}\xi , \end{aligned}$$

where we introduced \(U_m :=\phi _m U^{(0)}_m \phi _m^{-1}\) for all \(m = 1, \ldots , N_1\) and \(V_n :=\psi _n V^{(0)}_n \psi _n^{-1}\) for all \(n = 1, \ldots , N_2\). As we will see below, the 2d action obtained from our 4d action with defects effectively gives equations of motion for \(U_m\) and \(V_n\) which are valued in the Lie algebra \(\mathfrak {gl}_N\). Without any particular model in mind, it is a matter of taste at this stage whether one wants to interpret the fields of the theory as being those Lie algebra elements or the group elements \(\phi _m\) and \(\psi _n\). In the former interpretation, the phase space is thus the (co)adjoint orbit through \(U^{(0)}_m\) and \(V^{(0)}_n\).

In the first term on the right hand side above, we have dropped a boundary term which vanishes since \(a \in \varOmega ^1_c(X, \mathfrak {gl}_N)\) is of compact support. The curvature F(A) is given in components by [recall that (2.4) ensures that we can take A with no \(\mathrm{d}z\)-component]

$$\begin{aligned} F(A)&= \big ( \partial _\eta A_\xi - \partial _\xi A_\eta + [A_\eta , A_\xi ] \big ) \mathrm{d}\eta \wedge \mathrm{d}\xi \\&\quad +\,\big ( \partial _{{\bar{z}}} A_\xi - \partial _\xi A_{{\bar{z}}} + [A_{{\bar{z}}}, A_\xi ] \big ) \mathrm{d}{\bar{z}} \wedge \mathrm{d}\xi \\&\quad +\, \big ( \partial _{{\bar{z}}} A_\eta - \partial _\eta A_{{\bar{z}}} + [A_{{\bar{z}}}, A_\eta ] \big ) \mathrm{d}{\bar{z}} \wedge \mathrm{d}\eta +\mathrm{d}z\wedge \partial _z A. \end{aligned}$$

Note that the last term does not contribute to the equation of motion. The \(A_{{\bar{z}}}\) equation of motion is then given by

$$\begin{aligned} \partial _\eta A_\xi - \partial _\xi A_\eta + [A_\eta , A_\xi ] = 0, \end{aligned}$$
(2.9)

which will become the zero curvature equation for the Lax connection. On the other hand, the \(A_\eta \) and \(A_\xi \) equations of motion, respectively, read

$$\begin{aligned} \partial _{{\bar{z}}} A_\xi - \partial _\xi A_{{\bar{z}}} + [A_{{\bar{z}}}, A_\xi ]&= 2 \pi \mathrm{i}\sum _{m=1}^{N_1} U_m \delta (z - a_m), \end{aligned}$$
(2.10a)
$$\begin{aligned} \partial _{{\bar{z}}} A_\eta - \partial _\eta A_{{\bar{z}}} + [A_{{\bar{z}}}, A_\eta ]&= 2 \pi \mathrm{i}\sum _{n=1}^{N_2} V_n \delta (z - b_n) \end{aligned}$$
(2.10b)

where the \(\delta \)-functions, satisfying the property

$$\begin{aligned} \int _{{\mathbb {C}}P^1} f(\xi , \eta , z) \delta (z- x) \mathrm{d}z \wedge \mathrm{d}{\bar{z}} = f(\xi , \eta , x) \end{aligned}$$
(2.11)

for any \(x \in {\mathbb {C}}\) and any smooth function f on X, come from the fact that the surface defect terms are localised at \(z = a_m\) or \(z = b_n\).

2.4 Lax connection

Given the resemblance of (2.9) with the zero curvature equation satisfied by the Lax connection, we would like to turn A into the Lax connection itself. There are two obvious issues with this.

The first main issue is that A has an additional \(\mathrm{d}{\bar{z}}\)-component compared to the Lax connection \({\mathcal {L}}= {\mathcal {L}}_\eta \mathrm{d}\eta + {\mathcal {L}}_\xi \mathrm{d}\xi \). We can eliminate this problem by focusing on field configurations with no \(d {\bar{z}}\)-component. This will break some of the gauge invariance since we must now impose that (2.5) does not re-create any \(\mathrm{d}{\bar{z}}\)-component in the gauge field. In other words, we impose that

$$\begin{aligned} A_{{\bar{z}}} = 0, \quad {{\bar{\partial }}} g g^{-1} = 0. \end{aligned}$$
(2.12)

An obvious way to ensure the latter condition is to take \(g \in C^\infty (\Sigma , GL_N)\), i.e. g no longer depends on \({\mathbb {C}}P^1\). These residual gauge transformations will correspond to gauge transformations in the 2d theory.

The next difference between \(A = A_\eta \mathrm{d}\eta + A_\xi \mathrm{d}\xi \) and a Lax connection is that the former depends smoothly on \({\mathbb {C}}P^1\), with singularities at the marked points \(a_m\) and \(b_n\) of the form described in Sect. 2.1, while the latter is meromorphic on \({\mathbb {C}}P^1\). This issue is resolved by focusing again on a subset of gauge fields which satisfy the equations of motion (2.10). Having fixed \(A_{{\bar{z}}} = 0\), these now reduce to

$$\begin{aligned} \partial _{{\bar{z}}} A_\xi = 2 \pi \mathrm{i}\sum _{m=1}^{N_1} U_m \delta (z - a_m), \quad \partial _{{\bar{z}}} A_\eta = 2 \pi \mathrm{i}\sum _{n=1}^{N_2} V_n \delta (z - b_n). \end{aligned}$$

Using the identity \(\partial _{{\bar{z}}} z^{-1} = - 2 \pi \mathrm{i}\delta (z)\) we deduce that a solution of the above is

$$\begin{aligned} A_\xi = {\mathcal {L}}_\xi :=- U_0 - \sum _{m=1}^{N_1} \frac{U_m}{z - a_m}, \quad A_\eta = {\mathcal {L}}_\eta :=- V_0 - \sum _{n=1}^{N_2} \frac{V_n}{z - b_n}. \end{aligned}$$
(2.13)

These expressions coincide with those for the \(\mathrm{d}\xi \) and \(\mathrm{d}\eta \)-components U and V of the Lax connection from [33, (2) & (6)].

Note that if we have \(U_0 = d_\xi h h^{-1} \) and \(V_0 = d_\eta h h^{-1}\) for some \(h \in C^\infty (\Sigma , GL_N)\), cf. [33, (5)], then we can set them both to zero in (2.13) using a gauge transformation with \(g = h^{-1}\). This would have the effect of fixing the residual gauge symmetry down to the global transformations and the Lax connection (2.13) would then have no constant term, i.e. it would take the form

$$\begin{aligned} A_\xi = - \sum _{m=1}^{N_1} \frac{U_m}{z - a_m}, \quad A_\eta = - \sum _{n=1}^{N_2} \frac{V_n}{z - b_n}. \end{aligned}$$

We will, however, keep the residual gauge symmetry for the remainder of this section, which will become the gauge symmetry in the 2d action.

2.5 Defect equations of motion

We may also consider the variation of the action (2.6) with respect to the 2d defect fields \(\phi _m\) and \(\psi _n\). Consider the variation \(\phi _m \mapsto e^{\epsilon \alpha _m} \phi _m\) for arbitrary \(\alpha _m \in C^\infty (\Sigma , \mathfrak {gl}_N)\) with \(m = 1, \ldots , N_1\) and \(\psi _n \mapsto e^{\epsilon \beta _n} \psi _n\) for arbitrary \(\beta _n \in C^\infty (\Sigma , \mathfrak {gl}_N)\) with \(n = 1, \ldots , N_2\) in the action (2.6). This gives

$$\begin{aligned} \delta _{(\alpha _m, \beta _n)} S\big ( A, \{ \phi _m \}_{m=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big )&:=\frac{\mathrm{d}}{\mathrm{d}\epsilon } \bigg |_{\epsilon = 0} S\big ( A, \{ e^{\epsilon \alpha _m} \phi _m \}_{m=1}^{N_1}, \{ e^{\epsilon \beta _n} \psi _n \}_{n=1}^{N_2} \big )\\&= \sum _{m=1}^{N_1} \int _{\Sigma \times \{ a_m \}} {{\,\mathrm{Tr}\,}}\big ( \alpha _m \big ( d_\Sigma U_m - [U_m, \iota _{a_m}^*A] \big ) \big ) \wedge \mathrm{d}\xi \\&\quad +\,\sum _{n=1}^{N_2} \int _{\Sigma \times \{ b_n \}} {{\,\mathrm{Tr}\,}}\big ( \beta _n \big ( d_\Sigma V_n - [V_n, \iota _{b_n}^*A] \big ) \big ) \wedge \mathrm{d}\eta . \end{aligned}$$

Taking into account the solution (2.13), this then leads to the equations of motion

$$\begin{aligned} \partial _\eta U_m = - \bigg [ U_m, V_0 + \sum _{n=1}^{N_2} \frac{V_n}{a_m - b_n} \bigg ], \quad \partial _\xi V_n = - \bigg [ V_n, U_0 + \sum _{m=1}^{N_1} \frac{U_m}{b_n - a_m} \bigg ] \end{aligned}$$
(2.14)

for \(m = 1, \ldots , N_1\) and \(n = 1, \ldots , N_2\). These coincide with [33, (4)] (noting that there is a sign mistake in [33, (4)]). Of course, Eq. (2.14) is nothing but the residues of the zero curvature equation (2.9) at \(a_m\) and \(b_n\), respectively, taking into account the solution (2.13).

2.6 The Zakharov–Mikhailov action

We now substitute the solution (2.13) for A, which we write as \({\mathcal {L}}= {\mathcal {L}}_\eta \mathrm{d}\eta + {\mathcal {L}}_\xi \mathrm{d}\xi \) since it corresponds to the Lax connection, into the action (2.6). The 4d Chern–Simons action term becomes

$$\begin{aligned} S_{4d}({\mathcal {L}}) = - \frac{\mathrm{i}}{4 \pi } \int _X z \, {{\,\mathrm{Tr}\,}}( F({\mathcal {L}}) \wedge F({\mathcal {L}}) ) = - \frac{\mathrm{i}}{2 \pi } \int _X z \, {{\,\mathrm{Tr}\,}}(\partial {\mathcal {L}}\wedge {{\bar{\partial }}} {\mathcal {L}}) \end{aligned}$$
(2.15)

where in the second equality we used the fact that \({\mathcal {L}}\) only has components along \(\mathrm{d}\eta \) and \(\mathrm{d}\xi \) which implies, in particular, that \({{\,\mathrm{Tr}\,}}(d {\mathcal {L}}\wedge {\mathcal {L}}\wedge {\mathcal {L}}) = 0\). Using the explicit form (2.13) of \({\mathcal {L}}\), we find

$$\begin{aligned} {{\,\mathrm{Tr}\,}}( \partial {\mathcal {L}}\wedge {{\bar{\partial }}} {\mathcal {L}})&= 2 \pi \mathrm{i}\sum _{m=1}^{N_1} \sum _{n=1}^{N_2} \frac{{{\,\mathrm{Tr}\,}}(U_m V_n) \big ( \delta (z - b_n) - \delta (z - a_m) \big )}{(a_m - b_n)^2} \mathrm{d}z \wedge \mathrm{d}{\bar{z}} \wedge \mathrm{d}\eta \wedge \mathrm{d}\xi . \end{aligned}$$

Substituting this into (2.15) and performing the integral over \(\mathrm{d}z \wedge \mathrm{d}{\bar{z}}\) by using the property (2.11) of the \(\delta \)-function, we find

$$\begin{aligned} S_{4d}({\mathcal {L}})&= - \sum _{m=1}^{N_1} \sum _{n=1}^{N_2} \int _\Sigma \frac{{{\,\mathrm{Tr}\,}}(U_m V_n)}{a_m - b_n} \mathrm{d}\eta \wedge \mathrm{d}\xi . \end{aligned}$$
(2.16)

On the other hand, substituting the solution (2.13) for A into the two surface defect contributions to the action, namely (2.7), we obtain

$$\begin{aligned}&S_{\mathrm{defect}}\big ( A, \{ \phi _m \}_{m=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big ) = - \int _\Sigma {{\,\mathrm{Tr}\,}}\bigg ( \sum _{n=1}^{N_1} \phi _n^{-1} (\partial _\eta - V_0) \phi _n U^{(0)}_n \nonumber \\&\quad -\,\sum _{n=1}^{N_2} \psi _n^{-1} (\partial _\xi - U_0) \psi _n V^{(0)}_n - 2 \sum _{m=1}^{N_1} \sum _{n=1}^{N_2} \frac{U_m V_n}{a_m - b_n} \bigg ) \mathrm{d}\eta \wedge \mathrm{d}\xi . \end{aligned}$$
(2.17)

Combining together (2.16) and (2.17), and recalling that \(U_m = \phi _m U^{(0)}_m \phi _m^{-1}\) and \(V_n = \psi _n V^{(0)}_n \psi _n^{-1}\), we arrive at the following \(2\hbox {d}\) action

$$\begin{aligned}&S_{\mathrm{2d}}\big ( \{ \phi _n \}_{n=1}^{N_1}, \{ \psi _n \}_{n=1}^{N_2} \big ) \nonumber \\&\quad = - \int _\Sigma {{\,\mathrm{Tr}\,}}\bigg ( \sum _{n=1}^{N_1} \phi _n^{-1} (\partial _\eta - V_0) \phi _n U^{(0)}_n - \sum _{n=1}^{N_2} \psi _n^{-1} (\partial _\xi - U_0) \psi _n V^{(0)}_n \nonumber \\&\qquad -\,\sum _{m=1}^{N_1} \sum _{n=1}^{N_2} \frac{\phi _m U^{(0)}_m \phi _m^{-1} \psi _n V^{(0)}_n \psi _n^{-1}}{a_m - b_n} \bigg ) \mathrm{d}\eta \wedge \mathrm{d}\xi . \end{aligned}$$
(2.18)

This coincides with the Zakharov–Mikhailov action [33, (10)] up to an overall sign.

2.7 Example

The simplest non-trivial example of the Zakharov–Mikhailov action is obtained by taking \(N_1 = N_2 = 1\). In this case we only have two fields \(\phi _1\) and \(\psi _1\) which we denote simply as \(\phi \) and \(\psi \). Moving to a gauge where \(V_0 = U_0 = 0\), as described in Sect. 2.4, and choosing \(U^{(0)} = - \Lambda \) and \(V^{(0)} = \Lambda \) for some fixed constant matrix \(\Lambda \), the action (2.18) takes the simple form

$$\begin{aligned} S_{\mathrm{2d}}(\phi , \psi ) = \int _\Sigma {{\,\mathrm{Tr}\,}}\bigg ( \phi ^{-1} \partial _\eta \phi \Lambda + \psi ^{-1} \partial _\xi \psi \Lambda + \frac{1}{2\nu } \phi \Lambda \phi ^{-1} \psi \Lambda \psi ^{-1} \bigg ) \mathrm{d}\eta \wedge \mathrm{d}\xi , \end{aligned}$$

where we have introduced the coupling \(2\nu :=a_1 - b_1\).

This action coincides with that of the so-called linear chiral model constructed in [2, (3.20)]. The latter can be seen as a generalisation to an arbitrary Lie algebra (here written only for \(\mathfrak {gl}_N\)) of the model proposed by Faddeev and Reshetikhin in [19] as an ultralocal reduction of the SU(2) principal chiral model. More precisely, the Faddeev–Reshetikhin model is defined by replacing the non-ultralocal Poisson bracket of the SU(2) principal chiral model by an ultralocal one. However, the latter is degenerated, and therefore, the Faddeev–Reshetikhin model can only reproduce a reduction of the original principal chiral model dynamics, in which the Casimirs of the ultralocal Poisson bracket have been set to constants. In the next section, we will derive the covariant Poisson algebra of the Lax connection (2.13) which in the present two-point case generalises the ultralocal algebra for the Lax matrix of the linear chiral model found in [2, (3.5)].

3 Covariant Poisson bracket and r-matrix for the \(2\hbox {d}\) theory

In this section, we will rely heavily on the calculus in the variational bicomplex as presented in [18]. Informally, we introduce two differentials: d is the “horizontal” differential, and acts as the usual exterior differential, while \(\delta \) is the “vertical” differential that acts only with respect to the fields. We consider (pq)-differential forms that have a vertical degree p and a horizontal degree q. For instance, \({\mathcal {L}}= {\mathcal {L}}_\eta \mathrm{d}\eta + L_\xi \mathrm{d}\xi \) is a (0, 1)-form, or a horizontal 1-form, and \(\varOmega ^{(1)}\) below (3.2) is a (1, 1)-form. For details on how this is used in deriving the r-matrix structure of the covariant Poisson bracket of the Lax connection of a 2d integrable field theory, or more generally an integrable hierarchy, we refer the reader to [7, 9].

We proceed in four steps: to begin with, we derive the multisymplectic form of the theory by considering the variation of its Lagrangian volume form, as established in [18, (19.5.2)] and then used in [7]. We can then define the covariant Poisson bracket of certain horizontal forms, called Hamiltonian, using the multisymplectic form. We then show that the Lax form associated with the Zakharov–Mikhailov theory is Hamiltonian and compute its covariant Poisson bracket structure à la Sklyanin [28, 29], thus exhibiting its r-matrix structure. Finally, we construct the covariant Hamiltonian for the 2d theory, which is the covariant analogue of the usual Hamiltonian obtained by performing the Legendre transformation with respect to both independent variables, and we interpret the zero-curvature equations as covariant Hamilton equations.

3.1 The multisymplectic form

Our starting point is the Lagrangian volume form associated with (2.18), where from now on we shall drop the inessential overall minus sign compared to [33, (10)]. However, throughout this section we shall work in the gauge where \(U_0 = V_0 = 0\), so we start from

$$\begin{aligned} L_{\mathrm{ZM}} :={{\,\mathrm{Tr}\,}}\bigg ( \sum _{m=1}^{N_1} \phi _m^{-1} \partial _\eta \phi _m U^{(0)}_m - \sum _{n=1}^{N_2} \psi _n^{-1} \partial _\xi \psi _n V^{(0)}_n - \sum _{m=1}^{N_1} \sum _{n=1}^{N_2} \frac{U_m V_n}{a_m - b_n} \bigg ) \mathrm{d}\eta \wedge \mathrm{d}\xi , \end{aligned}$$

where we recall the notations \(U_m = \phi _m U^{(0)}_m \phi _m^{-1}\) and \(V_n=\psi _nV^{(0)}_n \psi _n^{-1}\). In particular, we have \(\delta U_m = [\delta \phi _m \phi _m^{-1}, U_m]\) and \(\delta V_n = [\delta \psi _n \psi _n^{-1}, V_n]\). We also note the identities

$$\begin{aligned}&\delta \big ( {{\,\mathrm{Tr}\,}}\big ( \phi _m^{-1} \partial _\eta \phi _m U^{(0)}_m \big ) \mathrm{d}\eta \wedge \mathrm{d}\xi \big )\\&\quad = {{\,\mathrm{Tr}\,}}\big ( \! - \partial _\eta U_m \delta \phi _m \phi _m^{-1} \big ) \wedge \mathrm{d}\eta \wedge \mathrm{d}\xi - \mathrm{d}{{\,\mathrm{Tr}\,}}\big ( \phi _m^{-1} \delta \phi _m U^{(0)}_m \wedge \mathrm{d}\xi \big ),\\&\qquad - \, \delta \big ( {{\,\mathrm{Tr}\,}}\big ( \psi _n^{-1} \partial _\xi \psi _n V^{(0)}_n \big ) \mathrm{d}\eta \wedge \mathrm{d}\xi \big )\\&\quad = {{\,\mathrm{Tr}\,}}\big ( \partial _\xi V_n \delta \psi _n \psi _n^{-1} \big ) \wedge \mathrm{d}\eta \wedge \mathrm{d}\xi - \mathrm{d}{{\,\mathrm{Tr}\,}}\big ( \psi _n^{-1} \delta \psi _n V^{(0)}_n \wedge \mathrm{d}\eta \big ). \end{aligned}$$

To show these we need, in particular, to use the fact that \(\delta d = - d \delta \) along with the cyclicity of the trace. Combining the above we then find

$$\begin{aligned} \delta L_{\mathrm{ZM}}&= {{\,\mathrm{Tr}\,}}\Bigg ( \!\! - \sum _{m=1}^{N_1} \bigg ( \partial _\eta U_m + \sum _{n=1}^{N_2} \frac{[U_m,V_n]}{a_m - b_n} \bigg ) \delta \phi _m \phi _m^{-1} \nonumber \\&\qquad +\,\sum _{n=1}^{N_2} \bigg ( \partial _\xi V_n + \sum _{m=1}^{N_1} \frac{[V_n,U_m]}{b_n - a_m} \bigg ) \delta \psi _n \psi _n^{-1} \Bigg ) \mathrm{d}\eta \wedge \mathrm{d}\xi \nonumber \\&\qquad -\,d {{\,\mathrm{Tr}\,}}\bigg (\sum _{m=1}^{N_1} \phi _m^{-1} \delta \phi _m U^{(0)}_m \wedge \mathrm{d}\xi +\sum _{n=1}^{N_2} \psi _n^{-1} \delta \psi _n V^{(0)}_n \wedge \mathrm{d}\eta \bigg ). \end{aligned}$$
(3.1)

As expected, the first term reproduces the Euler-Lagrange equations in the form (2.14), recalling that we are working in the gauge where \(U_0 = V_0 = 0\). On the other hand, the last term on the right hand side of (3.1) allows us to identify the form

$$\begin{aligned} {{\,\mathrm{\varOmega ^{(1)}}\,}}=\sum _{m=1}^{N_1} {{\,\mathrm{Tr}\,}}(\phi _m^{-1} \delta \phi _m U^{(0)}_m) \wedge \mathrm{d}\xi +\sum _{n=1}^{N_2} {{\,\mathrm{Tr}\,}}(\psi _n^{-1} \delta \psi _n V^{(0)}_n) \wedge \mathrm{d}\eta , \end{aligned}$$
(3.2)

which in turn yields the multisymplectic form \(\varOmega :=\delta {{\,\mathrm{\varOmega ^{(1)}}\,}}\) of the model as

$$\begin{aligned} \varOmega =-\sum _{m=1}^{N_1} {{\,\mathrm{Tr}\,}}\big ( \phi _m^{-1} \delta \phi _m \wedge \phi _m^{-1} \delta \phi _m U^{(0)}_m \big ) \wedge \mathrm{d}\xi -\sum _{n=1}^{N_2} {{\,\mathrm{Tr}\,}}\big ( \psi _n^{-1} \delta \psi _n\wedge \psi _n^{-1} \delta \psi _n V^{(0)}_n \big ) \wedge \mathrm{d}\eta . \end{aligned}$$
(3.3)

The multisymplectic form \(\varOmega = \omega _{(\xi )} \wedge \mathrm{d}\xi + \omega _{(\eta )} \wedge \mathrm{d}\eta \) provides the covariant symplectic structure of a field theory. Its coefficients \(\omega _{(\xi )}\) and \(\omega _{(\eta )}\) contain the pull-back to the group of the Kostant–Kirillov forms for the orbits through \(U^{(0)}_m\) and \(V^{(0)}_n\), respectively.

3.2 Covariant Poisson bracket of Hamiltonian 1-forms

We are now ready to define the covariant Poisson bracket \(\{\! | ~, ~| \! \}\) between certain horizontal forms called Hamiltonian. Specifically, a horizontal form F is Hamiltonian if there exists a vector field \(X_F\) such that

$$\begin{aligned} \delta F=X_F\lrcorner \varOmega , \end{aligned}$$
(3.4)

where \(~\lrcorner ~\) denotes the interior product of a vector field with a form. Let F and G be two Hamiltonian forms. We define their covariant Poisson bracket as follows

$$\begin{aligned} \{\! | F, G| \! \} :=(-1)^q X_F\lrcorner \delta G=(-1)^qX_F\lrcorner X_G\lrcorner \varOmega , \end{aligned}$$
(3.5)

where q is the horizontal degree of F. Notice that the vector field \(X_F\) in (3.4) will generally not be unique since \(\varOmega \) may have a non-trivial kernel. Nevertheless, the covariant Poisson bracket (3.5) is seen to be independent of the choice of vector fields \(X_F\) and \(X_G\) for both of the Hamiltonian forms F and G. We remark that the covariant Poisson bracket is non-trivial and well defined when the Hamiltonian forms F and G are either both horizontal 1-forms, or one is a horizontal 1-form and the other one is a 0-form (i.e. a function).

Our objective is to compute the covariant Poisson bracket à la Sklyanin for the Lax connection \({\mathcal {L}}= {\mathcal {L}}_\eta \mathrm{d}\eta + {\mathcal {L}}_\xi \mathrm{d}\xi \) corresponding to the solution (2.13) for A, in the gauge where \(U_0=V_0 = 0\). Specifically, let \(E_{ij}\) be the canonical basis for \(\mathfrak {gl}_N\) and write the Lax connection in this basis as

$$\begin{aligned} {\mathcal {L}}(z) = \sum _{i,j=1}^N{\mathcal {L}}_{ij}(z) \,E_{ij}, \end{aligned}$$

where from now on we shall show the explicit dependence on the spectral parameter. To compute the covariant Poisson brackets between any two components of the Lax connection, we first need to show that these are Hamiltonian 1-forms.

For this, we shall need the following useful identities. If M is any \(GL_N\)-valued field with components \(M_{ij}\), \(i,j=1,\dots ,N\) and C is any non-dynamical matrix (meaning \(\delta C=0\)), then we have

$$\begin{aligned}&\sum _{k=1}^N M_{ik}\frac{\partial }{\partial M_{jk}}\lrcorner {{\,\mathrm{Tr}\,}}\left( M^{-1}\delta M\wedge M^{-1}\delta M C\right) =\delta (MCM^{-1})_{ij}, \end{aligned}$$
(3.6a)
$$\begin{aligned}&\sum _{k=1}^N M_{ik}\frac{\partial }{\partial M_{jk}}\lrcorner \delta (MCM^{-1})_{kl}=\delta _{jk}(MCM^{-1})_{il}-\delta _{il}(MCM^{-1})_{kj}. \end{aligned}$$
(3.6b)

In particular, we can use these with \(M=\phi _n\), \(C=U_n^{(0)}\) and \(M=\psi _n\), \(C=V_n^{(0)}\). Then, a direct calculation shows that

$$\begin{aligned} X_{ij}(z) = \sum _{m=1}^{N_1}\sum _{\beta =1}^{N} \frac{\phi _{m,i\beta }}{z - a_m} \frac{\partial }{\partial \phi _{m,j\beta }} + \sum _{n=1}^{N_2}\sum _{\beta =1}^{N} \frac{\psi _{n,i\beta }}{z - b_n} \frac{\partial }{\partial \psi _{n,j\beta }}, \end{aligned}$$
(3.7)

satisfies \(\delta {\mathcal {L}}_{ij}(z)=X_{ij}(z)\lrcorner \varOmega \). Therefore, all the components \({\mathcal {L}}_{ij}(z)\) for \(i,j=1,\dots ,N\) of the Lax connection are Hamiltonian 1-forms, as required.

We shall write the covariant Poisson bracket of the Lax connection \({\mathcal {L}}\) using the standard tensorial notation \({\mathcal {L}}_1 :={\mathcal {L}}\otimes \varvec{1}\) and \({\mathcal {L}}_2 :=\varvec{1}\otimes {\mathcal {L}}\) as

$$\begin{aligned} \{\! | {\mathcal {L}}_1(z), {\mathcal {L}}_2(w)| \! \} :=\sum _{i,j=1}^N \{\! | {\mathcal {L}}_{ij}(z), {\mathcal {L}}_{kl}(w)| \! \}E_{ij}\otimes E_{kl}. \end{aligned}$$
(3.8)

3.3 The r-matrix structure

We now turn to the computation of the components on the right hand side of (3.8). We have

$$\begin{aligned} \{\! | {\mathcal {L}}_{ij}(z), {\mathcal {L}}_{kl}(w)| \! \}&= -X_{ij}(z)\lrcorner \delta {\mathcal {L}}_{kl}(w)\\&= \sum _{m=1}^{N_1}\frac{\delta _{jk}(U_m)_{il}-\delta _{il}(U_m)_{kj}}{(z-a_m)(w-a_m)} \mathrm{d}\xi + \sum _{n=1}^{N_2}\frac{\delta _{jk}(V_n)_{il}-\delta _{il}(V_n)_{kj}}{(z-b_n)(w-b_n)} \mathrm{d}\eta . \end{aligned}$$

Introducing the permutation operator \(P_{12} :=\sum _{i,j=1}^NE_{ij}\otimes E_{ji}\) with the property

$$\begin{aligned} \sum _{i,j=1}^N\left( \delta _{jk}M_{il}-\delta _{il}M_{kj}\right) E_{ij}\otimes E_{kl}=[M_1,P_{12}]=-[M_2,P_{12}], \end{aligned}$$

for any \(\mathfrak {gl}_N\)-valued field M with components \(M_{ij}\) for \(i,j = 1, \ldots , N\), and noting that for any distinct \(z, w, a \in {\mathbb {C}}\) we have the identity

$$\begin{aligned} \frac{1}{(z-a)(w-a)}=\frac{1}{w-z}\left( \frac{1}{z-a}-\frac{1}{w-a}\right) , \end{aligned}$$
(3.9)

we may rewrite the covariant Poisson bracket (3.8) as

$$\begin{aligned} \{\! | {\mathcal {L}}_1(z), {\mathcal {L}}_2(w)| \! \}&=\sum _{m=1}^{N_1}\frac{[(U_m)_{1},P_{12}]}{(z-a_m)(w-a_m)} \mathrm{d}\xi +\sum _{n=1}^{N_2}\frac{[(V_n)_{1},P_{12}]}{(z-b_n)(w-b_n)} \mathrm{d}\eta \\&= \bigg [\frac{P_{12}}{z-w},{\mathcal {L}}_1(z)+{\mathcal {L}}_2(w)\bigg ]. \end{aligned}$$

In other words, we have the announced result that the Lax connection satisfies the following Poisson algebra

$$\begin{aligned} \{\! | {\mathcal {L}}_1(z), {\mathcal {L}}_2(w)| \! \}=\big [r_{12}(z-w),{\mathcal {L}}_1(z)+{\mathcal {L}}_2(w)\big ], \end{aligned}$$

with respect to the covariant Poisson bracket \(\{\! | ~, ~| \! \}\), where \(r_{12}(z) :=P_{12}/z\) is the rational r-matrix. The fact that we have been working with the Lie algebra \(\mathfrak {gl}_N\) was convenient for writing the \(GL_N\)-valued fields \(\phi _n\) and \(\psi _n\) in components. However, the above derivation can be adapted to hold more generally for any semisimple Lie algebra, working in a basis of the latter.

3.4 The covariant Hamiltonian

Following [18, Lemma 19.5.9], the covariant Hamiltonian related to \(L_{\mathrm{ZM}}\) is found to be equal to

$$\begin{aligned} H_{\mathrm{ZM}}&:=-L_{\mathrm{ZM}} +\sum _{m=1}^{N_1}{{\,\mathrm{Tr}\,}}\big ( \phi _m^{-1}\partial _\eta \phi _m U_m^{(0)} \big )\, \mathrm{d}\eta \wedge d \xi - \sum _{n=1}^{N_2}{{\,\mathrm{Tr}\,}}\big ( \psi _n^{-1}\partial _\xi \psi _n V_n^{(0)} \big )\, \mathrm{d}\eta \wedge \mathrm{d} \xi \\&= \sum _{m=1}^{N_1} \sum _{n=1}^{N_2} {{\,\mathrm{Tr}\,}}\frac{U_m V_n}{a_m - b_n} \, \mathrm{d}\eta \wedge \mathrm{d} \xi . \end{aligned}$$

This can be reexpressed directly in terms of the Lax connection as

$$\begin{aligned} H_{\mathrm{ZM}}=\sum _{m=1}^{N_1} \sum _{n=1}^{N_2} {{\,\mathrm{res}\,}}_{z=a_m} {{\,\mathrm{res}\,}}_{w=b_n}{{\,\mathrm{Tr}\,}}\frac{{\mathcal {L}}(z)\wedge {\mathcal {L}}(w)}{z-w}. \end{aligned}$$
(3.10)

This is a rather remarkable formula extending to the present covariant context the familiar formula “\(H={{\,\mathrm{Tr}\,}}L^2\)” for extracting a Hamiltonian from a Lax matrix in many finite-dimensional integrable systems, such as the Gaudin model or Calogero–Moser system. In fact, formula (3.10) is very reminiscent of the expression for the Hamiltonian in non-ultralocal integrable field theories described by Gaudin models associated with affine Kac–Moody algebras [16, 30].

3.5 Flatness of the Lax connection as a covariant Hamilton equation

It was shown for the first time in [7] for certain \(2\hbox {d}\) integrable field theories (nonlinear Schrödinger, sine-Gordon, modified Korteweg–de Vries) that the zero curvature equation (2.9) is a covariant Hamilton equation for the Lax connection \({\mathcal {L}}\) associated with the density of the covariant Hamiltonian. By this we mean that, if we define the “covariant flow” of \({\mathcal {L}}\) by

$$\begin{aligned} \mathrm{d}{\mathcal {L}}(z)=\{\! | h_{\mathrm{ZM}}, {\mathcal {L}}(z)| \! \}\mathrm{d}\eta \wedge \mathrm{d}\xi ,~~\text {where}~~H_{ZM}=h_{ZM}\mathrm{d}\eta \wedge \mathrm{d}\xi , \end{aligned}$$

in analogy to what one would do in the traditional Hamiltonian formalism, then since we have

$$\begin{aligned} \{\! | h_{\mathrm{ZM}}, {\mathcal {L}}(z)| \! \}\mathrm{d}\eta \wedge \mathrm{d}\xi =-{\mathcal {L}}(z)\wedge {\mathcal {L}}(z), \end{aligned}$$
(3.11)

we can conclude that \(\mathrm{d}{\mathcal {L}}(z)+{\mathcal {L}}(z)\wedge {\mathcal {L}}(z)=0\). The main steps in the derivation of the crucial equality (3.11) are as follows. First, we have by definition

$$\begin{aligned} \{\! | h_{\mathrm{ZM}}, {\mathcal {L}}(z)| \! \}=\sum _{i,j=1}^N X_{ij}(z)\lrcorner \delta h_{\mathrm{ZM}} E_{ij}. \end{aligned}$$
(3.12)

Second, we find

$$\begin{aligned} X_{ij}(z)\lrcorner \delta h_{\mathrm{ZM}}&= \bigg (\sum _{m=1}^{N_1}\sum _{\beta =1}^{N} \frac{\phi _{m,i\beta }}{z - a_m} \frac{\partial }{\partial \phi _{m,j\beta }} + \sum _{n=1}^{N_2}\sum _{\beta =1}^{N} \frac{\psi _{n,i\beta }}{z - b_n} \frac{\partial }{\partial \psi _{n,j\beta }}\bigg )\\&\quad \lrcorner \bigg (\sum _{p=1}^{N_1}\sum _{q=1}^{N_2}\sum _{k,l=1}^N\frac{(\delta U_p)_{kl}(V_q)_{lk}+(U_p)_{lk}(\delta V_q)_{kl}}{a_p-b_q} \bigg )\\&= \sum _{m=1}^{N_1}\sum _{q=1}^{N_2}\sum _{k,l=1}^N\frac{\left( \delta _{jk}(U_m)_{il}-\delta _{il}(U_m)_{kj}\right) (V_q)_{lk}}{(z-a_m)(a_m-b_q)}\\&\quad +\,\sum _{n=1}^{N_2}\sum _{p=1}^{N_1}\sum _{k,l=1}^N\frac{(U_p)_{lk}\left( \delta _{jk}(V_n)_{il}-\delta _{il}(V_n)_{kj}\right) }{(z-b_n)(a_p-b_n)} \\&= \sum _{m=1}^{N_1}\sum _{n=1}^{N_2}\frac{\left( [U_m,V_n] \right) _{ij}}{(z-a_m)(z-b_n)}, \end{aligned}$$

where we have used the identity (3.6b) in the second equality and (3.9) in the last equality. Substituting the above into (3.12), we obtain (3.11).

4 Conclusion and outlook

In this paper, we derived the Zakharov–Mikhailov \(2\hbox {d}\) action from the \(4\hbox {d}\) Chern–Simons action in the presence of certain surface defects. At the \(2\hbox {d}\) level, the covariant Poisson algebra of the Lax connection was shown to possess a classical r-matrix structure of rational type, thereby recasting the pioneering results of Sklyanin [28, 29] into a covariant Hamiltonian context. So far, this had only been shown for the sine-Gordon model [7] and the entire AKNS hierarchy [8, 9]. There are a number of tantalising questions and possible further directions following this work.

Some of the models (e.g. deformed Gross–Neveu models) considered in the series of papers [1, 4,5,6] seem to be cousins of the models of Zakharov–Mikhailov type studied here. It would be natural to expect that the covariant Poisson algebra of the Lax connection also holds for these models. Whether this could be achieved by relating them to the present Zakharov–Mikhailov construction is an interesting problem. The extension of the covariant Poisson algebra structure to an entire hierarchy, as obtained in [9], is based on the notions of Hamiltonian multiform and multi-time Poisson bracket introduced in [8]. In turn, these are based on the idea of Lagrangian multiforms [22] which provide a generalised variational principle that is able to capture the integrability properties of classical field theories. The Zakharov–Mikhailov action was analysed from this point of view and embedded into a Lagrangian multiform in [27]. It is an intriguing problem to understand how such a multiform could effectively arise from a higher dimensional theory, in parallel to the present situation where a single \(2\hbox {d}\) Lagrangian is derived from a \(4\hbox {d}\) one.

The Poisson algebra of the Lax matrix of a non-ultralocal \(2\hbox {d}\) integrable field theory was derived in [31] by performing a Hamiltonian analysis of the \(4\hbox {d}\) Chern–Simons action for a general 1-form \(\omega \). It would be interesting to similarly derive the covariant Poisson algebra of the Lax connection in the present ultralocal setting for which \(\omega = \mathrm{d}z\). This would involve performing a covariant Hamiltonian analysis of the \(4\hbox {d}\) action (2.6) in order to rederive the covariant Poisson bracket obtained in Sect. 3 directly from the \(4\hbox {d}\) Chern–Simons theory.

We showed that the gauge transformations in the Zakharov–Mikhailov action arose as special types of gauge transformations in the \(4\hbox {d}\) Chern–Simons theory for which the gauge transformation parameter \(g \in C^\infty (\Sigma , G)\) is independent of the spectral parameter. This is the crudest way of ensuring (2.12) but we believe that a more appropriate condition would be to require that g is (sectionally) holomorphic in order to make a connection with the theory of dressing transformations [34]. In other words, it would be interesting to understand if dressing transformations in the \(2\hbox {d}\) integrable field theory can also be understood as arising from gauge transformations in \(4\hbox {d}\) Chern–Simons theory by allowing \(g \in C^\infty (X, G)\) to depend also on \({\mathbb {C}}P^1\) as long as the pole structure of the Lax connection (2.13) remains unchanged under such gauge transformations.

In the ultralocal setting considered in the present paper, the regularised \(4\hbox {d}\) Chern–Simons action is easily seen to be gauge invariant. Therefore, any defect terms added to the action, as in (2.6), should be gauge invariant themselves. By contrast, in the non-ultralocal setting one needs to impose boundary conditions on the bulk field A at the disorder defects, which are located at the poles of \(\omega \) [14]. This is necessary in order to ensure that the action is gauge invariant [3]. Alternatively, the gauge invariance can be ensured by introducing new fields living on the surface defects, called the edge modes, and coupling these to the bulk field A [3]. It would be very interesting to explore the possibility of combining these two approaches by adding further gauge invariant defect terms to the \(4\hbox {d}\) Chern–Simons action with edge modes. This would have the interesting effect of coupling, in the sense of [15, 16], ultralocal integrable field theories to a non-ultralocal one.