1 Introduction

1.1 Context and background

1.1.1 Integrability in the Hamiltonian framework

A profound discovery in the modern theory of integrable systems was that the special partial differential equations originally treated in the seminal works [GGKM, ZS], using what is now known as the Inverse Scattering Method, were also infinite dimensional Hamiltonian systems [G] for which an analog of the Liouville theorem for finite dimensional Hamiltonian systems could be established [ZF, ZMan]. This allows one, in particular, to see such systems as Hamiltonian field theories. The developments based on these early examples led to the beautiful theory of the classical r-matrix which captures the special Hamiltonian features of these models [Dr1, STS]. An important condition usually required of the r-matrix is that it satisfies the classical Yang–Baxter equation (CYBE)

$$\begin{aligned} \big [ r_{12}(\lambda , \mu ), r_{13}(\lambda , \nu ) \big ] + \big [ r_{12}(\lambda , \mu ),r_{23}(\mu , \nu ) \big ] -\big [r_{13}(\lambda , \nu ), r_{32}(\nu , \mu ) \big ]=0. \end{aligned}$$
(1.1)

It ensures that a certain Poisson bracket defined using r satisfies the Jacobi identity. Another important condition is to decide if r is skew-symmetric or not, i.e. whether or not it satisfies

$$\begin{aligned} r_{12}(\lambda ,\mu )=-r_{21}(\mu ,\lambda ). \end{aligned}$$

This has deep mathematical and physical implications. If the r-matrix is skew-symmetric, the associated field theories are called ultralocal while they are non-ultralocal otherwise. In the present work, we restrict our attention to the ultralocal case.

A characteristic feature of integrable field theories is that their equations of motion come in hierarchies. Specifically, any given integrable Hamiltonian field theory has infinitely many conserved charges which can, themselves, be used as Hamiltonians to define flows with respect to the Poisson bracket. Because all the conserved charges Poisson commute amongst themselves, it is possible to impose all these flows simultaneously on the fields of the theory and thus treat the latter as depending on infinitely many times. The collection of equations of motion thus obtained is referred to as an integrable hierarchy. Schematically, for a scalar field theory with field u, there would be a countable number of conserved charges \(H_j\), labelled by integers \(j\ge 1\) say, in involution with respect to a given Poisson bracket, namely

$$\begin{aligned} \{H_i,H_j\}=0 \end{aligned}$$

for every \(i,j\ge 1\). The hierarchy would then consist of all the equations

$$\begin{aligned} \partial _{t_j}u=\{H_j,u\}, \end{aligned}$$
(1.2)

where we have introduced an infinite number of times \(t_j\) for \(j \ge 1\). Among all the conserved charges \(H_j\), one of them can be taken to be the Hamiltonian of the integrable field theory one started with. Studying the hierarchy as a whole can reveal much more structure and properties of the initial model. This is of course not a new idea but here we depart from the established point of view in that we want to exploit this idea in a Lagrangian setting.

1.1.2 Integrability in the Lagrangian framework

When turning to the Lagrangian setting, one is immediately faced with the following question: how should integrable hierarchies be captured in the Lagrangian formalism? This question found an answer relatively recently in the theory of Lagrangian multiforms which was introduced in the seminal paper [LN] and has rapidly developed in various direction. More recently, several works cast the original idea into the context of continuous integrable field theories, see [SV, V, SNC, PV, SNC2, CS1, CS2, CS3] for examples of two-dimensional field theories (e.g. Korteweg-de Vries, sine-Gordon and nonlinear Schrödinger) and [SNC2, SNC3] for a three-dimensional example (Kadomtsev–Petviashvili). For a two-dimensional field theory, the central object is a differential two-form

$$\begin{aligned} \mathscr {L}[u]=\sum _{i,j}\mathscr {L}_{ij}[u]dt_i\wedge dt_j \end{aligned}$$
(1.3)

on an infinite-dimensional space \(\mathbb {R}^\infty \) parametrised by the infinite collection of times \(t_i\) of the hierarchy. The coefficients \(\mathscr {L}_{ij}[u]\) are Lagrangians depending on the fields of the theory, which are collectively denoted by u here for simplicity (even though we are no longer restricting to the case of a single scalar field). For each Lagrangian coefficient \(\mathscr {L}_{ij}[u]\) we can consider the associated action \(S_{ij}[u] = \int _{\mathbb {R}^2} \mathscr {L}_{ij}[u] dt_i \wedge dt_j\). Using the differential two-form (1.3) we can succinctly rewrite all these actions as \(S_{ij}[u] = \int _{\sigma _{ij}} \mathscr {L}[u]\), where the integral here is over the two dimensional plane \(\sigma _{ij} \simeq \mathbb {R}^2\) spanned by the coordinates \(t_i\) and \(t_j\) in \(\mathbb {R}^\infty \). At this point, of course, there is no reason for the field theories described by the actions \(S_{ij}[u]\) to belong to the same integrable hierarchy, let alone to produce equations of motion that are integrable! The key new ingredient is to impose a generalised variational principle on the more general action

$$\begin{aligned} S[u,\sigma ] = \int _\sigma \mathscr {L}[u], \end{aligned}$$
(1.4)

which now also depends on an arbitrary choice of two-dimensional smooth surface \(\sigma \) in \(\mathbb {R}^\infty \). Note, in particular, that \(S_{ij}[u] = S[u, \sigma _{ij}]\). The generalised variational principle which ties all these theories together is a least action principle for \(S[u,\sigma ]\) simultaneously for all smooth surfaces \(\sigma \). This results in what are called the multiform Euler–Lagrange (EL) equations. These were first derived in [SV] for the two-form case that we consider in this paper. It can be shown [SV, SNC] that they can be written compactly as

$$\begin{aligned} \delta d \mathscr {L}=0, \end{aligned}$$
(1.5)

where d is the usual exterior derivative and \(\delta \) denotes the variational derivative. In the Lagrangian multiform theory, the above generalised variational principle is complemented by another requirement: on critical points, one also requires that the action be stationary with respect to arbitrary local variations of \(\sigma \). This gives us the important closure relation on the equations of motion, i.e. on shell

$$\begin{aligned} d \mathscr {L}=0. \end{aligned}$$
(1.6)

Intuitively, requiring criticality of the action for an arbitrary surface is the new feature that encodes variationally the commutativity of the flows known to be a signature of integrability in the Hamiltonian world. Roughly speaking, the connection with (1.2) is that the Lagrangian coefficients \(\mathscr {L}_{1j}\) correspond by a Legendre transform to the Hamiltonians \(H_j\), with the understanding that the time \(t_1\) plays some preferred role (the “space” variable) and the \(t_j\), \(j\ge 2\) are all the higher times of the hierarchy. The interpretation of all the other Lagrangian coefficients \(\mathscr {L}_{ij}\) is best obtained by casting the hierarchy as a collection of compatible zero curvature equations involving Lax matrices \(V_j(\lambda )\), namely

$$\begin{aligned} \partial _{t_j}V_i(\lambda )-\partial _{t_i}V_j(\lambda )+\left[ V_i(\lambda ), V_j(\lambda )\right] =0 \end{aligned}$$
(1.7)

for \(i, j \ge 1\). It is known that all these equations are in fact Hamiltonian, see e.g. [AC], and the case \(i=1\) corresponds to (1.2). One of the main points of the present work is that they are also variational with Lagrangian \(\mathscr {L}_{ij}\). It is important to realise that the multiform EL equations are largely overdetermined equations for the coefficients \(\mathscr {L}_{ij}\). Part of these equations impose restrictions on the allowed coefficients, the idea being that they enforce the integrability of the corresponding theories. The rest consist of standard EL equations associated to these coefficients and give the equations of motion of the integrable hierarchy.

1.1.3 Motivating example: Ablowitz–Kaup–Newell–Segur hierarchy

In [CS3], on the example of the Ablowitz–Kaup–Newell–Segur (AKNS) hierarchy, the notion of Lagrangian multiform was successfully combined with the idea of “compounding hierarchies” introduced in the Lagrangian framework in [N1] (itself inspired from the use of the generating formalism for integrable hierarchies, see e.g. [N2]). This naturally leads to working with generating functions when dealing with hierarchies. The key object was what we can call a generating Lagrangian multiform. The simple idea is to organise the Lagrangian coefficients \(\mathscr {L}_{ij}\) of the 2-form (1.3) into a generating series involving formal (spectral) parameters

$$\begin{aligned} \mathscr {L}(\lambda ,\mu )=\sum _{i,j}\frac{\mathscr {L}_{ij}}{\lambda ^{i+1}\mu ^{j+1}}. \end{aligned}$$
(1.8)

It is clear that there is a one-to-one corresponding between \(\mathscr {L}[u]\) and \(\mathscr {L}(\lambda ,\mu )\) where from the latter, one can extract the coefficients by the formula

$$\begin{aligned} \mathscr {L}_{ij}={{\,\textrm{res}\,}}_\lambda {{\,\textrm{res}\,}}_\mu \left( \lambda ^i\mu ^j\mathscr {L}(\lambda ,\mu )\right) , \end{aligned}$$

where \({{\,\textrm{res}\,}}_\lambda \) returns the coefficient of \(\lambda ^{-1}\) in the series expansion, and similarly for \({{\,\textrm{res}\,}}_\mu \). One advantage of working with generating series such as (1.8) stems from the usefulness of generating functions in general: properties of their coefficients are more easily studied and derived from those of the generating function. In our context, this means that we can manipulate an integrable hierarchy as a whole instead of studying each Lagrangian coefficient \(\mathscr {L}_{ij}\) individually. Originally, the latter approach was used in the sense that only a given “starting” Lagrangian coefficient was known, say \(\mathscr {L}_{12}\), and one would try to build the higher coefficients \(\mathscr {L}_{ij}\) so as to obtain a consistent Lagrangian multiform. Methods to compute these coefficients were introduced for instance in [V, SNC2]. Although the recursive algorithm could be applied in principle, in practice this is hard to implement beyond the first few coefficients. Moreover, the Lagrangians \(\mathscr {L}_{ij}\) obtained in this way usually contain derivatives with respect to \(t_1\) or \(t_2\) (the times associated with \(\mathscr {L}_{12}\)). These are not natural times from the point of \(\mathscr {L}_{ij}\): this is the so-called problem of “alien derivatives” which was identified and explained in [V]. Having a generating Lagrangian multiform circumvents these issues. This will be elaborated upon in the examples.

For the AKNS hierarchy, the generating Lagrangian multiform can be written as [CS3]

$$\begin{aligned} \mathscr {L}(\lambda ,\mu )= i{{\,\textrm{Tr}\,}}{\left( \phi (\mu )^{-1} \mathcal {D}_\lambda \phi (\mu )\sigma _3 - \phi (\lambda ) ^{-1} \mathcal {D}_\mu \phi (\lambda )\sigma _3 \right) } - {{\,\textrm{Tr}\,}}{ \frac{Q(\lambda )Q(\mu )}{\mu -\lambda } },\nonumber \\ \end{aligned}$$
(1.9)

with \(Q(\lambda )=-i\phi (\lambda )\sigma _3\phi (\lambda ) ^{-1}\), \(\phi (\lambda )\) being a group-valued formal series in \(1/\lambda \) with constant term equal to the identity and whose coefficients contain the dynamical variables. The operator \(\displaystyle \mathcal {D}_\lambda = \sum _{j \ge 0} \lambda ^{-j-1} \partial _{t_j}\) is a formal collection of all the AKNS flows \(\partial _{t_j}\), and similarly for \(\mathcal {D}_\mu \). The generating Lagrangian multiform (1.9) generates all the coefficients \(\mathscr {L}_{ij}\) systematically and without the problem of alien derivatives, reproducing the first few coefficients which had been constructed in [SNC, SNC2, PV], as it should. Its multiform EL equations yield the defining equations of the AKNS hierarchy as discussed by Flaschka–Newell–Ratiu (FNR) in [FNR], namelyFootnote 1

$$\begin{aligned} \partial _{t_i} Q(\lambda ) = [Q^{(i)}(\lambda ), Q(\lambda )],~~ i\ge 0, \end{aligned}$$
(1.10)

where \(\displaystyle Q(\lambda )=\sum _{j=0}^\infty Q_j\lambda ^{-j}\) and \(\displaystyle Q^{(i)}(\lambda )=\sum _{j=0}^i Q_j\lambda ^{i-j}\) and \(Q_0=-i\sigma _3\). More precisely, the multiform EL equations for (1.9) produce the equations (1.10) in generating form

$$\begin{aligned} \mathcal {D}_\mu Q(\lambda ) = \frac{[Q(\mu ),Q(\lambda )]}{ \mu - \lambda }, \end{aligned}$$
(1.11)

where we used the formal series identity

$$\begin{aligned} \sum _{k=0}^\infty \frac{Q^{(k)}(\lambda )}{\mu ^{k+1}} = \frac{Q(\mu )}{\mu - \lambda }. \end{aligned}$$
(1.12)

1.2 Motivation, main results and plan

Motivation The present work is motivated by the following observations made on the generating Lagrangian multiform (1.9) and the generating FNR equations (1.11):

  1. 1.

    The potential term in \(\mathscr {L}(\lambda ,\mu )\) has a characteristic form which can be identified as the expression

    $$\begin{aligned} {{\,\textrm{Tr}\,}}_{12}\left( r_{12}(\lambda ,\mu )Q_1(\lambda )Q_2(\mu )\right) \end{aligned}$$

    where \(r_{12}(\lambda ,\mu )=\frac{P_{12}}{\mu -\lambda }\) is the rational r-matrix, known to describe the Hamiltonian structure of the AKNS hierarchy. One could then imagine replacing this particular r-matrix with another skew-symmetric r-matrix. This leads to the question of whether the nice properties of the generating Lagrangian multiform still hold. One of our main results is that this is the case by virtue of the CYBE. Correspondingly, the RHS of (1.11) can also be written as \([{{\,\textrm{Tr}\,}}_2 r_{12}(\lambda ,\mu ) Q_2(\mu ),Q_1(\lambda )]\) and the same generalisation can be contemplated.

  2. 2.

    The choice of expanding all the objects as formal series in \(1/\lambda \) and \(1/\mu \) is a sign that one is performing an expansion around the point at infinity. However, nothing would prevent us from considering other points in \(\mathbb {C}P^1\).

  3. 3.

    The Pauli matrix \(\sigma _3\) appearing in (1.9) is a special choice of constant element in the underlying loop algebra of \(\mathfrak {sl}_2\) and the form of \(Q(\lambda )\) indicates that one is building a phase space for the field theory as a (co)adjoint orbit around this particular element. One could consider other elements in the loop algebra to construct different phase spaces and hence different models. Moreover, one could also consider other Lie algebras than \(\mathfrak {sl}_2\).

The careful implementation of these natural observations requires some machinery which is presented Sect. 2. In a first instance, the reader may choose to read the rest of this introduction containing a summary of the formalism and results, and go directly to Sect. 3.

The idea is to substitute the loop algebra of \(\mathfrak {sl}_2\) with a much more versatile structure: the Lie algebra of \(\mathfrak {g}\)-valued adèles associated with a Lie algebra \(\mathfrak {g}\). This algebra is presented in [STS2] as the relevant structure to implement the second observation above. By doing so in our context, we build a “universal” generating Lagrangian multiform which is capable of describing a large class of ultralocal integrable hierarchies and we provide a large variety of examples.

In a nutshell, for a matrix Lie algebra \(\mathfrak {g}\), the Lie algebra of \(\mathfrak {g}\)-valued adèles is defined as

$$\begin{aligned} \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) {:}{=}\coprod _{a \in \mathbb {C}P^1} \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt}), \end{aligned}$$

where \(\lambda _a=\lambda -a\) for \(a\in \mathbb {C}\) and \(\lambda _\infty =\frac{1}{\lambda }\) are the local series expansion parameters. An element \(\varvec{X}(\varvec{\lambda }) = (X^a(\lambda _a))_{a \in \mathbb {C}P^1}\) of this algebra consist of tuples with all but finitely many of the formal Laurent series \(X^a(\lambda _a) \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt})\) being Taylor series in \(\lambda _a\), i.e. there exists a finite subset \(S \subset \mathbb {C}P^1\) such that \(X^a(\lambda _a) \in \mathfrak {g}\otimes \mathbb {C}\llbracket \lambda _a \rrbracket \) for every \(a \in \mathbb {C}{\setminus } S\). Let \(R_\lambda (\mathfrak {g})\) denote the Lie algebra of \(\mathfrak {g}\)-valued rational functions in the formal variable \(\lambda \) and define the map

$$\begin{aligned} \varvec{\iota }_{\varvec{\lambda }}: R_\lambda (\mathfrak {g}) \longrightarrow \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}), \quad f \longmapsto (\iota _{\lambda _a} f)_{a \in \mathbb {C}P^1} \end{aligned}$$
(1.13)

where \(\iota _{\lambda _a} f \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt})\) is the Laurent expansion of \(f\in R_\lambda (\mathfrak {g})\) at \(a \in \mathbb {C}P^1\). Using certain solutions of the CYBE, it is possible to obtain a direct sum decomposition of this Lie algebra into two maximally isotropic Lie subalgebras

$$\begin{aligned} \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) = \varvec{\mathcal {A}}^+_{\varvec{\lambda }}(\mathfrak {g}) \dotplus \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g}). \end{aligned}$$
(1.14)

We can also define a group \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\) associated to \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(\mathfrak {g})\). If \(\mu \) is another formal variable, we can work with double formal series locally in \(\lambda _a\) and \(\mu _b\), \(a,b\in \mathbb {C}P^1\) and consider tuples of the form \(\varvec{X}(\varvec{\lambda },\varvec{\mu })=(X^{a,b}(\lambda _a,\mu _b))_{a,b \in \mathbb {C}P^1}\).

Thanks to this adèlic framework, we can retain the power of the algebraic formulation of formal power series while working locally around arbitrary points in \(\mathbb {C}P^1\). We introduce the following generalisation of (1.9) which realises the above three observations

(1.15)

where the kinetic and potential terms are given by

$$\begin{aligned} {} {\textbf {K}}(\varvec{\lambda },\varvec{\mu })&\,{:}{=}\, {{\,\text {Tr}\,}}\big ( \varvec{\phi }(\varvec{\lambda })^{-1} \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda }) (\varvec{\iota }_{\varvec{\lambda }} F(\lambda ))_- \big ) - {{\,\text {Tr}\,}}\big (\varvec{\phi }(\varvec{\mu })^{-1} \mathcal {D}_{\varvec{\lambda }} \varvec{\phi }(\varvec{\mu }) (\varvec{\iota }_{\varvec{\mu }} F(\mu ))_- \big ), \end{aligned}$$
(1.16a)
$$\begin{aligned} {\textbf {U}}(\varvec{\lambda },\varvec{\mu })&\,{:}{=}\, \tfrac{1}{2} {{\,\text {Tr}\,}}_{12}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) {{\textbf {Q}}}_1(\varvec{\lambda }) {{\textbf {Q}}}_2(\varvec{\mu })\big ). \end{aligned}$$
(1.16b)

Here \(\varvec{\phi }(\varvec{\lambda })\) is an element of the group \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\), \(\varvec{Q}(\varvec{\lambda })= \varvec{\phi }(\varvec{\lambda }) \big ( \varvec{\iota }_{\varvec{\lambda }} F(\lambda ) \big )_- \varvec{\phi }(\varvec{\lambda })^{-1}\) is an element of \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), where \(\big ( \varvec{\iota }_{\varvec{\lambda }} F(\lambda ) \big )_- = \big ( F^a(\lambda _a)_- \big )_{a \in \mathbb {C}P^1}\) is the collection of principal parts of a \(\mathfrak {g}\)-valued rational function \(F(\lambda )\in R_\lambda (\mathfrak {g})\). In terms of components of the tuples, we have

$$\begin{aligned} \mathscr {L}^{a,b}(\lambda _a, \mu _b)= & {} {{\,\textrm{Tr}\,}}\big ( \phi ^a(\lambda _a)^{-1} \mathcal {D}_{\mu _b} \phi ^a(\lambda _a) F^a(\lambda _a)_- \big )- {{\,\textrm{Tr}\,}}\big (\phi ^b(\mu _b)^{-1} \mathcal {D}_{\lambda _a} \phi ^b(\mu _b) F^b(\mu _b)_- \big )\\{} & {} - \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{12}\big ( (\iota _{\lambda _a} \iota _{\mu _b} + \iota _{\mu _b}\iota _{\lambda _a})r_{12}(\lambda ,\mu )Q^a_1(\lambda _a)Q^b_2(\mu _b)\big ), \end{aligned}$$

for every \(a,b \in \mathbb {C}P^1\). The operator \(\mathcal {D}_{\varvec{\lambda }} \,{:}{=}\, (\mathcal {D}_{\lambda _a})_{a \in \mathbb {C}P^1}\) denotes the \(\mathbb {C}P^1\)-tuple of formal operators \(\mathcal {D}_{\lambda _a}\) which contain the partial differential operators \(\partial _{t^a_n}\) (see (3.6)). The times \(t_n^a\) will be the times of the integrable hierarchies we describe. The rational function \(r_{12}(\lambda ,\mu )\) is the classical r-matrix defining the type of ultralocal hierarchies we consider (e.g. rational or trigonometric) and corresponds to the r-matrix yielding the decomposition (1.14).

Main results

  1. 1.

    We show that the generating Lax equation

    $$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{Q}_1(\varvec{\lambda }) = \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ), \varvec{Q}_1(\varvec{\lambda }) \big ] \end{aligned}$$
    (1.17)

    is variational. It arises as the multiform EL equations associated to our generating Lagrangian multiform (1.15). This is the content of Theorem 3.12. This generalises the analogous result first obtained in [SNC] in the context of the Zakharov–Mikhailov models [ZM1]. The generating Lax equation plays here for field theories a role similar to the traditional Lax equation for finite dimensional systems. This is explained in Sect. 3.1. We relate it to a generating zero curvature equation which is shown to hold as a consequence of the CYBE for the r-matrix appearing in (1.15).

  2. 2.

    We relate for the first time the CYBE with the relatively recent notion of Lagrangian multiforms. The closure relation (1.6) in generating form, i.e. the closure relation for (1.15), is shown to be a consequence of the CYBE for the r-matrix appearing in (1.15), see Theorem 3.13. On the one hand, this provides a variational interpretation of the CYBE, a fundamental equation that has only been introduced and studied from a Hamiltonian point of view so far. On the other hand, given the importance of the CYBE as a criterion for classical integrability, this further establishes the Lagrangian multiform approach as a variational criterion for integrability.

  3. 3.

    Specialising the generating Lagrangian multiform (1.15), we recover known examples of integrable hierarchies and produce several new examples. We also introduce an easy method for coupling hierarchies together.

Plan of the paper

In Sect. 2, we introduce the Lie algebra of \(\mathfrak {g}\)-valued adèles and establish its decomposition into two complementary maximal isotropic Lie subalgebras which allows us to introduce the classical r-matrix of interest via the corresponding projectors onto the Lie subalgebras. This generalises to the adèles case the well-known interpretation of a classical r-matrix as a difference of projectors. This is done explicitly for the rational and trigonometric cases. Section 3 introduces the main elements of our framework: we state the generalisation of the generating FNR equations (1.11), which we call the generating Lax equation, taking into account the above observations. Its properties are directly connected to the CYBE. Then we introduce the generating Lagrangian multiform that produces the generating Lax equation as its multiform EL equations. Again, its properties, in particular the closure relation, are shown to be a direct consequence of the CYBE. The subsequent Sects. 46 are devoted to examples. Several were known previously, and these are used to show how our framework contains them naturally, e.g. the AKNS hierarchy and the sine-Gordon hierarchy. For the latter example, we explain in detail how the first few known Lagrangian coefficients are recovered but without the problem of alien derivatives. Other examples, such as the trigonometric Zakharov–Mikhailov hierarchy, are new. For the recently introduced deformed Gross-Neveu models, the new feature brought in by our construction is that they are naturally embedded into an integrable hierarchy. Various conclusions and discussions are presented in Sect. 8. Finally, we recall in an “Appendix” the relationship between the trigonometric r-matrix used in this paper and the more familiar r-matrix of the sine-Gordon model.

2 Lie Algebra of \(\mathfrak {g}\)-valued Adèles

2.1 General setup

Let \(N \in \mathbb {Z}_{\ge 1}\) and consider either the Lie algebra \(\mathfrak {gl}_N\) of all \(N \times N\) matrices with complex entries or its Lie subalgebra \(\mathfrak {sl}_N\) of traceless matrices. We will treat both of these cases in parallel, using the common notation \(\mathfrak {g}\) throughout. The generalisation to other matrix Lie algebras is straightforward but for simplicity we shall restrict to these two cases. We also denote by G the associated Lie group which corresponds either to the general linear group \(GL_N\) of invertible \(N \times N\) matrices or to its Lie subgroup \(SL_N\) of matrices with determinant 1.

We use the trace \({{\,\textrm{Tr}\,}}: \mathfrak {gl}_N \rightarrow \mathbb {C}\) to endow the Lie algebra \(\mathfrak {g}\) with the non-degenerate invariant symmetric bilinear form \(\mathfrak {g}\times \mathfrak {g}\rightarrow \mathbb {C}\) given by \((X, Y) \mapsto {{\,\textrm{Tr}\,}}(XY)\). Let \(P_{12}\) be the tensor Casimir of \(\mathfrak {g}\) with the property that \({{\,\textrm{Tr}\,}}_2(P_{12} X_2) = X\) for any \(X \in \mathfrak {g}\). Explicitly, for \(\mathfrak {gl}_N\) it is given by \(P_{12} = \sum _{i,j=1}^N E_{ij} \otimes E_{ji}\) where \(E_{ij}\) for \(i,j=1, \ldots , N\) is the standard basis of \(\mathfrak {gl}_N\). Similarly, for \(\mathfrak {sl}_N\) we can write \(P_{12} = \sum _a I_a \otimes I^a\) where \(\{ I_a \}\) and \(\{ I^a \}\) are dual bases of \(\mathfrak {sl}_N\) with respect to the above bilinear form. For clarity, let us also recall that the notation \(X_2\) means \(\varvec{1}\otimes X\) and the notation \({{\,\textrm{Tr}\,}}_2(\dots )\) means that we apply the trace only in the second tensor factor.

Let \(\lambda \) be a formal variable. For any \(a \in \mathbb {C}\) we define the formal local coordinate around a as \(\lambda _a {:}{=}\lambda - a\) and to the point at infinity we associate the formal local coordinate \(\lambda _\infty {:}{=}\lambda ^{-1}\). We consider the Lie algebra of \(\mathfrak {g}\)-valued adèles defined as

$$\begin{aligned} \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) {:}{=}\coprod _{a \in \mathbb {C}P^1} \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt}). \end{aligned}$$

Its elements consist of tuples \(\varvec{X}(\varvec{\lambda }) = (X^a(\lambda _a))_{a \in \mathbb {C}P^1}\) with all but finitely many of the formal Laurent series \(X^a(\lambda _a) \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt})\) being Taylor series, i.e. there exists a finite subset \(S \subset \mathbb {C}P^1\) such that \(X^a(\lambda _a) \in \mathfrak {g}\otimes \mathbb {C}\llbracket \lambda _a \rrbracket \) for every \(a \in \mathbb {C}{\setminus } S\). The Lie bracket of two elements \(\varvec{X}(\varvec{\lambda }) = (X^a(\lambda _a))_{a \in \mathbb {C}P^1}\) and \(\varvec{Y}(\varvec{\lambda }) = (Y^a(\lambda _a))_{a \in \mathbb {C}P^1}\) in \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) is defined component-wise, as

$$\begin{aligned}{}[\varvec{X}(\varvec{\lambda }), \varvec{Y}(\varvec{\lambda })] = \big ( [X^a(\lambda _a), Y^a(\lambda _a)] \big )_{a \in \mathbb {C}P^1}. \end{aligned}$$

Let \(R_\lambda \) denote the algebra of rational functions in the formal variable \(\lambda \). The Laurent expansion of a rational function \(f \in R_\lambda \) at any \(a \in \mathbb {C}P^1\) defines a homomorphism

$$\begin{aligned} \iota _{\lambda _a}: R_\lambda \longrightarrow \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt}), \quad f \longmapsto \iota _{\lambda _a} f. \end{aligned}$$
(2.1)

We will consider two possible non-degenerate invariant bilinear forms on the Lie algebra \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), namely

$$\begin{aligned} \langle \!\langle \cdot , \cdot \rangle \!\rangle _k: \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) \times \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) \longrightarrow \mathbb {C}\end{aligned}$$
(2.2a)

for \(k=0\) and \(k = -1\), defined as

$$\begin{aligned} \langle \!\langle \varvec{X}(\varvec{\lambda }), \varvec{Y}(\varvec{\lambda }) \rangle \!\rangle _k {:}{=}\sum _{a \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\lambda _a {{\,\textrm{Tr}\,}}\big ( X^a(\lambda _a) Y^a(\lambda _a) \big ) \lambda ^k d\lambda , \end{aligned}$$
(2.2b)

for any \(\varvec{X}(\varvec{\lambda }) = (X^a(\lambda _a))_{a \in \mathbb {C}P^1}\) and \(\varvec{Y}(\varvec{\lambda }) = (Y^a(\lambda _a))_{a \in \mathbb {C}P^1}\). Strictly speaking, the rational function \(\lambda ^k\) on the right hand side of (2.2b) should be expanded at \(a \in \mathbb {C}P^1\), namely we should write \(\iota _{\lambda _a} \lambda ^k\) instead of \(\lambda ^k\). In order to simplify the notation, such expansions will always be implicit when taking residues. Here, for any \(a \in \mathbb {C}P^1\), the residue \({{\,\textrm{res}\,}}^\lambda _a: \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt}) d\lambda _a \rightarrow \mathbb {C}\) returns the coefficient of \(\lambda _a^{-1} d\lambda _a\). For computing the residue at infinity we note that \(d\lambda = - \lambda _\infty ^{-2} d\lambda _\infty \). Note that only finitely many terms in the sum in (2.2b) are non-zero by definition of \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\).

Let \(R_\lambda (\mathfrak {g}) {:}{=}\mathfrak {g}\otimes R_\lambda \) denote the Lie algebra of \(\mathfrak {g}\)-valued rational functions in the formal variable \(\lambda \). We have an embedding of Lie algebras

$$\begin{aligned} \varvec{\iota }_{\varvec{\lambda }}: R_\lambda (\mathfrak {g}) \longrightarrow \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}), \quad f \longmapsto (\iota _{\lambda _a} f)_{a \in \mathbb {C}P^1} \end{aligned}$$
(2.3)

where \(\iota _{\lambda _a} f \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt})\) is the Laurent expansion of \(f \in R_\lambda (\mathfrak {g})\) at \(a \in \mathbb {C}P^1\) in the second tensor factor, as in (2.1). The Lie subalgebra \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g}) \subset \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) is maximally isotropic with respect to \(\langle \!\langle \cdot , \cdot \rangle \!\rangle _k\), for any \(k \in \mathbb {Z}\), by the strong residue theorem; see for instance [Ta, Corollary 1].

In the remainder of this section we will describe two possible complementary Lie algebras to \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) in \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), which are maximally isotropic with respect to \(\langle \!\langle \cdot , \cdot \rangle \!\rangle _0\) and \(\langle \!\langle \cdot , \cdot \rangle \!\rangle _{-1}\), respectively. These two main examples, which can be found for instance in [Dr2, Example 4], correspond to the rational r-matrix and the trigonometric r-matrix, respectively.

Notation We will generally use boldface to denote \(\mathbb {C}P^1\)-tuples. For instance, given any \(n \in \mathbb {Z}\) we will write \(\varvec{\lambda }^n \varvec{X}(\varvec{\lambda })\) for the element \((\lambda _a^n X^a(\lambda _a))_{a \in \mathbb {C}P^1} \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) of the Lie algebra of \(\mathfrak {g}\)-valued adèles. More generally, we would write \(\varvec{\lambda }^n \varvec{X}(\varvec{\lambda }) d \varvec{\lambda }\) as a shorthand for the \(\mathbb {C}P^1\)-tuple \((\lambda _a^n X^a(\lambda _a) d\lambda _a)_{a \in \mathbb {C}P^1}\). Note, crucially, that although \(d\lambda _a = d\lambda \) for all \(a \in \mathbb {C}\), we have \(d\lambda _\infty = - \lambda ^{-2} d\lambda \) for the point at infinity. Therefore the two expressions \(\varvec{\lambda }^n \varvec{X}(\varvec{\lambda }) d \varvec{\lambda }\) and \(\varvec{\lambda }^n \varvec{X}(\varvec{\lambda }) d \lambda \) subtly differ only in the component at infinity. If \(\mu \) is another formal variable then \(\varvec{\mu }\) will denote a separate \(\mathbb {C}P^1\)-tuple carrying an independent label \(b \in \mathbb {C}P^1\). For instance, we would have

$$\begin{aligned}{}[\varvec{X}(\varvec{\lambda }), \varvec{Y}(\varvec{\mu })] = \big ( \delta _{ab}[X^a(\lambda _a), Y^b(\mu _b)] \big )_{a,b \in \mathbb {C}P^1} \end{aligned}$$

for any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) and \(\varvec{Y}(\varvec{\mu }) \in \varvec{\mathcal {A}}_{\varvec{\mu }}(\mathfrak {g})\). We will make use of such notation with multiple formal variables extensively from Sect. 3 onwards.

2.2 Rational r-matrix

Throughout this section we fix the choice \(k=0\) in the bilinear form (2.2). Consider the Lie subalgebra of \(\mathfrak {g}\)-valued integral adèles

$$\begin{aligned} \varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g}) {:}{=}\mathfrak {g}\otimes \lambda _\infty \mathbb {C}\llbracket \lambda _\infty \rrbracket \times \coprod _{a \in \mathbb {C}} \mathfrak {g}\otimes \mathbb {C}\llbracket \lambda _a \rrbracket . \end{aligned}$$
(2.4)

Note that we have excluded the constant term from the Taylor series at infinity. We shall also need the corresponding group

$$\begin{aligned} \varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(G) {:}{=}{\widehat{G}}_\infty \times \coprod _{a \in \mathbb {C}} {\widehat{G}}_a, \end{aligned}$$
(2.5)

where in the \(GL_N\) case \({\widehat{G}}_a\) consists of all invertible \(N \times N\) matrices with entries in \(\mathbb {C}\llbracket \lambda _a \rrbracket \) while \({\widehat{G}}_\infty \) consists of all invertible \(N \times N\) matrices with off-diagonal entries in \(\lambda _\infty \mathbb {C}\llbracket \lambda _\infty \rrbracket \) and diagonal entries in \(1 + \lambda _\infty \mathbb {C}\llbracket \lambda _\infty \rrbracket \). In the \(SL_N\) case the groups \({\widehat{G}}_a\) for all \(a \in \mathbb {C}P^1\) are defined in the same way but with the added condition that the matrices are of determinant 1.

For later practical purposes, it is convenient to collect the following notations in a definition.

Definition 2.1

Let \(a \in \mathbb {C}\) and \(X^a(\lambda _a) \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _a )\hspace{-1.99997pt})\) be a Laurent series in \(\lambda _a\) with coefficients in \(\mathfrak {g}\). We shall use the notation

$$\begin{aligned} X^a(\lambda _a)^{\textrm{rat}}_- \in \mathfrak {g}\otimes \lambda _a^{-1} \mathbb {C}[\lambda _a^{-1}] \end{aligned}$$
(2.6a)

to represent the pole part of \(X^a(\lambda _a)\). Similarly, for \(X^\infty (\lambda _\infty ) \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _\infty )\hspace{-1.99997pt}) = \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda ^{-1} )\hspace{-1.99997pt})\), we denote by

$$\begin{aligned} X^\infty (\lambda _\infty )^{\textrm{rat}}_- \in \mathfrak {g}\otimes \mathbb {C}[\lambda _\infty ^{-1}] = \mathfrak {g}\otimes \mathbb {C}[\lambda ] \end{aligned}$$
(2.6b)

the pole part of \(X^\infty (\lambda _\infty )\). Note that the constant term in \(\lambda _\infty \) is included around infinity.

The Lie subalgebra \(\varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g}) \subset \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) is clearly maximally isotropic with respect to the bilinear form \(\langle \!\langle \cdot , \cdot \rangle \!\rangle _0\) defined in (2.2). Here we made use of the fact that the constant term was excluded from the Taylor series at infinity in the definition (2.4). It follows that the Lie algebra \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) decomposes as a direct sum of vector spaces

$$\begin{aligned} \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) = \varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g}) \dotplus \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g}) \end{aligned}$$
(2.7)

into complementary Lagrangian (i.e. maximal isotropic) Lie subalgebras. Let \(\pi ^{\textrm{rat}}_\pm \) denote the projections onto \(\varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g})\) and \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\), respectively, relative to (2.7).

Definition 2.2

(Rational r-matrix). Recall the notation \(P_{12}\) for the tensor Casimir of \(\mathfrak {g}\) from Sect. 2.1. The rational r-matrix is defined as the following rational function of the formal variables \(\lambda \) and \(\mu \):

$$\begin{aligned} r_{12}^\textrm{rat}(\lambda ,\mu )=\frac{P_{12}}{\mu -\lambda }. \end{aligned}$$
(2.8)

As is well-known, it is skew-symmetric: \(r_{12}^\textrm{rat}(\lambda ,\mu )=-r_{21}^{\textrm{rat}}(\mu ,\lambda )\). The following result shows that its known connection to projectors associated to the decomposition of a Lie algebra into isotropic Lie subalgebras extends to the present adèles setting.

Proposition 2.3

For any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), its projections onto the complementary subalgebras \(\varvec{\mathcal {A}}^\textrm{rat}_{\varvec{\lambda }}(\mathfrak {g})\) and \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) relative to the direct sum decomposition (2.7) are given respectively by \(\pi ^{\textrm{rat}}_+ \varvec{X}(\varvec{\lambda }) = \big ( (\pi ^{\textrm{rat}}_+ X)^a(\lambda _a) \big )_{a \in \mathbb {C}P^1}\) and \(\pi ^\textrm{rat}_- \varvec{X}(\varvec{\lambda }) = \big ( (\pi ^{\textrm{rat}}_- X)^a(\lambda _a) \big )_{a \in \mathbb {C}P^1}\) where

$$\begin{aligned} (\pi ^{\textrm{rat}}_+ X)^a(\lambda _a)&= \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b {{\,\textrm{Tr}\,}}_2 \bigg ( \iota _{\mu _b} \iota _{\lambda _a} r_{12}^\textrm{rat}(\lambda ,\mu ) X^b(\mu _b)_2 \bigg ) d\mu , \end{aligned}$$
(2.9a)
$$\begin{aligned} (\pi ^{\textrm{rat}}_- X)^a(\lambda _a)&= - \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b {{\,\textrm{Tr}\,}}_2 \bigg ( \iota _{\lambda _a} \iota _{\mu _b} r_{12}^{\textrm{rat}}(\lambda ,\mu ) X^b(\mu _b)_2 \bigg ) d\mu . \end{aligned}$$
(2.9b)

Proof

Let \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\). We consider, to begin with, its projection onto \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\). The \(\mathfrak {g}\)-valued rational function in \(R_\lambda (\mathfrak {g})\) constructed out of the pole parts of the collection of Laurent series in \(\varvec{X}(\varvec{\lambda })\) is given by

$$\begin{aligned}&\sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b {{\,\textrm{Tr}\,}}_2 \bigg ( \iota _{\mu _b} \frac{P_{12}}{\lambda - \mu } X^b(\mu _b)_2 \bigg ) d\mu = \sum _{b \in \mathbb {C}} \sum _{n = 0}^\infty {{\,\textrm{res}\,}}^\mu _b \frac{\mu _b^n}{\lambda _b^{n+1}} X^b(\mu _b) d\mu \\&\quad - \sum _{n = 0}^\infty {{\,\textrm{res}\,}}^\mu _\infty \frac{\lambda ^n}{\mu ^{n+1}} X^\infty (\mu _\infty ) d\mu = \sum _{b \in \mathbb {C}P^1} X^b(\lambda _b)^{\textrm{rat}}_- \end{aligned}$$

where in the first equality we took the trace and split the term at \(b=\infty \) from the rest of the sum over \(b \in \mathbb {C}P^1\). The expression (2.9b) is then obtained by taking the Laurent series expansion of this rational function at each \(a \in \mathbb {C}P^1\), corresponding to applying the map (1.13).

Consider now the projection of \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) onto \(\varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g})\). Note that for any \(a \in \mathbb {C}\) we have

$$\begin{aligned}&\sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b {{\,\textrm{Tr}\,}}_2 \bigg ( \iota _{\mu _b} \iota _{\lambda _a} \frac{P_{12}}{\mu - \lambda } X^b(\mu _b)_2 \bigg ) d\mu = \sum _{n=0}^\infty \lambda _a^n \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b \iota _{\mu _b} \mu _a^{-n-1} X^b(\mu _b) d\mu . \end{aligned}$$

If \(\varvec{X}(\varvec{\lambda }) \in \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\), say \(\varvec{X}(\varvec{\lambda }) = \varvec{\iota }_{\varvec{\lambda }} f(\lambda )\) for some \(f(\lambda ) \in R_\lambda (\mathfrak {g})\), then the above vanishes at each order in the \(\lambda _a\)-expansion by the residue theorem. Indeed, the coefficient of \(\lambda _a^n\) is given by the sum of all the residues of \((\mu - a)^{-n-1} f(\mu ) d\mu \). On the other hand, if \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g})\) then the only term contributing to the sum over \(b \in \mathbb {C}P^1\) is the term for \(b=a\) which is equal to \(X^a(\lambda _a)\). The same statements hold for \(a = \infty \) and hence the result follows. \(\square \)

Define the linear operator \(r^{\textrm{rat}} {:}{=}\pi ^{\textrm{rat}}_+ - \pi ^{\textrm{rat}}_-\). It follows from Proposition 2.3 that its kernel is given by

$$\begin{aligned} \bigg ( (\iota _{\mu _b} \iota _{\lambda _a} + \iota _{\lambda _a} \iota _{\mu _b}) \frac{P_{12}}{\mu - \lambda } \bigg )_{a, b \in \mathbb {C}P^1}. \end{aligned}$$
(2.10)

The kernel of the identity operator \(id = \pi ^{\textrm{rat}}_+ + \pi ^{\textrm{rat}}_-\) is similarly given by an expansion of zero (see e.g. [LL, Chap. 2]) since

$$\begin{aligned} \bigg ( (\iota _{\mu _b} \iota _{\lambda _a} - \iota _{\lambda _a} \iota _{\mu _b}) \frac{P_{12}}{\mu - \lambda } d\mu \bigg )_{a, b \in \mathbb {C}P^1} = \big ( P_{12} \delta _{ab} \delta (\lambda _a, \mu _a) d\mu _a \big )_{a, b \in \mathbb {C}P^1} \end{aligned}$$
(2.11)

where we defined

$$\begin{aligned} \delta (\lambda _a, \mu _a) {:}{=}\sum _{n \in \mathbb {Z}} \lambda _a^n \mu _a^{-n-1}. \end{aligned}$$
(2.12)

Lemma 2.4

Let \(\varvec{X}(\varvec{\mu }) = \big ( X^a(\mu _a) \big )_{a \in \mathbb {C}P^1} \in \varvec{\mathcal {A}}_{\varvec{\mu }}(\mathfrak {g})\) with \(\displaystyle X^a(\mu _a) = \sum _{n=-N_a}^\infty X^a_n \mu _a^n\) for some \(N_a \in \mathbb {Z}\), where \(N_a > 0\) for finitely many \(a \in \mathbb {C}P^1\). For any \(a \in \mathbb {C}\) we have

$$\begin{aligned} \iota _{\mu _a} \frac{X^a(\mu _a)}{\mu - \lambda } = - \sum _{r=-N_a}^\infty \mu _a^r \big ( \lambda _a^{-r-1} X^a(\lambda _a) \big )^{\textrm{rat}}_- \end{aligned}$$

and at infinity we have

$$\begin{aligned} \iota _{\mu _\infty } \frac{X^\infty (\mu _\infty )}{\mu - \lambda } = \sum _{r=-N_\infty }^\infty \mu _\infty ^{r+1} \big ( \lambda _\infty ^{-r} X^\infty (\lambda _\infty ) \big )^{\textrm{rat}}_-. \end{aligned}$$

Proof

First, let \(a \in \mathbb {C}\). Then we have

$$\begin{aligned} \iota _{\mu _a} \frac{X^a(\mu _a)}{\mu - \lambda }&= - \sum _{n= - N_a}^\infty X^a_n \mu _a^n \sum _{s=0}^\infty \mu _a^s \lambda _a^{-s-1} = - \sum _{n=-N_a}^\infty \sum _{r=n}^\infty X^a_n \mu _a^r \lambda _a^{n-r-1}\\&= - \sum _{r=-N_a}^\infty \mu _a^r \sum _{n=-N_a}^r X^a_n \lambda _a^{n-r-1} = - \sum _{r=-N_a}^\infty \mu _a^r \big ( \lambda _a^{-r-1} X^a(\lambda _a) \big )^{\textrm{rat}}_- \end{aligned}$$

where in the second equality we changed variables from s to \(r = s+n\) in the second sum and in the second line we changed the order of the sums.

Consider now the point at infinity. We have

$$\begin{aligned} \iota _{\mu _\infty } \frac{X^\infty (\mu _\infty )}{\mu - \lambda }&= \sum _{n=-N_\infty }^\infty X^\infty _n \mu _\infty ^n \sum _{s=0}^\infty \mu _\infty ^{s+1} \lambda _\infty ^{-s} = \sum _{n=-N_\infty }^\infty \sum _{r=n}^\infty X^\infty _n \mu _\infty ^{r+1} \lambda _\infty ^{n-r}\\&= \sum _{r=-N_\infty }^\infty \mu _\infty ^{r+1} \sum _{n=-N_\infty }^r X^\infty _n \lambda _\infty ^{n-r} = \sum _{r=-N_\infty }^\infty \mu _\infty ^{r+1} \big ( \lambda _\infty ^{-r} X^\infty (\lambda _\infty ) \big )^{\textrm{rat}}_- \end{aligned}$$

where in the second equality we changed variables \(s = r-n\) as before and in the second line we changed the order of the sums. \(\square \)

2.3 Trigonometric r-matrix

Throughout this section we will choose \(k=-1\) in the bilinear form (2.2). We shall also make use of the standard nilpotent subalgebras \(\mathfrak {n}_\pm \) and Borel subalgebras \(\mathfrak {b}_\pm \) of \(\mathfrak {g}\). Explicitly, \(\mathfrak {n}_+\) (resp. \(\mathfrak {n}_-\)) is spanned by \(E_{ij}\) for \(i < j\) (resp. \(i > j\)). In the \(\mathfrak {gl}_N\) case \(\mathfrak {b}_+\) (resp. \(\mathfrak {b}_-\)) is spanned by \(E_{ij}\) for \(i \le j\) (resp. \(i \ge j\)) while in the \(\mathfrak {sl}_N\) case \(\mathfrak {b}_+\) (resp. \(\mathfrak {b}_-\)) is spanned by \(E_{ij}\) for \(i < j\) (resp. \(i > j\)) together with \(E_{ii} - E_{jj}\) for \(i < j\). The Cartan subalgebra \(\mathfrak {h}\) is spanned by \(E_{ii}\) for \(i = 1, \ldots , N\) in the \(\mathfrak {gl}_N\) case and by \(E_{ii} - E_{jj}\) for \(i < j\) in the \(\mathfrak {sl}_N\) case. We have the direct sum decompositions \(\mathfrak {b}_\pm = \mathfrak {h}\oplus \mathfrak {n}_\pm \). We shall also make use of the corresponding subgroups \(N_\pm \), \(B_\pm \) and H in G. For \(GL_N\) these are the groups of unipotent upper/lower-triangular \(N \times N\) matrices, invertible upper/lower-triangular \(N \times N\) matrices and invertible diagonal \(N \times N\) matrices, respectively. For \(SL_N\) we add the condition that the matrices are of determinant 1.

Recall the notation \(P_{12}\) for the tensor Casimir of \(\mathfrak {g}\) from Sect. 2.1. We can split this into three parts as \(P_{12} = P^-_{12} + P^0_{12} + P^+_{12}\) where \(P^\pm _{12} \in \mathfrak {n}_\pm \otimes \mathfrak {n}_\mp \) and \(P^0_{12} \in \mathfrak {h}\otimes \mathfrak {h}\). Explicitly, in the \(\mathfrak {gl}_N\) case these read

$$\begin{aligned} P^+_{12} = \sum _{\begin{array}{c} i,j=1\\ i< j \end{array}}^N E_{ij} \otimes E_{ji}, \quad P^0_{12} = \sum _{i=1}^N E_{ii} \otimes E_{ii}, \quad P^-_{12} = \sum _{\begin{array}{c} i,j=1\\ i < j \end{array}}^N E_{ji} \otimes E_{ij}. \end{aligned}$$

For \(\mathfrak {sl}_N\) the expression for \(P^0_{12}\) is given in terms of dual bases \(\{ u^i \}\) and \(\{ u_i \}\) of the Cartan subalgebra \(\mathfrak {h}\) with respect to the trace bilinear form as \(P^0_{12} = \sum _{i=1}^{N-1} u^i \otimes u_i\). We note that \(P^+_{21} = P^-_{12}\), \(P^0_{21} = P^0_{12}\) and \(P_{21} = P_{12}\). We also define the corresponding projectors \(P^\pm : \mathfrak {g}\rightarrow \mathfrak {n}_\pm \) and \(P^0: \mathfrak {g}\rightarrow \mathfrak {h}\) onto the nilpotent Lie subalgebras \(\mathfrak {n}_\pm \) and the Cartan subalgebra \(\mathfrak {h}\), respectively, given for any \(X \in \mathfrak {g}\) as

$$\begin{aligned} P^\pm X {:}{=}{{\,\textrm{Tr}\,}}_2(P^\pm _{12} X_2), \quad P^0 X {:}{=}{{\,\textrm{Tr}\,}}_2(P^0_{12} X_2), \end{aligned}$$

so that \(id _{\mathfrak {g}} = P^- + P^0 + P^+\).

In the trigonometric setting, the role of the Lie subalgebra \(\varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g}) \subset \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) in (2.4) will be played by the following alternative Lie subalgebra

$$\begin{aligned} \varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g}) \,{:}{=}\,\varvec{\mathcal B}_{\varvec{\lambda }}^{0, \infty }(\mathfrak {g}) \times \coprod _{a \in \mathbb {C}^\times } \mathfrak {g}\otimes \mathbb {C}\llbracket \lambda _a \rrbracket \end{aligned}$$
(2.13)

where \(\mathbb {C}^\times {:}{=}\mathbb {C}{\setminus } \{0\}\) and

$$\begin{aligned} \varvec{\mathcal B}_{\varvec{\lambda }}^{0, \infty }(\mathfrak {g}) \subset \big ( \mathfrak {b}_+ \oplus \mathfrak {g}\otimes \lambda \mathbb {C}\llbracket \lambda \rrbracket \big ) \times \big ( \mathfrak {b}_- \oplus \mathfrak {g}\otimes \lambda _\infty \mathbb {C}\llbracket \lambda _\infty \rrbracket \big ) \end{aligned}$$

is the Lie subalgebra consisting of pairs of Taylor series \(X^0(\lambda ) = \sum _{n=0}^\infty X^0_n \lambda ^n\) and \(X^\infty (\lambda _\infty ) = \sum _{n=0}^\infty X^\infty _n \lambda _\infty ^n\) with \(X^0_n, X^\infty _n \in \mathfrak {g}\) for all \(n \ge 1\) but with \(X^0_0 \in \mathfrak {b}_+\) and \(X^\infty _0 \in \mathfrak {b}_-\) subject to the constraint \(P^0 X^0_0 = - P^0 X^\infty _0\). We shall also need the corresponding group \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(G)\) defined as follows.

In the \(GL_N\) case we let \({\widehat{B}}_+\) denote the group of all invertible \(N \times N\) matrices with entries below the diagonal in \(\lambda \mathbb {C}\llbracket \lambda \rrbracket \) and entries on or above the diagonal in \(\mathbb {C}\llbracket \lambda \rrbracket \). Likewise, we let \({\widehat{B}}_-\) be the group of all invertible \(N \times N\) matrices with entries on or below the diagonal in \(\mathbb {C}\llbracket \lambda _\infty \rrbracket \) and entries above the diagonal in \(\lambda _\infty \mathbb {C}\llbracket \lambda _\infty \rrbracket \). Concretely, an element of \({\widehat{B}}_+\) can be expanded as a Taylor series \(g(\lambda ) = \sum _{n=0}^\infty g_n \lambda ^n\) with \(g_0\) upper triangular and \(g_n \in \mathfrak {gl}_N\) for \(n \ge 1\), while an element of \({\widehat{B}}_-\) is a Taylor series \(h(\lambda _\infty ) = \sum _{n=0}^\infty h_n \lambda _\infty ^n\) with \(h_0\) lower triangular and \(h_n \in \mathfrak {gl}_N\) for \(n \ge 1\). As usual, in the \(SL_N\) case we define the subgroups \({\widehat{B}}_\pm \) as in the \(GL_N\) case but with the added condition that the matrices are of determinant 1. We then set

$$\begin{aligned} \varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(G)\, {:}{=}\,\varvec{\mathcal B}_{\varvec{\lambda }}^{0, \infty }(G) \times \coprod _{a \in \mathbb {C}} {\widehat{G}}_a \end{aligned}$$
(2.14)

where the first factor is the subgroup \(\varvec{\mathcal B}_{\varvec{\lambda }}^{0, \infty }(GL_N) \subset {\widehat{B}}_+ \times {\widehat{B}}_-\) consisting of pairs of Taylor series \(g^0(\lambda ) = \sum _{n=0}^\infty g^0_n \lambda ^n\) and \(g^\infty (\lambda _\infty ) = \sum _{n=0}^\infty g^\infty _n \lambda _\infty ^n\) with \(g^0_n, g^\infty _n \in \mathfrak {gl}_N\) for all \(n \ge 1\) but where the upper triangular matrix \(g^0_0\) and the lower triangular matrix \(g^\infty _0\) are subject to the constraint \(P^0\,g^0_0 = (P^0\,g^\infty _0)^{-1}\).

Note that for consistency we should really keep denoting the local coordinate at the origin as \(\lambda _0\), following the general notation introduced in Sect. 2.1. However, since \(\lambda _0\) is nothing but \(\lambda \), we will most often prefer to write the local coordinate at the origin simply as \(\lambda \), rather than use the more cumbersome notation \(\lambda _0\).

It will be convenient in what follows to introduce slightly different notions of pole parts of Laurent series at the origin and infinity in the trigonometric case. As they are important in practical calculations, we gather them in the following definition.

Definition 2.5

Given any \(\displaystyle X^0(\lambda ) = \sum _{n=-N_0}^\infty X^0_n \lambda ^n \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda )\hspace{-1.99997pt})\) we define

$$\begin{aligned} X^0(\lambda )^{\textrm{trig}}_- \,{:}{=}\,\big ( P^- + \tfrac{1}{2} P^0 \big ) X^0_0 + X^0(\lambda )^{\textrm{rat}}_- \in \mathfrak {b}_- \oplus \mathfrak {g}\otimes \lambda ^{-1} \mathbb {C}[\lambda ^{-1}]. \end{aligned}$$
(2.15a)

Similarly, for any \(\displaystyle X^\infty (\lambda ^{-1}) = \sum _{n=-N_\infty }^\infty X^\infty _n \lambda ^{-n} \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda ^{-1} )\hspace{-1.99997pt})\) we define

$$\begin{aligned} X^\infty (\lambda _\infty )^{\textrm{trig}}_- \,{:}{=}\, \big ( P^+ + \tfrac{1}{2} P^0 \big ) X^\infty _0 + \sum _{n=-N_\infty }^{-1} X^\infty _n \lambda _\infty ^n \in \mathfrak {b}_+ \oplus \mathfrak {g}\otimes \lambda _\infty ^{-1} \mathbb {C}[\lambda _\infty ^{-1}]. \end{aligned}$$
(2.15b)

Furthermore, for a Laurent series \(\displaystyle X^b(\lambda _b) = \sum _{n=-N_b}^\infty X^b_n \lambda ^n_b \in \mathfrak {g}\otimes \mathbb {C}(\hspace{-1.99997pt}( \lambda _b )\hspace{-1.99997pt})\) at any other point \(b \in \mathbb {C}^\times \) we set

$$\begin{aligned} X^b(\lambda _b)^{\textrm{trig}}_- \,{:}{=}\, - \big ( P^- + \tfrac{1}{2} P^0 \big ) X^b(-b)^{\textrm{rat}}_- + X^b(\lambda _b)^{\textrm{rat}}_- \in \mathfrak {b}_- \oplus \mathfrak {g}\otimes \lambda _b^{-1} \mathbb {C}[\lambda _b^{-1}],\nonumber \\ \end{aligned}$$
(2.15c)

where in the first term \(X^b(-b)^{\textrm{rat}}_-\) is the pole part \(X^b(\lambda _b)^{\textrm{rat}}_-\) at b evaluated at \(\lambda = 0\). In particular, as compared to the pole part \(X^b(\lambda _b)^{\textrm{rat}}_- \in \mathfrak {g}\otimes \lambda _b^{-1} \mathbb {C}[\lambda _b^{-1}]\) introduced in (2.6a), we note that the pole part \(X^b(\lambda _b)^{\textrm{trig}}_-\) includes a constant term (provided that \(( P^- + \tfrac{1}{2} P^0) X^b(-b)^{\textrm{rat}}_- \ne 0\)) which, moreover, is valued in \(\mathfrak {b}_-\).

Proposition 2.6

The Lie subalgebra \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g}) \subset \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) is maximally isotropic with respect to \(\langle \!\langle \cdot , \cdot \rangle \!\rangle _{-1}\). Moreover, we have a direct sum of vector spaces

$$\begin{aligned} \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) = \varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g}) \dotplus \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g}) \end{aligned}$$
(2.16)

into complementary Lagrangian (i.e. maximal isotropic) Lie subalgebras.

Proof

To see that \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\) is isotropic with respect to the bilinear form \(\langle \!\langle \cdot , \cdot \rangle \!\rangle _{-1}\), let \(\varvec{X}(\varvec{\lambda }), \varvec{Y}(\varvec{\lambda }) \in \varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\) be arbitrary and consider the pairing \(\langle \!\langle \varvec{X}(\varvec{\lambda }), \varvec{Y}(\varvec{\lambda }) \rangle \!\rangle _{-1}\) as given in (2.2b). There are no contributions from any \(a \in \mathbb {C}^\times \). The only contributions come from 0 and \(\infty \), which read

$$\begin{aligned}&{{\,\textrm{res}\,}}^\lambda _0 {{\,\textrm{Tr}\,}}\big ( X^0(\lambda ) Y^0(\lambda ) \big ) \lambda ^{-1} d\lambda + {{\,\textrm{res}\,}}^\lambda _\infty {{\,\textrm{Tr}\,}}\big ( X^\infty (\lambda _\infty ) Y^\infty (\lambda _\infty ) \big ) \lambda ^{-1} d\lambda \\&\quad = {{\,\textrm{Tr}\,}}(X^0_0 Y^0_0) - {{\,\textrm{Tr}\,}}(X^\infty _0 Y^\infty _0)\\&\quad = {{\,\textrm{Tr}\,}}\big ( P^0(X^0_0) P^0(Y^0_0) \big ) - {{\,\textrm{Tr}\,}}\big ( P^0(X^\infty _0) P^0(Y^\infty _0) \big ) = 0. \end{aligned}$$

In the first equality we wrote \(X^0(\lambda ) = \sum _{n=0}^\infty X^0_n \lambda ^n\), \(X^\infty (\lambda _\infty ) = \sum _{n=0}^\infty X^\infty _n \lambda _\infty ^n\) and similarly for \(Y^0(\lambda )\) and \(Y^\infty (\lambda _\infty )\). The second equality above follows from the fact that \(X^0_0, Y^0_0 \in \mathfrak {b}_+\) and \(X^\infty _0, Y^\infty _0 \in \mathfrak {b}_-\) and the last step makes use of the conditions in the definition of \(\varvec{\mathcal B}_{\varvec{\lambda }}^{0, \infty }(\mathfrak {g})\) that \(P^0 X^0_0 = - P^0 X^\infty _0\) and \(P^0 Y^0_0 = - P^0 Y^\infty _0\). In order to show that \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\) is maximally isotropic it suffices to prove the second statement, namely that we have the direct sum decomposition of vector spaces as in (2.16).

To any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) we associate the rational function

$$\begin{aligned} f_X(\lambda ) = X^0(\lambda )^{\textrm{trig}}_- + \sum _{b \in \mathbb {C}^\times } X^b(\lambda _b)^{\textrm{trig}}_- + X^\infty (\lambda _\infty )^{\textrm{trig}}_- \end{aligned}$$
(2.17)

in \(R_\lambda (\mathfrak {g})\). Consider the element \(\widetilde{\varvec{X}}(\varvec{\lambda }) = ({{\widetilde{X}}}^a(\lambda _a))_{a \in \mathbb {C}P^1}\) defined by

$$\begin{aligned} {{\widetilde{X}}}^a(\lambda _a) \,{:}{=}\, X^a(\lambda _a) - \iota _{\lambda _a} f_X(\lambda ) \end{aligned}$$

for every \(a \in \mathbb {C}P^1\). We have \({{\widetilde{X}}}^a(\lambda _a) \in \mathfrak {g}\otimes \mathbb {C}\llbracket \lambda _a \rrbracket \) for every \(a \in \mathbb {C}P^1\). But more precisely, noting that

$$\begin{aligned} X^b(\lambda _b)^{\textrm{trig}}_- \big |_{\lambda = 0} = \big ( P^+ + \tfrac{1}{2} P^0 \big ) X^b(-b)^{\textrm{rat}}_-, \quad X^\infty (\lambda _\infty )^{\textrm{trig}}_- \big |_{\lambda = 0} = \big ( P^+ + \tfrac{1}{2} P^0 \big ) X^\infty _0 \end{aligned}$$

for every \(b \in \mathbb {C}^\times \), we have, in fact, \(\widetilde{X}^0(\lambda _0) \in \mathfrak {b}_+ \oplus \mathfrak {g}\otimes \lambda \mathbb {C}\llbracket \lambda \rrbracket \) whose leading term in \(\mathfrak {b}_+\) is given by

$$\begin{aligned} \big ( P^+ + \tfrac{1}{2} P^0 \big ) \big ( X^0_0 - X^\infty _0 - X^b(-b)^{\textrm{rat}}_- \big ) \in \mathfrak {b}_+. \end{aligned}$$
(2.18)

Likewise, we have

$$\begin{aligned} X^b(\lambda _b)^{\textrm{trig}}_- \big |_{\lambda = \infty }&= - \big ( P^- + \tfrac{1}{2} P^0 \big ) X^b(-b)^{\textrm{rat}}_-, \\ X^0(\lambda )^{\textrm{trig}}_- \big |_{\lambda = \infty }&= \big ( P^- + \tfrac{1}{2} P^0 \big ) X^0_0 \end{aligned}$$

from which it follows that \({{\widetilde{X}}}^\infty (\lambda _\infty ) \in \mathfrak {b}_- \oplus \mathfrak {g}\otimes \lambda _\infty \mathbb {C}\llbracket \lambda _\infty \rrbracket \) with leading coefficient in \(\mathfrak {b}_-\) given by

$$\begin{aligned} \big ( P^- + \tfrac{1}{2} P^0 \big ) \big ( - X^0_0 + X^\infty _0 + X^b(-b)^{\textrm{rat}}_- \big ) \in \mathfrak {b}_-. \end{aligned}$$
(2.19)

Moreover, comparing the Cartan components of (2.18) and (2.19) we see that these are opposite. Hence we conclude that \(\widetilde{\varvec{X}}(\varvec{\lambda }) \in \varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\). In other words,

$$\begin{aligned} \varvec{X}(\varvec{\lambda }) = \widetilde{\varvec{X}}(\varvec{\lambda }) + \varvec{\iota }_{\varvec{\lambda }} f_X(\lambda ) \end{aligned}$$

gives the desired decomposition of a general element \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) as in (2.16).

This decomposition is unique since any element which belongs to both \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\) and \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) must vanish. Indeed, suppose \(f(\lambda ) \in R_\lambda (\mathfrak {g})\) is such that \(\varvec{\iota }_{\varvec{\lambda }} f(\lambda ) \in \varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\). Then it is clear from the definition of \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\) in (2.13) that \(f(\lambda )\) cannot be singular at any point in \(\mathbb {C}P^1\) and so must be constant. But then it follows from the definition of \(\varvec{\mathcal B}_{\varvec{\lambda }}^{0, \infty }(\mathfrak {g})\) that this constant must in fact be zero. \(\square \)

Definition 2.7

(Trigonometric r-matrix). The trigonometric r-matrix is defined as the following function of two formal variables \(\lambda \) and \(\mu \):

$$\begin{aligned} r^{\textrm{trig}}_{12}(\lambda , \mu ) = \frac{1}{2} \bigg ( P^+_{12} - P^-_{12} + \frac{\mu + \lambda }{\mu - \lambda } P_{12} \bigg ) = \frac{\mu P_{12}}{\mu - \lambda } - P^-_{12} - \tfrac{1}{2} P^0_{12}. \end{aligned}$$
(2.20)

It is skew-symmetric, namely we have \(r^{\textrm{trig}}_{21}(\mu , \lambda ) = - r^{\textrm{trig}}_{12}(\lambda , \mu )\). It provides the trigonometric counterpart of the kernel (2.10) for the choice of complement (2.13). Indeed, we have the following analogue of Proposition 2.3 in the trigonometric case.

Proposition 2.8

For any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), its projections onto the complementary subalgebras \(\varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(\mathfrak {g})\) and \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) relative to the direct sum decomposition (2.16) are given respectively by \(\pi ^{\textrm{trig}}_+ \varvec{X}(\varvec{\lambda }){=}\big ( (\pi ^{\textrm{trig}}_+ X)^a(\lambda _a) \big )_{a \in \mathbb {C}P^1}\) and \(\pi ^{\textrm{trig}}_- \varvec{X}(\varvec{\lambda }){=}\big ( (\pi ^{\textrm{trig}}_- X)^a(\lambda _a) \big )_{a \in \mathbb {C}P^1}\) where

$$\begin{aligned} (\pi ^{\textrm{trig}}_+ X)^a(\lambda _a)&= \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b {{\,\textrm{Tr}\,}}_2 \big ( \iota _{\mu _b} \iota _{\lambda _a} r^{\textrm{trig}}_{12}(\lambda , \mu ) X^b(\mu _b)_2 \big ) \mu ^{-1} d\mu , \end{aligned}$$
(2.21a)
$$\begin{aligned} (\pi ^{\textrm{trig}}_- X)^a(\lambda _a)&= - \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b {{\,\textrm{Tr}\,}}_2 \big ( \iota _{\lambda _a} \iota _{\mu _b} r^{\textrm{trig}}_{12}(\lambda , \mu ) X^b(\mu _b)_2 \big ) \mu ^{-1} d\mu . \end{aligned}$$
(2.21b)

Proof

We first describe the image of \(\pi ^{\textrm{trig}}_-\) explicitly. Then we show that \(\pi ^{\textrm{trig}}_-\) sends \(\varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(\mathfrak {g})\) to zero and that it acts as the identity on \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\). Hence, \(\pi ^{\textrm{trig}}_-\) is the projection onto \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) along \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\). Similarly, we prove that \(\pi ^{\textrm{trig}}_+\) is the projection onto \(\varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(\mathfrak {g})\) along \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\).

Given any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), consider the \(\mathfrak {g}\)-valued rational function

$$\begin{aligned} f_X(\lambda )&= - \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b {{\,\textrm{Tr}\,}}_2 \big ( \iota _{\mu _b} r^{\textrm{trig}}_{12}(\lambda , \mu ) X^b(\mu _b)_2 \big ) \mu ^{-1} d\mu \\&= \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b \bigg ( \iota _{\mu _b} \mu ^{-1} \big ( P^- + \tfrac{1}{2} P^0\big )(X^b(\mu _b)) + \iota _{\mu _b} \frac{1}{\lambda - \mu } X^b(\mu _b) \bigg ) d\mu . \end{aligned}$$

We compute the residues at each \(b \in \mathbb {C}^\times \) and then at the origin and infinity. Firstly, for the residue at \(b \in \mathbb {C}^\times \) we find \(X^b(\lambda _b)^{\textrm{trig}}_-\). For the residue at the origin we find \(X^0(\lambda )^{\textrm{trig}}_-\) and, likewise, for the residue at infinity we find

$$\begin{aligned} - \big ( P^- + \tfrac{1}{2} P^0 \big ) X^\infty _0 + X^\infty (\lambda _\infty )^{\textrm{rat}}_- = X^\infty (\lambda _\infty )^\textrm{trig}_-, \end{aligned}$$

where in the first expression we are using the pole part \(X^\infty (\lambda _\infty )^{\textrm{rat}}_- \in \mathfrak {g}\otimes \mathbb {C}[\lambda _\infty ^{-1}]\) defined in (2.6b) of a Laurent series at infinity, and in the second expression we are using the other notion of pole part \(X^\infty (\lambda _\infty )^\textrm{trig}_-\) introduced above in (2.15b). Putting the above together we conclude that \(f_X(\lambda )\) is the rational function (2.17) used in the proof of Proposition 2.6. By construction we have \((\pi ^\textrm{trig}_- X)^a(\lambda _a) = \iota _{\lambda _a} f_X(\lambda )\) for every \(a \in \mathbb {C}P^1\).

Now suppose \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(\mathfrak {g})\). Clearly \(X^b(\lambda _b)^{\textrm{rat}}_- = 0\), hence also \(X^b(\lambda _b)^{\textrm{trig}}_- = 0\) using the definition (2.15c), so the sum over \(b \in \mathbb {C}^\times \) on the right hand side of (2.17) vanishes. On the other hand, \(X^0(\lambda )^{\textrm{trig}}_- = \tfrac{1}{2} P^0 X^0_0\) and \(X^\infty (\lambda _\infty )^{\textrm{trig}}_- = \tfrac{1}{2} P^0 X^\infty _0\). But since \(P^0 X^0_0 = - P^0 X^\infty _0\) by definition of \(\varvec{X}(\varvec{\lambda })\) belonging to \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\), it follows that the remaining two terms in (2.17) cancel. So we have shown that \(\pi ^{\textrm{trig}}_- \varvec{X}(\varvec{\lambda }) = 0\) for any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(\mathfrak {g})\).

On the other hand, suppose now that \(\varvec{X}(\varvec{\lambda }) = \varvec{\iota }_{\varvec{\lambda }} f(\lambda )\) for some \(f(\lambda ) \in R_\lambda (\mathfrak {g})\). If the latter has a pole at some \(a \in \mathbb {C}^\times \) then its pole part there is given by \(X^a(\lambda _a)^{\textrm{rat}}_-\). If it has a pole at the origin then its pole part there is equal to

$$\begin{aligned} X^0(\lambda )^{\textrm{trig}}_- - \sum _{b \in \mathbb {C}^\times } \big ( P^- + \tfrac{1}{2} P^0 \big ) X^b(-b)^{\textrm{rat}}_- - \big ( P^- + \tfrac{1}{2} P^0 \big ) X^\infty _0, \end{aligned}$$
(2.22)

where \(X^\infty _0\) is the constant term in the expansion of \(f(\lambda )\) at infinity. Indeed, recall from (2.15a) that \(X^0(\lambda )^{\textrm{trig}}_-\) is given by the pole part \(X^0(\lambda )^{\textrm{rat}}_-\) at the origin plus \((P^- + \tfrac{1}{2} P^0) X^0_0\) where \(X^0_0\) is given here by the value at the origin of all the other pole parts of \(f(\lambda )\). This is why we must subtract the latter from \(X^0(\lambda )^{\textrm{trig}}_-\) in (2.22) to be left only with the desired pole part at the origin. Finally, the pole part of \(f(\lambda )\) at infinity is given by

$$\begin{aligned} X^\infty (\lambda _\infty )^{\textrm{trig}}_- + \big ( P^- + \tfrac{1}{2} P^0 \big ) X^\infty _0. \end{aligned}$$
(2.23)

Indeed, the pole part at infinity should contain the constant term but \(X^\infty (\lambda _\infty )^{\textrm{trig}}_-\) only contains part of it. The remaining part is precisely the piece added in (2.23). It now follows that the expression on the right hand side of (2.17) built from \(\varvec{X}(\varvec{\lambda }) = \varvec{\iota }_{\varvec{\lambda }} f(\lambda )\) coincides exactly with the partial fraction decomposition of \(f(\lambda )\). This establishes that \(\pi ^{\textrm{trig}}_- \varvec{X}(\varvec{\lambda }) = \varvec{X}(\varvec{\lambda })\) for any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\iota }_{\varvec{\lambda }} f(\lambda )\). In other words, we have therefore shown that \(\pi ^{\textrm{trig}}_-\) is indeed the projection onto \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) along \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\).

It remains to consider \(\pi ^{\textrm{trig}}_+\). For any \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) and \(a \in \mathbb {C}\) we have

$$\begin{aligned} (\pi ^{\textrm{trig}}_+ X)^a(\lambda _a)&= - \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b \Big ( \iota _{\mu _b} \mu ^{-1} \big ( P^- + \tfrac{1}{2} P^0\big )(X^b(\mu _b)) \Big ) d\mu \nonumber \\&\qquad + \sum _{n=0}^\infty \lambda _a^n \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b \iota _{\mu _b} \mu _a^{-n-1} X^b(\mu _b) d\mu . \end{aligned}$$
(2.24)

If \(\varvec{X}(\varvec{\lambda }) \in \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) then the first term on the right hand side vanishes by the residue theorem and the second term likewise at each order in the \(\lambda _a\)-expansion. If instead we consider \(a = \infty \) then

$$\begin{aligned} (\pi ^{\textrm{trig}}_+ X)^\infty (\lambda _\infty )&= - \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b \Big ( \iota _{\mu _b} \mu ^{-1} \big ( P^- + \tfrac{1}{2} P^0\big )(X^b(\mu _b)) \Big ) d\mu \nonumber \\&\quad - \sum _{n=0}^\infty \lambda ^{-n-1} \sum _{b \in \mathbb {C}P^1} {{\,\textrm{res}\,}}^\mu _b \iota _{\mu _b} \mu ^n X^b(\mu _b) d\mu , \end{aligned}$$
(2.25)

but both terms vanish once again by the residue theorem if \(\varvec{X}(\varvec{\lambda }) \in \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\). So we deduce that \(\pi ^{\textrm{trig}}_+ \varvec{X}(\varvec{\lambda }) = 0\) for every \(\varvec{X}(\varvec{\lambda }) \in \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\).

Suppose now that \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(\mathfrak {g})\). The first term on the right hand side of (2.24) gets a contribution only from the terms \(b=0\) and \(b=\infty \), which read

$$\begin{aligned}&- {{\,\textrm{res}\,}}^\mu _0 \Big ( \mu ^{-1} \big ( P^- + \tfrac{1}{2} P^0\big )(X^0(\mu )) \Big ) d\mu - {{\,\textrm{res}\,}}^\mu _\infty \Big ( \mu ^{-1} \big ( P^- + \tfrac{1}{2} P^0\big )(X^\infty (\mu _\infty )) \Big ) d\mu \\&\quad = - \big ( P^- + \tfrac{1}{2} P^0\big ) X^0_0 + \big ( P^- + \tfrac{1}{2} P^0\big ) X^\infty _0 = \big ( P^- + P^0\big ) X^\infty _0 = X^\infty _0 \end{aligned}$$

where we wrote \(X^0(\mu ) = \sum _{n=0}^\infty X^0_n \mu ^n\) and \(X^\infty (\mu _\infty ) = \sum _{n=0}^\infty X^\infty _n \mu _\infty ^n\). The second equality above follows since by assumption we have \(X^0_0 \in \mathfrak {b}_+\) so that \(P^- X^0_0 = 0\) and also \(P^0 X^0_0 = - P^0 X^\infty _0\). The third equality also follows since by assumption \(X^\infty _0 \in \mathfrak {b}_-\). The second sum in (2.24) is just as in the rational case, however since the series at infinity now contains a constant term we get a contribution to the sum over \(b \in \mathbb {C}P^1\) from both \(b=a\) and \(b = \infty \), yielding \(X^a(\lambda _a) - X^\infty _0\). So in total, we deduce that \((\pi ^{\textrm{trig}}_+ X)^a(\lambda _a) = X^a(\lambda _a)\) for every \(a \in \mathbb {C}\).

Consider now the case \(a = \infty \). The first term on the right hand side of (2.25) is again equal to \(X^\infty _0\) while the second term gives \(\displaystyle \sum _{n=1}^\infty X^\infty _n \lambda _\infty ^n = X^\infty (\lambda _\infty ) - X^\infty _0\). Putting these together we deduce that \((\pi ^{\textrm{trig}}_+ X)^\infty (\lambda _\infty ) = X^\infty (\lambda _\infty )\). In conclusion, we have shown that \(\pi ^{\textrm{trig}}_+ \varvec{X}(\varvec{\lambda }) = \varvec{X}(\varvec{\lambda })\) for all \(\varvec{X}(\varvec{\lambda }) \in \varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\) so that \(\pi ^{\textrm{trig}}_+\) is the projection onto \(\varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(\mathfrak {g})\) along \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\), as claimed. \(\square \)

We can now define the linear operator \(r^{\textrm{trig}} {:}{=}\pi ^{\textrm{trig}}_+ - \pi ^{\textrm{trig}}_-\). It follows from Proposition 2.8 that its kernel reads

$$\begin{aligned} \big ( (\iota _{\mu _b} \iota _{\lambda _a} + \iota _{\lambda _a} \iota _{\mu _b}) r^{\textrm{trig}}_{12}(\lambda , \mu ) \big )_{a, b \in \mathbb {C}P^1}. \end{aligned}$$
(2.26)

Moreover, the kernel of the identity operator \(id = \pi ^\textrm{trig}_+ + \pi ^{\textrm{trig}}_-\) is similarly given by an expansion of zero since

$$\begin{aligned} \big ( (\iota _{\mu _b} \iota _{\lambda _a} - \iota _{\lambda _a} \iota _{\mu _b}) r^{\textrm{trig}}_{12}(\lambda , \mu ) \mu ^{-1} d\mu \big )_{a, b \in \mathbb {C}P^1} = \big ( P_{12} \delta _{ab} \delta (\lambda _a, \mu _a) d\mu _a \big )_{a, b \in \mathbb {C}P^1}\nonumber \\ \end{aligned}$$
(2.27)

using the same notation \(\delta _{ab}\) and \(\delta (\lambda , \mu )\) as in the rational case.

The following is the analogue of Lemma 2.4 in the trigonometric case.

Lemma 2.9

Let \(\varvec{X}(\varvec{\mu }) = \big ( X^a(\mu _a) \big )_{a \in \mathbb {C}P^1} \in \varvec{\mathcal {A}}_{\varvec{\mu }}(\mathfrak {g})\) with \(X^a(\mu _a){=}\sum _{n=-N_a}^\infty X^a_n \mu _a^n\) for some \(N_a \in \mathbb {Z}\), where \(N_a > 0\) for finitely many \(a \in \mathbb {C}P^1\). For any \(a \in \mathbb {C}\) we have

$$\begin{aligned} \iota _{\mu _a} {{\,\textrm{Tr}\,}}_2 \big ( r^{\textrm{trig}}_{12}(\lambda , \mu ) X^a(\mu _a)_2 \big ) = - \sum _{r=-N_a}^\infty \mu _a^r \big ( (\lambda _a^{-r} + a \lambda _a^{-r-1}) X^a(\lambda _a) \big )^\textrm{trig}_-, \end{aligned}$$

while at infinity we have

$$\begin{aligned} \iota _{\mu _\infty } {{\,\textrm{Tr}\,}}_2 \big ( r^{\textrm{trig}}_{12}(\lambda , \mu ) X^\infty (\mu _\infty )_2 \big ) = \sum _{r=-N_\infty }^\infty \mu _\infty ^r \big ( \lambda _\infty ^{-r} X^\infty (\lambda _\infty ) \big )^{\textrm{trig}}_-. \end{aligned}$$

Proof

First, let \(a \in \mathbb {C}^\times \). We have

$$\begin{aligned}&\iota _{\mu _a} {{\,\textrm{Tr}\,}}_2 \big ( r^{\textrm{trig}}_{12}(\lambda , \mu ) X^a(\mu _a)_2 \big ) = - \big ( P^- + \tfrac{1}{2} P^0\big )(X^a(\mu _a)) - \iota _{\mu _a} \frac{\mu }{\lambda - \mu } X^a(\mu _a)\\&\quad = - \sum _{r=-N_a}^\infty \mu _a^r (P^- + \tfrac{1}{2} P^0) X_r^a - \sum _{n=-N_a}^\infty \sum _{s=0}^\infty (\mu _a + a) \frac{\mu _a^{n+s}}{\lambda _a^{s+1}} X_n^a\\&\quad = - \sum _{r=-N_a}^\infty \mu _a^r \bigg ( (P^- + \tfrac{1}{2} P^0) X_r^a + \sum _{n=-N_a}^{r-1} \lambda _a^{n-r} X^a_n + a \sum _{n=-N_a}^r \lambda _a^{n-r-1} X^a_n \bigg ). \end{aligned}$$

In the third equality we split the double sum into two terms, containing \(\mu _a\) and a respectively from the first factor. We changed variable from s to \(r = s+n+1\) in the first and from s to \(r = s+n\) in the second, and then changed the order of the two double sums. It remains to note that

$$\begin{aligned} \sum _{n=-N_a}^{r-1} \lambda _a^{n-r} X^a_n + a \sum _{n=-N_a}^r \lambda _a^{n-r-1} X^a_n = \big ( (\lambda _a^{-r} + a \lambda _a^{-r-1}) X^a(\lambda _a) \big )^{\textrm{rat}}_- \end{aligned}$$

and that this is equal to \(- X^a_r\) when evaluated at \(\lambda = 0\). The result at \(a \in \mathbb {C}^\times \) now follows by definition (2.15c) of the pole part at a.

At the origin we have

$$\begin{aligned}&\iota _\mu {{\,\textrm{Tr}\,}}_2 \big ( r^{\textrm{trig}}_{12}(\lambda , \mu ) X^0(\mu )_2 \big ) = - \sum _{r=-N_0}^\infty \mu ^r (P^- + \tfrac{1}{2} P^0) X^0_r - \sum _{n=-N_0}^\infty \sum _{s=1}^\infty \mu ^{n+s} \lambda ^{-s} X^0_n\\&\quad = - \sum _{r=-N_0}^\infty \mu ^r \bigg ( (P^- + \tfrac{1}{2} P^0) X^0_r + \sum _{n=-N_0}^{r-1} \lambda ^{n-r} X^0_n \bigg ) = - \sum _{r=-N_0}^\infty \mu ^r \big ( \lambda ^{-r} X^0(\lambda ) \big )^{\textrm{trig}}_-. \end{aligned}$$

In the second equality we changed variable in the double sum from s to \(r = n+s\) and then changed the order of the two sums. We have also added the term \(r=-N_0\) in this double sum since this term vanishes due to the range in the sum over n being empty. The last equality uses the definition (2.15a). Note that the result at the origin coincides with the result obtained above for \(a \in \mathbb {C}^\times \) but taken at \(a=0\). This is not completely obvious since the definitions of the pole parts (2.15a) and (2.15c) at 0 and a generic point \(a \in \mathbb {C}^\times \) are different. Likewise, at infinity we have

$$\begin{aligned} \iota _{\mu _\infty } {{\,\textrm{Tr}\,}}_2 \big ( r^{\textrm{trig}}_{12}(\lambda , \mu ) X^\infty (\mu _\infty )_2 \big )= & {} - \sum _{r=-N_\infty }^\infty \mu _\infty ^r (P^- + \tfrac{1}{2} P^0) X^\infty _r + \sum _{n=-N_\infty }^\infty \sum _{s=0}^\infty \mu _\infty ^{n+s} \lambda _\infty ^{-s} X^\infty _n\\= & {} \sum _{r=-N_\infty }^\infty \mu _\infty ^r \bigg ( - (P^- + \tfrac{1}{2} P^0) X^\infty _r + \sum _{n=-N_\infty }^r \lambda _\infty ^{n-r} X^\infty _n \bigg )\\= & {} \sum _{r=-N_\infty }^\infty \mu _\infty ^r \big ( \lambda _\infty ^{-r} X^\infty (\lambda _\infty ) \big )^{\textrm{trig}}_-. \end{aligned}$$

In the second equality we changed variable in the double sum from s to \(r = n+s\) and then changed the order of the two sums. The last equality uses (2.15b). \(\square \)

3 Generating Lagrangian Multiform and CYBE

In this section we will treat uniformly both the rational and trigonometric cases discussed in Sect. 2.2 and Sect. 2.3, respectively. More precisely, we shall work with the Lie algebra of \(\mathfrak {g}\)-valued adèles \(\varvec{\mathcal {A}}_\lambda (\mathfrak {g})\) equipped with the bilinear form (2.2) with either \(k=0\) or \(k=-1\). The corresponding vector space direct sum decompositions (2.7) and (2.16) will be denoted by

$$\begin{aligned} \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) = \varvec{\mathcal {A}}^+_{\varvec{\lambda }}(\mathfrak {g}) \dotplus \varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g}) \end{aligned}$$

where \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(\mathfrak {g})\) stands for the rational Lie subalgebra \(\varvec{\mathcal {A}}^{\textrm{rat}}_{\varvec{\lambda }}(\mathfrak {g})\) when \(k=0\) and the trigonometric Lie subalgebra \(\varvec{\mathcal {A}}^{\textrm{trig}}_{\varvec{\lambda }}(\mathfrak {g})\) when \(k=-1\). Correspondingly, we shall use the common notation \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\) for the groups \(\varvec{\mathcal {A}}^\textrm{rat}_{\varvec{\lambda }}(G)\) in the rational case and \(\varvec{\mathcal {A}}^\textrm{trig}_{\varvec{\lambda }}(G)\) in the trigonometric case.

Given a general element \(\varvec{X}(\varvec{\lambda }) = (X^a(\lambda _a))_{a \in \mathbb {C}P^1} \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) of the Lie algebra of \(\mathfrak {g}\)-valued adèles, we will also denote by \(X^a(\lambda _a)_-\) the principal part of the formal Laurent series \(X^a(\lambda _a)\), which stands for \(X^a(\lambda _a)^{\textrm{rat}}_-\) in the rational case, see (2.1), or for \(X^a(\lambda _a)^{\textrm{trig}}_-\) in the trigonometric case, see (2.5).

3.1 Dynamical equations

We will describe integrable field theories by taking the point of view that the spatial coordinate x, on which all the Hamiltonian fields are usually taken to depend, should be treated on an equal footing to all the other times in the hierarchy. To explain this new perspective on integrable hierarchies it is useful to begin by recalling the traditional point of view.

The dynamical equations of different integrable field theories in the same hierarchy are usually described as zero curvature equations

$$\begin{aligned} \partial _x V^a_n(\lambda ) - \partial _{t^a_n} U(\lambda ) + [V^a_n(\lambda ), U(\lambda )] = 0 \end{aligned}$$
(3.1)

where the Lax matrix \(U(\lambda ) \in R_\lambda (\mathfrak {g})\) is a coadjoint orbit of \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\) in \(R_\lambda (\mathfrak {g})\) which encodes the finite collection of fields of the hierarchy. The \(V^a_n(\lambda ) \in R_\lambda (\mathfrak {g})\), associated to the times \(t^a_n\) for some labels \(a \in \mathbb {C}\) and \(n \in \mathbb {Z}\) to be specified below and which we also refer to as Lax matrices, are coadjoint orbits of \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\) in \(R_\lambda (\mathfrak {g})\) built out of differential polynomials in the fields. From this traditional point of view, (3.1) represents a set of equations which is seen as a natural extension of the Lax equations \(\partial _{t^a_n} L(\lambda ) = [M^a_n(\lambda ), L(\lambda )]\), used to describe finite-dimensional systems, to the field theory case where every degree of freedom now depends on x. In particular, \(U(\lambda )\) is usually treated as the fundamental object since the \(V^a_n(\lambda )\) can all be built out of it and as such it is seen as the natural analogue of the Lax matrix \(L(\lambda )\) in the field theory case.

The crucial point is that the particular flow \(\partial _x\) can, and from our point of view should, be thought of as a linear combination of some of the elementary time flows \(\partial _{t^a_n}\). But if we are to treat the coadjoint orbit \(U(\lambda ) \in R_\lambda (\mathfrak {g})\) on an equal footing to all the other coadjoint orbits \(V^a_n(\lambda ) \in R_\lambda (\mathfrak {g})\) then we should also abandon the idea that each \(V^a_n(\lambda )\) is parametrised by differential polynomials with respect to x of the finite collection of fields contained in \(U(\lambda )\). Instead, we should treat all the coadjoint orbits \(V^a_n(\lambda ) \in R_\lambda (\mathfrak {g})\) as truly independent. We shall see, in a sense which is much closer in spirit to the Lax formalism for finite-dimensional systems, that all the Lax matrices \(V^a_n(\lambda )\) can be derived from a single object \(\varvec{Q}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), a certain adjoint orbit of \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\) in the full space of adèles \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\). In particular, the latter will satisfy a Lax equation (see (3.16) below)

$$\begin{aligned} \partial _{t^a_n} \varvec{Q}(\varvec{\lambda }) = [ \varvec{\iota }_{\varvec{\lambda }} V^a_n(\lambda ), \varvec{Q}(\varvec{\lambda }) ] \end{aligned}$$

with respect to all the times \(t^a_n\). As such, in our approach to hierarchies of integrable field theories, \(\varvec{Q}(\varvec{\lambda })\) will play a very similar role to that of the usual Lax matrix \(L(\lambda )\) for finite-dimensional systems. For us, the fundamental object will therefore be \(\varvec{Q}(\varvec{\lambda })\) rather than \(U(\lambda )\). The relationship between these two objects, and in particular the connection between our approach to hierarchies of integrable field theories and the usual one recalled above, comes from fixing a particular linear combination of time flows as our choice of spatial derivative \(\partial _x\). We discuss this in detail in Sect. 3.1.4, together with what we call the FNR procedure.

Since there is a close parallel between our treatment of integrable field theories and various familiar constructions in the theory of finite-dimensional integrable systems, we will draw the comparison throughout this section in a series of remarks.

3.1.1 Adjoint orbit

Let \(\varvec{\phi }(\varvec{\lambda }) = (\phi ^a(\lambda _a))_{a \in \mathbb {C}P^1} \in \varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\). We regard the entries of the matrix coefficients in the expansions

$$\begin{aligned} \phi ^a(\lambda _a) = \sum _{n=0}^\infty \phi ^a_n \lambda _a^n \end{aligned}$$

for all \(a \in \mathbb {C}P^1\) as an infinite collection of dynamical variables. In general, these are not all independent. For instance, \(\phi ^a(\lambda _a)\) should be invertible in the \(GL_N\) case, which means that the first term \(\phi ^a_0\) should be invertible, or \(\phi ^a(\lambda _a)\) should have determinant 1 in the \(SL_N\) case which will impose non-trivial relations between the coefficients at each order in \(\lambda _a\). The infinitely many degrees of freedom contained in \(\varvec{\phi }(\varvec{\lambda })\), or equivalently in \(\varvec{Q}(\varvec{\lambda })\) defined in (3.3) below, will be used to describe infinitely many different integrable hierarchies of integrable field theories. We will refer to these as group or algebra coordinates (respectively): they represent the dependent variables and are the fields satisfying the equations of motion of a hierarchies.

A particular integrable hierarchy will be determined by a choice of non-dynamical rational function, with poles in a finite subset \(S \subset \mathbb {C}P^1\), which we can write using a partial fraction decomposition as

$$\begin{aligned} F(\lambda ) = \sum _{a \in S} F^a(\lambda _a)_- \in R_\lambda (\mathfrak {g}) \end{aligned}$$

where \(F^a(\lambda _a)_- \in \mathfrak {g}\otimes \mathbb {C}[\lambda _a^{-1}]\) are (rational or trigonometric, depending on the case) principal parts at each \(a \in S\). In particular, \(F^a(\lambda _a)_- d\lambda \) has a pole of order \(N_a > 0\) at any \(a \in S \cap \mathbb {C}\) and a pole of order \(N_\infty + 2 \ge 2\) at infinity if \(\infty \in S\). Its expansion at all of the points \(a \in \mathbb {C}P^1\) defines an element \(\varvec{\iota }_{\varvec{\lambda }} F(\lambda ) = (\iota _{\lambda _a} F(\lambda ))_{a \in \mathbb {C}P^1} \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) of the \(\mathfrak {g}\)-valued adèles, via the embedding (2.3). By design, we have \((\iota _{\lambda _a} F(\lambda ))_- = F^a(\lambda _a)_-\) for each \(a \in S\) and \((\iota _{\lambda _a} F(\lambda ))_- = 0\) for every other points \(a \in \mathbb {C}P^1{\setminus } S\). The element of the \(\mathfrak {g}\)-valued adèles with these components, which we can denote by

$$\begin{aligned} \big ( \varvec{\iota }_{\varvec{\lambda }} F(\lambda ) \big )_- \,{:}{=}\, \big ( F^a(\lambda _a)_- \big )_{a \in \mathbb {C}P^1} \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}), \end{aligned}$$
(3.2)

is just a finite collection of principal parts. We consider its adjoint orbit under the group element \(\varvec{\phi }(\varvec{\lambda }) \in \varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\) introduced above, namely

$$\begin{aligned} \varvec{Q}(\varvec{\lambda }) \,{:}{=}\, \varvec{\phi }(\varvec{\lambda }) \big ( \varvec{\iota }_{\varvec{\lambda }} F(\lambda ) \big )_- \varvec{\phi }(\varvec{\lambda })^{-1} \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}). \end{aligned}$$
(3.3)

Explicitly, its component at any pole \(a \in S\) is \(Q^a(\lambda _a)=\phi ^a(\lambda _a) F^a(\lambda _a)_- \phi ^a(\lambda _a)^{-1}\) while the component at any other \(a \in \mathbb {C}P^1{\setminus } S\) vanishes. We can further expand the latter as a Laurent series in \(\lambda _a\), namely

$$\begin{aligned} Q^a(\lambda _a) = \sum _{n=-N_a}^\infty Q^a_n \lambda _a^n, \end{aligned}$$
(3.4)

for some \(Q^a_n \in \mathfrak {g}\), where \(N_a > 0\) is the order of the pole of \(F(\lambda )\) at \(a \in S \cap \mathbb {C}\). For the point at infinity we can have \(N_\infty \ge 0\).

Remark 3.1

The adjoint orbit (3.3) within the full Lie algebra of \(\mathfrak {g}\)-valued adèles \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) will play the role of the Lax matrix in the present infinite-dimensional setting. For comparison, it is useful to recall that in the finite-dimensional setting the Lax matrix is given by a coadjoint orbit

$$\begin{aligned} L(\lambda ) = \Pi _- \Big ( \varvec{\phi }(\varvec{\lambda }) \big ( \varvec{\iota }_{\varvec{\lambda }} F(\lambda ) \big )_- \varvec{\phi }(\varvec{\lambda })^{-1} \Big ) \in R_\lambda (\mathfrak {g}) \end{aligned}$$
(3.5)

where \(\Pi _-\) denotes either \(\pi _-^{\textrm{rat}}\) or \(\pi _-^\textrm{trig}\), depending on whether we are in the rational or trigonometric setting, but without applying the expansion \(\varvec{\iota }_{\varvec{\lambda }}\) so that we obtain an element of \(R_\lambda (\mathfrak {g})\) rather than \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\). In particular, the rational function \(L(\lambda )\) depends only on finitely many dynamical variables in \(\varvec{\phi }(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\).

3.1.2 Generating Lax equation

As our aim is to work with hierarchies of equations of motion, to each point \(a\in \mathbb {C}P^1\) we attach an infinite family of time coordinates \(t^a_n\) for \(n\in \mathbb {Z}\). Related to each time is the usual partial derivative \(\partial _{t_n^a}\) (meant as a total derivative when acting on functions of the fields). For our purposes, let us define the following generating operators

$$\begin{aligned} \mathcal {D}_{\lambda _a} {:}{=}\sum _{n\in \mathbb {Z}} \lambda _a^n \partial _{t^a_n},~~a\in \mathbb {C}, \quad \mathcal {D}_{\lambda _\infty } {:}{=}\sum _{n\in \mathbb {Z}} \lambda _\infty ^{n+k+1} \partial _{t^\infty _n} \end{aligned}$$
(3.6)

with \(k = 0\) in the rational case and \(k=-1\) in the trigonometric case. We let \(\mathcal {D}_{\varvec{\lambda }} {:}{=} (\mathcal {D}_{\lambda _a})_{a \in \mathbb {C}P^1}\) denote the \(\mathbb {C}P^1\)-tuple of these differential operators. Then, if \(\mu \) is another formal variable we will use the notation

$$\begin{aligned} \mathcal {D}_{\varvec{\lambda }} \varvec{Q}(\varvec{\mu }) = \bigg ( \mathcal {D}_{\lambda _a}Q^b(\mu _b) \bigg )_{a, b \in \mathbb {C}P^1} \end{aligned}$$
(3.7)

which encodes the flows \(\partial _{t^a_m} Q^b_n\) of all the dynamical variables \(Q^b_n\) with respect to all the times \(t^a_m\) for each pair of points \(a, b \in \mathbb {C}P^1\).

Following the first observation in Sect. 1.2 of the introduction, we want to declare the evolution of \(\varvec{Q}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) with respect to the above infinite family of times \(t^a_n\) to be governed by the following general Lax equation in r-matrix and generating form

$$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{Q}_1(\varvec{\lambda }) = \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ), \varvec{Q}_1(\varvec{\lambda }) \big ]. \end{aligned}$$
(3.8)

However, a few comments and precautions are necessary. First, writing such an equation with the understanding that \(\mathcal {D}_{\varvec{\mu }}\) is the \(\mathbb {C}P^1\)-tuple of commuting differential operators defined in (3.6) assumes that the vector fields on the right-hand side commute, if we want to be able to interpret the times \(t_n^a\) as coordinates on a manifold. In other words, defining the generating vector \({{\mathcal {X}}}_{\varvec{\mu }}\) acting on \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) by

$$\begin{aligned} {{\mathcal {X}}}_{\varvec{\mu }} \varvec{Q}_1(\varvec{\lambda }) = \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ), \varvec{Q}_1(\varvec{\lambda }) \big ], \end{aligned}$$
(3.9)

we must first prove that \([\mathcal{X}_{\varvec{\mu }},{{\mathcal {X}}}_{\varvec{\nu }}]=0\). Only then can we set \(\mathcal{X}_{\varvec{\mu }}=\mathcal {D}_{\varvec{\mu }}\) and view the generating Lax equation (3.8) as describing compatible time flows on \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\). This is shown below in Proposition 3.4 and is a beautiful consequence of the CYBE for r.

Second, note that the right-hand side of (3.8) lives in \(\coprod _{a, b \in \mathbb {C}P^1} \mathfrak {g}\otimes \lambda ^{-N_a} \mu ^{-N_b} \mathbb {C}\llbracket \lambda , \mu \rrbracket \). Indeed, at \(b \in \mathbb {C}P^1\) the power of \(\mu _b\) is bounded below by \(-N_b\) since \(\iota _{\lambda _a} \iota _{\mu _b} r_{12}(\lambda , \mu )\) is a Taylor series in \(\mu _b\) while \(Q^b_2(\mu _b)\) is a Laurent series with leading term of order \(\mu _b^{-N_b}\) by definition (3.4). By the following lemma we then also deduce that at \(a \in \mathbb {C}P^1\) the power of \(\lambda _a\) on the right hand side of (3.8) is bounded below by \(-N_a\). For the left hand side, this means that the flow with respect to the times \(t_m^b\), with \(m<-N_b\) are trivial: \(\varvec{Q}(\varvec{\lambda })\) does not depend on those times and for all practical purposes related to a hierarchy of field theories, they can be ignored.

Lemma 3.2

We have

$$\begin{aligned} \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ),\varvec{Q}_1(\varvec{\lambda }) \big ] = \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ),\varvec{Q}_1(\varvec{\lambda }) \big ]. \end{aligned}$$

Proof

Using the identity (2.11) (or (2.27) in the trigonometric case) we deduce that for any \(a, b \in \mathbb {C}P^1\) we have

$$\begin{aligned} \big [ {{\,\textrm{Tr}\,}}_2 \big ( (\iota _{\lambda _a} \iota _{\mu _b} - \iota _{\mu _b} \iota _{\lambda _a}) r_{12}(\lambda ,\mu ) Q^b_2(\mu _b) \big ),Q^a_1(\lambda _a) \big ] \propto \delta (\lambda _a, \mu _a) [Q^a(\lambda _a), Q^a(\mu _a)]. \end{aligned}$$

Since \([Q^a(\lambda _a), Q^a(\mu _a)]\) vanishes when \(\lambda _a = \mu _a\) it is proportional to \(\lambda _a - \mu _a\) and so it follows that the right hand side above vanishes, as required. \(\square \)

Remark 3.3

The Lax equation (3.8) is to be compared with the Lax equation in the usual finite-dimensional setting for the evolution of the Lax matrix with respect to the times associated with the coefficients in the partial fraction decomposition of the quadratic Hamiltonian

$$\begin{aligned} H(\mu ) = \tfrac{1}{2} {{\,\textrm{Tr}\,}}\big ( L(\mu )^2 \big ) = \sum _{a \in S} \sum _{n=0}^{n_a-1} \frac{H^a_n}{(\mu - a)^{n+1}}. \end{aligned}$$

If we gather together the flows \(\partial _{t^a_n} = \{ H^a_n, \cdot \}\) associated with the Hamiltonians \(H^a_n\) by defining the differential operator valued rational function

$$\begin{aligned} \mathcal {D}_\mu = \sum _{a \in S} \sum _{n=0}^{n_a-1} \frac{\partial _{t^a_n}}{(\mu - a)^{n+1}}, \end{aligned}$$

which is to be compared with the adèlic object (3.6) in the present infinite-dimensional setting, then the Lax equations in the finite-dimensional setting take the form

$$\begin{aligned} \mathcal {D}_\mu L_1(\lambda ) = \big [ {{\,\textrm{Tr}\,}}_2 \big ( r_{12}(\lambda ,\mu ) L_2(\mu ) \big ), L_1(\lambda ) \big ]. \end{aligned}$$
(3.10)

Both sides of this equation are \(\mathfrak {g}\)-valued rational functions in both \(\lambda \) and \(\mu \) with poles in \(\lambda \) and \(\mu \) at each \(a \in S\) of order at most \(N_a\).

Proposition 3.4

The flows (3.8) are compatible as a consequence of the commutativity of the corresponding vector fields, i.e. for any three formal variables \(\lambda \), \(\mu \) and \(\nu \) we have

$$\begin{aligned} {{\mathcal {X}}}_{\varvec{\nu }} {{\mathcal {X}}}_{\varvec{\mu }} \varvec{Q}(\varvec{\lambda }) = \mathcal{X}_{\varvec{\mu }} {{\mathcal {X}}}_{\varvec{\nu }} \varvec{Q}(\varvec{\lambda }). \end{aligned}$$
(3.11)

Proof

We have

$$\begin{aligned} {{\mathcal {X}}}_{\varvec{\nu }} {{\mathcal {X}}}_{\varvec{\mu }} \varvec{Q}_1(\varvec{\lambda })&= \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) {{\mathcal {X}}}_{\varvec{\nu }} \varvec{Q}_2(\varvec{\mu }) \big ),\varvec{Q}_1(\varvec{\lambda }) \big ] \nonumber \\&\quad + \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ), {{\mathcal {X}}}_{\varvec{\nu }} \varvec{Q}_1(\varvec{\lambda }) \big ] \nonumber \\&= {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \big [ \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} r_{23}(\mu ,\nu ) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_2(\varvec{\mu }) \big ],\varvec{Q}_1(\varvec{\lambda }) \Big ] \nonumber \\&\quad + {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }), \big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_1(\varvec{\lambda }) \big ] \Big ]. \end{aligned}$$
(3.12)

By using the cyclicity of the trace over space 2 in the first term on the right hand side and the Jacobi identity on the last term, this can be rewritten as

$$\begin{aligned} {{\mathcal {X}}}_{\varvec{\nu }} {{\mathcal {X}}}_{\varvec{\mu }} \varvec{Q}_1(\varvec{\lambda })&= {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big [ r_{12}(\lambda ,\mu ), r_{23}(\mu ,\nu ) \big ] \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_1(\varvec{\lambda }) \Big ]\\&\quad + {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big [ r_{12}(\lambda ,\mu ), r_{13}(\lambda ,\nu ) \big ] \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_1(\varvec{\lambda }) \Big ]\\&\quad + {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }),\big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }), \varvec{Q}_1(\varvec{\lambda }) \big ] \Big ]. \end{aligned}$$

Likewise, exchanging \(\mu \leftrightarrow \nu \) in (3.12) we obtain

$$\begin{aligned} {{\mathcal {X}}}_{\varvec{\mu }} {{\mathcal {X}}}_{\varvec{\nu }} \varvec{Q}_1(\varvec{\lambda })&= {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \big [ \varvec{\iota }_{\varvec{\nu }} \varvec{\iota }_{\varvec{\mu }} r_{32}(\nu ,\mu ) \varvec{Q}_2(\varvec{\mu }),\varvec{Q}_3(\varvec{\nu }) \big ],\varvec{Q}_1(\varvec{\lambda }) \Big ]\\&\quad + {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }), \big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }),\varvec{Q}_1(\varvec{\lambda }) \big ] \Big ]\\&= {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big [ r_{13}(\lambda ,\nu ), r_{32}(\nu ,\mu ) \big ] \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_1(\varvec{\lambda }) \Big ]\\&\quad + {{\,\textrm{Tr}\,}}_{23} \Big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }), \big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }),\varvec{Q}_1(\varvec{\lambda }) \big ] \Big ], \end{aligned}$$

where in the second equality we used Lemma 3.2 to swap the order of \(\varvec{\iota }_{\varvec{\nu }}\) and \(\varvec{\iota }_{\varvec{\mu }}\) in the first term, along with the cyclicity of the trace over space 3. Thus \(\big [{{\mathcal {X}}}_{\varvec{\nu }}, {{\mathcal {X}}}_{\varvec{\mu }}\big ]\varvec{Q}_1(\varvec{\lambda })\) equals

$$\begin{aligned}&{{\,\textrm{Tr}\,}}_{23}\Big [\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \Big ( [ r_{12}(\lambda , \mu ), r_{13}(\lambda , \nu )] + [r_{12}(\lambda , \mu ), r_{23}(\mu , \nu )]\\&\quad - [ r_{13}(\lambda ,\nu ), r_{32}(\nu , \mu )] \Big ) \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu }) ,\varvec{Q}_1(\varvec{\lambda }) \Big ] \end{aligned}$$

which vanishes as a consequence of the CYBE (1.1). \(\square \)

3.1.3 Generating zero curvature equation

In the context of integrable field theories the role of the Lax equation, cf. (3.10), is replaced by the zero curvature equation for a Lax connection. Therefore, as a first step towards relating the present formalism to integrable field theories, we now associate with each time \(t^a_n\), for any \(a \in S\) and \(n \ge -N_a\), a rational Lax matrix \(V^a_n(\lambda ) \in R_\lambda (\mathfrak {g})\) such that any pair of these satisfies a zero curvature equation.

The equations of motion (3.8) can be written succinctly as

$$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{Q}(\varvec{\lambda }) = [ \varvec{\iota }_{\varvec{\lambda }} \varvec{V}(\lambda ; \varvec{\mu }), \varvec{Q}(\varvec{\lambda }) ] \end{aligned}$$
(3.13)

where we have introduced

$$\begin{aligned} \varvec{V}(\lambda ; \varvec{\mu }) \,{:}{=}{{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ). \end{aligned}$$
(3.14)

Note that in (3.14) we do not expand the right hand side in powers of \(\lambda _a\) for \(a \in \mathbb {C}P^1\), i.e. we do not apply the homomorphism \(\varvec{\iota }_{\varvec{\lambda }}\). Instead, this expansion is taken explicitly in (3.13). In particular, the semi-colon in the notation \(\varvec{V}(\lambda ; \varvec{\mu })\) is used to emphasise that \(\lambda \) is just a formal variable whereas \(\varvec{\mu }\) is the usual boldface notation used as a shorthand for a collection \(\big ( V^b(\lambda ; \mu _b) \big )_{b \in \mathbb {C}P^1}\) where

$$\begin{aligned} V^b(\lambda ; \mu _b)&= \sum _{n=-N_b}^\infty V^b_n(\lambda ) \mu _b^n\,,~~b\in \mathbb {C}\,, \end{aligned}$$
(3.15a)
$$\begin{aligned} V^\infty (\lambda ; \mu _\infty )&= \sum _{n=-N_\infty }^\infty V^\infty _n(\lambda ) \mu _\infty ^{n+k+1}\,. \end{aligned}$$
(3.15b)

As usual, we take \(k = 0\) in the rational case and \(k=-1\) in the trigonometric case. Here \(V^b_n(\lambda ) \in R_\lambda (\mathfrak {g})\) are \(\mathfrak {g}\)-valued rational functions in \(\lambda \) with a pole at \(\lambda = b\). Unpacking the notation in (3.13) slightly, recalling the definition of the operators \(\mathcal {D}_{\varvec{\mu }}\) and (3.6), we see that the flow of \(\varvec{Q}(\varvec{\lambda })\) with respect to the time \(t^a_n\) is controlled by \(V^a_n(\lambda )\), namely we have the Lax equation

$$\begin{aligned} \partial _{t^a_n} \varvec{Q}(\varvec{\lambda }) = [ \varvec{\iota }_{\varvec{\lambda }} V^a_n(\lambda ), \varvec{Q}(\varvec{\lambda }) ]. \end{aligned}$$
(3.16)

Moreover, by the following proposition \(V^b(\lambda ; \mu _b)\) can be seen as a generating series in \(\mu _b\) of a hierarchy of Lax matrices \(V^b_n(\lambda )\) associated with the times \(t^b_n\).

Proposition 3.5

We have the zero curvature equation in generating form

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{V}(\lambda ; \varvec{\mu })- \mathcal {D}_{\varvec{\mu }} \varvec{V}(\lambda ; \varvec{\nu })+ \big [\varvec{V}(\lambda ; \varvec{\mu }), \varvec{V}(\lambda ; \varvec{\nu })\big ]=0. \end{aligned}$$
(3.17)

Equivalently, in components we have the zero curvature equation

$$\begin{aligned} \partial _{t^b_n} V^a_m(\lambda ) - \partial _{t^a_m} V^b_n(\lambda ) + \big [V^a_m(\lambda ), V^b_n(\lambda ) \big ]=0 \end{aligned}$$

for every \(a, b \in \mathbb {C}P^1\) and \(m \ge - N_a\) and \(n \ge - N_b\).

Proof

Using the Lax equation (3.8) we find

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{V}(\lambda ; \varvec{\mu })&= {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2(\varvec{\mu }) \big )\\&= {{\,\textrm{Tr}\,}}_{23} \big ( \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \big [ \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} r_{23}(\mu ,\nu ) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_2(\varvec{\mu }) \big ] \big )\\&= {{\,\textrm{Tr}\,}}_{23} \big ( \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big [ r_{12}(\lambda ,\mu ), r_{23}(\mu ,\nu ) \big ] \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu }) \big ), \end{aligned}$$

where in the last equality we used the cyclicity of the trace in space 2. Likewise, we also have

$$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{V}(\lambda ; \varvec{\nu })&= {{\,\textrm{Tr}\,}}_3 \big ( \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \mathcal {D}_{\varvec{\mu }} \varvec{Q}_3(\varvec{\nu }) \big )\\&= {{\,\textrm{Tr}\,}}_{23} \big ( \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \big [ \varvec{\iota }_{\varvec{\nu }} \varvec{\iota }_{\varvec{\mu }} r_{32}(\nu ,\mu ) \varvec{Q}_2(\varvec{\mu }),\varvec{Q}_3(\varvec{\nu }) \big ] \big )\\&= {{\,\textrm{Tr}\,}}_{23} \big ( \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big [ r_{13}(\lambda ,\nu ), r_{32}(\nu ,\mu ) \big ] \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu }) \big ) \end{aligned}$$

where in the final step we used Lemma 3.2 to swap the order of \(\varvec{\iota }_{\varvec{\nu }}\) and \(\varvec{\iota }_{\varvec{\mu }}\), before using the cyclicity of the trace in space 3. Finally, we have

$$\begin{aligned} \big [\varvec{V}(\lambda ; \varvec{\mu }), \varvec{V}(\lambda ; \varvec{\nu })\big ]&= {{\,\textrm{Tr}\,}}_{23} \big [ \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }), \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }) \big ]\\&= {{\,\textrm{Tr}\,}}_{23} \big ( \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big [ r_{12}(\lambda ,\mu ), r_{13}(\lambda ,\nu ) \big ] \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu }) \big ). \end{aligned}$$

The result now follows by the classical Yang–Baxter equation (1.1). \(\square \)

Remark 3.6

Note the clear resemblance between the generating series (3.14) for the hierarchy of Lax matrices \(V^a_n(\lambda )\) and the usual generating rational function

$$\begin{aligned} M(\lambda ; \mu ) = {{\,\textrm{Tr}\,}}_2 \big ( r_{12}(\lambda ,\mu ) L_2(\mu ) \big ) \end{aligned}$$

in the finite-dimensional case. The coefficients in the partial fraction decomposition of the latter with respect to \(\mu \) are \(\mathfrak {g}\)-valued rational matrices \(M^a_n(\lambda )\) which control the flow of the Lax matrix \(L(\lambda )\) with respect to the associated time \(t^a_n\) via the Lax equation \(\partial _{t^a_n} L(\lambda ) = [ M^a_n(\lambda ), L(\lambda ) ]\), which is to be compared with (3.16).

In the finite-dimensional case, however, one may also need to consider the more general generating rational function

$$\begin{aligned} M^{(n)}(\lambda ; \mu ) = {{\,\textrm{Tr}\,}}_2 \big ( r_{12}(\lambda ,\mu ) L_2(\mu )^{n-1} \big ) \end{aligned}$$

for integers \(n \ge 2\). Indeed, the Lax equation in (3.10) involves \(M(\lambda ; \mu ) = M^{(2)}(\lambda ; \mu )\) which is only associated with the quadratic Hamiltonians \(H(\mu ) = \frac{1}{2} {{\,\textrm{Tr}\,}}(L(\mu )^2)\). But in the finite-dimensional setting one should equally consider the Lax equations where \(M^{(n)}(\lambda ; \mu )\) replaces \(M^{(2)}(\lambda ; \mu )\) since these describe the flows of the Lax matrix \(L(\lambda )\) with respect to the higher order Hamiltonians built from \(\frac{1}{n} {{\,\textrm{Tr}\,}}(L(\mu )^n)\).

In the present infinite-dimensional context, we observe that the generating series (3.14) is sufficient to produce the infinite number of Lax matrices \(V^a_n(\lambda )\) associated with the infinite number of times \(t^a_n\) in the hierarchy that one expects from the traditional examples of the AKNS or the sine-Gordon hierarchies (see below). It is not clear to us what an appropriate analog of taking higher powers of \(L(\lambda )\) is in terms of \(\varvec{Q}(\varvec{\lambda })\) and whether the resulting Lax matrices and commuting flows would be independent of those obtained already.

Remark 3.7

It is instructive to compare the generating series (3.14) for the hierarchy of Lax matrices \(V^a_n(\lambda )\) with formulas for similar generating series of Lax matrices obtained in the more traditional approach to integrable field theories which involves the monodromy matrix associated to a given auxiliary equation \(\partial _x\Psi =U\Psi \). For example, in [FT, pp. 203–204], it is shown that the object

$$\begin{aligned} V(x,\lambda ,\mu )=\frac{1}{2(\lambda -\mu )}(\varvec{1}+W(x,\mu ))(-i\sigma _3)(\varvec{1}+W(x,\mu ))^{-1} \end{aligned}$$
(3.18)

“is the generating series of the Lax matrices \(V_n(x,\lambda )\) appearing in the zero curvature equation representation of the higher NS equations”. The expansion is to be understood as

$$\begin{aligned} V(x,\lambda ,\mu )=\sum _{n=1}^{\infty }V_n(x,\lambda )\mu ^{-n}. \end{aligned}$$
(3.19)

The point is that (3.18) can be rewritten as

$$\begin{aligned} V(x,\lambda ,\mu )=-\frac{1}{2}{{\,\textrm{Tr}\,}}_2\left( \iota _{\mu _\infty }r_{12(\lambda ,\mu )} (\varvec{1}+W(x,\mu ))_2(-i\sigma _3)_2(\varvec{1}+W(x,\mu ))_2^{-1}\right) .\nonumber \\ \end{aligned}$$
(3.20)

We note the explicit dependence of the preferred variable x, indicative of the fact that this object has been built from a particular, preferred time x associated to the Lax matrix denoted \(U(x,\lambda )\), which is nothing but \(V_1(x,\lambda )\), as it should be. Other than this dependence, formula (3.18) has exactly the same structure as our formula (3.14) when specialised to the AKNS hierarchy, see Sect. 4. Indeed, in that case the only pole to consider is at infinity and the function \(F(\lambda )\) is taken to be \(-i\sigma _3\). Hence the only non zero element in the tuple (3.14) is

$$\begin{aligned} V^\infty (\lambda ; \mu _\infty ) = {{\,\textrm{Tr}\,}}_2 \big ( \iota _{ \mu _\infty } r_{12}(\lambda ,\mu ) \phi ^\infty _2( \mu _\infty )(-i\sigma _3)_2\phi ^\infty _2( \mu _\infty )^{-1} \big ). \end{aligned}$$
(3.21)

To complete the comparison, note that the term \((\varvec{1}+W(x,\mu ))\) in (3.20) comes from writing the monodromy matrix \(T(x,y,\lambda )\) on the finite interval [yx], associated to \(U(x,\lambda )\), as

$$\begin{aligned} T(x,y,\mu )=(\varvec{1}+W(x,\mu ))e^{Z(x,y,\mu )}(\varvec{1}+W(y,\mu ))^{-1} \end{aligned}$$
(3.22)

where Z is a diagonal matrix and both Z and W are Taylor series in \(1/\mu \) (with no constant term for W). We refer the curious reader to [FT] for more details about Z and W which are not of importance for our discussion here. Considering for instance the case of fast decaying fields as \(|x|\rightarrow \infty \), we can work with the monodromy matrix on \((-\infty ,x)\)

$$\begin{aligned} T^-(x,\mu )=(\varvec{1}+W(x,\mu ))e^{Z^-(x,\mu )}. \end{aligned}$$
(3.23)

This is the object that plays the role of our group element \(\phi ^\infty ( \mu _\infty )\). Indeed, formally plugging \(T^-(x,\mu )\) into (3.21) in place of \(\phi ^\infty ( \mu _\infty )\), and remembering that \(e^{Z^-(x,\mu )}\) commutes with \(\sigma _3\), we see that we get (3.20) (up to an irrelevant factor \(-1/2\) which comes from a different choice of normalisation between us and [FT]).

In [ACDK, AC], the argument from [FT] was generalised to obtain the analog of formula (3.20) but where one now builds it from the monodromy matrix associated to the time \(t_k\) and Lax matrix \(V_k(t_k,\lambda )\), for an arbitrary but fixed \(k\ge 1\). This represented the first step towards providing a generating function of Lax matrices that treats all times in the AKNS hierarchy equally. Our formula (3.14) achieves this fully in that it makes no reference to a preferred time and an associated monodromy matrix as a starting point. It is also valid well beyond the realm of AKNS only, as our various examples below demonstrate.

It will be useful, in view of applying our general framework to construct explicit examples in the next few sections, to be more explicit about the form of the Lax matrices \(V^a_n(\lambda )\). This can be done using Lemma 2.4 in the rational case or Lemma 2.9 in the trigonometric case.

Proposition 3.8

In the rational case, for every \(a \in \mathbb {C}\) and \(n \ge - N_a\), we have

$$\begin{aligned} V^a_n(\lambda ) = - \big ( \lambda _a^{-n-1} Q^a(\lambda _a) \big )^\textrm{rat}_-, \end{aligned}$$

while at infinity, for any \(n \ge - N_\infty \) we have

$$\begin{aligned} V^\infty _n(\lambda ) = \big ( \lambda _\infty ^{-n} Q^\infty (\lambda _\infty ) \big )^{\textrm{rat}}_-. \end{aligned}$$

In the trigonometric case, for every \(a \in \mathbb {C}\) and \(n \ge -N_a\) we have

$$\begin{aligned} V^a_n(\lambda ) = - \big ( (\lambda _a^{-n} + a \lambda _a^{-n-1}) Q^a(\lambda _a) \big )^{\textrm{trig}}_-, \end{aligned}$$

which at the origin simply reads \(V^0_n(\lambda ) = - \big ( \lambda ^{-n} Q^0(\lambda ) \big )^{\textrm{trig}}_-\), while at infinity we have, for every \(n \ge - N_\infty \),

$$\begin{aligned} V^\infty _n(\lambda ) = \big ( \lambda ^n Q^\infty (\lambda ^{-1}) \big )^{\textrm{trig}}_-. \end{aligned}$$

Proof

In the rational (resp. trigonometric) case this is a direct consequence of the definition (3.15) together with Lemma 2.4 (resp. Lemma 2.9). \(\square \)

Recall that \(Q^a(\lambda _a) = 0\) if \(a \in \mathbb {C}P^1{\setminus } S\) so that, in fact, \(V^a_n(\lambda ) = 0\) unless \(a \in S\). By construction each Lax matrix \(V^a_n(\lambda ) \in R_\lambda (\mathfrak {g})\) for any \(a \in S\) and \(n \ge -N_a\), or rather their embedding in \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) via (1.13), is a coadjoint orbit in \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\). For instance, in the rational case for \(a \in \mathbb {C}\cap S\) we have

$$\begin{aligned} V^a_n(\lambda ) = - \Big ( \phi ^a(\lambda _a) \lambda _a^{-n-1} F^a(\lambda _a)_- \phi ^a(\lambda _a)^{-1} \Big )^{\textrm{rat}}_- \in \mathfrak {g}\otimes \mathbb {C}[\lambda _a^{-1}] \subset R_\lambda (\mathfrak {g}). \end{aligned}$$

3.1.4 Connection to integrable field theory and FNR procedure

Up to this point, the framework we have been discussing is very similar to the one used to describe finite-dimensional integrable systems, as emphasised in Remarks 3.1, 3.3 and 3.6. However, as we will see explicitly in all the examples discussed in later sections, our formalism encodes entire hierarchies of integrable field theories!

The FNR procedure

One way to make explicit contact with the traditional approach to integrable field theory is to choose a preferred coordinate, denote it by x and set it as a particular combination of the fundamental times \(t^a_n\) for \(a \in S\) and \(n \ge -N_a\). Quite generally, we can choose some finite subsets \(T_a \subset \mathbb {Z}_{\ge -N_a}\) for each \(a \in S\) and define \(\displaystyle \partial _x {:}{=}\sum _{a \in S} \sum _{n \in T_a} r^a_n \partial _{t^a_n}\) for some \(r^a_n \in \mathbb {C}^*\). The Lax matrix associated with the coordinate x is then given by

$$\begin{aligned} U(\lambda ) {:}{=}\sum _{a \in S} \sum _{n \in T_a} r^a_n V^a_n(\lambda ) \in R_\lambda (\mathfrak {g}). \end{aligned}$$
(3.24)

As explained above, \(\varvec{\iota }_{\varvec{\lambda }} U(\lambda )\) is then a coadjoint orbit in the dual space \(\varvec{\iota }_{\varvec{\lambda }} R_\lambda (\mathfrak {g})\) of \(\varvec{\mathcal {A}}^+_{\varvec{\lambda }}(\mathfrak {g})\). This is the coadjoint orbit alluded to at the very start of this section which encodes the finite collection of fields of our integrable hierarchy. It follows from (3.13), or even more directly from (3.16), that the spatial dependence of \(\varvec{Q}(\varvec{\lambda })\) is governed by the Lax equation

$$\begin{aligned} \partial _x \varvec{Q}(\varvec{\lambda }) = [ \varvec{\iota }_{\varvec{\lambda }} U(\lambda ), \varvec{Q}(\varvec{\lambda }) ]. \end{aligned}$$
(3.25)

As we will see on examples, the equation (3.25) can be solved recursively to express the coefficients \(Q^a_n\) of \(Q^a(\lambda _a)\), cf. (3.4), as differential polynomials in the fields, i.e. the variables contained in the Lax matrix \(U(\lambda )\). All other Lax matrices \(V^a_n(\lambda )\) associated to the fundamental times \(t^a_n\) will then have components expressed as differential polynomials of the fields.

We will outline below how (3.25) can, in principle, be solved recursively for each \(Q^a_n\). Since certain details of the recursive procedure depend on the model considered, we will only illustrate here the part of the construction which applies universally to all models in Lemma 3.9 below. We will see later on examples how to apply this construction to specific models.

To state the lemma, we first need to make a few observations and definitions. Since the Laurent expansions \(\iota _{\lambda _a} V^a_n(\lambda )\) each have a non-zero principal part, it follows from the definition (3.24) that we can write

$$\begin{aligned} \iota _{\lambda _a} U(\lambda ) = \sum _{p = - n_a}^\infty U^a_p \lambda _a^p \end{aligned}$$
(3.26)

for some \(n_a \ge 1\) and non-zero leading coefficient \(U^a_{-n_a} \in \mathfrak {g}\). By definition (3.3) we have that \(Q^a(\lambda _a)=\phi ^a(\lambda _a) F^a(\lambda _a)_- \phi ^a(\lambda _a)^{-1}\). It thus follows from the relationship between each \(\iota _{\lambda _a} V^a_n(\lambda )\) and \(Q^a(\lambda _a)\), as described explicitly in Proposition 3.8, that the coefficients of the most singular terms in the formal Laurent series \(Q^a(\lambda _a)\) and \(\iota _{\lambda _a} U(\lambda )\), given in (3.4) and (3.26) respectively, are proportional. In other words, we have \(Q^a_{-N_a} = c\, U^a_{-n_a}\) for some \(c \in \mathbb {C}^*\). Explicitly, it can be seen from Proposition 3.8 that c is given up to a sign by the coefficient \(r^a_n\) in (3.24) with \(n = \max T_a\). We can thus write

$$\begin{aligned} \lambda _a^{N_a} Q^a(\lambda _a) = c\, U^a_{-n_a} + \sum _{r=1}^\infty Q^a_{-N_a+r} \lambda _a^r. \end{aligned}$$
(3.27)

Now let \(\mathfrak {k} {:}{=}\ker (ad \, U^a_{-n_a})\) and \(\mathfrak {i} {:}{=}{{\,\textrm{im}\,}}(ad \, U^a_{-n_a})\). We fix any complements \(\mathfrak {k}'\) of \(\mathfrak {k}\) and \(\mathfrak {i}'\) of \(\mathfrak {i}\) in \(\mathfrak {g}\) so that we have the direct sum decompositions

$$\begin{aligned} \mathfrak {g}= \mathfrak {k} \oplus \mathfrak {k}' = \mathfrak {i} \oplus \mathfrak {i}'. \end{aligned}$$
(3.28)

Let \(\pi _{\mathfrak {k}}: \mathfrak {g}\rightarrow \mathfrak {k}\) and \(\pi _{\mathfrak {k}'}: \mathfrak {g}\rightarrow \mathfrak {k}'\) denote the projections onto \(\mathfrak {k}\) and \(\mathfrak {k}'\) relative to the first decomposition. Likewise, let \(\pi _{\mathfrak {i}}: \mathfrak {g}\rightarrow \mathfrak {i}\) and \(\pi _{\mathfrak {i}'}: \mathfrak {g}\rightarrow \mathfrak {i}'\) denote the projections onto \(\mathfrak {i}\) and \(\mathfrak {i}'\) relative to the second decomposition in (3.28).

Lemma 3.9

For any \(r \ge 1\), \(\pi _{\mathfrak {k}'}(Q^a_{-N_a+r})\) is expressible as a differential polynomial in x of the elements \(Q^a_{-N_a+s}\) for \(s < r\).

Proof

Using the explicit forms (3.4) and (3.26) for the Laurent series of \(Q^a(\lambda _a)\) and \(\iota _{\lambda _a} U(\lambda )\), we may rewrite the component of (3.25) at \(a \in S\) more explicitly as

$$\begin{aligned} \sum _{n=-N_a}^\infty \lambda _a^n \partial _x Q^a_n = \sum _{m=-N_a}^\infty \sum _{p = - n_a}^\infty \lambda _a^{m+p} [ U^a_p, Q^a_m] = \sum _{n=-N_a-n_a}^\infty \lambda _a^n \sum _{p = - n_a}^\infty [ U^a_p, Q^a_{n-p}]. \end{aligned}$$

In the second equality we have changed variables in the double sum from \(m \ge -N_a\) to \(n {:}{=}m+p \ge -N_a - n_a\). Comparing the coefficients of \(\lambda _a^{-N_a - n_a + r}\) on both sides of the above equation for all \(r \ge 0\) we find the following. For every \(0 \le r \le n_a - 1\),

$$\begin{aligned}{}[U^a_{-n_a}, Q^a_{-N_a + r}] = - \sum _{q=1}^r [U^a_{-n_a+q}, Q^a_{-N_a+r-q}] \end{aligned}$$
(3.29a)

where we changed variables in the sum from p to \(q {:}{=}p+n_a\). Notice that for \(r=0\) this gives \([U^a_{-n_a}, Q^a_{-N_a}] = 0\) which is consistent with the observation in (3.27) that \(Q^a_{-N_a}\) is proportional to \(U^a_{-n_a}\). On the other hand, for \(r \ge n_a\) we have

$$\begin{aligned}{}[ U^a_{-n_a}, Q^a_{-N_a+r}] = \partial _x Q^a_{-N_a -n_a+ r} - \sum _{q=1}^r [ U^a_{-n_a+q}, Q^a_{-N_a+r-q}]. \end{aligned}$$
(3.29b)

Denoting the right hand side of the equations (3.29) by \(B_r\), for each \(r \ge 0\) we can rewrite all of them more uniformly as

$$\begin{aligned}{}[ U^a_{-n_a}, Q^a_{-N_a+r}] = B_r \end{aligned}$$
(3.30)

for \(r \ge 0\). Since the left hand side of (3.30) lies in \(\mathfrak {i}\) we have, for every \(r \ge 0\),

$$\begin{aligned} \big [ U^a_{-n_a}, \pi _{\mathfrak {k}'}(Q^a_{-N_a+r}) \big ] = \pi _{\mathfrak {i}}(B_r), \quad 0 = \pi _{\mathfrak {i}'}(B_r), \end{aligned}$$
(3.31)

where in the first equation we have also decomposed \(Q^a_{-N_a+r}\) relative to the first decomposition in (3.28) and used the fact that \(\pi _{\mathfrak {k}}(Q^a_{-N_a+r})\) commutes with \(U^a_{-n_a}\).

Now the linear map \(ad \, U^a_{-n_a}: \mathfrak {k}' \rightarrow \mathfrak {i}\) is a bijection. Indeed, it is clearly surjective by definition of \(\mathfrak {i}\). To see that it is injective, note that if \([U^a_{-n_a}, X] = [U^a_{-n_a}, Y]\) for any \(X, Y \in \mathfrak {k}'\) then \(X-Y \in \mathfrak {k}\) and hence \(X - Y = 0\), as required. It follows that \(\pi _{\mathfrak {k}'}(Q^a_{-N_a+r})\) is uniquely determined in terms of \(\pi _{\mathfrak {i}}(B_r)\) for every \(r \ge 0\) by the first equation in (3.31). The result now follows. \(\square \)

In order to completely determine the coefficients \(Q^a_{-N_a+r}\) for \(r \ge 0\), it remains to show that the \(\pi _{\mathfrak {k}}(Q^a_{-N_a+r})\) for every \(r \ge 0\) can also be determined recursively. This is the part which will typically depend on the model considered. Here we will show, generalising an argument for the ZS-AKNS \(n \times n\) hierarchy given in [TU, Theoerem 2.2], see also [Sa], how this can be done under the assumption that there is a polynomial \(P_a\) with coefficients in \(\mathbb {C}[\lambda _a]\) such that \(P_a\big ( \lambda _a^{N_a} F^a(\lambda _a)_- \big ) = 0\) and \(P_a'(c\, U^a_{-n_a}) \in \mathbb {C}[\lambda _a]\) is invertible in \(\mathbb {C}\llbracket \lambda _a \rrbracket \). Recalling (3.27), we have the identity

$$\begin{aligned} P_a \bigg ( \! c\, U^a_{-n_a} + \sum _{r=1}^\infty Q^a_{-N_a+r} \lambda _a^r \bigg ) = 0. \end{aligned}$$

And using the fact that each \(\pi _{\mathfrak {k}}(Q^a_{-N_a+r})\) commutes with \(U^a_{-n_a}\), by definition of \(\mathfrak {k}\), we can then rewrite the above in the form

$$\begin{aligned} P_a(c\, U^a_{-n_a}) + P'_a(c\, U^a_{-n_a}) \sum _{r=1}^\infty \pi _{\mathfrak {k}}(Q^a_{-N_a+r}) \lambda _a^r = \mathcal {R} \Big ( \big \{ Q^a_{-N_a+r} \big \}_{r=1}^\infty \Big ) \end{aligned}$$
(3.32)

where the right hand side is a sum of terms, each of which contains either higher powers of \(\sum _{r=1}^\infty \pi _{\mathfrak {k}}(Q^a_{-N_a+r}) \lambda _a^r\) or at least one factor of \(\sum _{r=1}^\infty \pi _{\mathfrak {k}'}(Q^a_{-N_a+r}) \lambda _a^r\). Since we are assuming that \(P'_a(c\, U^a_{-n_a}) \in \mathbb {C}[\lambda _a]\) is invertible, it follows by comparing powers of \(\lambda _a^r\) on both sides of (3.32) that \(\pi _{\mathfrak {k}}(Q^a_{-N_a+r})\) can be expressed as a finite sum of terms involving only \(\pi _{\mathfrak {k}}(Q^a_{-N_a+s})\) for \(s < r\) or \(\pi _{\mathfrak {k}'}(Q^a_{-N_a+s})\) for \(s \le r\).

In conjunction with Lemma 3.9, this shows that each \(\pi _{\mathfrak {k}}(Q^a_{-N_a+r})\) and \(\pi _{\mathfrak {k}'}(Q^a_{-N_a+r})\), and therefore \(Q^a_{-N_a+r}\) itself, can be determined recursively for each \(r \ge 0\). In particular, all the coefficients \(Q^a_n\), \(n \ge - N_a\) of the Laurent series \(Q^a(\lambda _a)\) in (3.4) can be expressed as differential polynomials in x of the coefficients of the rational function \(U(\lambda )\). The same conclusion still holds even when there is no polynomial \(P_a\) with the above properties, as will be shown on the example of the sine-Gordon hierarchy in Sect. 5.

It is important to observe that our choice of ‘spatial’ coordinate x defined by the linear combination \(\partial _x = \sum _{a \in S} \sum _{n \in T_a} r^a_n \partial _{t^a_n}\) and its associated Lax matrix in (3.24) was completely arbitrary. Indeed, one of the main advantages of working with the adjoint orbit \(\varvec{Q}(\varvec{\lambda })\) in \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) rather than the coadjoint orbit \(U(\lambda )\) in \(R_\lambda (\mathfrak {g})\) is that it keeps all the times on an equal footing by not singling out a particular (linear combination of) time as ‘space’.

On the redundancy of the FNR procedure The previous discussion casts in the present framework the original idea of [FNR] whereby one should first solve for the coordinates in \(\varvec{Q}(\varvec{\lambda })\) in terms of the finite collection of fields contained in a given Lax matrix \(U(\lambda )\), now interpreted as fields depending on a preferred space variable x. The other times in the hierarchies are viewed as (compatible) time flows imposed on this finite collection of fields and define a preferred field theory alongside its higher symmetries.

Here we want to elaborate on a point of view originally advocated in [CS3] whereby the above “traditional” approach is not needed at all and, in fact, represents a conceptual obstruction to the formalism we want to put forward in this work: we treat all the times in a hierarchy as well as all the (algebra or group) coordinates (i.e. the dependent variables contained in \(\varvec{Q}(\varvec{\lambda })\) or \(\varvec{\phi }(\varvec{\lambda })\) respectively) on the same footing. From this point of view, one should consider the entirety of the Lax equations contained in the generating Lax equation (3.8), or equivalently, the collection of zero curvature equations (3.5). The point is that the latter implement the FNR procedure anyway but they present the advantage of being amenable to a covariant Hamiltonian formulation, which was one of the main results of [CS1, CS3]. This aspect is beyond the scope of the present work but remains one motivation for it. The fact that the zero curvature equations contain the equations of the FNR procedure was already observed and used in the particular example of the AKNS hierarchy in [AC]. For convenience, let us sketch the argument here in the simplest case of a single pole \(a\in \mathbb {C}\), with a collection of times \(t_n^a\), \(n\ge -N_a\). Suppose we fix \(n\ge -N_a\) and we want to solve

$$\begin{aligned} \partial _{t_n^a} Q^a(\lambda _a) = [ \iota _{ \lambda _a} V_n^a(\lambda ), Q^a( \lambda _a) ], \end{aligned}$$
(3.33)

given \(Q^a_{-N_a}\), along the lines of Lemma 3.9 and the discussion after it. Without loss of generality, shifting the power of \(\lambda _a\) by \(N_a\), we can always assume for simplicity that \(N_a=0\). Then, (3.33) amounts to the collection of equations

$$\begin{aligned} \partial _{t_n^a} Q^a_j = \sum _{p =0}^n[ Q_{j+n-p+1}^a, Q^a_p ],~~j\ge 0. \end{aligned}$$
(3.34)

As discussed above, in certain cases (which include the AKNS hierarchy and the sG hierarchy as we show explicitly in Sect. 5), this allows one to express all the algebra coordinates in \(Q_j\), \(j\ge n\) as differential polynomials with respect to \(t_n^a\) in the coordinates contained in \(Q_k\), \(k=0,\dots ,n\). Now consider the zero curvature equations, for \(m\ge n+1\),

$$\begin{aligned} \partial _{t_n^a} V_m^a(\lambda ) - \partial _{t_m^a} V_n^a(\lambda )+ [ V_n^a(\lambda ), V_m^a(\lambda ) ]=0. \end{aligned}$$
(3.35)

Looking at the coefficient of \(1/\lambda ^j\), for \(j=n+2,\dots ,m+1\), we find that they contain the equations

$$\begin{aligned} \partial _{t_n^a} Q^a_{m+1-k} = \sum _{p =0}^n[ Q_{m+n+2-k-p}^a, Q^a_p ],~~k=n+2,\dots ,m+1. \end{aligned}$$
(3.36)

If we set \(j=m+1-k\), these become

$$\begin{aligned} \partial _{t_n^a} Q^a_{j} = \sum _{p =0}^n[ Q_{j+n+1-p}^a, Q^a_p ],~~j=0,\dots ,m-n-1. \end{aligned}$$
(3.37)

So the collection of zero curvature equations (3.35) for \(m\ge n+1\) produces exactly the set of FNR equations (3.34). Hence, there is no point in implementing the FNR procedure a priori to determine the “fields” and then impose the zero curvature equations to determine their equations of motion. The latter suffices. With this in mind, we will come back to this point in certain examples below to illustrate our position and show how abandoning the FNR procedure allows us to eliminate the problem of alien derivatives mentioned in the introduction.

3.2 Generating Lagrangian multiform

In this section, we introduce the main object of this paper, the generating Lagrangian multiform (1.15)–(3.40), and we show that the Lax equation (3.8) as it derives from . Although the equations of motion (3.8) can be written in terms of \(\varvec{Q}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) alone, in order to write we need the group-valued element \(\varvec{\phi }(\varvec{\lambda }) \in \varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\). This is very reminiscent of the fact that writing down the Zakharov–Mikhailov action describing the Zakharov–Shabat equations of motion requires introducing a group valued field [ZM1]. Recall the definition of \(\varvec{Q}(\varvec{\lambda }) \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) in (3.3) as an adjoint orbit of the element \((\varvec{\iota }_{\varvec{\lambda }} F(\lambda ))_- \in \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), defined in (3.2), under the action of \(\varvec{\phi }(\varvec{\lambda }) \in \varvec{\mathcal {A}}^+_{\varvec{\lambda }}(G)\).

We consider the following generating Lagrangian multiform

(3.38)

where the kinetic and potential terms are given by

$$\begin{aligned} {{\textbf {K}}}(\varvec{\lambda },\varvec{\mu })&\,{:}{=}\, {{\,\text {Tr}\,}}\big ( \varvec{\phi }(\varvec{\lambda })^{-1} \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda }) (\varvec{\iota }_{\varvec{\lambda }} F(\lambda ))_- \big )- {{\,\text {Tr}\,}}\big (\varvec{\phi }(\varvec{\mu })^{-1} \mathcal {D}_{\varvec{\lambda }} \varvec{\phi }(\varvec{\mu }) (\varvec{\iota }_{\varvec{\mu }} F(\mu ))_- \big ), \end{aligned}$$
(3.39a)
$$\begin{aligned} {{\textbf {U}}}(\varvec{\lambda },\varvec{\mu })&\,\,{:}{=}\, \,\tfrac{1}{2} {{\,\text {Tr}\,}}_{12}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) {{\textbf {Q}}}_1(\varvec{\lambda }) {{\textbf {Q}}}_2(\varvec{\mu })\big ). \end{aligned}$$
(3.39b)

As mentioned at the end of Sect. 2.1, the boldface notation (1.15) is used as a shorthand for an equality of components

$$\begin{aligned} \mathscr {L}^{a,b}(\lambda _a, \mu _b) = K^{a,b}(\lambda _a, \mu _b) - U^{a,b}(\lambda _a, \mu _b) \end{aligned}$$

for every \(a,b \in \mathbb {C}P^1\), and the kinetic and potential terms (1.16) in components are given explicitly by

$$\begin{aligned} K^{a,b}(\lambda _a,\mu _b)&= {{\,\textrm{Tr}\,}}\big ( \phi ^a(\lambda _a)^{-1} \mathcal {D}_{\mu _b} \phi ^a(\lambda _a) F^a(\lambda _a)_- \big ) \nonumber \\&\quad - {{\,\textrm{Tr}\,}}\big (\phi ^b(\mu _b)^{-1} \mathcal {D}_{\lambda _a} \phi ^b(\mu _b) F^b(\mu _b)_- \big ), \end{aligned}$$
(3.40a)
$$\begin{aligned} U^{a,b}(\lambda _a,\mu _b)&= \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{12}\big ( (\iota _{\lambda _a} \iota _{\mu _b} + \iota _{\mu _b} \iota _{\lambda _a})r_{12}(\lambda ,\mu )Q^a_1(\lambda _a)Q^b_2(\mu _b)\big ). \end{aligned}$$
(3.40b)

The kinetic term (1.16a) is clearly skew-symmetric under the exchange \(\varvec{\lambda }\leftrightarrow \varvec{\mu }\), so the skew-symmetry of is equivalent to the skew-symmetry of the potential term (1.16b), namely

$$\begin{aligned}&{{\,\textrm{Tr}\,}}_{12}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }}) r_{12}(\lambda ,\mu )\varvec{Q}_1(\varvec{\lambda }) \varvec{Q}_2(\varvec{\mu })\big )\\&\quad = - {{\,\textrm{Tr}\,}}_{12}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }}) r_{21}(\mu ,\lambda ) \varvec{Q}_1(\varvec{\lambda }) \varvec{Q}_2(\varvec{\mu })\big ). \end{aligned}$$

This holds since r is skew-symmetric.

3.2.1 Extracting Lagrangians and Lagrangian multiforms

We have been using the generating formalism efficiently so far. Here, we spend some time discussing the connection of our generating Lagrangian multiform with Lagrangians and Lagrangian multiforms. This will be useful to reformulate the multiform EL equations and the closure relation in generating form, allowing to continue to take advantage of this for general computations.

From the definition of the generating Lagrangian multiform (1.15), we see that the kinetic term \(K^{a,b}(\lambda _a, \mu _b)\) given by (3.40a) is a Laurent series in both \(\lambda _a\) and \(\mu _b\), with powers bounded below by \(-N_a\) and \(-N_b\), respectively. In particular, for any \(m, n \in \mathbb {Z}\) the coefficient of \(\lambda _a^m \mu _b^n\) is well defined. The same is true for the potential term (3.40b) by the following lemma.

Lemma 3.10

For any \(m, n \in \mathbb {Z}\) and any \(a, b \in \mathbb {C}P^1\), the coefficient of \(\lambda _a^m \mu _b^n\) in the potential term \(U^{a,b}(\lambda _a, \mu _b)\) given by (3.40b) is a well defined expression which is quadratic in the coefficients of \(Q^a(\lambda _a)\) and \(Q^b(\mu _b)\).

Proof

If \(b \ne a\) then \((\iota _{\lambda _a} \iota _{\mu _b} + \iota _{\mu _b} \iota _{\lambda _a})r_{12}(\lambda ,\mu )\) is valued in \((\mathfrak {g}\otimes \mathfrak {g}) \otimes \mathbb {C}\llbracket \lambda _a, \mu _b \rrbracket \). Since by definition (3.4) we have \(Q^a(\lambda _a) \in \mathfrak {g}\otimes \lambda _a^{-N_a} \mathbb {C}\llbracket \lambda _a \rrbracket \) and \(Q^b(\mu _b) \in \mathfrak {g}\otimes \mu _b^{-N_b} \mathbb {C}\llbracket \mu _b \rrbracket \), it follows that \(U^{a,b}(\lambda _a, \mu _b)\) is a Laurent series in both \(\lambda _a\) and \(\mu _b\), with powers bounded below by \(-N_a\) and \(-N_b\), respectively.

If \(b=a\) then \((\iota _{\lambda _a} \iota _{\mu _a} + \iota _{\mu _a} \iota _{\lambda _a})r_{12}(\lambda ,\mu )\) contains a doubly infinite Laurent series in \(\lambda _a \mu _a^{-1}\) coming from the expansion of \(1/(\lambda - \mu )\), possibly also multiplied by some polynomial in \(\lambda _a\) and \(\mu _a\) depending on the precise form of the r-matrix. Multiplying this by the Laurent series \(Q^a(\lambda _a) \in \mathfrak {g}\otimes \lambda _a^{-N_a} \mathbb {C}\llbracket \lambda _a \rrbracket \) and \(Q^a(\mu _a) \in \mathfrak {g}\otimes \mu _a^{-N_a} \mathbb {C}\llbracket \mu _a \rrbracket \), we produce terms of the form \(\lambda _a^{r+j+p} \mu _a^{s-j+q}\) with \(r, s \ge -N_a\), \(j \in \mathbb {Z}\) and pq ranging over finitely many possible values. In order to form a term proportional to \(\lambda _a^m \mu _a^n\) we need \(m = r+j+p\) and \(n=s-j+q\). But then \(m-j-p = r \ge -N_a\) so that \(j \le m +N_a - p\) and also \(n+j-q = s \ge - N_a\) so that \(j \ge -n -N_a+q\). In other words, \(j \in \mathbb {Z}\) must be bounded from above and below so that it ranges only over finitely many values. Hence, there are only finitely many terms contributing to the coefficient of \(\lambda _a^m \mu _a^n\) and the result follows. \(\square \)

As a consequence, for any \(a,b \in \mathbb {C}P^1\) and \(m, n \in \mathbb {Z}\) with \(m \ge - N_a\) and \(n \ge - N_b\), we may now extract the following Lagrangian coefficients associated to the times \(t_n^a\) and \(t_m^b\):

Definition 3.11

(Elementary Lagrangians).

$$\begin{aligned} \mathscr {L}^{a,b}_{m,n} {:}{=}{{\,\textrm{res}\,}}^\lambda _a {{\,\textrm{res}\,}}^\mu _b \mathscr {L}^{a,b}(\lambda _a, \mu _b)\lambda ^{-m-1}d\lambda \,\mu ^{-n-1}d{\mu }. \end{aligned}$$
(3.41)

Recall the notational convention explained after (2.2b), in particular for residues computed at infinity. In short, Definition (3.41) means that \(\mathscr {L}^{a,b}_{m,n}\) is the coefficient of \(\lambda _a^m\mu _b^n\) in the expansion of \(\mathscr {L}^{a,b}(\lambda _a, \mu _b)\), as one would want. This is what we use to compute elementary Lagrangians in all our examples.

As explained below, when building a hierarchy, one chooses a finite set \(S\in \mathbb {C}P^1\) and all but a finite number of the elementary Lagrangians \(\mathscr {L}^{a,b}_{m,n}\) vanish (those for which a and/or b is in \(\mathbb {C}P^1{\setminus } S\)). The Lagrangian multiform of the hierarchy is then given by

$$\begin{aligned} \mathscr {L}^{\textrm{S}} {:}{=}\tfrac{1}{2} \sum _{a,b \in S}\sum _{m,n}\mathscr {L}^{a,b}_{m,n}\,d{t^a_m} \wedge d{t^b_n}=\sum _{(m,a)<(n,b)}\mathscr {L}^{a,b}_{m,n}\,d{t^a_m} \wedge d{t^b_n}. \end{aligned}$$
(3.42)

Note that we introduced an order on the pairs \((m,a)\in \mathbb {Z}\times S\) in the last equality (recall that \(\mathscr {L}^{a,b}_{m,n}=-\mathscr {L}^{b,a}_{n,m}\)). With \(S=\{a_1,\dots ,a_n\}\), it is defined by

$$\begin{aligned} (m,a_i)< (n,a_j)\Leftrightarrow i<j ~\text {or}~ (i=j ~\text {and}~ m< n). \end{aligned}$$

These definitions generalise the correspondence explained in the introductory section 1.1.3 between \(\mathscr {L}[u]\) and \(\mathscr {L}(\lambda ,\mu )\) for the AKNS hierarchy. As we will see in detail in Sect. 4, the latter indeed corresponds to the case where \(S=\{\infty \}\). In practice, one calculates the elementary Lagrangians (3.41) directly by computing the appropriate Laurent series expansion of \(\mathscr {L}^{a,b}(\lambda _a, \mu _b)\). The corresponding Lagrangian multiform is easily obtained as in (3.42).

The essential point of the present discussion is to identify the generating form of the two main equations of the theory of Lagrangian multiforms: the multiform EL equations \(\delta d \mathscr {L}^{\textrm{S}}=0\) and the closure relation \(d \mathscr {L}^{\textrm{S}}=0\) which should hold on solutions of the multiform EL equations. We see that the key object to translate in generating form is therefore \(d \mathscr {L}^{\textrm{S}}\). In view of (3.42), \(d \mathscr {L}^\textrm{S}\) has the form

$$\begin{aligned} d \mathscr {L}^\textrm{S}=\sum _{(k,c)<(m,a)<(n,b)}\left( \partial _{t_k^c}\mathscr {L}^{a,b}_{m,n}+\partial _{t_n^b}\mathscr {L}^{c,a}_{k,m}+\partial _{t_m^a}\mathscr {L}^{b,c}_{n,k} \right) \,dt_k^c\wedge d{t^a_m} \wedge d{t^b_n}. \end{aligned}$$

The generating function corresponding to the coefficient \(\partial _{t_k^c}\mathscr {L}^{a,b}_{m,n}+\partial _{t_n^b}\mathscr {L}^{c,a}_{k,m}+\partial _{t_m^a}\mathscr {L}^{b,c}_{n,k}\) is

$$\begin{aligned} \mathcal {D}_{\nu _c}\mathscr {L}^{a,b}(\lambda _a, \mu _b)+\mathcal {D}_{\mu _b}\mathscr {L}^{c,a}(\nu _c,\lambda _a )+\mathcal {D}_{\lambda _a}\mathscr {L}^{b,c}(\mu _b,\nu _c). \end{aligned}$$

Summarizing our discussion, the set S was fixed but arbitrary, so going back to the adélic setting, we will be working compactly with

$$\begin{aligned} \delta \mathcal {D}_{\varvec{\nu }}\mathscr {L}(\varvec{\lambda }, \varvec{\mu })+\delta \mathcal {D}_{ \varvec{\mu }}\mathscr {L}(\varvec{\nu },\varvec{\lambda })+\delta \mathcal {D}_{\varvec{\lambda }}\mathscr {L}( \varvec{\mu },\varvec{\nu }) \end{aligned}$$

when deriving the multiform EL equations in generating form, and with

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }}\mathscr {L}(\varvec{\lambda }, \varvec{\mu })+\mathcal {D}_{ \varvec{\mu }}\mathscr {L}(\varvec{\nu },\varvec{\lambda })+\mathcal {D}_{\varvec{\lambda }}\mathscr {L}( \varvec{\mu },\varvec{\nu }) \end{aligned}$$

when studying the closure relation.

3.2.2 Generating multiform Euler–Lagrange equations

Having introduced the main object of our framework, we proceed to derive the associated multiform EL equations (in generating form) and show that they give the generating Lax equation (3.8).

Theorem 3.12

The generating Lax equation (3.8) is variational: the multiform EL equations deriving from the generating Lagrangian multiform take the form

$$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{Q}_1(\varvec{\lambda }) = \big [ {{\,\textrm{Tr}\,}}_2 \big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ), \varvec{Q}_1(\varvec{\lambda }) \big ]. \end{aligned}$$

Proof

We derive the equations induced by the requirement \(\delta d\mathscr {L}=0\) in generating form. This means that we compute \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{\mathscr {L}}(\varvec{\lambda },\varvec{\mu }) + \circlearrowleft =0\), where \(\circlearrowleft \) means cyclic permutations of \(\lambda ,\mu ,\nu \), and set the independent coefficients to zero. We start with the kinetic terms.

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{K}(\varvec{\lambda },\varvec{\mu }) =&{{\,\textrm{Tr}\,}}\Big (- \varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\nu }} \varvec{\phi }(\varvec{\lambda }) \varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda }) (\varvec{\iota }_{\varvec{\lambda }}F(\lambda ))_- + \varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\nu }} \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda }) (\varvec{\iota }_{\varvec{\lambda }}F(\lambda ))_- \\&+ \varvec{\phi }^{-1}(\varvec{\mu })\mathcal {D}_{\varvec{\nu }} \varvec{\phi }(\varvec{\mu }) \varvec{\phi }^{-1}(\varvec{\mu }) \mathcal {D}_{\varvec{\lambda }} \varvec{\phi }(\varvec{\mu }) (\varvec{\iota }_{\varvec{\mu }}F(\mu ))_- - \varvec{\phi }^{-1}(\varvec{\mu })\mathcal {D}_{\varvec{\nu }} \mathcal {D}_{\varvec{\lambda }} \varvec{\phi }(\varvec{\mu })(\varvec{\iota }_{\varvec{\mu }}F(\mu ))_- \Big ) \end{aligned}$$

so that \(\mathcal {D}_{\varvec{\nu }} \varvec{K}(\varvec{\lambda },\varvec{\mu }) + \circlearrowleft \) is equal to

$$\begin{aligned}&{{\,\textrm{Tr}\,}}\Big (\left[ \varvec{\phi }^{-1}(\lambda ) \mathcal {D}_{\varvec{\mu }}\varvec{\phi }(\varvec{\lambda }),\varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\nu }} \varvec{\phi }(\varvec{\lambda })\right] (\varvec{\iota }_{\varvec{\lambda }}F(\lambda ))_- \Big )+ \circlearrowleft . \end{aligned}$$
(3.43)

After we apply the \(\delta \) differential we get

$$\begin{aligned} \delta \mathcal {D}_{\varvec{\nu }} \varvec{K}(\varvec{\lambda },\varvec{\mu }) + \circlearrowleft= & {} {{\,\textrm{Tr}\,}}\Big ( \mathcal {D}_{\varvec{\nu }} \varvec{\phi }(\varvec{\lambda })\varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\mu }} \varvec{Q}(\varvec{\lambda }) - \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda })\varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\nu }} \varvec{Q}(\varvec{\lambda })\Big )\delta \varvec{\phi }(\lambda )\varvec{\phi }^{-1}(\varvec{\lambda })\\{} & {} \quad +{{\,\textrm{Tr}\,}}\Big (\varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\nu }} \varvec{Q}(\varvec{\lambda })\delta \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda }) - \varvec{\phi }^{-1}(\varvec{\lambda }) \mathcal {D}_{\varvec{\mu }} \varvec{Q}(\varvec{\lambda }) \delta \mathcal {D}_{\varvec{\nu }} \varvec{\phi }(\varvec{\lambda }) \Big ) + \circlearrowleft . \end{aligned}$$

We now turn to the the potential term

$$\begin{aligned} \varvec{U}(\varvec{\lambda },\varvec{\mu }) =\tfrac{1}{2} {{\,\textrm{Tr}\,}}_{12}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) \varvec{Q}_1(\varvec{\lambda }) \varvec{Q}_2(\varvec{\mu })\big ). \end{aligned}$$
(3.44)

We drop \(\varvec{\lambda }\) and \(\varvec{\mu }\) in \(\varvec{\phi }\) and \(\varvec{Q}\) for conciseness since they follow the spaces 1 and 2 consistently. Let us also denote \((\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu )\) by \(\varvec{r}_{12}\). We compute

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu }) = \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{12}\big ( \varvec{r}_{12} \left( \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1 \varvec{Q}_2+ \varvec{Q}_1 \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2\right) \big ) \end{aligned}$$
(3.45)

and after applying the \(\delta \)-differential we get

$$\begin{aligned} \delta \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu })= \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{12}\big ( \varvec{r}_{12} \left( \delta \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1 \varvec{Q}_2+ \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1 \delta \varvec{Q}_2+\delta \varvec{Q}_1 \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2+\varvec{Q}_1 \delta \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2\right) \big ) \nonumber \\ \end{aligned}$$
(3.46)

and similarly for the cyclic permutations. We use the following identities

$$\begin{aligned} {{\,\textrm{Tr}\,}}_{12} \varvec{r}_{12} \delta \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1\varvec{Q}_2= & {} {{\,\textrm{Tr}\,}}_{12}(-\varvec{Q}_2 r_{12} \varvec{\mathcal {D}}_{\varvec{\nu }} \varvec{Q}_1 - \varvec{Q}_1 \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1 \varvec{Q}_2 r_{12} \\{} & {} + \mathcal {D}_{\varvec{\nu }}\varvec{\phi }_1\varvec{\phi }^{-1}_1 \varvec{Q}_2 \varvec{r}_{12} \varvec{Q}_1 + \varvec{\phi }_1 \mathcal {D}_{\varvec{\nu }} X_1 \varvec{\phi }^{-1}_1 \varvec{Q}_2 r_{12})\delta \varvec{\phi }_1 \varvec{\phi }^{-1}_1\\{} & {} +{{\,\textrm{Tr}\,}}_{12} [\varvec{Q}_1, \varvec{r}_{12} \varvec{Q}_2] \delta \mathcal {D}_{\varvec{\nu }}\varvec{\phi }_1 \varvec{\phi }^{-1}_1\\= & {} {{\,\textrm{Tr}\,}}_{12}( [\mathcal {D}_{\varvec{\nu }}\varvec{Q}_1, \varvec{r}_{12}\varvec{Q}_2] - \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1 [\varvec{Q}_1, \varvec{r}_{12}\varvec{Q}_2])\delta \varvec{\phi }_1 \varvec{\phi }^{-1}_1\\{} & {} +{{\,\textrm{Tr}\,}}_{12} [\varvec{Q}_1, \varvec{r}_{12} \varvec{Q}_2] \delta \mathcal {D}_{\varvec{\nu }}\varvec{\phi }_1 \varvec{\phi }^{-1}_1, \\ {{\,\textrm{Tr}\,}}_{12} \varvec{r}_{12}\varvec{Q}_1 \delta \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2= & {} {{\,\textrm{Tr}\,}}_{12}( - \varvec{r}_{12} \varvec{Q}_1 \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2 - \varvec{Q}_2 \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_2 \varvec{\phi }^{-1}_2 \varvec{r}_{12} \varvec{Q}_1 \\{} & {} + \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_2 \varvec{\phi }^{-1}_2 \varvec{r}_{12} \varvec{Q}_1 \varvec{Q}_2 + \varvec{\phi }_2 \mathcal {D}_{\varvec{\nu }} X_2 \varvec{\phi }^{-1}_2 \varvec{r}_{12} \varvec{Q}_1) \delta \varvec{\phi }_2 \varvec{\phi }^{-1}_2\\{} & {} +{{\,\textrm{Tr}\,}}_{12} [\varvec{Q}_2,\varvec{r}_{12}\varvec{Q}_1]\delta \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_2 \varvec{\phi }^{-1}_2\\= & {} {{\,\textrm{Tr}\,}}_{12} ([\mathcal {D}_{\varvec{\nu }} \varvec{Q}_2, \varvec{r}_{12}\varvec{Q}_1] - \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_2 \varvec{\phi }^{-1}_2[\varvec{Q}_2,\varvec{r}_{12} \varvec{Q}_1]) \delta \varvec{\phi }_2 \varvec{\phi }^{-1}_2\\{} & {} +{{\,\textrm{Tr}\,}}_{12} [\varvec{Q}_2,\varvec{r}_{12}\varvec{Q}_1]\delta \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_2 \varvec{\phi }^{-1}_2, \end{aligned}$$

and

$$\begin{aligned}{} & {} {{\,\textrm{Tr}\,}}_{12} \varvec{r}_{12} \delta \varvec{Q}_1 \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2 = {{\,\textrm{Tr}\,}}_{12} [\varvec{Q}_1,\varvec{r}_{12} \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2] \delta \varvec{\phi }_1 \varvec{\phi }^{-1}_1,\\{} & {} {{\,\textrm{Tr}\,}}_{12} \varvec{r}_{12} \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1 \delta \varvec{Q}_2 = {{\,\textrm{Tr}\,}}_{12} [\varvec{Q}_2,\varvec{r}_{12} \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1] \delta \varvec{\phi }_2\varvec{\phi }^{-1}_2, \end{aligned}$$

to express \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu })\) on the basis of \(\delta \varvec{\phi }_1\), \(\delta \mathcal {D}_{\varvec{\mu }} \varvec{\phi }_1\) and \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_1\) (and similarly on the space 2). Then, we collect the coefficients of \(\delta \varvec{\phi }_1\), \(\delta \mathcal {D}_{\varvec{\mu }} \varvec{\phi }_1\) and \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_1\) which provide the independent equations. From \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{K}(\varvec{\lambda },\varvec{\mu }) + \circlearrowleft \) we have

$$\begin{aligned}{} & {} {{\,\textrm{Tr}\,}}_1 ( -\mathcal {D}_{\varvec{\mu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1 \mathcal {D}_{\varvec{\nu }}\varvec{Q}_1 + \mathcal {D}_{\varvec{\nu }}\varvec{\phi }_1 \varvec{\phi }^{-1}_1 \mathcal {D}_{\varvec{\mu }}\varvec{Q}_1) \delta \varvec{\phi }_1 \varvec{\phi }^{-1}_1+ {{\,\textrm{Tr}\,}}_1( - \mathcal {D}_{\varvec{\mu }} \delta \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1 \nonumber \\{} & {} \qquad + \mathcal {D}_{\varvec{\nu }}\varvec{Q}_1 \delta \mathcal {D}_{\varvec{\mu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1) + \circlearrowleft \end{aligned}$$
(3.47)

and from \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu }) + \circlearrowleft \), using the skew-symmetry of r, we obtain

$$\begin{aligned}{} & {} {{\,\textrm{Tr}\,}}_{12} [\varvec{Q}_1,\varvec{r}_{12} \varvec{Q}_2] \delta \mathcal {D}_{\varvec{\nu }}\varvec{\phi }_1 \varvec{\phi }^{-1}_1 - {{\,\textrm{Tr}\,}}_{13}[\varvec{Q}_1,\varvec{r}_{13}\varvec{Q}_3] \delta \mathcal {D}_{\varvec{\mu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1\nonumber \\{} & {} +{{\,\textrm{Tr}\,}}_{12}( [\mathcal {D}_{\varvec{\nu }} \varvec{Q}_1, \varvec{r}_{12}\varvec{Q}_2] - \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1 [\varvec{Q}_1, \varvec{r}_{12}\varvec{Q}_2] + [\varvec{Q}_1,\varvec{r}_{12}\mathcal {D}_{\varvec{\nu }}\varvec{Q}_2])\delta \varvec{\phi }_1 \varvec{\phi }^{-1}_1\\{} & {} + {{\,\textrm{Tr}\,}}_{13}(-[\mathcal {D}_{\varvec{\mu }}\varvec{Q}_1, \varvec{r}_{13}\varvec{Q}_3] + \mathcal {D}_{\varvec{\mu }} \varvec{\phi }_1 \varvec{\phi }^{-1}_1[\varvec{Q}_1,\varvec{r}_{13} \varvec{Q}_3] -[\varvec{Q}_1,\varvec{r}_{13}\mathcal {D}_{\varvec{\mu }} \varvec{Q}_3]) \delta \varvec{\phi }_1 \varvec{\phi }^{-1}_1+ \circlearrowleft .\nonumber \end{aligned}$$
(3.48)

The coefficients of \(\delta \mathcal {D}_{\varvec{\mu }} \varvec{\phi }_1\) and \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{\phi }_1\) in \(\delta \mathcal {D}_{\varvec{\nu }} \varvec{\mathscr {L}}(\varvec{\lambda },\varvec{\mu }) + \circlearrowleft =0\) give

$$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{Q}_1 = \tfrac{1}{2} [{{\,\textrm{Tr}\,}}_2 \varvec{r}_{12}\varvec{Q}_2,\varvec{Q}_1], \quad \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1 = \tfrac{1}{2} [{{\,\textrm{Tr}\,}}_3 \varvec{r}_{13}\varvec{Q}_3,\varvec{Q}_1], \end{aligned}$$

i.e. two equivalent copies of the same equation under the irrelevant change \(2 \leftrightarrow 3\) and \(\mu \leftrightarrow \nu \). Explicitly, it reads

$$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{Q}_1(\varvec{\lambda })= \tfrac{1}{2} \big [ {{\,\textrm{Tr}\,}}_2 \big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }}) r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }) \big ),\varvec{Q}_1(\varvec{\lambda }) \big ], \end{aligned}$$
(3.49)

which gives the desired result (3.8) upon recalling Lemma 3.2. The coefficient of \(\delta \varvec{\phi }_1\) is just a consequence of this equation and of the commutativity of the flows: \([\mathcal {D}_{\varvec{\mu }}, \mathcal {D}_{\varvec{\nu }}]=0\). The coefficients of \(\delta \varvec{\phi }_2\), \(\delta \varvec{\phi }_3\) etc. contained in the cyclic permutations \(\circlearrowleft \) give equivalent equations under the corresponding cyclic permutations of the spectral parameters and auxiliary spaces.

\(\square \)

3.2.3 Generating closure relation

Theorem 3.13

The generating closure relation

(3.50)

holds when (3.8) is satisfied. It is a consequence of the CYBE for r.

Proof

First consider the kinetic term (3.39a). We have

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{K}(\varvec{\lambda },\varvec{\mu })&= {{\,\textrm{Tr}\,}}\big ( \varvec{\phi }(\varvec{\lambda })^{-1} \mathcal {D}_{\varvec{\nu }} \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda }) (\varvec{\iota }_{\varvec{\lambda }} F(\lambda ))_- \big )\\&\quad - {{\,\textrm{Tr}\,}}\big (\varvec{\phi }(\varvec{\mu })^{-1} \mathcal {D}_{\varvec{\nu }} \mathcal {D}_{\varvec{\lambda }} \varvec{\phi }(\varvec{\mu }) (\varvec{\iota }_{\varvec{\mu }} F(\mu ))_- \big )\\&\quad - {{\,\textrm{Tr}\,}}\big ( \varvec{\phi }(\varvec{\lambda })^{-1} \mathcal {D}_{\varvec{\nu }} \varvec{\phi }(\varvec{\lambda }) \varvec{\phi }(\varvec{\lambda })^{-1} \mathcal {D}_{\varvec{\mu }} \varvec{\phi }(\varvec{\lambda }) (\varvec{\iota }_{\varvec{\lambda }} F(\lambda ))_- \big )\\&\quad + {{\,\textrm{Tr}\,}}\big (\varvec{\phi }(\varvec{\mu })^{-1} \mathcal {D}_{\varvec{\nu }} \varvec{\phi }(\varvec{\mu }) \varvec{\phi }(\varvec{\mu })^{-1} \mathcal {D}_{\varvec{\lambda }} \varvec{\phi }(\varvec{\mu }) (\varvec{\iota }_{\varvec{\mu }} F(\mu ))_- \big ). \end{aligned}$$

It follows by adding the cyclic permutations of this expression in the variables \(\lambda \), \(\mu \) and \(\nu \) that

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{K}(\varvec{\lambda }, \varvec{\mu }) + \mathcal {D}_{\varvec{\lambda }} \varvec{K}(\varvec{\mu }, \varvec{\nu }) + \mathcal {D}_{\varvec{\mu }} \varvec{K}(\varvec{\nu }, \varvec{\lambda }) =0. \end{aligned}$$
(3.51)

Consider now the potential term (1.16b). Using Theorem 3.12 we find

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu })&= \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{12}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) \mathcal {D}_{\varvec{\nu }} \varvec{Q}_1(\varvec{\lambda }) \varvec{Q}_2(\varvec{\mu })\big )\\&\quad + \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{12}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) \varvec{Q}_1(\varvec{\lambda }) \mathcal {D}_{\varvec{\nu }} \varvec{Q}_2(\varvec{\mu })\big )\\&= \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{123}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) \big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_1(\varvec{\lambda }) \big ] \varvec{Q}_2(\varvec{\mu })\big )\\&\quad + \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{123}\big ( (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) \varvec{Q}_1(\varvec{\lambda }) \big [ \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} r_{23}(\mu ,\nu ) \varvec{Q}_3(\varvec{\nu }),\varvec{Q}_2(\varvec{\mu }) \big ] \big ). \end{aligned}$$

By using the cyclicity of the trace in space 1 and 2 in the first and second terms, respectively, we may write this as

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu })&= - \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{123}\big ( \big [ (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }), \varvec{Q}_1(\varvec{\lambda }) \big ] \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }) \big )\\&\quad - \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{123}\big ( \big [ (\varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} + \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }})r_{12}(\lambda ,\mu ) \varvec{Q}_1(\varvec{\lambda }),\varvec{Q}_2(\varvec{\mu }) \big ] \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} r_{23}(\mu ,\nu ) \varvec{Q}_3(\varvec{\nu }) \big )\\&= - {{\,\textrm{Tr}\,}}_{123}\big ( \big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_2(\varvec{\mu }), \varvec{Q}_1(\varvec{\lambda }) \big ] \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\nu }} r_{13}(\lambda ,\nu ) \varvec{Q}_3(\varvec{\nu }) \big )\\&\quad - {{\,\textrm{Tr}\,}}_{123}\big ( \big [ \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} r_{12}(\lambda ,\mu ) \varvec{Q}_1(\varvec{\lambda }),\varvec{Q}_2(\varvec{\mu }) \big ] \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} r_{23}(\mu ,\nu ) \varvec{Q}_3(\varvec{\nu }) \big ) \end{aligned}$$

where in the second equality we used Lemma 3.2 in both terms. By using once again the cyclicity of the trace in space 1 and 2 in the first and second terms, respectively, we arrive at the expression

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu })&= {{\,\textrm{Tr}\,}}_{123}\big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big ( [r_{12}(\lambda ,\mu ), r_{13}(\lambda ,\nu )] \nonumber \\&\quad + [r_{12}(\lambda ,\mu ), r_{23}(\mu ,\nu )] \big ) \varvec{Q}_1(\varvec{\lambda }) \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu })\big ). \end{aligned}$$
(3.52a)

Likewise, using the skew-symmetry of the r-matrix we find

$$\begin{aligned} \mathcal {D}_{\varvec{\lambda }} \varvec{U}(\varvec{\mu }, \varvec{\nu })&= - \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{123}\big ( (\varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} + \varvec{\iota }_{\varvec{\nu }} \varvec{\iota }_{\varvec{\mu }})r_{23}(\mu , \nu ) \big [ \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\lambda }} r_{12}(\lambda , \mu ) \varvec{Q}_1(\varvec{\lambda }),\varvec{Q}_2(\varvec{\mu }) \big ] \varvec{Q}_3(\varvec{\nu })\big )\\&\quad - \tfrac{1}{2} {{\,\textrm{Tr}\,}}_{123}\big ( (\varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} + \varvec{\iota }_{\varvec{\nu }} \varvec{\iota }_{\varvec{\mu }})r_{23}(\mu ,\nu ) \varvec{Q}_2(\varvec{\mu }) \big [ \varvec{\iota }_{\varvec{\nu }} \varvec{\iota }_{\varvec{\lambda }} r_{13}(\lambda , \nu ) \varvec{Q}_1(\varvec{\lambda }),\varvec{Q}_3(\varvec{\nu }) \big ] \big ). \end{aligned}$$

Then by following the same steps as above for \(\mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda },\varvec{\mu })\) we deduce that

$$\begin{aligned} \mathcal {D}_{\varvec{\lambda }} \varvec{U}(\varvec{\mu }, \varvec{\nu })&= {{\,\textrm{Tr}\,}}_{123}\big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big ( [r_{12}(\lambda ,\mu ), r_{23}(\mu ,\nu )] \nonumber \\&\quad + [r_{13}(\lambda ,\nu ), r_{23}(\mu ,\nu )] \big ) \varvec{Q}_1(\varvec{\lambda }) \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu })\big ). \end{aligned}$$
(3.52b)

Similarly, we also find using the skew-symmetry of the r-matrix that

$$\begin{aligned} \mathcal {D}_{\varvec{\mu }} \varvec{U}(\varvec{\nu }, \varvec{\lambda })&= {{\,\textrm{Tr}\,}}_{123}\big ( \varvec{\iota }_{\varvec{\lambda }} \varvec{\iota }_{\varvec{\mu }} \varvec{\iota }_{\varvec{\nu }} \big ( [r_{13}(\lambda ,\nu ), r_{23}(\mu ,\nu )] \nonumber \\&\quad + [r_{12}(\lambda ,\mu ), r_{13}(\lambda ,\nu )] \big ) \varvec{Q}_1(\varvec{\lambda }) \varvec{Q}_2(\varvec{\mu }) \varvec{Q}_3(\varvec{\nu })\big ). \end{aligned}$$
(3.52c)

It now follows from combining the three equations in (3.52) and using the classical Yang–Baxter equation for the skew-symmetry r-matrix that

$$\begin{aligned} \mathcal {D}_{\varvec{\nu }} \varvec{U}(\varvec{\lambda }, \varvec{\mu }) + \mathcal {D}_{\varvec{\lambda }} \varvec{U}(\varvec{\mu }, \varvec{\nu }) + \mathcal {D}_{\varvec{\mu }} \varvec{U}(\varvec{\nu }, \varvec{\lambda }) = 0. \end{aligned}$$
(3.53)

The result now follows from (3.51) and (3.53) but together. \(\square \)

The rest of the paper is devoted to examples. To specify an example, the following ingredients need to be fixed:

(i):

a skew-symmetric r-matrix as in Sect. 2 (rational or trigonometric in this work),

(ii):

an effective divisor \(\mathcal D {:}{=}\sum _{a \in S} N_a a\), in particular with support given by a finite subset \(S \subset \mathbb {C}P^1\) and with \(N_a \in \mathbb {Z}_{\ge 1}\) for each \(a \in S\), (\(N_\infty \in \mathbb {Z}_{\ge 0}\) if \(\infty \in S\)),

(ii):

a Lie algebra \(\mathfrak {g}\) which for simplicity we take to be either \(\mathfrak {gl}_N\) or \(\mathfrak {sl}_N\),

(iv):

a \(\mathfrak {g}\)-valued rational function \(F(\lambda ) \in R_\lambda (\mathfrak {g})\) with pole divisor \((F)_\infty = \mathcal D\), i.e. with a pole of order \(N_a \in \mathbb {Z}_{\ge 1}\) at each point \(a \in S\), (\(N_\infty \in \mathbb {Z}_{\ge 0}\) if \(\infty \in S\)).

Each section contains an example of a hierarchy for which the above formalism produces Lagrangian multiforms, Lax matrices and zero curvature equations. Some sections consist of known examples that we recover or cast in a new light, e.g. AKNS and sine-Gordon. Other examples are new to the best of our knowledge and show the power of the formalism, e.g. the trigonometric Zakharov–Mikhailov class of models or the examples where we couple different integrable field theories together.

4 AKNS Hierarchy

We keep this section short as it is a matter of “closing the loop”: we reproduce the motivating example of Sect. 1.1.3 which was dealt with in detail in [CS3]) and the starting point of this whole project. The main objective is to illustrate how to use our machinery on the simplest and most well known example. We choose the rational r-matrix and we fix the required data as follows:

$$\begin{aligned} S=\{\infty \},~~N_\infty =0,~~\mathfrak {g}=\mathfrak {sl}_2,~~F(\lambda )=-i\sigma _3. \end{aligned}$$
(4.1)

The adjoint orbit description of Sect. 3.1 is implemented with

$$\begin{aligned} \phi ^\infty (\lambda _\infty )=\varvec{1}+\sum _{n=1}^\infty \phi _n^\infty \lambda _{\infty }^n, \end{aligned}$$
(4.2)

and gives

$$\begin{aligned} Q^\infty (\lambda _\infty )=\sum _{n=0}^\infty Q_n^{\infty }\lambda _\infty ^n, \end{aligned}$$
(4.3)

with \(Q_0^{\infty }=-i\sigma _3\) and \(Q_1^{\infty }=i[\sigma _3,\phi _1^\infty ]\), the familiar first two elements in the AKNS hierarchy. Since there is only one pole in this example, let us drop the subscripts and superscripts and simply write the fundamental objects in (4.2) and (4.3) as

$$\begin{aligned} \phi (\lambda )=\varvec{1}+\sum _{n=1}^\infty \phi _n \lambda ^{-n},~~Q(\lambda )=\sum _{n=0}^\infty Q_n\lambda ^{-n}. \end{aligned}$$
(4.4)

Similarly, we will just write \(t_n\) instead of \(t_n^\infty \) for the times of the hierarchy. The generating Lax equation (3.8) gives us, using the definitions (3.6), (3.14) and (3.15),

$$\begin{aligned} \partial _{t_n}Q(\lambda )=\left[ V_n(\lambda ),Q(\lambda )\right] \end{aligned}$$
(4.5)

where

$$\begin{aligned} V_n(\lambda )=\sum _{r=0}^n Q_r \lambda ^{n-r} \end{aligned}$$
(4.6)

are the Lax matrices of the hierarchy. Eqs (4.5) are the the central equations of [FNR] where only Hamiltonian aspects of the theory were developed. The associated zero curvature equations read

$$\begin{aligned} \partial _{t_k}V_n(\lambda )-\partial _{t_n}V_k(\lambda )+[V_n(\lambda ),V_k(\lambda )]=0,~~n,k\ge 0, \end{aligned}$$
(4.7)

and produce the equations of motion of the hierarchy. The famous (unreduced) NLS system corresponds to \(n=1\) and \(k=2\). From our generating Lagrangian (1.15), we can of course reproduce the generating Lagrangian of [CS3] and all the Lagrangians forming the Lagrangian multiform that gives these equations as its (multiform) EL equations. Since \(S=\{\infty \}\) we only have \(\mathscr {L}^{\infty ,\infty }(\lambda _\infty , \mu _\infty )\) to consider. As above, let us simply denote it as \(\mathscr {L}(\lambda , \mu )\). The coefficient \(\mathscr {L}_{mn}\) of \({\lambda ^{-m-1}\mu ^{-n-1}}\) in its expansion reads

$$\begin{aligned} \mathscr {L}_{mn}=\sum _{i=1}^{m} {{\,\textrm{Tr}\,}}{\tilde{\phi }}_{i} \partial _{t_{n}} \phi _{ m-i+1} X_0 - \sum _{i=1}^{n} {{\,\textrm{Tr}\,}}\tilde{\phi }_{i} \partial _{t_{m}} \phi _{ n-i+1} X_0 -U_{mn} \end{aligned}$$
(4.8)

where we wrote \(\displaystyle \phi ^{-1}(\lambda )=\varvec{1}+\sum _{n=1}^\infty {\tilde{\phi }}_n\lambda ^{-n}\) for convenience and where \(U_{mn}\) is given by

$$\begin{aligned} U_{mn}=- {{\,\textrm{Tr}\,}}\sum _{j=0}^{m} Q_{m+n+1-j} Q_j. \end{aligned}$$
(4.9)

These are the coefficients of the AKNS Lagrangian multiform found in [CS3] (up to an overall minus sign) to which we refer for more details. It was explained in [CS3] that there exists a parametrization of \(\phi (\lambda )\) in terms of very nice coordinates \(\displaystyle e(\lambda )=\sum _{i=1}^\infty e_i\lambda ^{-i}\), \(\displaystyle f(\lambda )=\sum _{i=1}^\infty f_i\lambda ^{-i}\) as

$$\begin{aligned} \phi (\lambda )=\frac{1}{\sqrt{2i}}\begin{pmatrix} \sqrt{2i-e(\lambda )f(\lambda )} &{} e(\lambda )\\ -f(\lambda ) &{} \sqrt{2i-e(\lambda )f(\lambda )} \end{pmatrix}. \end{aligned}$$
(4.10)

For the reader’s convenience, let us give for instance

$$\begin{aligned} \mathscr {L}_{12}=\frac{1}{2} (f_1 \partial _{t_2} e_{1} - e_1 \partial _{t_2} f_{1}) - \frac{1}{2} \sum _{j=1}^{2}(f_j \partial _{t_1} e_{2-j+1} - e_j \partial _{t_1} f_{2-j+1})-2ie_2f_2-e_1^2f_1^2\nonumber \\ \end{aligned}$$
(4.11)

and

$$\begin{aligned} \mathscr {L}_{13}= & {} \frac{1}{2} (f_1 \partial _{t_3} e_{1} - e_1 \partial _{t_3} f_{1}) - \frac{1}{2} \sum _{j=1}^{3}(f_j \partial _{t_1} e_{3-j+1} - e_j \partial _{t_1} f_{3-j+1})\nonumber \\{} & {} \quad -2i(e_2f_3+e_3f_2)-\frac{3}{2}e_1f_1(f_1e_2+f_2e_1) \end{aligned}$$
(4.12)

Of course, one can check that the equations of motion for these Lagrangians give precisely the zero curvature equations (4.7) for \((k,n)=(1,2)\) and \((k,n)=(1,3)\) respectively. For instance, varying \(\mathscr {L}_{12}\) with respect to \(e_j\), \(f_j\), \(j=1,2\), we have

$$\begin{aligned} \partial _{t_1}e_1+2ie_2=0\,&,&~~\partial _{t_1}f_1-2if_2=0, \end{aligned}$$
(4.13)
$$\begin{aligned} \partial _{t_2}e_1-\partial _{t_1}e_2-2e_1^2f_1=0\,&,&~~\partial _{t_2}f_1-\partial _{t_1}f_2+2f_1^2e_1=0. \end{aligned}$$
(4.14)

This is equivalent to (4.7) for \((k,n)=(1,2)\), upon recalling that

$$\begin{aligned} \displaystyle Q_1=\begin{pmatrix} 0 &{} \sqrt{2i}e_1\\ \sqrt{2i}f_1 &{} 0 \end{pmatrix},~~\displaystyle Q_2=\begin{pmatrix} e_1f_1 &{} \sqrt{2i}e_2\\ \sqrt{2i}f_2 &{} -e_1f_1 \end{pmatrix}. \end{aligned}$$

The top two equations can be used to eliminate \(e_2,f_2\) in the bottom two equations. With \(t_2=t\), \(t_1=x\), \(e_1=\frac{1}{\sqrt{2i}}q\), \(f_1=\frac{1}{\sqrt{2i}}r\) we get

$$\begin{aligned} i\partial _{t}q+\frac{1}{2}\partial _{x}^2q-q^2r=0,~~-i\partial _{t}r+\frac{1}{2}\partial _{x}^2r-r^2q=0, \end{aligned}$$
(4.15)

and the reduction \(r=\mp r^*\) yields the well-known (de)focusing NLS equation

$$\begin{aligned} i\partial _{t}q+\frac{1}{2}\partial _{x}^2q\pm |q|^2q=0 \end{aligned}$$

for the complex field q. Similarly, \(\mathscr {L}_{13}\) gives the complex modified KdV equation.

5 Sine-Gordon Hierarchy

For the example of the sine-Gordon equation

$$\begin{aligned} u_{xy}+\sin u=0, \end{aligned}$$
(5.1)

we choose the trigonometric r-matrix (2.20). The required data is fixed as follows

$$\begin{aligned} S=\{0,\infty \},~~N_0=1=N_\infty ,~~\mathfrak {g}=\mathfrak {sl}_2,~~F(\lambda )=\frac{i}{2}\left( \frac{1}{\lambda }\sigma _++\sigma _--\sigma _+-\lambda \sigma _-\right) ,\nonumber \\ \end{aligned}$$
(5.2)

and we work with the basis \(\sigma _3\), \(\sigma _+\), \(\sigma _-\). The adjoint orbit description of Sect. 3.1 is implemented with

$$\begin{aligned}{} & {} \phi ^0(\lambda )=\sum _{n=0}^\infty \phi _n^0\lambda ^n,~~\phi _0^0=e^{i\frac{u}{4}\sigma _3}, \end{aligned}$$
(5.3)
$$\begin{aligned}{} & {} \phi ^\infty (\lambda _\infty )=\sum _{n=0}^\infty \phi _n^\infty \lambda _{\infty }^n,~~\phi _0^\infty =e^{-i\frac{u}{4}\sigma _3}. \end{aligned}$$
(5.4)

The phase space coordinate u will be the sine-Gordon field as will become clear soon. This gives, with \((\iota _{\lambda _0} F(\lambda ))^\textrm{trig}_-=\frac{i}{2}\left( \frac{1}{\lambda }\sigma _++\sigma _-\right) \) and \((\iota _{\lambda _\infty } F(\lambda ))^\textrm{trig}_-=-\frac{i}{2}\left( \lambda \sigma _-+\sigma _+\right) \),

$$\begin{aligned}{} & {} Q^0(\lambda _0)=\frac{i}{2}\phi ^0(\lambda )\left( \frac{1}{\lambda }\sigma _++\sigma _-\right) \phi ^0(\lambda )^{-1}=\sum _{n=-1}^\infty Q_n^{0}\lambda ^n, \end{aligned}$$
(5.5)
$$\begin{aligned}{} & {} Q^\infty (\lambda _\infty )= -\frac{i}{2}\phi ^\infty (\lambda _\infty )\left( \frac{1}{\lambda _\infty } \sigma _-+\sigma _+\right) \phi ^\infty (\lambda _\infty )^{-1}=\sum _{n=-1}^\infty Q_n^{\infty }\lambda _\infty ^n, \end{aligned}$$
(5.6)

with \(Q_{-1}^0=\frac{i}{2}e^{\frac{iu}{2}}\sigma _+\) and \(Q_{-1}^{\infty }=-\frac{i}{2}e^{\frac{iu}{2}}\sigma _-\). We now show how to use our formalism to recover the sine-Gordon equation (in light cone coordinates) as well as its first higher compatible flow which is nothing but the modified KdV equation, as presented in [Su]. We take advantage of this example to illustrate how our formalism also produces the Lagrangian multiform corresponding to these 3 times. In this context, our motivation is to show that the so-called “alien derivatives” problem that was discussed in [V] does not appear with our approach. The problem only arises if one insists on using the variational equations we obtain to eliminate some of the phase space coordinates in favour of the sine-Gordon field u and its derivatives with respect to a given time. In other words, we show in detail how our general discussion about the FNR procedure, when applied at the variational level, leads to this alien derivative problem. This is yet another reason in our opinion why it is preferable to work with the natural phase space coordinates that are provided by \(\varvec{\phi }(\varvec{\lambda })\).

It is convenient to parametrise (5.3)–(5.4) as

$$\begin{aligned}{} & {} \phi ^0(\lambda _0)=e^{i\frac{u}{4}\sigma _3}(\varvec{1}+\psi ^0(\lambda )),~~\psi ^0(\lambda )=\sum _{n=1}^\infty \psi _n^0\lambda ^n, \end{aligned}$$
(5.7)
$$\begin{aligned}{} & {} \phi ^\infty (\lambda _\infty )=e^{-i\frac{u}{4}\sigma _3}(\varvec{1}+\psi ^\infty (\lambda _\infty )),~~\psi ^\infty (\lambda _\infty )=\sum _{n=1}^\infty \psi _n^\infty \lambda _{\infty }^n, \end{aligned}$$
(5.8)

where we recall that \(\det \phi ^0=1=\phi ^\infty \) should hold. Using the gauge freedom of multiplying \(\phi ^0(\lambda _0)\) (resp. \(\phi ^\infty (\lambda _\infty )\)) on the right by a matrix which commutes with \((\iota _{\lambda _0} F(\lambda ))^{\textrm{trig}}_-\) (resp. \((\iota _{\lambda _\infty } F(\lambda ))^{\textrm{trig}}_-\)), we can work with

$$\begin{aligned}{} & {} \psi ^0(\lambda )=\sum _{n=1}^\infty \psi _n^0\lambda ^n,~~ \psi _n^0=\begin{pmatrix} A_n^0 &{} 0\\ C_n^0 &{} D_n^0 \end{pmatrix}, \end{aligned}$$
(5.9)
$$\begin{aligned}{} & {} \psi ^\infty (\lambda _\infty )=\sum _{n=1}^\infty \psi _n^\infty \lambda _{\infty }^n,~~\psi _n^\infty =\begin{pmatrix} A_n^\infty &{} B_n^\infty \\ 0 &{} D_n^\infty \end{pmatrix}. \end{aligned}$$
(5.10)

Note that one can show that there is a bijection between the group coordinates \(A_n^0\), \(C_n^0\), \(D_n^0\) and \(A_n^\infty \), \(C_n^\infty \), \(D_n^\infty \), and the algebra coordinates \(a_n^0\), \(b_n^0\) and \(c_n^0\), which we would introduce via \(Q_n^0=a_n^0\sigma _3+b_n^0\sigma _++c_n^0\sigma _-\) (and similarly at \(\infty \)). The reader familiar with the FNR construction or only interested in zero curvature equations would tend to use the algebra coordinates. However, since our Lagrangians are naturally expressed with group coordinates, we use the latter both for the zero curvature equations and the Lagrangians. It also facilitates comparison between the two ways of obtaining the equations of motion.

By our general results in Sects. 3.1.2 and 3.1.3, all the time flows commute and all the corresponding zero curvature equations of Proposition 3.5 hold, with the Lax matrices reading for \(n\ge -1\), (see Proposition 3.8)

$$\begin{aligned}{} & {} V_n^0(\lambda )=-(P^- + \tfrac{1}{2} P^0) Q^0_n -\frac{1}{\lambda } Q^0_{n-1} -\cdots - \frac{1}{\lambda ^{n}} Q_{0}^0 - \frac{1}{\lambda ^{n+1}} Q_{-1}^0 , \end{aligned}$$
(5.11)
$$\begin{aligned}{} & {} V_n^\infty (\lambda )=(P^+ + \tfrac{1}{2} P^0) Q^\infty _n +\lambda Q^\infty _{n-1} +\cdots + \lambda ^n Q^\infty _0+\lambda ^{n+1} Q^\infty _{-1}. \end{aligned}$$
(5.12)

The sine-Gordon equation is recovered by taking the pair of Lax matrices \((V^0_0(\lambda ),V^\infty _0(\lambda ))\) and the compatible higher flow attached to the pair \((V^\infty _0(\lambda ),V^\infty _1(\lambda ))\) gives the mKdV equation in potential form. The third possible Lax pair is \((V^0_0(\lambda ),V^\infty _1(\lambda ))\) and will be called the mixed equation. For convenience, let us label the corresponding times as follows \(t^0_0=y\), \(t^\infty _0=x\), \(t^\infty _1=z\). Therefore, we focus on the following three zero curvature equations

  1. 1.

    \(\partial _{x} V_0^0(\lambda ) -\partial _{y}V^\infty _0(\lambda )+\left[ V_0^0(\lambda ),V^\infty _0(\lambda )\right] =0\) (sG);

  2. 2.

    \(\partial _{z} V_0^\infty (\lambda ) -\partial _{x}V^\infty _1(\lambda )+\left[ V_0^\infty (\lambda ),V^\infty _1(\lambda )\right] =0\) (mKdV);

  3. 3.

    \(\partial _{z} V_0^0(\lambda ) -\partial _{y}V^\infty _1(\lambda )+\left[ V_0^0(\lambda ),V^\infty _1(\lambda )\right] =0\) (mixed).

A direct calculation gives

$$\begin{aligned}{} & {} Q_{-1}^0=\frac{i}{2}e^{\frac{iu}{2}}\sigma _+,~~Q_0^0=\frac{i}{2}\begin{pmatrix} -C_1^0 &{} 2 A_1^0e^{i\frac{u}{2}}\\ e^{-i\frac{u}{2}} &{} C_1^0 \end{pmatrix}, \end{aligned}$$
(5.13)
$$\begin{aligned}{} & {} Q_1^0=\frac{i}{2}\begin{pmatrix} -C_2^0-A_1^0C_1^0 &{} (2 A_2^0+(A_1^0)^2)e^{i\frac{u}{2}}\\ (2 D_1^0-(C_1^0)^2)e^{-i\frac{u}{2}} &{} C_2^0+A_1^0C_1^0 \end{pmatrix}, \end{aligned}$$
(5.14)
$$\begin{aligned}{} & {} Q_{-1}^{\infty }=-\frac{i}{2}e^{\frac{iu}{2}}\sigma _-,~~Q_0^\infty =-\frac{i}{2}\begin{pmatrix} B_1^\infty &{} e^{-i\frac{u}{2}}\\ 2D_1^\infty e^{i\frac{u}{2}} &{} -B_1^\infty \end{pmatrix}, \end{aligned}$$
(5.15)
$$\begin{aligned}{} & {} Q_{1}^{\infty }=-\frac{i}{2}\begin{pmatrix} B_2^\infty +B_1^\infty D_1^\infty &{} (2A_1^\infty -(B_1^\infty ))^2e^{-i\frac{u}{2}}\\ (2D_2^\infty +(D_1^\infty )^2)e^{i\frac{u}{2}} &{} -B_2^\infty -B_1^\infty D_1^\infty \end{pmatrix}. \end{aligned}$$
(5.16)

Hence,

$$\begin{aligned}{} & {} V^0_0(\lambda )=-(P^- + \tfrac{1}{2} P^0) Q^0_0 - \frac{1}{\lambda }Q_{-1}^0=-\frac{i}{4} \begin{pmatrix} -C_1^0 &{} 2e^{i\frac{u}{2}}/\lambda \\ 2e^{-i\frac{u}{2}} &{} C_1^0 \end{pmatrix}, \end{aligned}$$
(5.17)
$$\begin{aligned}{} & {} V^\infty _0(\lambda )=(P^+ + \tfrac{1}{2} P^0) Q^\infty _0 + \lambda Q_{-1}^\infty =-\frac{i}{4}\begin{pmatrix} B_1^\infty &{} 2e^{-i\frac{u}{2}}\\ 2\lambda e^{i\frac{u}{2}} &{} -B_1^\infty \end{pmatrix}, \end{aligned}$$
(5.18)
$$\begin{aligned}{} & {} V^\infty _1(\lambda )=(P^+ + \tfrac{1}{2} P^0) Q^\infty _1 + \lambda Q_{0}^\infty +\lambda ^2 Q_{-1}^\infty \nonumber \\{} & {} \qquad \qquad \quad =-\frac{i}{4}\begin{pmatrix} 2\lambda B_1^\infty +B_2^\infty -A_1^\infty B_1^\infty &{} 2(\lambda +2A_1^\infty -B_1^\infty )e^{-i\frac{u}{2}}\\ 2\lambda (\lambda -2A_1^\infty ) e^{i\frac{u}{2}} &{} -2\lambda B_1^\infty -B_2^\infty +A_1^\infty B_1^\infty \end{pmatrix}. \end{aligned}$$
(5.19)

Therefore, we obtain the following equations of motion from the zero curvature equations:

$$\begin{aligned} \text {(sG)}~~{\left\{ \begin{array}{ll} C_1^0=-u_y,\\ B_1^\infty =-u_x,\\ \partial _xC_1^0+\partial _yB_1^\infty -2\sin u=0\,. \end{array}\right. } \end{aligned}$$
(5.20)

The first two equations show that the group coordinates \(C_1^0\), \(B_1^\infty \) can be thought of as auxiliary fields and can be eliminated from the dynamics to get (5.1), as desired.

$$\begin{aligned} \text {(mKdV)}~~{\left\{ \begin{array}{ll} B_1^\infty =-u_x,\\ A_1^\infty =-\frac{i}{2}\partial _xB_1^\infty +\frac{1}{4}(B_1^\infty )^2,\\ u_z-u_x(2A_1^\infty -(B_1^\infty )^2)-2i\partial _x(2A_1^\infty -(B_1^\infty )^2)-(B_1^\infty )^3+3 A_1^\infty B_1^\infty -B_2^\infty =0\,\\ u_z+2u_xA_1^\infty -4i\partial _xA_1^\infty -A_1^\infty B_1^\infty -B_2^\infty =0,\\ \partial _zB_1^\infty +\partial _x(A_1^\infty B_1^\infty -B_2^\infty )=0 \end{array}\right. }\nonumber \\ \end{aligned}$$
(5.21)

We see that both (5.20) and (5.21) contain the same equation for \(B_1^\infty \) in terms of u, as it should be. A comment is in order. Under the first two equations, the third and fourth equation consistently give the same expression for \(B_2^\infty \). In turn, replacing all the auxiliary fields into the last equation yields mKdV in potential form (i.e. mKdV for \(v=u_x\))

$$\begin{aligned}{} & {} u_{xz}+u_{xxxx}+\frac{3}{2}u_{xx}u_x^2=0. \end{aligned}$$
(5.22)
$$\begin{aligned}{} & {} \text {(mixed)}~~{\left\{ \begin{array}{ll} C_1^0=-u_y,\\ A_1^\infty B_1^\infty -B_2^\infty =u_z,\\ \partial _y B_1^\infty =\sin u,\\ i\partial _y\left( e^{-iu/2}(2A_1^\infty -(B_1^\infty )^2)\right) +\frac{1}{2}C_1^0\left( 2A_1^\infty -(B_1^\infty )^2 \right) e^{-iu/2}+B_1^\infty e^{iu/2}=0,\\ 2i\partial _y\left( A_1^\infty e^{iu}\right) +B_1^\infty e^{-iu/2}-A_1^\infty C_1^0e^{iu/2}=0,\\ i\partial _zC_1^0-i\partial _y(A_1^\infty B_1^\infty -B_2^\infty )+2A_1^\infty (e^{iu}+e^{-iu})-(B_1^\infty )^2e^{-iu}=0. \end{array}\right. }\nonumber \\ \end{aligned}$$
(5.23)

Using the first two equations to eliminate the auxiliary fields and noting that the fourth and fifth equations are equivalent (modulo the third equation), we obtain after simplification the following system of equations for the three fields u, \(A_1^\infty \) and \(B_1^\infty \),

$$\begin{aligned} \text {(mixed)}~~{\left\{ \begin{array}{ll} \partial _y B_1^\infty =\sin u,\\ \partial _yA_1^\infty =\frac{i}{2}B_1^\infty e^{-iu},\\ -2iu_{yz}+2A_1^\infty (e^{iu}+e^{-iu})-(B_1^\infty )^2e^{-iu}=0. \end{array}\right. } \end{aligned}$$
(5.24)

Note that this system of equations in (yz) can be perfectly studied on its own and is integrable. However, from our point of view, it should be included together with (5.20) and (5.21) into the sG hierarchy. This leads to interesting observations which are related to the Lagrangian multiform description we present below. First of all, using \(B_1^\infty =-u_x\) and the sine-Gordon equation, we see that the first equation in (5.24) is trivially satisfied. Similarly, the second equation in (5.24) is a consequence of \(B_1^\infty =-u_x\), the sine-Gordon equation and the second equation in (5.21). Perhaps more interesting is the fact that combining the first three equations in (5.21) with the second equation in (5.23) yields

$$\begin{aligned} u_{z}+u_{xxx}+\frac{1}{2}u_x^3=0, \end{aligned}$$
(5.25)

of which (5.22) is simply a differential consequence.

We now turn to the extraction of the coefficients of the Lagrangian multiform for the corresponding time flows. We need \(\mathscr {L}_{00}^{0,\infty }\) for (sG), \(\mathscr {L}_{01}^{0,\infty }\) for (mixed) and \(\mathscr {L}_{01}^{\infty ,\infty }\) for (mKdV). We have

$$\begin{aligned} K^{0,\infty }(\lambda _0, \mu _\infty )= & {} {{\,\textrm{Tr}\,}}\left[ \frac{i}{2}\left( \frac{1}{\lambda }\sigma _++\sigma _-\right) \left( \phi ^{(0)}_0 + \lambda \phi ^{(0)}_1 +\lambda ^2\phi ^{(0)}_2 +O(\lambda ^3)\right) ^{-1}\right. \\{} & {} \quad \left. \times \left( \frac{1}{\mu _{\infty }} \partial _{t^\infty _{-1}}+\partial _{t^\infty _0} + \mu _{\infty }\partial _{t^\infty _1}+O(\mu _{\infty }^2) \right) \left( \phi ^{(0)}_0 + \lambda \phi ^{(0)}_1 + \lambda ^2 \phi ^{(0)}_2\dots +O(\lambda ^3)\right) \right] \\{} & {} \quad +{{\,\textrm{Tr}\,}}\left[ \frac{i}{2}\left( \frac{1}{\mu _\infty }\sigma _-+\sigma _+\right) \left( \phi ^{\infty }_0 + \mu _\infty \phi ^{\infty }_1 + \mu _\infty ^2 \phi ^{\infty }_2+O(\lambda ^3)\right) ^{-1}\right. \\{} & {} \quad \left. \times \left( \frac{1}{\lambda } \partial _{t^0_{-1}}+\partial _{t^0_0} + \lambda \partial _{t^0_1}+O(\lambda ^2) \right) \left( \phi ^{\infty }_0 + \mu _\infty \phi ^{\infty }_1 + \mu _\infty ^2 \phi ^{\infty }_2\dots +O(\mu _\infty ^3)\right) \right] . \end{aligned}$$

Hence, dropping irrelevant total derivative terms and using again \(t^0_0=y\), \(t^\infty _0=x\), \(t^\infty _1=z\) for convenience, we find

$$\begin{aligned} K_{00}^{0,\infty }= & {} \frac{1}{4}C_1^0 u_x+\frac{1}{4}B_1^\infty u_y ,~~K_{01}^{0,\infty }=\frac{1}{4}C_1^0u_z-\frac{1}{4}(A_1^\infty B_1^\infty -B_2^\infty )u_y\\{} & {} -A_1^\infty \partial _{y}B_1^\infty +B_1^\infty \partial _{y}A_1^\infty . \end{aligned}$$

To compute the potential terms, observe that for the trigonometric r-matrix, we have

$$\begin{aligned} ( \iota _{ \lambda _0} \iota _{ \mu _\infty } + \iota _{ \mu _\infty } \iota _{\lambda _0})r_{12}(\lambda ,\mu )=-2P_{12}^--P_{12}^0+2P_{12}\sum _{n=0}^{\infty }\lambda ^n\mu _{\infty }^n. \end{aligned}$$

Hence,

$$\begin{aligned} U_{00}^{0,\infty }= & {} {{\,\textrm{Tr}\,}}\left( Q_0^0(P^++\frac{1}{2}P^0)Q_0^\infty +Q_{-1}^0Q_{-1}^\infty \right) =\frac{1}{4}\left( e^{iu}+e^{-iu}-C_1^0B_1^\infty \right) ,\\ U_{01}^{0,\infty }= & {} {{\,\textrm{Tr}\,}}\left( Q_0^0(P^++\frac{1}{2}P^0)Q_1^\infty +Q_{-1}^0Q_{0}^\infty \right) \\= & {} \frac{1}{4}\left( 2A_1^\infty -(B_1^\infty )^2\right) e^{-iu}+\frac{1}{4}C_1^0\left( A_1^\infty B_1^\infty -B_2^\infty \right) -\frac{1}{2}A_1^\infty e^{iu}. \end{aligned}$$

This gives us the desired Lagrangian densities for (sG) and (mixed) as

$$\begin{aligned} \mathscr {L}_{\text {sG}}\equiv \mathscr {L}_{00}^{0,\infty }= K_{00}^{0,\infty }-U_{00}^{0,\infty }, \end{aligned}$$
(5.26)

and

$$\begin{aligned} \mathscr {L}_{\text {mixed}}\equiv \mathscr {L}_{01}^{0,\infty }= K_{01}^{0,\infty } - U_{01}^{0,\infty }. \end{aligned}$$
(5.27)

Similarly, we find

$$\begin{aligned} K_{01}^{\infty ,\infty }=-\frac{1}{4}B_1^\infty u_z-\frac{1}{4}\left( A_1^\infty B_1^\infty -B_2^\infty \right) u_x-\frac{i}{2}A_1^\infty \partial _{x}B_1^\infty +\frac{i}{2}B_1^\infty \partial _{x}A_1^\infty \end{aligned}$$

and, with

$$\begin{aligned} ( \iota _{ \lambda _\infty } \iota _{ \mu _\infty } + \iota _{ \mu _\infty } \iota _{ \lambda _\infty })r_{12}(\lambda ,\mu )=-2P_{12}^--P_{12}^0+P_{12}\sum _{n=0}^{\infty }\frac{\mu _{\infty }^n}{\lambda _\infty ^n}-P_{12}\sum _{n=0}^{\infty }\frac{\lambda _\infty ^{n+1}}{\mu _{\infty }^{n+1}}, \end{aligned}$$

we get

$$\begin{aligned} U_{01}^{\infty ,\infty }={{\,\textrm{Tr}\,}}\left( Q_0^\infty (P^++\frac{1}{2}P^0)Q_1^\infty \right) =\frac{1}{4}B_1^\infty \left( A_1^\infty B_1^\infty -B_2^\infty \right) +\frac{1}{2}A_1^\infty \left( 2A_1^\infty -(B_1^\infty )^2\right) . \end{aligned}$$

Thus, the Lagrangian density for (mKdV) is given by

$$\begin{aligned} \mathscr {L}_{\text {mKdV}}\equiv \mathscr {L}_{01}^{\infty ,\infty }=K_{01}^{\infty ,\infty }-U_{01}^{\infty ,\infty }. \end{aligned}$$
(5.28)

It remains to derive the EL equations associated to each Lagrangian. For instance, by varying \(B_1^\infty \), \(C_1^0\) and u in \(\mathscr {L}_{\text {sG}}\) we find exactly the three equations in (5.20). Similarly, it can be checked that the E-L equations for \(\mathscr {L}_{\text {mKdV}}\) and \(\mathscr {L}_{\text {mixed}}\) reproduce (5.21) and (5.23) respectively.

In particular, all the equations that determine the group coordinates in terms of u and its (relevant) derivatives are reproduced variationally. This is an interesting feature that the FNR procedure is also obtained variationally with our construction. An important by-product is that the so-called problem of “alien-derivatives” is eliminated systematically. In the present context, the manifestation of this problem would be for instance that the Lagrangian \(\mathscr {L}_{\text {mixed}}\) contains terms with derivatives of u with respect to x, while this Lagrangian is supposed to produce equations of motion with respect to the variables y and z only. Clearly, our Lagrangians do not suffer from this problem since by construction, they always only involve the two times they are supposed to produce equations of motion for. The problem is an artefact of using some of the equations of motion to solve for some of the fields in terms of u and its derivatives. In other words, it is an artefact of implementing the FNR procedure a priori to eliminate some of the group coordinates. If we do implement this procedure of elimination, we obtain Lagrangians which form a Lagrangian multiform equivalent to the one given originally in [Su] and which suffers from this problem. Eliminating the auxiliary fields in favour of u and its derivatives, we obtain

$$\begin{aligned} \mathscr {L}_{\text {sG}}=-\frac{1}{4}u_xu_y-\frac{1}{2}\cos u \end{aligned}$$

which is a well-known Lagrangian for (5.1), as well as

$$\begin{aligned} \mathscr {L}_{\text {mKdV}}=\frac{1}{4}u_xu_z+\frac{1}{16}u_x^4-\frac{1}{4}u_{xx}^2-\frac{i}{4}\partial _x\left( \frac{1}{6}u_x^3+iu_xu_{xx} \right) \end{aligned}$$

and

$$\begin{aligned} \mathscr {L}_{\text {mixed}}=-\frac{1}{4}u_yu_z-\frac{1}{2}u_{xx}(u_{xy}+\sin u)+\frac{1}{4}u_x^2 \cos u-\frac{i}{4}\partial _y\left( \frac{1}{6}u_x^3+iu_xu_{xx}\right) . \end{aligned}$$

Changing x to \(-x\), multiplying all our Lagrangian by 2 and dropping the irrelevant total derivatives in x and y, we recover exactly the three Lagrangian coefficients, eqs (31)-(33), in [Su]. Our Lagrangian multiform expressed with the group coordinates (and restricted to the three times xyz) is thus equivalent to that in [Su] but, as noted before, it does not suffer from the alien derivative problem.

The poles at 0 and \(\infty \) play a symmetric role in the construction so it would be natural to consider also the time \(t_1^0\) and the associated Lax matrix \(V^0_1(\lambda )\). This naturally leads to two additional zero curvature equations (denote \(t^0_1=t\) and the other times as above) that can be combined with (sG)

  1. 1.

    \(\partial _{t} V_0^0(\lambda ) -\partial _{y}V^0_1(\lambda )+\left[ V_0^0(\lambda ),V^0_1(\lambda )\right] =0\) (mKdV2):

  2. 2.

    \(\partial _{t} V_0^\infty (\lambda ) -\partial _{x}V^0_1(\lambda )+\left[ V_0^\infty (\lambda ),V^0_1(\lambda )\right] =0\) (mixed 2):

The first one is called (mKdV2) as it is another copy of the mKdV equation but in (yt) instead of (xt). It is a compatible flow with (sG) where we can think of the roles of x and y being swapped. Then, naturally (mixed 2) is the remaining compatible flow between the variables x and t. All the expressions for the Lax matrices, the zero curvature equations and the corresponding Lagrangians are similar to the above ones with the appropriate changes and we omit them. To complete the picture related to the four times we have focussed on, it would remain to consider the zero curvature equation

$$\begin{aligned} \partial _{z} V_1^0(\lambda ) -\partial _{t}V^\infty _1(\lambda )+\left[ V_1^0(\lambda ),V^\infty _1(\lambda )\right] =0. \end{aligned}$$

The set of equations of motion is not particularly enlightening. When embedded in the hierarchy of the five zero curvature equations already discussed, this system is a consequence of them, as it should be. Our contruction gives us the means to derive the corresponding Lagrangian density \(\mathscr {L}_{11}^{(0,\infty )}\) if required but again we omit its lengthy expression here.

FNR procedure for the sine-Gordon hierarchy

We have discussed the FNR procedure at the level of the EL equations above, using some of the equations to eliminate certain auxiliary coordinates/fields. Here, we discuss it at the level of the algebra coordinates using the Lax equation. This is more in line with the original work [FNR] and with the explanation around Lemma 3.9 for which it provides an illustration in the sG case. We recall that our point of view is that the procedure is unnecessary. We show it in the sG case to make contact with a more traditional approach but also because to our knowledge, this is the first time that the FNR construction is obtained for a hierarchy other than AKNS. In the present sG case, it is based on the Lax equations (5.29)–(5.30) below.

The generating Lax equation (3.8) gives the following equations, for \(n\ge -1\),

$$\begin{aligned} \partial _{t_n^0} Q^0(\lambda )= & {} \left[ V^0_n(\lambda ),Q^0(\lambda )\right] =\left[ - \big ( \lambda ^{-n} Q^0(\lambda ) \big )^{\textrm{trig}}_-,Q^0(\lambda )\right] , \end{aligned}$$
(5.29)
$$\begin{aligned} \partial _{t_n^\infty } Q^\infty (\lambda _\infty )= & {} \left[ V^\infty _n(\lambda ),Q^\infty (\lambda _\infty )\right] =\left[ \big ( \lambda _\infty ^{-n} Q^\infty (\lambda _\infty ) \big )^{\textrm{trig}}_-,Q^\infty (\lambda _\infty )\right] .\nonumber \\ \end{aligned}$$
(5.30)

We could use the Lax equations (5.29)–(5.30) to derive the coefficients of \(Q_n^0\) and \(Q_n^\infty \) as differential polynomials in the coordinate u. Given the form of \(F(\lambda )\) here, we do not fall into the area of applicability of the argument given after Lemma 3.9. Nevertheless, it is still possible to proceed. We illustrate this with (5.30), the other case being similar.

Our choices (5.2) and (5.3) give \(c_{-1}=-\tfrac{i}{2} e^{iu/2}\), \(a_{-1}=0=b_{-1}\). Then, consider (5.30) for \(n=0\) with \(Q^\infty (\lambda )=\begin{pmatrix} a(\lambda ) &{} b(\lambda )\\ c(\lambda ) &{} -a(\lambda ) \end{pmatrix}\) (we drop the superscript for conciseness). Writing \(t_0^\infty =x\) for convenience and projecting onto \(\sigma _3\), \(\sigma _+\) and \(\sigma _-\), we obtain

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _{x}a(\lambda )=b_0c(\lambda )-\frac{1}{\lambda }c_{-1}b(\lambda ),\\ \partial _{x}b(\lambda )=a_0b(\lambda )-2b_0a(\lambda ),\\ \partial _{x}c(\lambda )=-a_0c(\lambda )+\frac{2}{\lambda }c_{-1}a(\lambda ). \end{array}\right. } \end{aligned}$$
(5.31)

Looking at the \(\lambda ^j\) coefficient, this yields the following system

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _{x}a_j=b_0c_j-c_{-1}b_{j+1},\\ \partial _{x}b_j=a_0b_j-2b_0a_j,\\ \partial _{x}c_j=-a_0c_j+2c_{-1}a_{j+1}, \end{array}\right. } \end{aligned}$$
(5.32)

which we should use to determine the coefficients recursively. Suppose, we have determined \(a_k\), \(b_k\), \(c_k\) for \(k=1,\dots ,n-1\) then the first equation gives us \(b_{n}\) and hence the second equation yields \(a_n\). However, we cannot deduce \(c_n\) from the third equation since it would require the knowledge of \(a_{n+1}\). It is possible to replace (5.31) by the following equivalent system

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _{x}a(\lambda )=b_0c(\lambda )-\frac{1}{\lambda }c_{-1}b(\lambda ),\\ \partial _{x}b(\lambda )=a_0b(\lambda )-2b_0a(\lambda ),\\ a^2(\lambda )+b(\lambda )c(\lambda )=-\frac{\lambda }{4}. \end{array}\right. } \end{aligned}$$
(5.33)

To see this, note that (5.31) implies \(\partial _{x}{{\,\textrm{Tr}\,}}Q^\infty (\lambda _\infty )^2=0\) so that

$$\begin{aligned}{} & {} \mathscr {L}^{a_i,b_j}_{-1-1}=-{{\,\text {Tr}\,}}\left( (\phi _0^{a_i})^{-1} \partial _{t_{-1}^{b_j}} \phi _0^{a_i} A_{i0} - (\phi _0^{b_j})^{-1} \partial _{t_{-1}^{a_i}} \phi _0^{b_j} B_{j0} - \frac{\phi _0^{a_i} A_{i0}(\phi _0^{a_i})^{-1}\, \phi _0^{b_j} B_{j0}(\phi _0^{b_j})^{-1}}{ a_i- b_j}\right) \end{aligned}$$

as it should by construction. Conversely, assume (5.33) holds. The third equation implies \(2\partial _{x}a(\lambda ) a(\lambda )+ \partial _{x}b(\lambda )c(\lambda )+b(\lambda )\partial _{x}c(\lambda )=0\). Using the first two equations to eliminate \(\partial _{x}a(\lambda )\) and \(\partial _{x}b(\lambda )\) yields \(b(\lambda )\left( \partial _{x}c(\lambda )+a_0c(\lambda )-\frac{2}{\lambda }c_{-1}a(\lambda ) \right) =0\), and the claim follows. Now the advantage of system (5.33) is that the j-th term of the third equation gives the following relation:

$$\begin{aligned} \sum _{i=0}^{j+1}\left( a_{i}a_{j-i}+b_{i}c_{j-i}\right) =-\frac{1}{4}\delta _{j,-1}. \end{aligned}$$
(5.34)

Spelling it out, it can be seen that it can be used to determine \(c_n\) from \(a_k\), \(b_k\), \(c_k\), \(k=1,\dots ,n-1\) and \(b_{n}\), \(a_n\) obtained from the first two equations as explained before. Thus, (5.33) allow us to determine all \(a_j\), \(b_j\), \(c_j\), \(j\ge 0\) recursively. We find the first few as

$$\begin{aligned}{} & {} a_0=\frac{i}{2}u_x,~~b_0=-\tfrac{i}{2} e^{-iu/2},~~c_0=\tfrac{i}{2} e^{iu/2}\left( 1+iu_{xx}+\tfrac{1}{2}u_x^2\right) , \end{aligned}$$
(5.35)
$$\begin{aligned}{} & {} a_1=-\tfrac{i}{2} \left( u_x+u_{xxx}+\tfrac{1}{2} u_x^3\right) ,~~b_1=\tfrac{i}{2} e^{-iu/2}\left( 1-iu_{xx}+\tfrac{1}{2}u_x^2\right) , \end{aligned}$$
(5.36)
$$\begin{aligned}{} & {} c_1=-\tfrac{i}{2} e^{iu/2}\left( \tfrac{1}{2} u_x^2+\tfrac{3}{8} u_x^4+iu_{xx}+u_xu_{xxx}-\tfrac{1}{2} u_{xx}^2 +iu_{xxxx}+\tfrac{3i}{8} u_{xx}u_x^2\right) .\nonumber \\ \end{aligned}$$
(5.37)

Now, for instance, the expression we find for \(a_0\) is consistent with the fact that \(a_0=-\tfrac{i}{2} B_1^\infty \) from (5.15) and with the second equation in (5.20). This is what we mean when we say that the FNR procedure is automatically implemented with our Lagrangian approach. We reiterate that the advantage of not applying it is that the problem of alien derivatives disappears and that dependent variables are also treated on an equal footing, like the independent variables.

6 Hierarchies of Zakharov–Mikhailov Type

In this section, we introduce a rather large class of models and their hierarchies by using the following data

$$\begin{aligned} S= & {} \{ a_1, \dots , a_{P}\} \subset \mathbb {C},~~ P >0,~~\mathfrak {g}= \mathfrak {gl}_N, \end{aligned}$$
(6.1)
$$\begin{aligned} F(\lambda )= & {} - \sum _{i=1}^{P} \sum _{r=0}^{n_i} \frac{A_{ir}}{(\lambda -a_i)^{r+1}}. \end{aligned}$$
(6.2)

Each \(A_{ir} \in \mathfrak {gl}_N\) is a non-dynamical constant matrix and we have chosen to write the order \(N_{a_i}\) of the pole \(a_i\), \(i =1,\dots , P\) as \(N_{a_i}= n_{i} +1\) for convenience. All the poles in S are distinct. The r-matrix can be the rational or trigonometric one at this stage.

The motivation behind such choices is that in the simplest setting (rational r-matrix and simple poles), our construction reproduces the Zakharov–Shabat Lax pair with simple poles whose equations of motion were cast in variational form in [ZM1]. In fact, our construction automatically embeds this single Lax pair, its zero curvature equation and its Lagrangian into an integrable hierarchy. This point of view was first introduced in [SNC] where the class of Zakharov–Mikhailov (ZM) models was cast into the formalism of Lagrangian multiforms. Allowing for higher order poles gives us the generalisation discussed in [Di, Chap. 20]. When we switch to the trigonometric r-matrix, we produce for the first time the trigonometric version of the large class of ZM models and their hierarchies. Finally, when specialising the construction via an appropriate reduction and choice of matrices \(A_{ir}\), we obtain as a special case the class of models studied in [ABW]. Their integrability is guaranteed by construction and they are naturally embedded in an integrable hierachy, a new feature for these models that were originally obtained as standalone models by a different method related to the 4d Chern–Simons construction (see conclusions for details and references). These examples are detailed in the next three subsections.

6.1 Rational Zakharov–Mikhailov models

We first describe in detail how to reproduce the class of Lax pairs and Lagrangians originally discussed in the pioneering paper [ZM1]. The generalisation to higher order poles presented in [Di] will be straightforward. The r-matrix is fixed to be the rational one in this subsection. We split the data (6.1)–(6.2) in the following way: \(P=P_1+P_2\), \(P_1,P_2 >0\), and

$$\begin{aligned} S= & {} \{ a_1, \dots , a_{P_1}, b_1, \dots , b_{P_2}\} \subset \mathbb {C},~~ \mathfrak {g}= \mathfrak {gl}_N, \end{aligned}$$
(6.3)
$$\begin{aligned} F(\lambda )= & {} - \sum _{i=1}^{P_1} \sum _{r=0}^{n_i} \frac{A_{ir}}{(\lambda -a_i)^{r+1}} - \sum _{j=1}^{P_2} \sum _{r=0}^{m_j} \frac{B_{jr}}{(\lambda -b_j)^{r+1}}. \end{aligned}$$
(6.4)

For notational convenience, we simply denoted \(A_{j+P_1,r}=B_{jr}\) and \(n_{j+P_1}=m_j\) for \(j=1,\dots ,P_2\).

6.1.1 Case of simple poles

Following [ZM1], let us consider a Lax pair of the formFootnote 2

$$\begin{aligned} U(\lambda ) = \sum _{i=1}^{P_1}\frac{U_i}{\lambda - a_i},\quad V(\lambda ) = \sum _{j=1}^{P_2} \frac{V_j}{\lambda - b_j}. \end{aligned}$$
(6.5)

A prominent example of an integrable field theory that falls into this class is the Faddeev-Reshetikhin model [FR] which was proposed as an ultralocal variant of the principal chiral model. The main result of [ZM1] is that the equations of motions encoded in the zero curvature equation \(\partial _\eta U(\lambda )-\partial _\xi V(\lambda ) + [U(\lambda ),V(\lambda )] =0\) associated to the auxiliary problem

$$\begin{aligned} \Psi _\xi =U\Psi ,~~\Psi _\eta =V\Psi , \end{aligned}$$
(6.6)

are variational and are obtained as the EL equations of the following Lagrangian density

$$\begin{aligned} \mathscr {L}_{ZM} = {{\,\textrm{Tr}\,}}\left( \sum _{i=1}^{P_1} \phi _i^{-1} \partial _{\eta } \phi _i U^{(0)}_i - \sum _{j=1}^{P_2} \psi _j ^{-1} \partial _{\xi } \psi _jV^{(0)}_j - \sum _{i=1}^{P_1} \sum _{j=1}^{P_2}\frac{\phi _i U^{(0)}_i\phi _i^{-1} \, \psi _jV^{(0)}_j\psi _j ^{-1}}{ a_i- b_j}\right) .\nonumber \\ \end{aligned}$$
(6.7)

The key insight to obtain this result is to parametrise \(U_i\) as \(\varphi _i U^{(0)}_i \varphi _i^{-1}\) and \(V_j\) as \(\psi _j V^{(0)}_j \psi _j^{-1}\). The matrices \(U^{(0)}_i\) and \(V^{(0)}_j\) are constant and all the dynamical variables are contained in the fields \(\varphi _i\) and \(\psi _j\).

We can reproduce (6.5) by choosing \(n_i=0\) and \(m_j=0\) in our data (6.4). Since

$$\begin{aligned} \left( \iota _{ \lambda _{a_i}}F(\lambda )\right) ^\textrm{rat}_-=-\frac{A_{i0}}{\lambda -a_i},~~\left( \iota _{ \lambda _{b_j}}F(\lambda )\right) ^{\textrm{rat}}_-=-\frac{B_{j0}}{\lambda -b_j}, \end{aligned}$$

a direct calculation using Proposition 3.8 gives

$$\begin{aligned} V_{-1}^{a_i}(\lambda )=\frac{\phi _0^{a_i}A_{i0}(\phi _0^{a_i})^{-1}}{\lambda -a_i},~~V_{-1}^{b_j}(\lambda )=\frac{\phi _0^{b_j}B_{j0}(\phi _0^{b_j})^{-1}}{\lambda -b_j}. \end{aligned}$$
(6.8)

Therefore, it remains to make the identifications \(\phi ^{a_i}_0 = \varphi _i\) and \(A_{i0} = U^{(0)}_i\), and \(\phi ^{b_j}_0 = \psi _j\) and \(B_{j0} = V^{(0)}_j\) and take linear combinations \(\displaystyle \partial _\xi =\sum _{i=1}^{P_1}\partial _{t_{-1}^{a_i}}\), \(\displaystyle \partial _{\eta }=\sum _{j=1}^{P_2}\partial _{t_{-1}^{b_j}}\) of the elementary time flows \(\partial _{t_{-1}^{a_i}}\) and \(\partial _{t_{-1}^{b_j}}\). The corresponding Lax matrices are simply the sum of the elementary Lax matrices (6.8) which gives precisely (6.5).

To understand how to recover the Lagrangian (6.7) with our method, note that the zero curvature equation associated to the elementary times \(t_{-1}^{a_i}\) and \(t_{-1}^{b_j}\) reads

$$\begin{aligned} \partial _{t_{-1}^{a_i}}V_{-1}^{b_j}(\lambda )- \partial _{t_{-1}^{b_j}}V_{-1}^{a_i}(\lambda )+\left[ V_{-1}^{b_j}(\lambda ),V_{-1}^{a_i}(\lambda ) \right] =0. \end{aligned}$$
(6.9)

Summing these elementary zero curvature equations over \(i=1\,\dots ,P_1\) and \(j=1,\dots ,P_2\) yields the desired \(\partial _\eta U(\lambda )-\partial _\xi V(\lambda ) + [U(\lambda ),V(\lambda )] =0\). Therefore, to find the Lagrangian \(\mathscr {L}_{ZM}\) it suffices to sum the elementary Lagrangians \(\mathscr {L}^{a_i,b_j}_{-1-1}\) (the coefficient of \(\lambda _{a_i}^{-1}\mu _{b_j}^{-1}\) in \(\mathscr {L}^{a_i,b_j}(\lambda _{a_i},\mu _{b_j})\) which yields the equations of motion in (6.9)). A direct calculation gives

$$\begin{aligned} \mathscr {L}^{a_i,b_j}_{-1-1}=-{{\,\textrm{Tr}\,}}\left( (\phi _0^{a_i})^{-1} \partial _{t_{-1}^{b_j}} \phi _0^{a_i} A_{i0} - (\phi _0^{b_j})^{-1} \partial _{t_{-1}^{a_i}} \phi _0^{b_j} B_{j0} - \frac{\phi _0^{a_i} A_{i0}(\phi _0^{a_i})^{-1}\, \phi _0^{b_j} B_{j0}(\phi _0^{b_j})^{-1}}{ a_i- b_j}\right) \nonumber \\ \end{aligned}$$
(6.10)

and the claim follows, i.e. , with identifications made above, we derive \(\mathscr {L}_{ZM}\)(up to an irrelevant minus sign) as in (6.7) by taking the double sum \(\displaystyle \sum _{i=1}^{P_1}\sum _{j=1}^{P_2}\mathscr {L}^{a_i,b_j}_{-1-1}\).

It was shown for the first time in [SNC] that the ZM Lagrangian can be incorporated into a Lagrangian multiform where each coefficient is a copy of the original ZM Lagrangian associated to the corresponding times. The explicit case of 3 times was considered. We now explain how to recover this multiform from our data. Instead of splitting the data (6.1)–(6.2) into two types of poles as in (6.3)–(6.4), we split it into three types of poles by setting \(P=P_1+P_2+P_3\) and restrict our attention to simple poles, i.e. we set

$$\begin{aligned} S= & {} \{ a_1, \dots , a_{P_1}, b_1, \dots , b_{P_2},c_1,\dots ,c_{P_3}\} \subset \mathbb {C},~~ P_1,P_2,P_3 >0,~~\mathfrak {g}= \mathfrak {gl}_N,\nonumber \\ \end{aligned}$$
(6.11)
$$\begin{aligned} F(\lambda )= & {} - \sum _{i=1}^{P_1} \frac{A_{i}}{\lambda -a_i} - \sum _{j=1}^{P_2} \frac{B_{j}}{\lambda -b_j}- \sum _{k=1}^{P_3} \frac{C_{k}}{\lambda -c_k}. \end{aligned}$$
(6.12)

As before, we take the linear combinations \(\displaystyle \partial _\xi =\sum _{i=1}^{P_1}\partial _{t_{-1}^{a_i}}\), \(\displaystyle \partial _{\eta }=\sum _{j=1}^{P_2}\partial _{t_{-1}^{b_j}}\) of the elementary time flows, as well as the new combinations \(\displaystyle \partial _{\nu }=\sum _{k=1}^{P_3}\partial _{t_{-1}^{c_k}}\). The original ZM Lagrangian is now denoted by \(\mathscr {L}_{\xi \eta }\) and is accompanied by two new copies

$$\begin{aligned} \mathscr {L}_{\eta \nu }=\sum _{j=1}^{P_2}\sum _{k=1}^{P_3}\mathscr {L}^{b_j,c_k}_{-1-1},~~\mathscr {L}_{\nu \xi }=\sum _{k=1}^{P_3}\sum _{i=1}^{P_1}\mathscr {L}^{c_k,a_i}_{-1-1}. \end{aligned}$$
(6.13)

The Lagrangian multiform in [SNC, Section 2.4] is precisely

$$\begin{aligned} \mathscr {L}=\mathscr {L}_{\xi \eta }\,d\xi \wedge d\eta +\mathscr {L}_{\eta \nu } \, d\eta \wedge d\nu + \mathscr {L}_{\nu \xi }\,d\nu \wedge d\xi . \end{aligned}$$
(6.14)

The associated Lax matrices and zero curvature equations also reproduce those of [SNC].

6.1.2 Case of higher poles

The generalisation of the ZM result to Lax matrices with higher order poles of the form

$$\begin{aligned} U(\lambda ) = \sum _{i=1}^{P_1} U_i(\lambda ), \quad V(\lambda ) = \sum _{j=1}^{P_2} V_j(\lambda ), \end{aligned}$$
(6.15)

where

$$\begin{aligned} U_i = \sum _{r=0}^{n_i} \frac{U_{ir}}{(\lambda - a_i)^{r+1}},\quad V_j = \sum _{r=0}^{m_j} \frac{V_{jr}}{(\lambda - b_j)^{r+1}} \end{aligned}$$
(6.16)

was presented in [Di]. We can reproduce it by simply allowing \(n_i\) and \(m_j\) in the data (6.4) to be arbitrary positive integers and by following the same steps as for simple poles. In that case we find

$$\begin{aligned} V^{a_i}_{-1}(\lambda ) = - \sum _{r=0}^{n_i} \frac{Q^{a_i}_{ -r -1}}{(\lambda - a_i)^{r+1}} \equiv U_i(\lambda ),~~ V^{b_j}_{-1}(\lambda ) =- \sum _{r=0}^{m_j} \frac{Q^{b_j}_{ -r -1}}{(\lambda - b_j)^{r+1}}\equiv V_j(\lambda )\nonumber \\ \end{aligned}$$
(6.17)

where the coefficients are identified as

$$\begin{aligned}&Q^{a_i}_{-r-1} {=}{:}- U_{ir}\,,~~ i=1,\dots ,P_1\,,~~ r=0,\dots ,n_i\,, \end{aligned}$$
(6.18)
$$\begin{aligned}&Q^{b_j}_{-r-1} {=}{:}- V_{jr} \,,~~ j=1,\dots ,P_2\,,~~ r=0,\dots ,m_j\,, \end{aligned}$$
(6.19)

and calculated from the group coordinates using the following expansions

$$\begin{aligned}&Q^{a_i}(\lambda _{a_i}) = -\phi ^{a_i}(\lambda _{a_i}) \sum _{r=0}^{n_i} \frac{A_{ir}}{(\lambda - a_i)^{r+1}} \phi ^{a_i}(\lambda _{a_i})^{-1}=\sum _{k=-n_i-1}^\infty Q^{a_i}_k (\lambda - {a_i})^k\,, \end{aligned}$$
(6.20)
$$\begin{aligned}&Q^{b_j}(\lambda _{b_j}) = -\phi ^{b_j}(\lambda _{b_j}) \sum _{r=0}^{m_j} \frac{B_{jr}}{(\lambda - b_j)^{r+1}} \phi ^{b_j}(\lambda _{b_j})^{-1}= \sum _{k=-m_j-1}^\infty Q^{b_j}_k (\lambda - {b_j})^k\,. \end{aligned}$$
(6.21)

As before, we simply assemble the elementary time flows into \(\displaystyle \partial _\xi =\sum _{i=1}^{P_1}\partial _{t_{-1}^{a_i}}\) and \(\displaystyle \partial _{\eta }=\sum _{j=1}^{P_2}\partial _{t_{-1}^{b_j}}\) which have the desired Lax pair (6.15). This gives the corresponding equations of motion in zero curvature form \(\partial _\eta U(\lambda )-\partial _\xi V(\lambda ) + [U(\lambda ),V(\lambda )] =0\). The Lagrangian producing these equations of motion is obtained by adding the elementary Lagrangians \(\mathscr {L}^{a_i,b_j}_{-1-1}\). We give some details to show that we recover exactly [Di, Formula 20.2.12] (in the case of non coinciding poles which we consider here).

The kinetic part of \(\mathscr {L}^{a_i,b_j}_{-1-1}\) reads

$$\begin{aligned} \begin{aligned} K^{a_i,b_j}_{-1-1}&=\mathop {\textrm{res}}\limits _{\lambda = a_i} \mathop {\textrm{res}}\limits _{\mu = b_j} {{\,\textrm{Tr}\,}}( - \phi ^{a_i}(\lambda _{a_i}) ^{-1} \mathcal {D}_{\mu _{b_j}} \phi ^{a_i}(\lambda _{a_i}) \sum _{r=0}^{n_i} \frac{A_{ir}}{(\lambda -a_i)^{r+1}} \\&\quad + \phi ^{b_j}(\lambda _{b_j}) ^{-1} \mathcal {D}_{\lambda _{a_i}}\phi ^{b_j}(\lambda _{b_j}) \sum _{r=0}^{m_j}\frac{B_{jr}}{(\lambda -b_j)^{r+1}} )\\&= {{\,\textrm{Tr}\,}}(- \mathop {\textrm{res}}\limits _{\lambda = a_i} \phi ^{a_i}(\lambda _{a_i}) ^{-1} \partial _{t^{b_j}_{-1}} \phi ^{a_i}(\lambda _{a_i})\sum _{r=0}^{n_i} \frac{A_{ir}}{(\lambda -a_i)^{r+1}} \\&\quad + \mathop {\textrm{res}}\limits _{\mu = b_j} \phi ^{b_j}(\lambda _{b_j}) ^{-1} \partial _{t^{a_i}_{-1}} \phi ^{b_j}(\lambda _{b_j})\sum _{r=0}^{m_j}\frac{B_{jr}}{(\lambda -b_j)^{r+1}} )\\&= {{\,\textrm{Tr}\,}}(- \mathop {\textrm{res}}\limits _{\lambda =a_j} g_i^{-1} \partial _{t^{b_j}_{-1}}g_i A_i + \mathop {\textrm{res}}\limits _{\mu = b_j} h_j^{-1} \partial _{t^{a_i}_{-1}}h_j B_j) \end{aligned} \end{aligned}$$

where in the last equality, we introduced \(g_i\) (resp. \(h_j\)) to denote the truncation of \(\phi ^{a_i}(\lambda _{a_i})\) (resp. \(\phi ^{b_j}(\lambda _{b_j})\)) up to the order \(n_i\) (resp. \(m_j\)), in order to help make the comparison with Dickey’s formula. The equality holds since the truncation is possible under the residue. We also denoted \(\displaystyle A_i {:}{=}\sum _{r=0}^{n_i} \frac{A_{ir}}{(\lambda -a_i)^{r+1}} \) and \(\displaystyle B_j {:}{=}\sum _{r=0}^{m_j}\frac{B_{jr}}{(\lambda -b_j)^{r+1}}\) for conciseness.

The potential term reads, noting that \(\iota _{\lambda _{a_i}}\iota _{\mu _{b_j}} = \iota _{\mu _{b_j}}\iota _{\lambda _{a_i}}\) when \(a_i\ne b_j\),

$$\begin{aligned} \begin{aligned} U^{a_i,b_j}_{-1-1}&= {{\,\textrm{Tr}\,}}_{12} \left( \mathop {\textrm{res}}\limits _{\lambda =a_i} \mathop {\textrm{res}}\limits _{\mu = b_j}\iota _{\lambda _{a_i}}\iota _{\mu _{b_j}}\frac{P_{12}}{\mu - \lambda } (Q{a_i}( \lambda _{a_i}))_1 (Q{b_j} (\mu _{b_j}))_2\right) \\&=- \mathop {\textrm{res}}\limits _{\lambda =a_i} Q^{a_i}(\lambda _{a_i}) \left( Q^{b_j} (\lambda _{b_j}) \right) ^{\textrm{rat}}_- \\&= - \mathop {\textrm{res}}\limits _{\lambda =a_i} (Q^{a_i}(\lambda _{a_i}))^{\textrm{rat}}_- ( Q^{b_j} (\lambda _{b_j}) )^{\textrm{rat}}_-\\&= - \mathop {\textrm{res}}\limits _{\lambda =a_i} (g_i A_i g_i^{-1})^{\textrm{rat}}_- ( h_j B_j h_j^{-1} )^{\textrm{rat}}_-.\\ \end{aligned} \end{aligned}$$

We obtain Dickey’s Lagrangian, up to an overall sign and a relative sign due to a different convention in the zero-curvature equation, by taking the following sums

$$\begin{aligned} L_D = - \sum _{i=1}^{P_1}\sum _{j=1}^{P_2} \left( K^{a_ib_j}_{-1-1}-U^{a_ib_j}_{-1-1}\right) . \end{aligned}$$
(6.22)

6.1.3 Interplay between hierarchies associated to simple and higher order poles

Following Proposition 3.8, the Lax matrices read, for each \(n \ge -n_i -1\) and \(i=1,\dots ,P_1\), and for each \(m \ge -m_j -1\) and \(j=1,\dots ,P_2\):

$$\begin{aligned}&V^{a_i}_n(\lambda ) = - \left( \frac{Q^{a_i}(\lambda _{a_i})}{(\lambda - a_i)^{n+1}} \right) ^{\textrm{rat}}_-= - \sum _{r=0}^{n+n_i +1} \frac{Q^{a_i}_{n-r}}{(\lambda - a_i)^{r+1}}, \end{aligned}$$
(6.23)
$$\begin{aligned}&V^{b_j}_m(\lambda ) = - \left( \frac{Q^{b_j}(\lambda _{b_j})}{(\lambda - b_j)^{m+1}} \right) ^\textrm{rat}_-= - \sum _{r=0}^{m+m_j +1} \frac{Q^{b_j}_{m-r}}{(\lambda - b_j)^{r+1}}. \end{aligned}$$
(6.24)

At first glance, it is tempting to suggest that a Dickey hierarchy with certain fixed order \(n_i\) and \(m_j\) simply sits higher or lower in another Dickey hierarchy with different fixed \(n_i\) and \(m_j\). The situation is much more complicated in general. To illustrate what we mean and show that this is too naive, let us focus on the field content of a Lax matrix around a pole a and compare the ZM case (where a is a simple pole) with the Dickey case (where a has order \(n_1+1>1\)). The corresponding Lax matrices are

$$\begin{aligned} V^{ZM,a}_n(\lambda ) = - \left( \frac{Q^{ZM,a}(\lambda _{a})}{(\lambda - a)^{n+1}} \right) ^{\textrm{rat}}_-= - \sum _{r=0}^{n +1} \frac{Q^{ZM,a}_{n-r}}{(\lambda - a)^{r+1}},~~n\ge -1, \end{aligned}$$
(6.25)

and

$$\begin{aligned} V^{D,a}_n(\lambda ) = - \left( \frac{Q^{D,a}(\lambda _{a})}{(\lambda - a)^{n+1}} \right) ^{\textrm{rat}}_-= - \sum _{r=0}^{n+n_1 +1} \frac{Q^{D,a}_{n-r}}{(\lambda - a)^{r+1}},~~n\ge -n_1-1. \end{aligned}$$
(6.26)

In general, it is always the case that the Dickey hierarchy contains the ZM case as its lowest level. Indeed,

$$\begin{aligned} V^{D,a}_{-n_1-1}(\lambda ) =\left( (\lambda -a)^{n_1}\phi ^a(\lambda _a)\sum _{r=0}^{n_1} \frac{A^D_{r}}{(\lambda -a)^{r+1}}(\phi ^a(\lambda _a))^{-1} \right) _-^{\textrm{rat}}= \frac{\phi ^{a}_0 A^D_{n_1} (\phi ^{a}_0)^{-1}}{\lambda - a}\nonumber \\ \end{aligned}$$
(6.27)

and it suffices to choose \(A^D_{n_1}=A^{ZM}_{0}\) to see that this is equal to

$$\begin{aligned} V^{ZM,a}_{-1}(\lambda ) = \left( \phi ^a(\lambda _a)\frac{A^{ZM}_{0}}{\lambda -a}(\phi ^a(\lambda _a))^{-1} \right) _-^{\textrm{rat}}=\frac{\phi ^{a}_0 A^{ZM}_{0} (\phi ^{a}_0)^{-1}}{\lambda - a}. \end{aligned}$$
(6.28)

However, the crucial point is that \(Q^{ZM,a}(\lambda _{a})\) and \(Q^{D,a}(\lambda _{a})\) are constructed as orbits around different elements in general so the phase space is different in general. This means that the previous identification only gives some of the fields of the Dickey case which happen to be identifiable with the full phase space for ZM. The “converse” is not true in general. The Dickey case can only be seen as a higher flow in the ZM hierarchy if we construct it around a special element of the form \(\displaystyle \sum _{r=0}^{n_1} \frac{A^D_{r}}{(\lambda -a)^{r+1}}\) with \(A^D_{r}=0\) for \(r= 1,\dots ,n_1\) and \(A^D_{n_1}=A^{ZM}_{0}\). In that case, we see that

$$\begin{aligned} V^{D,a}_{n}(\lambda ) = V^{ZM,a}_{n+n_1}(\lambda ) \end{aligned}$$
(6.29)

so that the two hierarchies simply correspond to shifting the starting point in the elementary times \(t_j^a\). This discussion was local in the sense that we looked at a typical pole a. Of course, similar conclusions hold around the other poles. If one assembles them to obtain compound times, then the situation is similar but technically more complicated. The summary is that in general, the Dickey case is a genuine generalisation of the ZM case unless it is constructed as an orbit around a specific element dictated by the ZM element. Of course, this comparison extends to the corresponding Lagrangians since the building blocks are the same as for the Lax matrices.

6.2 Trigonometric Zakharov–Mikhailov models

We can repeat the construction of the previous subsection but with the rational r-matrix replaced by the trigonometric one. To the best of our knowledge, this produces for the first time a new class of models which we call trigonometric Zakharov–Mikhailov models.

For conciseness, we simply illustrate this on the simplest example of simple poles in the data (6.4). To derive the elementary Lax matrices, we need to use the trigonometric formula in Proposition 3.8 which brings interesting differences compared to the rational case, already for the lowest times \(t_{-1}^{a_i}\) and \(t_{-1}^{b_j}\). With \(Q_{-1}^{a_i}=\phi _0^{a_i}A_{i0}(\phi _0^{a_i})^{-1}\) and \(Q_{-1}^{b_j}=\phi _0^{b_j}B_{j0}(\phi _0^{b_j})^{-1}\), the corresponding elementary Lax matrices read

$$\begin{aligned}{} & {} V_{-1}^{a_i}(\lambda )=-\frac{a_iQ_{-1}^{a_i}}{\lambda -a_i}-\left( P^{-}+\frac{1}{2}P^0\right) Q_{-1}^{a_i}, \end{aligned}$$
(6.30)
$$\begin{aligned}{} & {} V_{-1}^{b_j}(\lambda )=-\frac{b_jQ_{-1}^{b_j}}{\lambda -b_j}-\left( P^{-}+\frac{1}{2}P^0\right) Q_{-1}^{b_j}. \end{aligned}$$
(6.31)

It will be convenient to introduce the following notations, for \(M\in \mathfrak {gl}_N\):

$$\begin{aligned} \left( P^{+}+\frac{1}{2}P^0\right) M=M^>,~~\left( P^{-}+\frac{1}{2}P^0\right) M=M^<. \end{aligned}$$
(6.32)

In particular \(M=M^>+M^<\). We derive from our general formula the following elementary Lagrangian:

$$\begin{aligned}{} & {} \mathscr {L}^{a_i,b_j}_{-1-1}={{\,\textrm{Tr}\,}}\left( (\phi _0^{a_i})^{-1}\partial _{t^{b_j}_{-1}}\phi _0^{a_i}A_{i0}-(\phi _0^{b_j})^{-1}\partial _{t^{a_i}_{-1}}\phi _0^{b_j}B_{j0}\right) \\{} & {} \qquad \quad -\frac{b_j}{b_j-a_i}{{\,\textrm{Tr}\,}}\left( \phi _0^{a_i}A_{i0}(\phi _0^{a_i})^{-1}\phi _0^{b_j}B_{j0}(\phi _0^{b_j})^{-1}\right) +{{\,\textrm{Tr}\,}}\left[ \phi _0^{a_i}A_{i0}(\phi _0^{a_i})^{-1}\left( \phi _0^{b_j}B_{j0}(\phi _0^{b_j})^{-1}\right) ^<\right] \nonumber . \end{aligned}$$
(6.33)

The last term represents the main difference with the rational case, see (6.10).

We now show that the so-called anisotropic chiral model presented in Section 6 of [FR] can be obtained as a particular case of our trigonometric ZM Lagrangians and ZS Lax matrices. We will refer to it as anisotropic Faddeev-Reshetikhin model to avoid the confusion with the “anisotropic chiral model” terminology used in [FR] which would assume that we parametrise the currents differently from our coadjoint parametrization, see (6.34).

We proceed in two steps. First, we specialise our data as follows: in (6.3), we take \(P_1=P_2=1\) and write \(a_1=a\) and \(b_1=b\); in (6.4), we simply write

$$\begin{aligned} F(\lambda )=\frac{A}{\lambda -a}+\frac{B}{\lambda -b}. \end{aligned}$$

We also restrict \(\mathfrak {g}\) to \(\mathfrak {sl}_2\).Footnote 3 Second, we apply the automorphism discussed in “Appendix A” to make the connection with [FR] easier. Let us denote for convenience \(t_{-1}^{a}=\xi \), \(t_{-1}^{b}=\eta \),

$$\begin{aligned} Q_{-1}^{a}=\phi _0^{a}A(\phi _0^{a})^{-1}\equiv J_0,~~Q_{-1}^{b}=\phi _0^{b}B(\phi _0^{b})^{-1}\equiv J_1, \end{aligned}$$
(6.34)

and the Lax pair (6.30),

$$\begin{aligned}{} & {} V_{-1}^{a}(\lambda )\equiv U(\lambda )=-\frac{aJ_0}{\lambda -a}-J_0^<, \end{aligned}$$
(6.35)
$$\begin{aligned}{} & {} V_{-1}^{b}(\lambda )\equiv V(\lambda )=-\frac{bJ_1}{\lambda -b}-J_1^<. \end{aligned}$$
(6.36)

The Lagrangian (6.33) becomes

$$\begin{aligned} \mathscr {L}_{\textrm{aFR}}= {{\,\textrm{Tr}\,}}\left( (\phi _0^{a})^{-1}\partial _{\eta }\phi _0^{a}A-(\phi _0^{b})^{-1}\partial _{\xi }\phi _0^{b}B-\frac{b}{b-a}J_0J_1+J_0J_1^<\right) . \end{aligned}$$
(6.37)

Varying with respect to \(\phi _0^{a}\) and \(\phi _0^{b}\), the EL equations readFootnote 4

$$\begin{aligned} \partial _\eta J_0=\left[ -\frac{bJ_1}{a-b}-J_1^<,J_0\right] ,~~\partial _\xi J_1=\left[ -\frac{aJ_0}{b-a}-J_0^<,J_1\right] . \end{aligned}$$
(6.38)

Projecting on the basis \(J_{0,1}=J_{0,1}^+\sigma _++J_{0,1}^-\sigma _-+J_{0,1}^3\sigma _3\), we get

$$\begin{aligned} \quad \quad {\left\{ \begin{array}{ll} \partial _\eta J_0^+=\frac{2b}{a-b}J_1^+J_0^3-\frac{a+b}{a-b}J_1^3J_0^+,\\ \partial _\eta J_0^-=\frac{2a}{b-a}J_1^-J_0^3+\frac{a+b}{a-b}J_1^3J_0^-,\\ \partial _\eta J_0^3=\frac{b}{b-a}J_1^+J_0^-+\frac{a}{a-b}J_1^-J_0^+, \end{array}\right. } \qquad {\left\{ \begin{array}{ll} \partial _\xi J_1^+=\frac{2a}{b-a}J_0^+J_1^3+\frac{a+b}{a-b}J_0^3J_1^+,\\ \partial _\xi J_1^-=\frac{2b}{a-b}J_0^-J_1^3-\frac{a+b}{a-b}J_0^3J_1^-,\\ \partial _\xi J_1^3=\frac{a}{a-b}J_0^+J_1^-+\frac{b}{b-a}J_0^-J_1^+. \end{array}\right. }\nonumber \\ \end{aligned}$$
(6.39)

The residue at infinity of the zero curvature equation for the Lax pair (6.35)–(6.36) yields the equation \(-\partial _\eta J_0^<+ \partial _\xi J_1^<+\left[ J_0^<,J_1^<\right] =0\) in addition to (6.38). However, when projecting, one can see that this is a consequence of the system (6.39).

To make the comparison with the equations for the fields \(S_{1,2,3}\) and \(T_{1,2,3}\) used in [FR], we use the automorphism mentioned above and express the final answer using the Pauli matrices \(\sigma _{1,2,3}\). We also implement the changes \(\lambda \rightarrow e^{2\lambda }\), \(a\rightarrow e^{2a}\), \(b\rightarrow e^{-2a}\) to go from rational to hyperbolic parametrisation. We find

$$\begin{aligned} e^{\lambda /2\sigma _3}U(e^{2\lambda })e^{-\lambda /2\sigma _3}= & {} -\frac{1}{2}\left[ w_1(\lambda -a)\frac{1}{2}\left( e^aJ_0^++e^{-a}J_0^-\right) \sigma _1\right. \\{} & {} \left. + w_2(\lambda -a)\frac{i}{2}\left( e^aJ_0^+-e^{-a}J_0^-\right) \sigma _2+w_3(\lambda -a)J_0^3 \sigma _3 \right] , \end{aligned}$$

and

$$\begin{aligned} e^{\lambda /2\sigma _3}V(e^{2\lambda })e^{-\lambda /2\sigma _3}= & {} -\frac{1}{2}\left[ w_1(\lambda +a)\frac{1}{2}\left( e^{-a}J_1^++e^{a}J_1^-\right) \sigma _1\right. \\{} & {} \left. + w_2(\lambda +a)\frac{i}{2}\left( e^{-a}J_1^+-e^{a}J_1^-\right) \sigma _2+w_3(\lambda +a)J_1^3 \sigma _3\right] , \end{aligned}$$

where \(w_1(\lambda )=w_2(\lambda )=\frac{1}{\sinh \lambda }\), \(w_3(\lambda )=\coth \lambda \). It remains to compare with the Lax operator (6.22) in [FR] and remember that they work with x and t instead of the light-cone coordinates \(\xi \) and \(\eta \). This leads to the identifications

$$\begin{aligned} {\left\{ \begin{array}{ll} S_1=-\frac{1}{4}\left( e^aJ_0^++e^{-a}J_0^-\right) ,\\ S_2=-\frac{i}{4}\left( e^aJ_0^+-e^{-a}J_0^-\right) ,\\ S_3=-\frac{1}{2}J_0^3, \end{array}\right. }\quad {\left\{ \begin{array}{ll} T_1=-\frac{1}{4}\left( e^{-a}J_1^++e^{a}J_1^-\right) ,\\ T_2=-\frac{i}{4}\left( e^{-a}J_1^+-e^{a}J_1^-\right) ,\\ T_3=-\frac{1}{2}J_1^3. \end{array}\right. } \end{aligned}$$
(6.40)

Using (6.40), eqs (6.39) become

$$\begin{aligned}{} & {} \partial _{\eta }S_a=2i\sum _{b,c}\epsilon ^{abc}w_b(2a)T_bS_c, \end{aligned}$$
(6.41)
$$\begin{aligned}{} & {} \partial _{\xi }T_a=-2i\sum _{b,c}\epsilon ^{abc}w_b(2a)S_bT_c, \end{aligned}$$
(6.42)

which are of the same form as (6.26)-(6.27) in [FR] when moving from the light-cone coordinates \(\xi ,\eta \) to the coordinates xt.

6.3 Deformed Gross–Neveu models

Here, we show how to produce the Lax pair and Lagrangian for the deformed Gross-Neveu model discussed in [ABW, Section 16.2] (see also [By] and references therein for the particular case of rank \(M=1\)) as a particular case of our construction. The deformation is controlled by the r-matrix in the potential term which appears naturally in our construction. In fact, more than just the single Lagrangian and its Lax pair, we can in principle generate all the elementary Lagrangians in the whole Lagrangian multiform and all the elementary Lax pairs for the hierarchy containing this model as its main representative. This explains the origin of the integrability of such a class of models observed in [ABW, By] and is seen to be a particular case of our construction.

The idea is to apply a reduction, in the spirit of [Mik], to a Zakharov–Mikhailov model. The r-matrix could in principle be any skew-symmetric solution of the CYBE as we have already mentioned. Of course, if we want to resort to our explicit formulas for Lagrangians or Lax matrices, then it will be either the rational or trigonometric one since we have given an explicit construction only in those cases. Nevertheless, we will write most results without specifying the r-matrix to emphasize this observation.

Choose the data in (6.3)–(6.4) as follows

$$\begin{aligned} S= & {} \{ a,a^*\},~~a\notin \mathbb {R},~~\mathfrak {g}= \mathfrak {gl}_N, \end{aligned}$$
(6.43)
$$\begin{aligned} F(\lambda )= & {} \frac{A}{\lambda -a} -\frac{A^\dagger }{\lambda -a^*}. \end{aligned}$$
(6.44)

In particular, we chose \(N_a=1=N_{a^*}\). As mentioned, we want to use the idea of reduction which we implement as a reality condition on the objects of the theory. Writing

$$\begin{aligned} Q^a(\lambda _a)=\sum _{k=-1}^\infty Q_k^{a}(\lambda -a)^k \end{aligned}$$
(6.45)

and

$$\begin{aligned} Q^{a^*}(\lambda _{a^*})=\sum _{k=-1}^\infty Q_k^{a^*}(\lambda -{a^*})^k \end{aligned}$$
(6.46)

we require \(Q_k^{{a^*}}=-\left( Q_k^{a}\right) ^\dagger \) for all \(k\ge -1\). Accordingly, at the group level, we require that when writing

$$\begin{aligned} \varphi ^{-1}_a(\lambda _a)=\sum _{k=0}^\infty \widetilde{\varphi }_k^{a}(\lambda -a)^k \end{aligned}$$
(6.47)

and

$$\begin{aligned} \varphi _{a^*}(\lambda _{a^*})=\sum _{k=0}^\infty \varphi _k^{{a^*}}(\lambda -{a^*})^k \end{aligned}$$
(6.48)

we must have \(\widetilde{\varphi }_k^{a}=\left( \varphi _k^{{a^*}}\right) ^\dagger \) for all \(k\ge 0\). Then, for any skew-symmetric r-matrix which is well-defined at \(\lambda =a\) and \(\mu =a^*\), a direct computation gives

$$\begin{aligned} \mathscr {L}^{a,a^*}_{-1-1}= & {} {{\,\textrm{Tr}\,}}\left( \left( \varphi _0^{{a^*}}\right) ^\dagger \partial _{t_{-1}^{a^*}}\varphi _0^{{a}} A+ \left( \varphi _0^{{a}}\right) ^\dagger \partial _{t_{-1}^{a}}\varphi _0^{{a^*}}A^\dagger \right) \nonumber \\{} & {} +{{\,\textrm{Tr}\,}}_{12}\left( r_{12}(a,a^*)\left( \varphi _0^{{a}} A\left( \varphi _0^{{a^*}}\right) ^\dagger \right) _1 \left( \varphi _0^{{a}} A\left( \varphi _0^{{a^*}}\right) ^\dagger \right) _2^\dagger \right) . \end{aligned}$$
(6.49)

It remains to choose A as a rank M matrix and parametrize it as \(A=(uv)^\dagger \) where u is a constant \(N\times M\) matrix and v is a constant \(M\times N\) matrix (\(M\le N\)). Then, setting \(U=\varphi _0^{{a^*}}u\), \(V=v\left( \varphi _0^{{a}}\right) ^\dagger \), \(t_{-1}^a={\bar{z}}\) and \(t_{-1}^{a^*}=z\), we get

$$\begin{aligned} \mathscr {L}^{a,a^*}_{-1-1}={{\,\textrm{Tr}\,}}\left( V\partial _{{\bar{z}}}U+U^\dagger \partial _z V^\dagger \right) +{{\,\textrm{Tr}\,}}_{12}\left( r_{12}(a,a^*)(UV)_1\left( UV\right) _2^\dagger \right) . \end{aligned}$$
(6.50)

This is the Lagrangian given in [ABW] (without the covariant derivative), with the relation to their notation being \(r_a(A)_1={{\,\textrm{Tr}\,}}_{2}(r_{12}(a,a^*)A_2)\) so that the potential term reads

$$\begin{aligned} {{\,\textrm{Tr}\,}}_{12}\left( r_{12}(a,a^*)\left( UV\right) _1^\dagger (UV)_2\right) ={{\,\textrm{Tr}\,}}\left( r_{a}(UV)\left( UV\right) ^\dagger \right) . \end{aligned}$$

The interpretation of the parameter appearing in the r-matrix (a here, s in [ABW]) is clear in our context: it corresponds to the pole structure of the constant matrix in our data (6.4).

The corresponding Lax pair is derived from (3.14) and reads, with \(K=UV\),

$$\begin{aligned} V_{-1}^{a}(\lambda )={{\,\textrm{Tr}\,}}_2 \big ( r_{12}(\lambda ,a)K^\dagger _2\big ),~~V_{-1}^{a^*}(\lambda )=-{{\,\textrm{Tr}\,}}_2 \big ( r_{12}(\lambda ,a^*)K_2\big ), \end{aligned}$$
(6.51)

and coincides with the Lax connection (16.7) in [ABW]. Hence, the zero curvature equation yields

$$\begin{aligned}{} & {} \partial _z {{\,\textrm{Tr}\,}}_2 \big ( {{\,\textrm{res}\,}}_\lambda ^a r_{12}(\lambda ,a)K^\dagger _2\big )=\left[ {{\,\textrm{Tr}\,}}_2 \big ( {{\,\textrm{res}\,}}_\lambda ^a r_{12}(\lambda ,a)K^\dagger _2\big ), {{\,\textrm{Tr}\,}}_2 \big ( r_{12}(a,a^*)K_2\big )\right] ,\\{} & {} \partial _{{\bar{z}}} {{\,\textrm{Tr}\,}}_2 \big ( {{\,\textrm{res}\,}}_\lambda ^{a^*} r_{12}(\lambda ,a^*)K_2\big )=\left[ {{\,\textrm{Tr}\,}}_2 \big ( r_{12}(\lambda ,a)K^\dagger _2\big ), {{\,\textrm{Tr}\,}}_2 \big ( {{\,\textrm{res}\,}}_\lambda ^{a^*} r_{12}(\lambda ,a^*)K_2\big ) \right] , \end{aligned}$$

which reduces toFootnote 5

$$\begin{aligned}{} & {} \partial _z K^\dagger =\left[ K^\dagger , {{\,\textrm{Tr}\,}}_2 \big ( r_{12}(a,a^*)K_2\big )\right] ,\\{} & {} \partial _{{\bar{z}}} K_2=\left[ {{\,\textrm{Tr}\,}}_2 \big ( r_{12}(\lambda ,a)K^\dagger _2\big ), K \right] . \end{aligned}$$

In our opinion, it is rather beautiful that our generating Lagrangian multiform produces this class of models which was originally obtained via a completely different method, related to 4d Chern–Simons theory (see the conclusion for details and references). Unlike the latter method which necessarily focuses on a single Lagrangian at a time, we can also obtain all the Lagrangians corresponding to the higher commuting flows of the hierarchy, if desired.

7 Coupling Integrable Hierarchies Together

To show the flexibility of the construction, we explain by way of two examples how we can couple integrable field theories together in a simple way. The reader familiar with integrable hierarchies will recognize the procedure of assembling elementary time flows and the corresponding Lax matrices into linear combinations. What we gain here is the possibility to derive the corresponding Lagrangian (multiform) systematically for the new model as well. The procedure is an analog in the ultralocal case of the construction presented in [DLMV1] for a class of non ultralocal field theories. Unlike the latter, the coupling here is at the level of an entire hierarchy. We give an example in the rational class and one in the trigonometric class of models. In the rational class, we couple together the AKNS hierarchy with the hierarchy of the Faddeev-Reshetikhin model (the simplest instance of a ZM model). In the trigonometric class, we couple the sine-Gordon hierarchy as discussed in Sect. 5 with the hierarchy of the anisotropic Faddeev-Reshetikhin model as presented in Sect. 6.2. In each case, for conciseness, we present all the details for the lowest levels of the hierarchy but it should be clear by now that one can extract higher levels (Lagrangians and Lax matrices) systematically if desired.

7.1 AKNS-FR hierarchy

To couple models in the AKNS hierarchy with models in the simplest ZM hierarchy (with two poles), we assemble the corresponding data as

$$\begin{aligned} S=\{a,-a,\infty \},~~a\in \mathbb {C}^\times ,N_a=N_b=1,~~N_\infty =0,~~\mathfrak {g}=\mathfrak {sl}_2, \end{aligned}$$
(7.1)

and we choose

$$\begin{aligned} F(\lambda )=-i\alpha \sigma _3+\frac{A}{\lambda -a}+\frac{B}{\lambda +a}\equiv \alpha F^{AKNS}(\lambda )+F^{FR}(\lambda ), \end{aligned}$$
(7.2)

where AB are constant \(\mathfrak {sl}_2\) matrices. The parameter \(\alpha \) is the coupling between the two theories: \(\alpha =0\) gives a pure FR theory while sending \(\alpha \) to infinity produces a pure AKNS hierarchy. The effect of multiplying \(F^{AKNS}(\lambda )=-i\sigma _3\) by \(\alpha \) is to yield \(Q^\infty (\lambda _\infty )=\alpha Q(\lambda )\) where \(Q(\lambda )\) is the AKNS series (4.4). Hence the Lax matrix \(V_n^\infty (\lambda )\) is equal to the AKNS Lax matrix \(V_n(\lambda )\) multiplied by \(\alpha \). With this in mind, we have for instance \(V_1^\infty (\lambda )=-i\alpha \lambda \sigma _3+\alpha Q_1\).

For simplicity, we illustrate the coupling by looking at the two main models in each hierarchy( NLS in AKNS and FR in ZM), i.e. by considering the Lax pair

$$\begin{aligned} V_{-1}^a(\lambda )+V_1^\infty (\lambda )\equiv U(\lambda ),~~ V_{-1}^b(\lambda )+V_2^\infty (\lambda )\equiv V(\lambda ), \end{aligned}$$
(7.3)

with associated times \(\xi \) and \(\eta \) respectively. This choice of Lax pair corresponds to assembling the four flows \(t_1^\infty \), \(t_2^\infty \) (AKNS) and \(t_{-1}^a\), \(t_{-1}^{-a}\) (FR) such that \(\partial _\xi =\partial _{t^{a}_{-1}}+\partial _{t_1^\infty }\) and \(\partial _{\eta }=\partial _{t^{-a}_{-1}}+\partial _{t^\infty _2}\).

Denoting \(Q_{-1}^a=J_0\) and \(Q_{-1}^{-a}=J_1\) and recalling the above comments on the effect of multiplying by \(\alpha \), we have

$$\begin{aligned} U(\lambda )=\frac{J_0}{\lambda -a} -i\alpha \lambda \sigma _3+\alpha Q_1 \equiv U_{FR}(\lambda )+\alpha U_{NLS}(\lambda ), \end{aligned}$$
(7.4)
$$\begin{aligned} V(\lambda )= \frac{J_1}{\lambda -b} -i\lambda ^2 \alpha \sigma _3+\lambda \alpha Q_1+\alpha Q_2 \equiv V_{FR}(\lambda )+\alpha V_{NLS}(\lambda ). \end{aligned}$$
(7.5)

The zero curvature equation \(\partial _\eta U(\lambda )-\partial _\xi V(\lambda ) + [U(\lambda ),V(\lambda )] =0\) yields the following four (matrix) equations by looking at the residue at \(\lambda =a\), \(\lambda =-a\), \(\lambda =\infty \) and at the constant term in the \(1/\lambda \) expansion respectively,

$$\begin{aligned} \partial _\eta J_0+\frac{1}{2a}\left[ J_0,J_1\right] +\alpha \left[ J_0,V_{NLS}(a)\right] =0, \end{aligned}$$
(7.6)
$$\begin{aligned} \partial _\xi J_1+\frac{1}{2a}\left[ J_0,J_1\right] -\alpha \left[ U_{NLS}(-a),J_1\right] =0, \end{aligned}$$
(7.7)
$$\begin{aligned} \alpha \partial _{\xi }Q_1+i\alpha ^2[\sigma _3,Q_2]+i\alpha [J_0,\sigma _3]=0, \end{aligned}$$
(7.8)
$$\begin{aligned} \alpha \partial _{\eta }Q_1-\alpha \partial _{\xi } Q_2+\alpha ^2[Q_1,Q_2] -ia\alpha [J_0,\sigma _3]-i\alpha [\sigma _3,J_1]+\alpha [J_0,Q_1]=0. \end{aligned}$$
(7.9)

Setting \(\alpha =0\), (7.6) and (7.7) gives the FR version of the principal chiral model [ZM2, FR] which is usually written as

$$\begin{aligned} \partial _{\eta } J_0+\partial _{\xi } J_1+\frac{1}{a}\left[ J_0,J_1\right] =0,~~\partial _{\eta } J_0-\partial _{\xi } J_1=0. \end{aligned}$$
(7.10)

In the limit \(\alpha \rightarrow \infty \) (recall that \(\partial _{\xi }\) scales like \(\alpha \partial _{t_1}\) and \(\partial _{\eta }\) scales like \(\alpha \partial _{t_2}\), with \(t_1\), \(t_2\) the NLS times), we see that (7.8)–(7.9) yield the NLS system (4.13)–(4.14)

$$\begin{aligned} \partial _{t_1}Q_1+i[\sigma _3,Q_2]=0,~~ \partial _{t_2}Q_1-\partial _{t_1} Q_2+[Q_1,Q_2]=0. \end{aligned}$$
(7.11)

The Lagrangian of this coupled model is obtained by adding the NLS Lagrangian \(\mathscr {L}_{12}^{\infty \infty }\) (which is \(\mathscr {L}_{12}\) in (4.11) properly rescaled)

$$\begin{aligned} \mathscr {L}_{12}^{\infty \infty }= & {} \frac{\alpha }{2} (f_1 \partial _{t_2^\infty } e_{1} - e_1 \partial _{t_2^\infty } f_{1}) - \frac{\alpha }{2} \sum _{j=1}^{2}(f_j \partial _{t_1^\infty } e_{2-j+1} - e_j \partial _{t_1^\infty } f_{2-j+1})\nonumber \\{} & {} -\alpha ^2\left( 2ie_2f_2+e_1^2f_1^2\right) , \end{aligned}$$
(7.12)

the FR Lagrangian \(\mathscr {L}_{-1-1}^{a,-a}\)

$$\begin{aligned} \mathscr {L}_{-1-1}^{a,-a}={{\,\textrm{Tr}\,}}\left[ (\phi _0^{a})^{-1} \partial _{t_{-1}^{-a}} \phi _0^{a}A- (\phi _0^{-a})^{-1} \partial _{t_{-1}^{a}} \phi _0^{-a}B-\frac{J_0J_1}{2a}\right] , \end{aligned}$$
(7.13)

and the following two mixed elementary Lagrangians (discarding some irrelevant total derivatives),

$$\begin{aligned} \mathscr {L}_{-12}^{a,\infty }= & {} {{\,\textrm{Tr}\,}}\left[ (\phi _0^{a})^{-1} \partial _{t_{2}^{\infty }} \phi _0^{a}A\right] - \frac{\alpha }{2} \sum _{j=1}^{2}(f_j \partial _{t_{-1}^{a}} e_{2-j+1} - e_j \partial _{t_{-1}^{a}} f_{2-j+1})\nonumber \\{} & {} -\alpha {{\,\textrm{Tr}\,}}\left[ J_0V_{NLS}(a)\right] , \end{aligned}$$
(7.14)
$$\begin{aligned} \mathscr {L}_{1-1}^{\infty ,-a}=\frac{\alpha }{2} (f_1 \partial _{t_{-1}^{-a}} e_{1} - e_1 \partial _{t_{-1}^{-a}} f_{1})-{{\,\textrm{Tr}\,}}\left[ (\phi _0^{-a})^{-1} \partial _{t_{1}^{\infty }} \phi _0^{-a}B+\alpha J_1U_{NLS}(-a)\right] .\nonumber \\ \end{aligned}$$
(7.15)

Summing we get our Lagrangian for the coupled model

$$\begin{aligned} \mathscr {L}_{\mathrm{NLS-FR}}= & {} {{\,\textrm{Tr}\,}}\left[ (\phi _0^{a})^{-1} \partial _{\eta } \phi _0^{a} A-(\phi _0^{-a})^{-1} \partial _{\xi } \phi _0^{-a}B\right] \nonumber \\{} & {} +\frac{\alpha }{2} (f_1 \partial _{\eta } e_{1} - e_1 \partial _{\eta } f_{1}) - \frac{\alpha }{2} \sum _{j=1}^{2}(f_j \partial _{\xi } e_{2-j+1} - e_j \partial _{\xi } f_{2-j+1})\nonumber \\{} & {} -\alpha ^2\left( 2ie_2f_2+e_1^2f_1^2\right) -{{\,\textrm{Tr}\,}}\left[ \frac{J_0J_1}{2a} +\alpha J_0 V_{NLS}(a)-\alpha J_1 U_{NLS}(-a) \right] .\nonumber \\ \end{aligned}$$
(7.16)

It can be checked directly that the variations with respect to \(\phi _0^{a}\), \(\phi _0^{-a}\), \(e_{2},f_2\) and \(e_1,f_1\) gives (7.6), (7.7), (7.8) and (7.9) respectively.

7.2 sG-aFR hierarchy

The same strategy can of course be applied in the trigonometric case and we illustrate this by assembling the data of the sine-Gordon (sG) hierarchy as in Sect. 5 with that of the anisotropic Faddeev-Reshetikhin (aFR) model as in Sect. 6.2, in the following way

$$\begin{aligned} S=\{0,a,b,\infty \},~~a,b\in \mathbb {C}^\times ,N_0=N_a=N_b=N_\infty =1,~~\mathfrak {g}=\mathfrak {sl}_2, \end{aligned}$$
(7.17)

and we choose

$$\begin{aligned} F(\lambda )=\frac{i\beta }{2}\left( \frac{1}{\lambda }\sigma _++\sigma _--\sigma _+-\lambda \sigma _-\right) +\frac{A}{\lambda -a}+\frac{B}{\lambda -b}, \end{aligned}$$
(7.18)

where AB are constant \(\mathfrak {sl}_2\) matrices and it is understood that \(b=1/a\). We keep b instead of 1/a as it makes notations lighter but all calculations are done with \(b=1/a\). The parameter \(\beta \) is the coupling between the two theories: \(\beta =0\) gives a pure aFR theory while sending \(\beta \) to infinity produces a pure sG model.

To illustrate the procedure on the easiest case, we choose the main representative of each hierarchy, i.e. we consider the Lax pair (recall from Sect. 6.2 that we set \(J_0=Q_{-1}^a=\phi _0^{a}A(\phi _0^{a})^{-1}\) and \(J_1=Q_{-1}^b=\phi _0^{b}B(\phi _0^{b})^{-1}\))

$$\begin{aligned} U(\lambda )&=V_{-1}^a(\lambda )+V_0^0(\lambda )= {} \frac{-aJ_0^>-\lambda J_0^<}{\lambda -a} -\frac{i\beta }{4} \begin{pmatrix} -C_1^0 &{}{} 2e^{i\frac{u}{2}}/\lambda \\ 2e^{-i\frac{u}{2}} &{}{} C_1^0 \end{pmatrix}\nonumber \\{}&{} \equiv U_{\text {aFR}}(\lambda )+\beta U_{\text {sG}}(\lambda ),\end{aligned}$$
(7.19)
$$\begin{aligned} V(\lambda )&=V_{-1}^b(\lambda )+V_0^\infty (\lambda )= \frac{-b J_1^>-\lambda J_1^<}{\lambda -b} -\frac{i\beta }{4} \begin{pmatrix} &{} B_1^\infty 2e^{-i\frac{u}{2}}\\ 2\lambda e^{i\frac{u}{2}} &{} -B_1^\infty \end{pmatrix}\nonumber \\&\equiv V_{\textrm{aFR}}(\lambda )+\beta V_{\textrm{sG}}(\lambda ), \end{aligned}$$
(7.20)

with associated times \(\xi \) and \(\eta \) respectively. This corresponds to assembling the two sG times \(t_0^0\), \(t_0^\infty \) with the two aFR times \(t_{-1}^a\), \(t_{-1}^b\) such that \(\partial _\xi =\partial _{t^{a}_{-1}}+\partial _{t_0^0}\) and \(\partial _{\eta }=\partial _{t^{b}_{-1}}+\partial _{t^\infty _0}\). The zero curvature equation \(\partial _\eta U(\lambda )-\partial _\xi V(\lambda ) + [U(\lambda ),V(\lambda )] =0\) yields the following four equations by looking at the residue at \(\lambda =0\), \(\lambda =\infty \), \(\lambda =a\), \(\lambda =b\) respectively,

$$\begin{aligned} u_\eta +\beta B_1^\infty =-2iJ_1^3, \end{aligned}$$
(7.21)
$$\begin{aligned} u_\xi +\beta C_1^0=-2iJ_0^3, \end{aligned}$$
(7.22)
$$\begin{aligned} \partial _\eta J_0=\left[ V_{\textrm{aFR}}(a),J_0\right] -\beta \left[ J_0,V_{\textrm{sG}}(a)\right] , \end{aligned}$$
(7.23)
$$\begin{aligned} \partial _\xi J_1=\left[ U_{\textrm{aFR}}(b),J_1\right] +\beta \left[ U_{\textrm{sG}}(b),J_1\right] . \end{aligned}$$
(7.24)

Equations (7.21)–(7.22) should be compared with the first two equations in (5.20) and (7.23)–(7.24) should be compared with (6.38). The last independent equation contained in the zero curvature can be obtained for instance by setting \(\lambda =1\). It can be shown that only the component \(\sigma _3\) gives an equation that is not a consequence of those already written. It takes the form

$$\begin{aligned}{} & {} \frac{i\beta }{4}\left( \partial _{\eta }C_1^0+\partial _{\xi }B_1^\infty \right) -\frac{i\beta ^2}{2}\sin u\nonumber \\{} & {} \quad +\frac{1}{2}\frac{a+1}{a-1}\left( \partial _{\eta }J_0^3+\partial _{\xi }J_1^3\right) +\frac{1}{(a-1)(b-1)}\left( aJ_0^+J_1^--bJ_0^-J_1^+\right) \nonumber \\{} & {} \quad +\frac{i\beta }{2(a-1)}\left( J_0^-e^{-i\frac{u}{2}}-aJ_0^+e^{i\frac{u}{2}}-J_1^+e^{-i\frac{u}{2}}+aJ_1^-e^{i\frac{u}{2}}\right) =0. \end{aligned}$$
(7.25)

We can use (7.21)–(7.24) to cast this equation in the following more suggestive form which shows the coupling between sG and the aFR currents

$$\begin{aligned} i\left( \partial _{\eta }J_0^3+\partial _{\xi }J_1^3\right) +u_{\eta \xi }+\beta ^2\sin u+\frac{\beta }{2}\left( (J_0^- -J_1^+)e^{-i\frac{u}{2}}+a(J_1^--J_0^+)e^{i\frac{u}{2}}\right) =0.\nonumber \\ \end{aligned}$$
(7.26)

We can derive the Lagrangian producing (7.21)–(7.24) and (7.26) by adding the sine-Gordon Lagrangian (5.26) (with appropriate inclusion of \(\beta \))

$$\begin{aligned} \mathscr {L}_\textrm{sG}=\frac{\beta }{4}C_1^0 \partial _{t^\infty _0}u+\frac{\beta }{4}B_1^\infty \partial _{t^0_0}u-\frac{\beta ^2}{4}\left( e^{iu}+e^{-iu}-C_1^0B_1^\infty \right) , \end{aligned}$$
(7.27)

the anisotropic FR Lagrangian (6.37)

$$\begin{aligned} \mathscr {L}_\textrm{aFR}= {{\,\textrm{Tr}\,}}\left( (\phi _0^{a})^{-1}\partial _{t_{-1}^b}\phi _0^{a}A-(\phi _0^{b})^{-1}\partial _{t_{-1}^a}\phi _0^{b}B-\frac{b}{b-a}J_0J_1+J_0J_1^<\right) , \end{aligned}$$
(7.28)

and the following two mixed elementary Lagrangians

$$\begin{aligned} \mathscr {L}_{-10}^{a,\infty }={{\,\textrm{Tr}\,}}\left( (\phi _0^{a})^{-1}\partial _{t^\infty _0}\phi _0^{a}A+\frac{\beta }{4}C_1^0 \partial _{t_{-1}^a}u -\beta J_0V_{\textrm{sG}}(a)\right) , \end{aligned}$$
(7.29)
$$\begin{aligned} \mathscr {L}_{0-1}^{0,b}={{\,\textrm{Tr}\,}}\left( \frac{\beta }{4}C_1^0 \partial _{t_{-1}^b}u-(\phi _0^{b})^{-1}\partial _{t^0_0}\phi _0^{b}B +\beta J_1U_{\textrm{sG}}(b)\right) . \end{aligned}$$
(7.30)

We obtain

$$\begin{aligned}{} & {} \mathscr {L}_{\mathrm{sG-aFR}}=\frac{\beta }{4}C_1^0 \partial _{\eta }u+\frac{\beta }{4}B_1^\infty \partial _{\xi }u+{{\,\textrm{Tr}\,}}\left( (\phi _0^{a})^{-1}\partial _{\eta }\phi _0^{a}A-(\phi _0^{b})^{-1}\partial _{\xi }\phi _0^{b}B\right) \\{} & {} \quad -\frac{\beta ^2}{4}\left( e^{iu}+e^{-iu}-C_1^0B_1^\infty \right) -{{\,\textrm{Tr}\,}}\left( \frac{b}{b-a}J_0J_1+J_0J_1^< + \beta J_0V_{\textrm{sG}}(a) -\beta J_1U_{\textrm{sG}}(b)\right) \nonumber . \end{aligned}$$
(7.31)

The variation with respect to \(C_1^0\), \(B_1^\infty \), \(\phi _0^{a}\), \(\phi _0^{b}\), and u gives (7.21), (7.22), (7.23), (7.24), and (7.25) respectively.

8 Discussion and Conclusion

By introducing a certain generating Lagrangian multiform, we were able to relate two important but so far separate aspects of integrable systems: the well established theory of the classical r-matrix and the comparatively much newer framework of Lagrangian multiforms. In doing so, we bring closer together the vast amount of results in the Hamiltonian approach to integrable systems and the Lagrangian approach in the form advocated in the seminal paper [LN]. A rich byproduct of this effort is that the generating Lagrangian multiform and its accompanying generating Lax equation and zero curvature equation provide a systematic framework to construct integrable hierarchies of field theories, both in terms of Lagrangians and of Lax matrices. This was illustrated at length over many examples, both known and new. As already emphasised in the introduction, this versatility to accommodate a very large class of examples stems from the fact that we work in the adèlic framework.

The most immediate open question that comes to mind relates to the restrictions imposed on the classical r-matrix appearing in the generating Lagrangian multiform. Certain aspects of our construction appear to remain true under only the assumption that r is a solution of the CYBE (1.1). In particular, the restriction to the rational or trigonometric case that we studied in detail only played a role in the explicit construction of the projectors associated to the decomposition of the Lie algebra of \(\mathfrak {g}\)-valued adèles. It is easy to imagine that one could use a more general skew-symmetric r-matrix provided similar technicalities can be dealt with. Specifically, given a solution r of the CYBE, one would like to establish results along the following lines:

- Define a pair of linear operators on the Lie algebra of \(\mathfrak {g}\)-valued adèles \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) as

$$\begin{aligned} \pi _\pm : \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) \longrightarrow \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}), \quad \varvec{X}(\varvec{\lambda }) \longmapsto \big ( (\pi _\pm X)_a(\lambda _a) \big )_{a \in \mathbb {C}P^1} \end{aligned}$$
(8.1)

with formulas similar to e.g. (2.21).

- Show that the linear maps \(\pi _\pm \) so defined are projection operators onto complementary subspaces of \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\), i.e. \((\pi _\pm )^2 = \pi _\pm \) and \(\pi _+ + \pi ^k_- = id \).

- Show that the images \(\pi _\pm \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) of the projection operators \(\pi _\pm \) are both Lie subalgebras of \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) and are isotropic with respect to the bilinear form analogous to that defined in (2.2).

If one could accomplish this then it would follow that one would have a direct sum decomposition of \(\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) into complementary Lagrangian Lie subalgebras

$$\begin{aligned} \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) = \pi _+ \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}) \dotplus \pi _- \varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g}). \end{aligned}$$

The corresponding r-matrix would be defined as \(r {:}{=}\pi _+ - \pi _- \in {{\,\textrm{End}\,}}\varvec{\mathcal {A}}_{\varvec{\lambda }}(\mathfrak {g})\) and would presumably have a kernel of the form \(\big ( (\iota _{\mu _b} \iota _{\lambda _a} + \iota _{\lambda _a} \iota _{\mu _b}) r_{12}(\lambda , \mu ) \big )_{a, b \in \mathbb {C}P^1}\). We could then use this kernel into our generating Lagrangian multiform and construct integrable hierarchies by the same method as we have done. One candidate to see if such a programme can be realised is the elliptic r-matrix [Sk, Be].

The other obvious restriction of the present work is the condition that r be skew-symmetric. In fact, we wrote the CYBE (1.1) in its non-skew-symmetric form on purpose. Once again, some of our results appear to hold without this assumption. This is the case for the commutativity of the vector fields (3.9) as can be seen from the proof of Proposition 3.4. The extension of our construction to the non-skew-symmetric case, hence to non-ultralocal field theories, appears rather challenging as the current form of our generating Lagrangian multiform simply does not allow for such an extension. We are currently investigating this exciting issue which promises to have connection with the framework of classical affine Gaudin models, developed in [V1, DLMV2], that provides a unifying formalism for constructing and studying a very broad class of non-ultralocal classical integrable field theories. A first step in that direction was achieved recently [CDS] where it was shown how to incorporate the non-skew symmetric case naturally in the context of finite-dimensional integrable hierarchies.

It was shown in [V2] that classical affine Gaudin models are closely related to 4d mixed topological-holomorphic Chern–Simons theory introduced and studied in [Cos1, Cos2, CWY1, CWY2, CY], see also [DLMV3, BSV, LV]. In fact, 4d Chern–Simons theory also naturally provides a framework for constructing a very broad class of ultralocal integrable field theories (see also [Zo] for a description of ultralocal integrable field theories as affine Gaudin models). In this context, it was shown in [CSV], see also [FSY], that the rational Zakharov–Mikhailov models, one of the main classes of examples that we reproduced here, could be obtained from 4d Chern–Simons theory with certain line defects. However, the construction of [CSV] is, by design, able to produce only the action of a single Zakharov–Mikhailov model, as opposed to its entire hierarchy, starting from that of 4d Chern–Simons theory. It seems natural to wonder if such a construction, and in fact the whole 4d Chern–Simons approach, could be adapted to our generating Lagrangian multiform framework in order to derive entire integrable hierarchies and not just single models from this point of view.

In the simplest case of the AKNS hierarchy, the concept of Hamiltonian multiform, initially introduced in [CS2], was illustrated in [CS3]. The main idea is that it is possible to apply a version of the covariant Legendre transformation to an entire Lagrangian multiform to obtain the Hamiltonian analog of a multiform. Each coefficient of the resulting Hamiltonian multiform can be seen as a covariant Hamiltonian for the field theory described by the associated Lagrangian coefficient in the Lagrangian multiform. Important accompanying objects are the symplectic multiform and the multitime Poisson bracket which generalise to an entire hierarchy the concepts of multisymplectic form and of covariant Poisson bracket respectively. The latter are essential ingredients of the framework generally called covariant Hamiltonian field theory, see e.g. [Gi] and references therein for a very useful recent review of the many facets of this rich topic. We believe it is important to try and obtain the generating Hamiltonian multiform and related structures corresponding to our generating Lagrangian multiform. Indeed, historically, one of the driving motivations of the above mentioned covariant Hamiltonian approach to field theory has been to allow for a (canonical) quantization of field theories that removes from the start the breaking of covariance associated to the standard Hamiltonian approach. The idea of covariant Hamiltonian field theory is to use a Poisson bracket that does not suffer from the lack of covariance of the traditional Poisson bracket: a covariant Poisson bracket. The results of [CS2, CS3] show that one can extend this idea to a whole integrable hierarchy and that the classical r-matrix plays a key role in this “covariant” context, see also [CS1, CSV]. The hope is that this could allow one to use the nice features of integrability encoded in the passage from the classical r-matrix to the quantum R-matrix, to fully implement the idea of covariant canonical quantization for such field theories.

Finally, our work also opens the possibility for quantization using another route: combining Feynman’s path integral ideas with a Lagrangian multiform, thus taking advantage again of integrability features now encoded in a Lagrangian object entering the path integral. This tantalising idea was first put forward and explored in [KN] but is still very much in its infancy.