1 Introduction

Modern philosophers take seriously the ontological status of fields. But what they usually have in mind are relatively concrete entities, such as the electric and magnetic fields, and not elusive gauge fields, such as the electromagnetic potential. How then, to classify “gauge” degrees of freedom? Do these have an ontological significance similar to electric and magnetic fields, or are they only a notational convenience, born of a redundancy in our representations of the world? In the words of John Earman, are gauge degrees of freedom only “redundant descriptive fluff” [1]?

The eliminativist view of gauge degrees of freedom advocates not only that gauge degrees of freedom are redundant, but that they are also eliminable. The most developed form of eliminativism proposes a different, non-local gauge-invariant basis to describe our physical quantities. Non-local, yes, but controllably so: this is called the holonomy-basis.Footnote 1 Whether one can really write down a theory—an action functional or a Hamiltonian—in terms of holonomies (or Wilson loops) is challenging, to say the least, and so the status of holonomies as fundamental ontological buiding blocks is anything but secure. But we will not pursue this formidable challenge in this paper.

Likewise, the overall status of gauge degrees of freedom is too large a topic to be reviewed here. We plan only to analyze a recent argument against the eliminativist view, and show that it is founded on an incorrect mathematical treatment—and it is therefore not tenable in its current form. In the rest of this section, we introduce the argument and give a prospectus for the paper.

1.1 The \(\theta _{\text {YM}}\)-Term

In a recent paper, [3] engages with the details of the eliminativist program in the context of QCD. Dougherty’s first aim is to convince the reader that a \(\theta _{\text {\tiny YM}}\)-term in the QCD Lagrangian is mandatory.

In brief, the argument is as follows: the \(\theta _{\text {\tiny YM}}\)-term is necessary to account for certain experimental facts. To be more specific: the smallness of the masses of the up and down quarks gives rise to a chiral symmetry, whose effects (a parity doubling of the hadron spectrum, cf. [4, Sec. 19.10]) are not observed in experiments. This means that this chiral symmetry must be broken somehow. But the spontaneous breaking of this symmetry would generate Goldstone bosons, which are also not observed. Therefore, one must be able to break chiral symmetry without creating Goldstone bosons.

A solution is to have the breaking be effected through an anomaly.Footnote 2 Namely, under chiral transformations (also called a global U\((1)_A\) symmetry), it turns out that the path-integral measure for quark fields fails to be invariant: under that transformation the measure acquires a phase. Specifically, for a fermion field of flavor f, the chiral symmetry acts by a shift \(\psi _f\mapsto \exp (i\gamma _5\alpha _f)\psi _f\) (with \(\gamma _5\) the fifth Dirac gamma-matrix), whereas the fermion path-integral transforms asFootnote 3

$$\begin{aligned} \mathcal {D}\psi \mathcal {D}\overline{\psi }\mapsto \exp \left( i2(\theta _{\text {YM}}\text {-term})\sum _f\alpha _f \right) \mathcal {D}\psi \mathcal {D}\overline{\psi }, \end{aligned}$$
(1.1)

where

$$\begin{aligned} \theta _{\text {YM}}\text {-term} = \frac{1}{8\pi ^2} \int \text {tr}( F\wedge F). \end{aligned}$$
(1.2)

Therefore, according to this argument, mathematical consistency and experimental evidence—the lack of both the relevant Goldstone bosons and of the parity doubling of the hadron spectrum—together would provide support for the physical significance of the \( \theta _{\text {YM}}\)-term as arising from a chiral anomaly. It is here important to stress the role fermions play in making the \(\theta _{\text {YM}}\)-term inescapable.

So far, so good. But agreed: this is not the end of the story: such a term would be CP-violating and thus gives rise to other questions of observability. However, the relation between CP-violation and the \(\theta _{\text {YM}}\text {-term}\) is not directly relevant to the central points of this paper, which is why we will avoid discussing it.Footnote 4

Having sketched the broader context for the discussion, we now very briefly embed within it Dougherty’s criticism of the holonomy formalism. Before we begin, it should be stated from the outset that our intention in this paper is only to set straight a specific misunderstanding of this criticism. The main target of our criticism is the mistaken belief that due to how \( \theta _{\text {YM}}\)-term transforms under gauge transformations, it cannot be accounted for within an eliminativist interpretation, such as the holonomy formalism. In his words: “This eliminative interpretation of gauge is at odds with the our current best theory of high-energy physics.” Or, a bit later, in more detail: “In this paper I defend the physical significance of the distinction between large and small gauge transformations against the eliminative interpretation of gauge.” [3, p.1]. We contend that: (1) Dougherty’s ‘large gauge transformations’ are not gauge transformations in the first place (a fact that, as we will prove, goes beyond a terminological dispute), and, more importantly, (2) the objective properties Dougherty (mis)attributes to ‘large gauge transformations’ are in fact captured in an eleminativist formalism such as the holonomy one, which only eliminates the bona-fide gauge transformations. So, first, it is apt to get clear on the distinction between ‘large’ and ‘small’ gauge transformations, and on how, if at all, such distinction could be serviced against the holonomy-based, eliminativist interpretation.

1.2 Dougherty’s Criticism

In his defense of eliminativism, [2] cites [8]’s use of the holonomy formalism in attempting to resolve the \(U(1)_A\) puzzle without the introduction of a \(\theta _{\text {YM}}\)-term (we will briefly describe this puzzle in Sect. 3.3).Footnote 5 According to [3] (cf. p.1, 7, 8, 16) the \(\theta _{\text {YM}}\)-term is only gauge-invariant under gauge transformations that have a particular behaviour at infinity (or at the relevant boundaries); the remaining transformations, called ‘large gauge transformations’, do not, according to Dougherty, leave the \(\theta _{\text {YM}}\)-term invariant. In his criticism of the eliminativist view, [3, p.1] writes passages such as (italic ours):

That is, a large gauge transformation relates representatives of different physical states. Mathematical differences between these representatives can reflect a physical difference, signaling the existence of some quantities and possibilities that cannot exist according to the received [eliminativist] philosophical position.

Or, later on [3, p.1]:

The Yang-Mills \([\theta \text {-}]\)vacuum term is not preserved by all gauge transformations. If the eliminative view of gauge transformations is right, this means that the Yang-Mills vacuum term is physically meaningless. If gauge transformations are redundancies then mathematical differences between gauge equivalent configurations can’t reflect physical differences. So the value of the Yang-Mills vacuum term can’t represent any physical fact.

Or, again [3, p.9]:

If we reject the size distinction [between small and large gauge transformations] and demand that gauge transformations on the boundary be treated just as gauge transformations elsewhere then this integral [that gives rise to the \(\theta \)-term] is ill-defined. The vacuum Yang-Mills term must therefore be excluded.

Dougherty’s claim then is that the non-eliminativist would be comfortable in separating the wheat from the chaff, for they could say: “some ‘gauge transformations’ relate distinct physical possibilities while others don’t. Thankfully, I, the non-eliminativist, haven’t eliminated any of them, so I can still tell the two kinds apart!” This strategy, it is claimed, is not available to Healey’s preferred holonomy formalism. The claim is that, since Healey’s eliminativism does not license a distinction between different types of gauge transformations, no restriction to one type of gauge transformation is allowed. In particular, one cannot keep just those transformations that would guarantee invariance of the \(\theta _{\text {YM}}\)-term. Therefore Healey would either have to equate what should be physically distinct states—those which, according to Dougherty, correspond to differences due to ‘large gauge transformations’—or be obliged to set \(\theta _{\text {YM}}\) to zero and thereby fall foul of the fact that at least allowing for a non-zero \(\theta _{\text {\tiny YM}}\)-term is a theoretical requirement.

As we hope to make clear, we disagree with Dougherty’s argument and conclusions. In particular, we disagree that “The Yang-Mills \([\theta \text {-}]\)vacuum term is not preserved by all gauge transformations.” It is preserved by all gauge transformations; as long as one is attentive to the strict meaning of these transformations. Our criticism could be chalked off to a terminological dispute, one of little substance to the debate about eliminativism. The reason the criticism matters is that, apart from trivial issues of terminology, holonomies only eliminate the more strict kind of ‘gauge transformations’ and are perfectly well able to register the effects of what Dougherty calls “large gauge transformations”. In particular, the \(\theta _{\text {\tiny YM}}\)-term contribution to the Yang-Mills action is gauge invariant and can be expressed in terms of holonomies. Indeed, lattice QCD, a formalism that employs holonomies (or rather, Wilson loops) as its basic variables, includes \(\theta \)-terms without any hangups (see e.g. [9] and references therein).

1.3 Our Criticism of Dougherty’s Criticism

Dougherty’s argument that the \(\theta _{\text {YM}}\)-term is only gauge-invariant under gauge transformations that have a particular behaviour at the boundaries is incorrect. For the \(\theta _{\text {YM}}\)-term is manifestly gauge-invariant under the action of all gauge transformations.

Nonetheless, behind Dougherty’s argument, there is a subtle and tempting reason to erroneously assume that the \(\theta _{\text {YM}}\)-term is gauge-variant. For, as Dougherty correctly states, the \(\theta _{\text {YM}}\)-term can also be expressed as a pure boundary contribution to the action functional over a topologically trivial domain M (i.e. one diffeomorphic to a 4-disk). And it is well-known that this boundary contribution (over the 3-sphere), which takes the form of a Chern-Simons boundary integral, can acquire different values even on configurations that have vanishing curvature, and are often thus called ‘pure-gauge’ (however, see the comment below Equation (3.1) for why this practice is misleading). The values of such boundary contributions can differ by an integer multiple of \(2\pi \). So, it would be natural to say that these values have some sort of gauge-dependence, i.e. that they change under “large gauge transformations”; this putative change is the one Dougherty wrongly appeals to in his argument.

The mistake, to be explicated below, is partly due to a terminological confusion: it lies in the construal of the term “large gauge transformation”, which is often mixed up with what are called “transition functions.” Although transition functions share some features with gauge transformations, they are fundamentally different objects which encode gauge invariant information. It is only under a particular type of change in the transition functions—changes which cannot be attributed to any gauge transformation—that the \(\theta _{\text {YM}}\)-term fails to be invariant. Thus, in order to clarify the mistake, it is helpful to first clarify the terminology. But to be clear: independent of the terminological dispute, the physical effects of transition functions can be captured by the holonomy formalism, and so are no obstacle to the eliminativist interpretation of gauge.

In practice, the term “large gauge transformation” has been used with two meanings:

(i) a smooth Lie-group-valued function on space or spacetimeFootnote 6 that is not connected to the group identity, i.e. not infinitesimally generated through exponentiation;

(ii) in the presence of asymptotic boundaries, it is a gauge transformation which does not asymptote to the identity.

In this article, we will exclusively use the term “large gauge transformation” in the sense attached to (i), i.e. not being connected to the identity.

To make his argument stick, Dougherty must use transformations that satisfy both (i) and (ii) i.e. transformations whose pullback to the boundary neither vanishes,Footnote 7 nor is connected to the identity. This is because only such transformations would change the value of the boundary Chern-Simons integral which re-expresses the \(\theta _\text {\tiny YM}\)-term.Footnote 8 However, the combination of (i) and (ii), required by Dougherty selects an empty set of functions. This is because there is no smoothFootnote 9 Lie-group valued function over \(\mathbb R^4\) that tends at infinity to a function over \({\partial }\mathbb R^4 \cong S^3\) that is not connected to the identity. This fact is strictly necessary to ensure the mathematical consistency of the equality between the bulk-integral defining the \(\theta _\text {\tiny YM}\)-term (which is manifestly gauge-invariant under all gauge transformations) and its expression in terms of Chern-Simons boundary integral (which is not invariant under large-gauge transformations over \(S^3\)). The goal of the following sections is to explain these facts, dissolve the apparent tension between them, and explore their consequences in sufficient detail.

Here, we briefly sketch with equations an abstract argument showing that the necessary transformations cannot be smoothly extended into the bulk (all notation will be explained later). For now we consider the simplest possible caseFootnote 10: that of a gauge potential A that is pure gauge on a 4-disk \(D^4\). Thus, \(A=g^{-1}{\textrm{d}}g\) for some \(g: D^4\rightarrow G\), and its associated curvature vanishes, i.e. \(F(A)=F(g^{-1}{\textrm{d}}g) =0\), so that the \(\theta _{\text {\tiny YM}}\)-term, defined as \( \frac{1}{8\pi ^2} \int _{ D^4 } \text {tr}( F\wedge F)\), manifestly vanishes—in all gauges. Thus,

$$\begin{aligned} 0 = \frac{1}{8\pi ^2} \int _{ D^4} \text {tr}( F\wedge F) = \frac{1}{24\pi ^2} \oint _{ {\partial }D^4 = S^3} \text {tr}( g^{-1} {\textrm{d}}g \wedge g^{-1} {\textrm{d}}g \wedge g^{-1} {\textrm{d}}g ) = : \textsf{CS}_{ S^3}( h^{-1}{\textrm{d}}h) ,\end{aligned}$$
(1.3)

where the second equality will be shown in the next section; and \(\textsf{CS}_{ S^3}\) is by definition the Chern-Simons functional (on \(S^3\)), with \( h:S^3 \rightarrow G\) here set to \(h=g_{|S^3}\).

The puzzle arises thus: it is a mathematical fact that certain \(\tilde{h}:S^3 \rightarrow G\) yield a non-vanishing \(\textsf{CS}_{S^3}(\tilde{h}^{-1}{\textrm{d}}\tilde{h})\). So how could the above equation (1.3) avoid mathematical inconsistency? In brief: such \(\tilde{h}\)’s are not of the form \(h=g_{|S^3}\) for a smooth \(g: D^4\rightarrow G\). That is, the \(\tilde{h}\)’s that yield these different values are “homotopically” different: they cannot be smoothly deformed into each other, and are thus said to differ by a “large” transformation. At a bit more length, the answer to our question then is that, crucially, large transformations of this kind cannot be extended into the \(D^4 \) bulk smoothly and therefore cannot define “gauge transformations" of the bulk configuration \(A=0\); there are no such transformations whose restriction to the boundary fits in (i) above. In other words, the large boundary transformations required to yield a non-zero value of the Chern-Simons functional are not of the form \(h=g_{|S^3}\) for a smooth \(g:D^4\rightarrow G\); and such transformations would not have the usual properties of gauge transformations. That is: such \(\tilde{h}\) are not restrictions to the boundary of gauge transformations of any kind—which, as we know, leave the value of the \(\theta _{\text {YM}}\)-term invariant. In this understanding, [3, p. 8 and 9] is mistaken when he says that: “we find that the Yang-Mills vacuum term varies under some gauge transformations,” and hence concludes: “if we [...] demand that gauge transformations on the boundary be treated just as gauge transformations elsewhere then [the integral \(\int \text {tr}(F\wedge F)\)] is ill-defined [and] the vacuum Yang-Mills term must therefore be excluded.”

Homotopically different h’s on the right hand side of (1.3) represent physically different configurations also in the bulk, and indeed must be accompanied by different curvatures in the bulk. In due course, we will prove all of these statements, thus avoiding a mathematical contradiction: the gauge-invariance properties of the \(\theta _{\text {YM}}\)-term cannot depend on the way we decide to write it, viz. as a bulk or as a boundary term.

1.4 Prospectus

This paper will proceed as follows. In section 2, we will give a brief introduction to the main mathematical concepts at play. We briefly review Chern classes in Sect. 2.1. There, we will recall what these classes have to do with the \(\theta _{\text {YM}}\) term in QCD, and discuss their gauge and topological invariance. In the following subsection 2.2, we finally bring in what Dougherty calls “large gauge transformations,” that underpin his argument and show in particular that they have nothing to do with gauge-transformations: they are quantities that encode the topological properties of the underlying bundle, and are not related to choices of gauge. Such topological properties are represented by the particular gluing, or relations, between topologically trivial charts; and the winding numbers encode this ‘gluing’ information.

These conclusions are valid for manifolds without boundary. In Sect. 3 we describe how these conclusions can be extended to the context of manifolds with boundaries. Here it is important to distinguish the Euclidean signature setting from the Lorentzian one. In the former case, in section 3.1, we can complete asymptotic boundaries and fall back on the results for the boundary-less manifolds. For the latter case, in section 3.2, we get two disconnected boundaries, and thus (assuming the fields behave nicely at space-like infinity), the \(\theta _{\text {YM}}\) topological invariant becomes a difference of two Chern-Simons terms, or of two winding numbers. Nonetheless, the conclusions about their invariance remains, but now it applies to the difference of winding numbers. In Sect. 4 we conclude: Sect. 4.1 summarizes the main points made in the paper. Finally, in Sect. 4.2, we briefly smoke a peace-pipe with Dougherty, by giving a criticism of our own of the eliminativism he targets. This criticism does take into account the role of the \(\theta _{\text {YM}}\)-term—but not its properties under gauge transformation, which, pace Dougherty, are compatible with eliminativism.

Since this article is an answer to [3], we follow him in accepting the same, intrinsically semiclassical, but standard, account of chiral symmetry breaking, cf. e.g. [4]. However, as we ackowledge in Appendix B, a fully non-perturbative account also exists [10, Ch. 3].

2 Topological Invariants and Fiber Bundles

In this Section, we will introduce aspects of the topology of fiber bundles, and proceed to assess gauge-invariance of the \(\theta _{\text {YM}}\)-term for closed manifolds in several different ways. In Sect. 2.1, we introduce the \(\theta _{\text {YM}}\)-term—also known as the Chern-number. Seen as a bulk, i.e. spacetime, integral, we show both gauge and topological invariance of the term. In Sect. 2.2 we relate this invariant to the appearance of ‘large’ transformations: they appear as Wess-Zumino integrals related to transition functions between charts. We also show that gauge transformations on a 4-dimensional disk-region cannot have non-trivial winding number at its boundary. This is entirely compatible with, and indeed required by, our considerations in this paper.

For completeness, in Appendix A we give a brief introduction to fibre bundles as the mathematical structure underpinning gauge theories. In this appendix we introduce the basic machinery: the connection-form (and its relational interpretation), and the relation between charts, gauge transformations and transition functions, crucial to our appraisal of the conclusions of [3].

Here is a summary of the concepts from Appendix A that we will require in what follows:

Summary of Appendix A. A gauge field configuration can be defined either:

  1. (1)

    “abstractly,” by providing a bundle \(\pi :P\rightarrow M\) and an Ehresmann connection \(\omega \in \Omega ^1(P,\mathfrak g)\); or

  2. (2)

    “in coordinates,” by providing an atlas of charts \(U_\alpha \subset M\), a set of sections \(\sigma _\alpha : U_\alpha \in P \), and compatibleFootnote 11 transition functions \(\mathfrak {t}_{\alpha \beta }:U_{\alpha \beta }\rightarrow G\) (these three ingredients define P), together with a choice of compatibleFootnote 12 gauge fields \(A_\alpha \in \Omega ^1(U_\alpha ,\mathfrak g)\) (this corresponds to the choice of \(\omega \)).

The coordinate description is redundant because it requires the introduction of auxiliary choices of sections, \(\sigma _\alpha \); different choices are related by “gauge transformations” of the \(A_\alpha \)’s and of the \(\mathfrak {t}_{\alpha \beta }\)’s. Therefore, gauge invariance requires all physical observables to depend on the choice of P and \(\omega \) only.Footnote 13

Crucially, transition functions and gauge transformations play entirely different roles. Gauge transformations act on the transition functions, but not vice-versa, and a gauge transformation’s domain of definition is the whole chart \(U_\alpha \), and not merely the overlaps \(U_{\alpha \beta }\) as is the case for the transition functions \(\mathfrak {t}_{\alpha \beta }\)’s. These technical differences reflect the fact that the \(g_{\alpha }\)’s and \(\mathfrak {t}_{\alpha \beta }\)’s play conceptually different roles. From the perspective of P, the gauge transformations \(g_\alpha \)’s encode the freedom of choosing a local section \(\sigma _\alpha \) (which is necessarily defined on the whole of \(U_\alpha \)). Conversely, the \(\mathfrak {t}_{\alpha \beta }\) encode—albeit somewhat redundantly—the way in which the charts are glued to one another, and thus the global structure of the bundle P.

2.1 The Chern-Number

For a closed 4-dimensional manifold M—that is, M compact and without boundary—the quantity (the notation will be explained in a moment, for now it is enough to notice that the integrand depends on A and is gauge-invariant)

$$\begin{aligned} \textsf{Ch}[P]:=\int _{{M}} \textsf{ch}_A \end{aligned}$$

is a topological invariant—not of M—but of the fibre bundle P over M. A connection-form \(\omega \) is defined over P and a collection of local gauge potentials \(A_\alpha \) is defined over an atlas of M, as above. Since \(\textsf{ch}_A\) is gauge-invariant, the integral can then be obtained through an appropriate partition of unity associated to the atlas. As a topological invariant of P, \(\textsf{Ch}[P]\) is not only completely gauge-invariant, but also independent of the choice of \(\omega \) over P. We call \(\textsf{Ch}[P]\) the (second) Chern-number of P.Footnote 14

If we write our physics in terms of gauge potentials, and allow them to live in different bundles, e.g. P and \(P'\), then the potentials A and \(A'\) might lead to different values of \(\textsf{Ch}[P]\). The question then is: how does A “know about” topological properties of P? And how can \(\textsf{Ch}[P]\) depend only on the topology of P and not on the detailed choices e.g. of A that go into its computation? This is the content of the Chern-Weil theorem (e.g. [11, Ch. 11.1]), that we briefly review below.

From now onwards, we will restrict to \(G = {\textrm{SU}}(N)\).

First, the Chern-number is computed as follows:

$$\begin{aligned} \textsf{Ch}[P]=\int _M \textsf{ch}_A=\frac{1}{8\pi ^2}\int _M \text {tr}(F\wedge F) \end{aligned}$$
(2.1)

where

$$\begin{aligned} \textsf{ch}_A:= \frac{1}{8\pi ^2}\text {tr}( F\wedge F). \end{aligned}$$
(2.2)

Of course, \(\textsf{Ch}(P)\) is nothing but the “\(\theta _{\text {\tiny YM}}\)-term,” (cf. (1.2)). Or, more specifically: the \(\theta _{\text {YM}}\)-term in the QCD Lagrangian can be written using (2.1) as:

$$\begin{aligned} \mathcal {L}_\theta = \theta \,\textsf{Ch}[P] \end{aligned}$$
(2.3)

where \(\theta \) is just a real-valued coefficient. The integrand \(\textsf{ch}_A\) defines the second Chern-class of the bundle P. The second Chern-class is manifestly gauge-invariant, given the gauge transformation properties of F (A.8) and the cyclicity of the trace.Footnote 15 This means that on the overlaps \(U_{\alpha \beta }\), \( \textsf{ch}_{A_\alpha } = \textsf{ch}_{A_\beta }\), which is why no chart index appears in the equations above, and why the integral can be performed with no further complications.

This also immediately tells us that \(\textsf{Ch}[P]\) can at most depend on the choice of \(\omega \), and not of gauge (i.e. of sections). We are now ready to review the Chern-Weil theorem, which shows that \(\textsf{Ch}[P]\) is not only gauge-invariant but also independent of the choice of \(\omega \) on P—that is it depends only on the topological properties of P.

A first hint of the ‘topological’ nature of \(\textsf{Ch}[P]\) comes from the observation that it does not change under a small arbitrary variation of A (i.e. the equations of motion of the action \(S[A] = \int \textsf{ch}_A\) are identically satisfied). This follows immediately from \(\delta F={\textrm{d}}_A\delta A\) and the Bianchi identity \({\textrm{d}}_A F=0\) where \({\textrm{d}}_A:={\textrm{d}}+[A, \cdot ]\) is the exterior gauge-covariant derivative (for the adjoint representation). But invariance can be proven also for finite, rather than infinitesimal, changes in connection. Consider two connections A and \(A'\), and now define \(\gamma := A' - A \in \Omega ^1(M)\) and a one-parameter family of connections \(A_s=A+s\gamma \), \(s\in (0,1)\), interpolating between A and \(A'\) (the space of connections is an affine space). Then, denoting the curvature of \(A_s\) as \(F_s\), one finds

$$\begin{aligned} \textsf{ch}_{A'}-\textsf{ch}_{A}&\equiv \frac{1}{8\pi ^2} \int ^1_0 \frac{{\textrm{d}}}{{\textrm{d}}s}\text {tr}( F_s\wedge F_s){\textrm{d}}s\nonumber \\&=\frac{1}{4\pi ^2}\int _0^1\text {tr}({\textrm{d}}_{A_s} \gamma \wedge F_s){\textrm{d}}s=\frac{1}{4\pi ^2} {\textrm{d}}\Big (\int _0^1\text {tr}(\gamma \wedge F_s){\textrm{d}}s \Big ). \end{aligned}$$
(2.4)

Thus the difference \( \textsf{ch}_{A'}-\textsf{ch}_{A}\) is an exact differential form and thus vanishes when integrated over a closed manifold.Footnote 16 Since A and \(A'\) are arbitrary connections, it follows that \(\int _M \textsf{ch}_A\) over a closed manifold P does not depend on the choice of connection, i.e. that it is a topological invariant.

Summary The gauge invariance of \(\textsf{ch}_A\) tells us that \(\textsf{Ch}[P]\) depends at most on \(\omega \), and the Chern-Weil theorem tells us that \(\textsf{Ch}[P]\) does not depend on A (and therefore on \(\omega \)) at all. Therefore, \(\textsf{Ch}[P]\) can only reflect a (topological) property of the bundle P on which the connection is defined. A nontrivial, and extremely deep, fact is that the second Chern number of P is always an integer

$$\begin{aligned} \textsf{Ch}[P] \in \mathbb Z. \end{aligned}$$
(2.5)

We conclude this Section with a simple remark. The discussion above clearly shows that the Chern number (2.1) (and thus the \(\theta _{\text {YM}}\)-term) is gauge-invariant under all possible gauge transformations. And, just to be clear, this even holds at the level of the integrands:

$$\begin{aligned} \textsf{ch}_{A^g}=\textsf{ch}_A \qquad \text{ for } \text{ all }\quad g=g(x). \end{aligned}$$
(2.6)

This fact follows simply from the transformation properties of F (A.8) and the (graded) cyclicity of the trace (for \(\lambda , \eta \) as p and q-forms, respectively)

$$\begin{aligned} \text {tr}(\lambda \wedge \eta )= (-1)^{pq}\text {tr}(\eta \wedge \lambda ). \end{aligned}$$
(2.7)

Therefore any non-gauge invariance of the \(\theta _{\text {\tiny YM}}\)-term is vetoed by this simple demonstration.

2.2 Transition Functions and Large Gauge Transformations

As we have just witnessed, the Chern-number and the so-called \(\theta _{\text {YM}}\)-term, (2.1), is completely gauge-invariant. Thus the inevitable question: whence Dougherty’s claims?

Here we will focus on his claim that “The Yang-Mills \([\theta \text {-}]\)vacuum term is not preserved by all gauge transformations.”, as discussed in Sect. 1.2 (where we include the full quote). We will now argue that one way Dougherty might have arrived at this conclusion, ignoring the previous simple argument for the gauge invariance of the \(\theta _{\text {YM}}\)-term, is through an uncatious invocation of boundaries.

Before we get to boundaries of the entire Universe, in Sect. 3, let us revisit the computation of the Chern-number under a new guise, by breaking up the manifold into charts and therefore introducing internal boundaries. Over each chart we can identify the gauge potential with a \({\mathfrak {g}}\)-valued differential 1-form A. However, this identification does not hold globally as emphasized in our discussion of transition functions (cf. equation (A.3)): one should be careful when drawing global conclusions from the following local statements.

First, we recall that the Chern density (2.2), i.e. \(\textsf{ch}_A:= \frac{1}{8\pi ^2}\text {tr}( F\wedge F)\), is a top-form on a 4-dimensional manifold and it is therefore closed.Footnote 17 Hence, the Poincaré lemma implies that the restriction of \(\textsf{ch}_A\) to a contractible space is exact, i.e. can be written as the differential of a 3-form. Indeed, on each chart \(U_\alpha \)—which is a contractible space where the connection A can be identified with a \({\mathfrak {g}}\)-valued 1-form \(A_\alpha \) (we will omit the chart-label \(\alpha \))—one has the following crucial identityFootnote 18 involving the Chern-Simons 3-form \(\textsf{cs}_A\)Footnote 19

$$\begin{aligned} \textsf{ch}_A = {\textrm{d}}\textsf{cs}_A \qquad \text {where}\qquad \textsf{cs}_A := \frac{1}{8\pi ^2} \text {tr}( A \wedge {\textrm{d}}A+ \tfrac{2}{3} A\wedge A \wedge A) . \end{aligned}$$
(2.8)

There are two subtleties lurking behind this identity: one is the fact that it holds only chart-wise, and the second is that the Chern-Simons form is not gauge-invariant, since:

$$\begin{aligned} \textsf{cs}_{A^g}-\textsf{cs}_{A} = \textsf{wz}_g +\frac{1}{16\pi ^2} {\textrm{d}}\;\text {tr}( {\textrm{d}}g g^{-1} \wedge A) \end{aligned}$$
(2.9)

where the Wess-Zumino term \(\textsf{wz}_g\) is just the Chern-Simons form evaluated on the flat connection \(g^{-1}dg\):

$$\begin{aligned} \textsf{wz}_g := \textsf{cs}_{g^{-1}{\textrm{d}}g}= - \frac{1}{24\pi ^2}\text {tr}(g^{-1}{\textrm{d}}g\wedge g^{-1}{\textrm{d}}g\wedge g^{-1}{\textrm{d}}g). \end{aligned}$$
(2.10)

In particle physics lingo, equations (2.6), (2.8), and (2.9) together say that “while the topological charge [\(\textsf{ch}_A\)] is gauge-invariant, the topological current [\(\textsf{cs}_A\)] is not.” [12, p. 31].

However, as demanded by mathematical consistency between the invariance of \(\textsf{ch}\) and its relation to \(\textsf{cs}\) in the first equation of (2.8), both sides of (2.9) must be closed 3-forms, and therefore \(\textsf{wz}_g\) is necessarily a closed 3-form, i.e.Footnote 20

$$\begin{aligned} {\textrm{d}}\textsf{wz}_g \equiv 0. \end{aligned}$$
(2.11)

Therefore, the gauge invariance of \(\textsf{ch}_A\) is not affected, even if we write it in terms of the gauge-variant functional \(\textsf{cs}\):

$$\begin{aligned} \textsf{ch}_{A^g} = {\textrm{d}}\textsf{cs}_{A^g} = {\textrm{d}}( \textsf{cs}_A + \textsf{wz}_g + {\textrm{d}}\; \frac{1}{16\pi ^2}\text {tr}( {\textrm{d}}g g^{-1} \wedge A) ) = {\textrm{d}}\textsf{cs}_A = \textsf{ch}_{A}. \end{aligned}$$
(2.12)

In particular, taking \(A=0\) and integrating this equation on a manifold with boundary, we see that the boundary integral of the Wess-Zumino term associated to a gauge transformation in the bulk necessarily vanishes. Equation (2.12) is a first important check, which we will now corroborate with a different calculation.

This different computation resolves possible confusion having to do with a particular way of expressing \(\textsf{Ch}[P]\). Namely, there is still one manner of computing \(\textsf{Ch}[P]\) chart by chart, using (2.8), which may confusingly appear gauge-variant. We will now set up the puzzle and then dissolve it. Instead of dealing with these issues on a very general basis, we will specialize our discussion to a more concrete example.

Consider the closed manifold \( M = S^4\) covered by two charts, isomorphic to 4-dimensional disks, \(U_{1}, U_2=D^4\), that overlap on a “transition belt” around the equator, \(U_{12}=S^3\times [-1,1]\).

We know that at the interface, by (A.3), \(A_1=A^{\mathfrak {t}}_2\), \(\mathfrak {t}\equiv \mathfrak {t}_{21}\). Denoting the subsets of the domain of the charts that lies above/below the equator, respectively, by \(\tilde{U}_1 = U_1 \setminus (S^3\times [-1,0])\) and \(\tilde{U}_2 = U_2 \setminus (S^3\times [0,1])\) (notice that \({\partial }\tilde{U}_1 = - {\partial }\tilde{U}_2 = S^3 \times \{0\} \simeq S^3\subset U_{12}\)), we have

$$\begin{aligned} \textsf{Ch}[P]&= \int _{\tilde{U}_1} \textsf{ch}_{A_1} + \int _{\tilde{U}_2} \textsf{ch}_{A_2} \nonumber \\&= \oint _{{\partial }\tilde{U}_1}( \textsf{cs}_{A_1} - \textsf{cs}_{A_2})= \oint _{{\partial }\tilde{U}_1}( \textsf{cs}_{A^{\mathfrak {t}}_2} - \textsf{cs}_{A_2}) = \oint _{{\partial }\tilde{U}_1} \textsf{wz}_{\mathfrak {t}} \end{aligned}$$
(2.13)

where we used (2.9) and (2.10) (with \(\mathfrak {t}:U_{12}\rightarrow G\) replacing g in the latter equation).Footnote 21

Thus we see that, setting \({{\partial }\tilde{U}_1}\simeq S^3\) and denoting \(\textsf{WZ}_{S^3}(g) = \int _{S^3} \textsf{wz}_g\),

$$\begin{aligned} \mathbb Z \ni \textsf{Ch}[P] = \textsf{WZ}_{S^3}(\mathfrak {t}). \end{aligned}$$
(2.14)

This equation is of crucial importance for us. We have not used gauge transformations, and yet, something that “looks like” a gauge-transformation, namely, a transition function, as in (A.3), has appeared in the computation. Now we will verify that the Wess-Zumino invariant related to \(\mathfrak {t}\) cannot change by applying a gauge transformation.

First of all, as discussed in Sect. A, \(\mathfrak {t}\) encodes a topological property of the bundle. It is therefore not to be interpreted as a gauge transformation, but as part of the definition of P. But things are subtle, because—as we summarized in the last paragraph of Sect. A\(\mathfrak {t}\) participates in the definition of P in a way that depends on the choice of gauge, i.e. of sections \(\sigma _\alpha \). As a consequence, under a change in the choice of sections, the transition functions transform according to (A.7):

$$\begin{aligned} \mathfrak {t}\mapsto g_{2}^{-1} \mathfrak {t}g_1. \end{aligned}$$
(2.15)

Thus, the question arises: why does the following equality,

$$\begin{aligned} \textsf{WZ}_{S^3}(\mathfrak {t}) = \textsf{WZ}_{S^3}(g_2^{-1} \mathfrak {t}g_1), \end{aligned}$$
(2.16)

hold?

From a strictly three-dimensional, or boundary, perspective there is no reason why this should be the case. In particular, we could always choose \(g_1 = e \) (the identity of G) and \(g_2\) such that \((g_2)_{|U_{12}}= \mathfrak {t}\), thus apparently trivializing the value of \(\textsf{WZ}_{S^3}\). However, once we take into account the whole domain of definition of the \(g_\alpha \)’s, which extends into the four-dimensional bulk of the two hemispheres, the above choice might simply be unavailable. That is, if \(\mathfrak {t}: S^3 \rightarrow G\) is large in the sense (i) of Sect. 1.3—not connected to the identity—there is no smooth extension of it that goes from the belt overlap \(U_{12}=S^3\) to the chart domain \(U_2 = D^4\). An extension would necessarily have to “break” somewhere inside \(U_2\). Only for \(\mathfrak {t}\)’s connected to the identity will there be a smooth \(g_2\) such that \((g_2)_{|U_{12}}= \mathfrak {t}\).

We can easily perform a proof by contradiction (reductio). For suppose it was possible to smoothly extend such \(g_\alpha \)’s into the interior of their charts. Then, following a radial evolution in the disk \(U_2=D_4\), we would find a g(xr) such that \(g(x,r=1) = \mathfrak {t}(x)\) and \( \lim _{r\rightarrow 0} g(r, x)= g_o\) for all \(x\in S^3\), where \(g_o\) is some fixed element of G. But exploiting this radial parametrization we can define a 1-parameter family of gauge transformations \(\{ h_r(x):S^3 \rightarrow G \, | \, h_r(x) = g(r,x) \}_{r\in [0,1]}\), defined at the intersection \(S^3\), such that \(\textsf{WZ}(h_{r=0}=g_o)=0\) and \(\textsf{WZ}(h_{r=1}=\mathfrak {t})\ne 0\). But this cannot be right: \(\textsf{WZ}(h_r)\in \mathbb Z\), and since one cannot continuously jump between discrete values, \(\textsf{WZ}\) has to be constant on path-connected components of its domain. Let us prove this explicitly (by adding a differentiability assumption): denoting \(h_r(x) = g(r, x)\) and \(\xi _r = \frac{{\textrm{d}}h_r}{{\textrm{d}}r}h_r^{-1} \), we have, for an arbitrary \(r=r_o\),

$$\begin{aligned} \frac{{\textrm{d}}}{{\textrm{d}}r}\textsf{WZ}_{S^3}(h_r) {}_{|r=r_o}= \oint _{S^3} \frac{{\textrm{d}}}{{\textrm{d}}r}\textsf{wz}_{h_r}{}_{|r=r_o} = \frac{1}{24\pi ^2}\oint _{S^3} {\textrm{d}}\; \text {tr}( {\textrm{d}}\xi _{r_o} \wedge h_{r_o}^{-1} {\textrm{d}}h_{r_o}) = 0 \end{aligned}$$
(2.17)

where the second equality follows from (2.10).

The point is that any smooth map \(g_\alpha (x,r)\) from the 4-disk \(D^4\) into G—a gauge transformation according to (i)Footnote 22—automatically provides through “radial evolution” a homotopy of maps \(h_r(x) = g_{\alpha }(r,x):S^3\rightarrow G\) between a constant function \(h_{r=0}(x) = \lim _{r\rightarrow 0} g_{\alpha }(r,x) = g_o\) (at the central point) and its boundary value \(h_{r=1}(x) = g_{\alpha }(r=1,x)\). Or, in other words, the boundary value of any gauge transformation \(g_{\alpha }(x, r=1)\) on such charts must be connected to the identity.

And \(\textsf{WZ}_{S^3}(h)\) computes a “winding number” of the map \(h: S^3 \rightarrow G\); this is a topological quantity that cannot be undone by a smooth deformation of h. It follows from the above that a gauge transformation cannot change the winding number at the boundary. That is, the boundary value of a bulk gauge transformation \(g_\alpha \) must have trivial winding number as a map from \({\partial }U_\alpha \rightarrow G\), i.e. \(\textsf{WZ}_{S^3}(g_\alpha {}_{|{\partial }U_\alpha }) \equiv 0\). This of course means that \(\mathfrak {t}\) and \(g_2^{-1}\mathfrak {t}g_1\) are in the same homotopy class as maps from \(S^3\) into G, and therefore have the same winding number, as per equation (2.16).

Therefore, we conclude that in the simple case analyzed here, the second Chern number of the bundle \(\pi :P\rightarrow S^4\) is fully encoded into the winding number of the “equatorial” transition function \(\mathfrak {t}: S^3 \rightarrow G\). This winding number is an intrinsic property of \(\mathfrak {t}\) that cannot be changed by any gauge transformation.

So far we have discussed bundles on manifolds without boundaries. But to satisfactorily vanquish all doubts about gauge-invariance, we should also guarantee that it emerges when the \(\theta _{\text {\tiny YM}}\)-term is expressed not at intersections, but at boundaries. This is only possible when the curvature vanishes at the boundary; e.g. asymptotically. We now turn to this.

3 Manifolds with Boundaries

In the first Section, 3.1, we will examine Chern classes within a single bounded, Euclidean manifold and its relation to the Chern-Simons and Wess-Zumino functionals. In Sect. 3.2 we briefly examine the Lorentzian case, with two boundaries, one asymptotic past Cauchy surface and one asymptotic future one. (Like most of the literature (e.g. [4, p.454-455]), we neglect spatial boundary terms at infinity (on which A is supposed to vanish).) The Chern class then gives a difference of past and future Chern-Simons terms, (naively) representing a transition between different vacua of the theory. In Sect. 3.3, we briefly discern the meaning of non-trivial bundle topology viz. the meaning of individual winding numbers.

3.1 In Euclidean Signature

Setting aside an exhaustive treatment of fibre bundles over manifolds with boundaries, which goes beyond the scope of this article, we will content ourselves with discussing what happens first for \(M\cong D^4\) with a boundary \(S^3\), and then for \(M \cong \mathbb R^4\) complemented with its asymptotic boundary \(B^3_\infty \cong S^3\).

First, we recall that gauge transformations on \(D^4\) induces gauge transformations on \({\partial }D^4=S^3\) that are necessarily connected to the identity (as 3d objects). Armed with this fact, we can already see why our conclusions of gauge-invariance will hold in the bounded case: even if different enough A’s give different Chern-numbers (since they may yield different Chern-Simons terms at the boundary, according to (2.8)), such A’s would not be related by a gauge transformation, as guaranteed by equation (2.12). This proof was easy, but it doesn’t yet get to the bottom of the puzzle, which we can only articulate when expressing such integrals in terms of winding numbers, i.e. Wess-Zumino functionals. And for that, we need boundary conditions guaranteeing that the curvature vanishes,Footnote 23 which we can treat jointly with the asymptotic case.

Topologically, the space \(M \cong \mathbb R^4\) is justFootnote 24 a 4-disk, and we denote it \(\mathbb R^4_\infty \cong D^4\) to emphasize the addition of a sphere at infinity, \({\partial }\mathbb R^4_\infty = B^3_\infty \cong S^3\). The simple remark that \(D^4\) constituted one of two hemispheres in the previous discussion will become useful later.

The gain is that, now, a single chart covers the whole space; the loss is that this raises a puzzle: without any need for a transition function, what is left of the previous arguments we applied for the \(\textsf{WZ}\) term?

As standard, we start by requiring that the field strength vanishes sufficiently fast at infinity to render the Yang-Mills action, supplemented by the \(\theta _{\text {YM}}\) term, finite. This implies in particular that the gauge potential must approach a curvature-free configuration at infinity:

$$\begin{aligned} A \xrightarrow {x\rightarrow \infty } h^{-1} {\textrm{d}}h \quad \text {for some}\quad h : B^3_\infty \cong S^3 \rightarrow G. \end{aligned}$$
(3.1)

Note that this h need not be seen as a gauge transformation—vanishing curvature guarantees (3.1)—and thus a characterization as “pure gauge” can be misleading. For such an h may still ‘wind around’ the boundary, in which case A cannot be of the form \(A= g^{-1} {\textrm{d}}g \) throughout the region. That is, an A that has non-trivial winding number at the boundary must have curvature in the bulk.Footnote 25

For such an A, from (2.8) and (2.10) one has:

$$\begin{aligned} \int _{\mathbb R^4_\infty } \textsf{ch}_A= \int _{B^3_\infty }\textsf{wz}_{B^3_\infty } (h). \end{aligned}$$
(3.2)

(we avoid the Chern-number notation, \(\textsf{Ch}\), because we do not have a closed base manifold, this preferrence will be maintained in what follows). Again, we know that no gauge transformation—which by definition must be extendible into \(\mathbb R^4_\infty \)—can be large at the boundary, nor can it change the local value of \(\textsf{ch}_A\), and therefore none can change the value of either of the integrals above. This quantity is therefore fully gauge-invariant, just as the left-hand side shows manifestly.

Intriguingly, even in this, single-boundary case, the Wess-Zumino invariant is still an integer! Of course, had we computed the quantity \(\int \textsf{ch}_A\) with arbitrary boundary conditions, we can get any (gauge-invariant) quantity, depending on the boundary conditions. \(\textsf{WZ}_{B^3_\infty }(h)\) is valued in the integers because of the asymptotic conditions required on the gauge potentials, which are necessary for the integral to converge. As before, this integer counts how many times the boundary map \(h:S^3 \rightarrow G\) winds around the group.

A deeper reason why this integral still yields an integer is that, due to the boundary conditions, it can be recast as an integral over a closed manifold, as before. That is, in the Euclidean case being studied here, we can connect the above computations with the previous ones performed for the closed manifold case, at the end of Sect. 2.2. It turns out that given the asymptotic boundary conditions (3.1), there is a “minimal” way to extend the bundle over \(M= \mathbb R^4_\infty \cong D^4\) to a bundle \(\overline{P}\) over a closed manifold \(\overline{M} \cong S^4\) (where we denote the closure by an overbar). Then, with this extension,

$$\begin{aligned} \textsf{Ch}[\overline{P}] = \int _{\mathbb R^4_\infty } \textsf{ch}_A. \end{aligned}$$
(3.3)

To understand \(\overline{P}\), it is enough to observe that the asymptotic boundary conditions (3.1) are just the minimalFootnote 26 requirements to be able to compactify \(\mathbb R^4\) to \(S^4\). If the field strength vanishes at infinity rapidly enough, we can compactify \(\mathbb R^4\) to \(S^4\) by simply adding oneFootnote 27 point at infinity—the North Pole in the stereographic projection of \(S^4\)—and declaring that at this point \(F=0\)—the only value it can assume by continuity. This compactification will take us back to our previously covered example.

3.2 In Lorentzian Signature

But there is still one remaining piece of the puzzle. Much of what we have done is based on an Euclidean-signature intuition for the manifold \(\mathbb R^4_\infty \): the \(\theta _{\text {YM}}\)-term measures the topology of a canonically defined bundle on \(\overline{P} \rightarrow S^4\) and \(\textsf{WZ}_{S^3_\infty }(h)\) measures the winding number of the asymptotic field configuration around the 3-sphere at infinity. Thinking about the Lorentzian case opens new perspectives.

To think about the manifold with Lorentzian signature, we can imagine squishing the boundary at infinity \(B^3_\infty \sim S^3\) from opposite sides, making it look more and more like a ‘thin lens’. This effectively separates the boundary into three components: a past and a future Cauchy surface, \(\Sigma _{\pm }\), and a “celestial sphere” \(S^2_\infty \) at spatial infinity.Footnote 28 Each Cauchy surface supports some (asymptotic) gauge-potential configuration that encodes a classical state of the theory. In our case, these states have half of their support on the northern (southern) hemisphere of \(S^3_\infty \) corresponding to the asymptotic past (future, respectively) Cauchy surfaces.

It is easy to find configurations that are curvature-free at asymptotic past and future infinities, \(\Sigma _{\pm \infty }\). For the same reason as in the previous case,Footnote 29 asymptotic conditions guarantee that the Chern-Simons terms are integer numbers, \(n_\pm \). And due to the fixed orientation of these surfaces, the Chern class gives a difference between these numbers, i.e. \(\int \textsf{ch}_A=n_+-n_-\).

Therefore, in a similar fashion to what we did throughout the paper, we can reconcile the fact that curvature-free boundary states h (3.1) can encode the physical, i.e. gauge-invariant, value of the \(\theta _{\text {\tiny YM}}\)-term—which only depends on the curvature.

To summarize some of these results from different contexts: while it is true that only the curvatures figure in the argument of \(\int \textsf{ch}_A\), this term is only related to Chern-Simons terms on the boundaries of the manifold (cf. (2.8)), and these latter terms do not depend on the curvature. For closed unbounded manifolds, winding numbers appear as differences of Chern-Simons terms at transition patches; for Euclidean bounded manifolds, the boundary is connected and we obtain a single winding-number (that cannot be changed by gauge transformations that properly extend into the bulk); but here, since the configurations are “pure gauge” at disconnected boundaries, we extract winding numbers from each connected boundary Chern-Simons term. The \(\theta _{\text {YM}}\)-term, \(\int \textsf{ch}_A\), will thus be related to a difference of winding numbers due to the inward/outward orientation of the two Cauchy slices with respect to the 4-dimensional bulk.

But, as emphasized after equation (3.1), curvature-free vacuum states with different nontrivial winding numbers,Footnote 30 although perfectly admissible, must include curvature in the bulk. This means that, although the individual boundary winding numbers associated to each boundary are not distinguishable by curvature invariants, transitions between them are. And this is because, crucially, the transition between different curvature-free boundary states with non-trivial winding numbers can never proceed through curvature-free histories.Footnote 31 Within the bulk of spacetime, one has to go through non-vanishing values of F that contribute to \( \textsf{ch}_A\), and values which are uncontroversially encoded in the holonomies.

3.3 Non-trivial Bundle Topology and the \(\theta \)-Vacuum

The quantity \(\int \textsf{ch}_A\) itself is computable even from an eliminativist perspective, since it is fully based on curvature observables encoded e.g. in infinitesimal holonomies. Therefore, even if the eliminativist view is incapable of describing the different, spatial and curvature-free A’s—the different winding numbers,—the integral \(\int \textsf{ch}_A\) could still have physical significance.

A simple comparison can be carried out with the observability of the energy levels of an atom. The energy of a given level—analogously: the winding number of a vacuum state associated to the past component of the boundary—is not a well defined concept, nor a physically meaningful one. Nonetheless the difference between the energies of two different levels are meaningful and physically measurable from the atomic spectra; and these differences are analogous non-vanishing values of the \(\theta _{\text {\tiny YM}}\)-term.

Maybe a more suggestive comparison is the phase of a quantum state in a Hilbert space. Although the phase of a single quantum state is not accessible by measurement (only the state’s ray in Hilbert space is), phase differences between states play a crucial role in quantum mechanics through interference phenomena. Perhaps the closest analogy here is to a Berry phase, where a system described by a certain ray is adiabatically altered and finally brought back to the initial ray. The interesting point is that the initial and final states of the system can have different phases even if they belong to the same ray. The phase difference in this system is encoded in the integral of a quantity over the evolution of the system. In the analogy, the initial flat configuration—corresponding to a ray on Hilbert space—is altered, with curvature being generated, and then it is brought back to the same ‘ray’ or flat configuration: the different winding numbers play the role of the different phases, which is encoded along the 4-manifold.

Indeed, [2, p. 179] makes a very similar analogy:

Models related by a “large” gauge transformation are characterized by different Chern-Simons numbers, and one might take these to exhibit a difference in the intrinsic properties of the situations they represent. But it is questionable whether the Chern-Simons number of a gauge-configuration represents an intrinsic property of that configuration, even if a difference in Chern-Simons numbers represents an intrinsic difference between gauge-configuration. Perhaps Chern-Simons numbers are like velocities in models of special relativity.

These observations then underpin the second role of the \( \theta _{\text {YM}}\)-term. That is, gauge theory allows the existence of distinct boundary states (e.g. initial and final states) that are all curvature-free but labelled by different winding numbers. These boundary states then represent different choices of initial and final vacua for the theory and the \( \theta _{\text {YM}}\)-term can represent, in a semiclassical (“instanton”) approximation, a transition from one such curvature-free boundary state to a different one [15, 16]. That is, as we saw, for asymptotically flat configurations, the Chern number gives a difference between winding numbers, \(\int \textsf{ch}_A=n_+-n_-=:\nu \). If one wants to include configurations with different winding numbers in the path integral, with weight factors \(f(\nu )\) for each sector, one can use the cluster decomposition of expectation values to argue that \(f(\nu )=\exp (i\theta \nu )\), where \(\theta \) is a free-parameter (cf. [4, p. 456]).Footnote 32 Thus the inclusion of the \( \theta _{\text {\tiny YM}}\)-term in the Lagrangian corresponds to allowing a superposition of all winding numbers, and the same parameter in the path integral will be included in the superposition of vacuum states.

Therefore, if the vacuum state can be computed through a path integral, and if this path integral is compatible with the cluster decomposition, one introduces the \(\theta \)-vacuum stateFootnote 33:

$$\begin{aligned} |\theta \rangle = \sum _n e^{i \theta n} |n\rangle \end{aligned}$$
(3.4)

which transforms by a phase under shifts of the winding number. Then, each \(\theta \)-vacuum defines an independent sector of the quantum theory. The existence of the state (3.4) is compatible with both the impossibility of distinguishing vacuum states with different winding number (\(|n\rangle \)) from each other via local observables, as well as with the physical significance of the difference between winding numbers.Footnote 34\(\textsf{CS}(A)\).

One important point to observe from this argument, vis à vis eliminativism, is that it is at least a logical possibility to have a representation of \(\mathcal {L}_\theta \) in the physics and yet have no way of discerning the individual winding numbers entering the \(\theta \)-vacuum. That is, we can talk about transitions by appeal to the bulk properties of curvature, and not by appeal to the difference between boundary winding numbers. Indeed, this is what [2, p. 198] is referring to, when he writes: “there is no possibility of introducing a parameter \(\theta \)”. This quote is the sole evidence that [3] provides for Healey’s belief that the holonomy formalism cannot produce a \(\theta \)-term, but, again, it is mistaken. It takes Healey to be referring to the \(\theta \)-term, and not to the \(\theta \)-vacuum. But Healey is indeed referring to the impossibility of introducing individual winding numbers explicitly,Footnote 35 not to the impossibility of writing the \(\theta \)-term in the action in terms of holonomy variables. Furthermore, Healey’s quote goes on citing [8] to clarify that “from the [holonomy] perspective there is no need to introduce any [\(\theta \)] in the first place [even though in principle] one can introduce an arbitrary parameter \(\theta \) in the [holonomy] representation [...]". However, assessing whether the holonomy framework can offer a viable resolution of the \(U(1)_A\)-puzzle requires the introduction of the matter field and is beyond the scope of our discussion. And, moreover, there are other possibilities. Accounting for certain non-perturbative properties of the quantization of a gauge system [10, Ch. 3], one can provide an explanation of chiral symmetry breaking without either introducing Goldstone bosons nor invoking the topology of P as encoded in the \(\theta _{\text {YM}}\)-term. We discuss this in appendix B.

Here, we should again emphasize: in this paper, our intent was not to examine the full, non-perturbative quantum picture, nor [8]’s claims, nor their relation to [2]’s, and thus we have refrained from assessing the significance of the \(\theta _\text {\tiny YM}\)-term in these respective domains. Our intent was rather to correct a mistake in the treatment of gauge in the semiclassical picture—i.e. whether the \(\theta _{\text {YM}}\)-contribution to the Yang-Mills action is gauge invariant and can be accounted for in an eliminativist frameworkFootnote 36—irrespective of whether this picture, on its own, provides a completely satisfactory account of chiral symmetry breaking or not.

4 Conclusions

4.1 Summary of our discussion

About the eliminative view and the gauge-invariant properties of the \(\theta _{\text {YM}}\)-term, [3, p. 16] concludes:

[I] showed that if the eliminative view were true then the vacuum Yang-Mills \(\theta _{\text {YM}}\)-term [(2.1)] [...] would lead to inconsistency when integrated over any region [...] By Stokes’ theorem it is a matter of mathematical fact that this integral coincides with the integral of \(\textsf{cs}_A\). But this integral varies under large gauge transformations. So if I were to eliminate gauge from the theory then each configuration would be assigned contradictory values for the vacuum Yang-Mills term of the action: one for each class of representative gauge potentials that differ by a large gauge transformation.

Our discussion has explained, qualified, and rectified Dougherty’s statement.

The \(\theta _{\text {YM}}\)-term is manifestly gauge-invariant under all gauge transformations, as shown in Sect. 2. This is just a consequence of the cyclic trace identity and the transformation properties of the curvature—and Stokes’ theorem cannot change this fact.

Nonetheless, we felt it was important to explain some sources of confusion surrounding the \(\theta _{\text {YM}}\)-term. For instance, it may be expressed as Wess-Zumino integrals on gluing surfaces, and the arguments of these integrals look like gauge transformations. So doesn’t that indicate their gauge-variance, contrary to the brute fact mentioned above?

This puzzle is solved once we take into account that the arguments of these integrals on the gluing surfaces are transition functions, and not gauge transformations, and that in fact, non-trivial transition functions cannot be trivialized by gauge transformations. Gauge transformations are smooth, and they are associated to charts of the manifold. These two simple requirements mean gauge transformations cannot affect the value of the integral of \(\textsf{cs}_A\) on the boundary of the manifold: in accordance with the invariance of the Chern number.

Every difference that is attributed, in this loose manner of speaking, to ‘large gauge transformations’, has a gauge-invariant explanation solely in terms of curvature; and holonomies are sensitive to curvature

The same conclusion holds for asymptotic boundaries, for configurations that are asymptotically curvature-free. The only way to obtain a non-trivial winding number at the asymptotic boundary requires a non-vanishing curvature for A in the bulk—A is not a “pure-gauge” configuration. That is how the winding number can be represented by the \(\theta _{\text {YM}}\)-term—which depends only on the curvature. In Lorentzian signature (with appropriate boundary conditions at spacelike infinity) this means that transitions over time between winding numbers must be associated with curvature at some point in time.

[3] equivocates between the invariance of the \(\theta _{\text {YM}}\)-term and the variance of the Chern-Simons \(\textsf{cs}_A\). We have shown that there is no equivocation, since the equality of the two requires \(\textsf{cs}_A\) to be integrated over a boundary, and this quantity does not vary under bona-fide gauge transformations either.

Instead of this explanation for the discrepancy, Dougherty invokes a “size distinction”. The distinction in question is one between gauge transformations that may act solely on the boundary from those whose action on the boundary must be a smooth extension of those acting on the bulk. The relevance of this distinction assumes there is a choice to be made here, on whether to accept gauge transformations as acting solely on the boundary of the manifold or not. Moreover, [3] ties the eliminativist to the more permissive choice, where the action of any group-valued function supported on the boundary—whether a bona-fide gauge transformation or not—is interpreted as a viable gauge transformation. We have shown that this view is mathematically inconsistent. To be as clear as possible: no such choice exists. A size-distinction would lead to two different and incompatible notions of gauge. A boundary transformation that changes the (total) winding number cannot be extended to a bulk transformation that sends one solution of the equations of motion to another—as a gauge transformation would—and therefore this transformation cannot be called ‘a symmetry’, and is thus not an option the eliminativist can embrace.

Now we are equipped to answer [3, p. 16]’s two following rhetorical questions in the conclusions of his paper: “[It is] not enough to simply make an exception for large gauge transformations. Do we make an exception for any gauge transformation that’s nontrivial on the boundary of any region? Only those on the sphere at infinity that also spoil the gauge invariance of the vacuum Yang-Mills term?” We can say, respectively: “No, allow gauge transformations that are non-trivial at the boundary; and yes, we can exclude those that spoil gauge-invariance, but we would do so without making an exception, since the latter are not gauge transformations, and the effect that you attribute to these transformations are perfectly well encoded in the bulk curvature—which is explicitly contained in the holonomies.” Had this not been so, the \(\theta \)-term could never figure in lattice QCD—a formalism that employs holonomies as its basic variables. But of course, these terms frequently appear in this formalism (see [9] and references therein).Footnote 37

While it is true that on a manifold with asymptotic boundaries one can nonetheless use Stokes’ theorem to extract interesting and nontrivial features of the vacuum structure of Yang-Mills theory, none of these features provide a smoking gun against the eliminative view of gauge.

In sum, the eliminativist view, and the holonomy interpretation [2], is perfectly capable of encoding a non-zero \(\theta _{\text {YM}}\)-term in the action functional. Whether it needs to do this to resolve the \(U(1)_A\) puzzle, or whether it has an alternative route as claimed by [8], is a different story, that goes beyond the scope of this paper.

4.2 Against eliminativism nonetheless

Having arrived at the end of this paper, we can smoke a peace-pipe with Dougherty. As tobacco acceptable to both parties, we notice that the most developed understanding of the solution to the U\((1)_A\)-puzzle (i.e. the breaking of chiral symmetry without the introduction of Goldstone bosons), requires the physical significance of structures associated to the existence of the gauge symmetry: be it the role of the fibre bundle topology in the standard semi-classical account, or the role of different connected components of \(\mathcal {G}_3\) in the non-perturbative one. In both cases, the arguments militate against any naive implementation of eliminativism.

More broadly, eliminativism about gauge fields is unwarranted for many reasons, some of which we now briefly summarize. Gauge degrees of freedom simplify mathematical treatments of physical theories by allowing us to write our theories in terms of Lorentz-invariant action functionals (and path integrals): there is no available local Hamiltonian or Lagrangian, even in the Abelian case (i.e. electromagnetism) that employs only electric and magnetic fields.

Moreover, as a guide to theory-building, gauge degrees of freedom are introduced to mandate the local Gauss law: action functionals that employ them automatically ensure both the local Gauss law and charge conservation. At a pedestrian level, they guarantee that the details of the dynamics of the forces that interact with the charges will preserve the conservation of charge [17]. In this sense, gauge degrees of freedom fill an explanatory gap: they guarantee conservation laws and provide a framework by which to build theories that automatically respect these laws.

Fibre bundles provide a yet deeper, geometrical explanation of these degrees of freedom. Fibre bundles—and the connection and its curvature—allow us to formalize the notion that certain properties that are taken as, in a certain sense, “intrinsic”, such as “being a proton”, are in fact relational.

General relativity is relational in a similar way, and, similarly, has a good deal of structure that could be construed as eliminable. But, we would wager, most eliminativists are reluctant to limn that redundant structure (Healey certainly is, cf. [2, Ch. 4.2]). The parallel becomes blatant once we formulate general relativity in terms of connection forms (see footnote 14). As discussed at length by [18], applying the principal fiber bundle formalism to general relativity puts coordinate and gauge transformations on a par. Indeed, the Chern-number can also be calculated for a connection associated to parallel transport of tangent vectors on spacetime, where it bears many of the same properties as the more general Chern-number, associated to parallel transport of general vector bundles over spacetime.

More broadly, viz à viz eliminativism we see no relevant disanalogy between gauge fields and metrics, due to the simple fact that in the spacetime case there is certainly redundancy of mathematical representation (in that case, of geometry through the metric). But there most would agree this redundancy does not warrant a complete elimination of spacetime metrics from our theories. We see no reason to distinguish, in this aspect, gauge and gravitational theories.

We believe empirical signatures of the \(\theta \)-term are certainly compatible with, if not explained by, the reality of certain non-trivial topological, relational properties of the bundle.Footnote 38 Although this is not contrary to eliminativism—as already emphasized the \(\theta \)-term can be computed by means of holonomy variables—the holonomy formalism is certainly not the most perspicuous language in which to articulate these properties.

In sum, gauge degrees of freedom fill an explanatory gap, have a neat relationist interpretation, and are thoroughly warranted if we value consilience with other important theoretical structures of physics, such as Hamiltonians, actions, Lorentz invariance, etc. Demands for their complete elimination from our theoretical description of nature seems to ignore the criteria by which we interpret theories. However, a less sanguine deflation of their ontological status, that ascribes to them only relational status and relies on Leibniz equivalence to count/discern physical possibilities, is warranted. And such a position sits well with a via media position in the debate between spacetime substantivalism and relationism.