1 Introduction

Determining when two theories, models, or formulations of a theory are equivalent to one another (and in what sense) is a significant topic within the philosophy of science (Glymour, 1970; Quine, 1975; Weatherall, 2018). The rationale underlying the attention which has been afforded to this issue presumably has to do with the idea that it is only through understanding these issues of equivalence that one can come to understand how a theory, model, or formulation comes to limn reality. Arguably, the quest for such understanding has also aided scientific progress in the past—examples include the equivalence of wave mechanics and matrix mechanics (von Neumann, 2018; Muller, 1997a, b), Feynmann and Swinger’s approaches to quantum field theory (Dyson, 1949), Lagrangian and Hamiltonian mechanics (Barrett, 2019; Curiel, 2014; North, 2009), the AdS/CFT correspondence (Maldacena, 1998; de Haro, 2021), and many others.

The existing literature on theoretical equivalence is vast, and has focused on developing criteria for—and assessing the conditions under which—particular theories can be understood as being equivalent, as well as applying these criteria to specific examples in order to illuminate our understanding of particular theories and the interconnections that their structures may possess. In a recent discussion concerning the equivalence of Lagrangian and Hamiltonian mechanics, Barrett (2019) sketches an interesting connection between questions of theoretical equivalence and questions concerning the content or structure of a physical theory. While theoretical equivalence is certainly a significant topic within the philosophy of science, van Fraassen (1986) famously considers the question, ‘what is the content of a theory?’, to be the central foundational question of philosophy of science. In identifying this relationship between questions of theoretical equivalence and the content of a theory, Barrett argues that whenever we commit to a method of identifying the content of a theory, we also necessarily commit to a standard of equivalence between theories. The converse also applies because when we commit to a particular standard of equivalence between theories, we are (for Barrett) also saying which features of our theories are significant or ‘contentful’, as these are the very features that our assessment of equivalence will consider.

Within the philosophy of physics, philosophers typically (and justifiably) focus on the dynamical content encoded in the equations of motion as the relevant physical content of a theory. Often, this then (understandably) results in dynamical equivalence being taken to be a sufficient condition for empirical equivalence. Examples include Weatherall (2016) arguing for the theoretical equivalence of the electromagnetic field formulation and the gauge potential formulation of classical electromagnetism (EM) and Knox (2011) arguing for the theoretical equivalence of general relativity (GR) and the teleparallel equivalent of general relativity (TPG). In both cases, the authors maintain that the ‘contentful’ features of the theories in question are fully captured in the theories’ dynamics (or equivalently, their equations of motion), and in doing so adopt a standard of equivalence which holds that equivalent dynamics is sufficient for demonstrating empirical equivalence. For example, when discussing the empirical equivalence of the two formulations of EM, Weatherall (2016, p. 1078, our emphasis) stipulates “[G]iven a Faraday tensor \(F_{ab}\) that satisfies Maxwell’s equations [...] on both formulations, the empirical content of a model is exhausted by its associated Faraday tensor. In this sense, the theories are empirically equivalent [...]”. Similarly, Knox (2011, p. 272) indicates that “the [TPG] Lagrangian above turns out to be identical, up to a divergence, to the Einstein-Hilbert Lagrangian in standard GR [...] the equivalence of the Lagrangians is enough to establish empirical equivalence". As we shall see, while they do not explicitly advocate a particular view of theory structure in their analyses, this standard of empirical equivalence (i.e., equivalent dynamics) nonetheless implies and is broadly consistent with a certain fairly typical version of the semantic view of scientific theories. This view, as usually articulated, holds that a theory’s content is captured by models comprised of the right kinds of mathematical objects, where these objects obey some specified dynamics.

While this is certainly an understandable position given the prominence of dynamics in physical theories, recent decades have seen both philosophers and physicists investigating content that is not entirely determined by a theory’s dynamics—in particular, the content inherent to describing isolated subsystems, their associated boundary conditions, and their relationship to their environments. Recently, philosophers have used isolated subsystems to investigate the empirical significance of gauge symmetries (Greaves & Wallace, 2014; Teh, 2016; Murgueitio Ramírez & Teh, forthcoming; Gomes, 2021; Wolf et al., 2023), explored the important explanatory role that the boundary conditions associated with such isolated subsystems play in mathematical modeling (Bursten, 2021), and considered the implications that boundary conditions might have for our conception of laws and the Humean mosaic (McKenna, forthcoming). Physicists have likewise focused on isolated subsystems and the boundary phenomena associated with them, as can be seen in prominent examples including edge modes in the quantum Hall effect (Wen, 1995), the study of black hole entropy (Gibbons & Hawking, 1977), slip/no-slip boundary conditions in fluid flow (Lauga te al., 2007), and the AdS/CFT correspondence (Maldacena, 1998).

Furthermore, when viewing the content of a physical theory as including the kinds of boundary content associated with isolated subsystems, it becomes clear that an analysis of empirical equivalence that relies upon only dynamics is deficient.Footnote 1 In particular, in this paper we highlight how the analysis in the aforementioned examples from Knox and Weatherall does not account for such boundary phenomena; doing so leads to a verdict that both pairs of theories, as presented by the authors, are in fact empirically (a fortiori theoretically) inequivalent. These results thereby invite the following conclusions:

  1. 1.

    Adjudications of theoretical equivalence cannot be made independently of clearly committing oneself to particular judgments regarding a theory’s relevant content. If one fails to account properly for the contentful features of a theory, one may be left with an adjudication of theoretical equivalence that is incorrect, a view of the theories’ structure that is deficient, or both.

  2. 2.

    The content of physical theories can extend beyond dynamics. Boundary phenomena, boundary conditions, and the modeling of subsystem-environment decompositions are relevant to questions concerning the content of physical theories, and likewise concerning theoretical equivalence, because these items are important for capturing the empirical content of physical theories. Indeed, some philosophers have begun to discuss boundary conditions alongside other elements that are typically invoked when specifying theoretical structure—see e.g.  Greaves and Wallace (2014); Teh (2016).

2 Views on theoretical equivalence

Discussions of theoretical equivalence almost invariably begin with a notion of empirical equivalence. If two theories disagree in terms of the empirical content associated with them, then no further analysis is necessary: they are inequivalent tout court. The reason for this is that empirical goings-on are naturally regarded as supervening on physical goings-on. At a minimum, theories should necessarily have the same empirical content if they are to be considered equivalent. This means that two theories must have the same range of applicability regarding empirical scenarios they describe and provide indistinguishable predictions for the observational phenomena. To be slightly more specific, we can understand that models M of a theory T will have empirical substructures, which can represent observable phenomena. Suppose, for every M of T, there is an \(M'\) of \(T'\), where the empirical substructures of M and \(M'\) are isomorphic. Then, T and \(T'\) can be understood to be empirically equivalent (van Fraassen, 1980).

This is a fairly general way of stating what empirical equivalence amounts to. As we have seen above, showing that two theories possess equivalent dynamical content through their equations of motion is often taken to be sufficient to demonstrate empirical equivalence within the physical sciences. While we do not attempt to provide a fully exhaustive and all-encompassing definition of empirical equivalence (we can be pragmatic about this!—see Section 4.3), one of the goals of this paper is to demonstrate clearly that within the physical sciences there is important content beyond dynamics that should factor into our analyses of empirical equivalence. That is, dynamical equivalence alone is not sufficient to establish empirical equivalence.

Those of a positivist persuasion would consider empirical equivalence to be a sufficient criterion for establishing theoretical equivalence because they would argue that a theory’s meaning and content is exhausted by its empirical consequences. Yet, most subscribe to the idea that empirical equivalence is a necessary but not sufficient condition for theoretical equivalence, because there are meaningful theoretical claims beyond strict empirical consequences, such as two theories differing in regards to “what structure they attribute to the world, what sorts of entities exist in the world, or what the laws of nature are” (Weatherall, 2018, p. 5). This has motivated philosophers to propose further, stronger criteria for establishing theoretical equivalence that go beyond empirical consequences. These can be roughly broken down into formal notions of equivalence and interpretational equivalence. This literature is vast and we make no attempt at a fully exhaustive description of the possibilities.

Definitional equivalence is a formal criterion developed initially by Quine (1975) and Glymour (1970), and captures the idea that two theories should be inter-translatable. This means that one should be able to take all of the vocabulary of theory T, and translate it into the vocabulary of theory \(T'\), and vice versa, in a manner that faithfully preserves the content of each theory. Furthermore, there is generally an idea that these translations between theories should be unique and invertible. Other formal attempts at cashing out equivalence in something like this manner include categorical equivalence and Morita equivalence. Categorical equivalence uses tools from category theory to address situations that seem otherwise to be problem cases for definitional equivalence, such as when transformations between models are many-to-one (Weatherall, 2018). This is the case, for example, when multiple gauge choices in one theory correspond to one model on the other side of the transformation. Morita equivalence attempts to weaken definitional equivalence by providing a notion of equivalence that applies to theories that are formulated using different sorts (i.e., different classes of entities) (Barrett & Halvorson, 2016).

Interpretational equivalence, in contrast with definitional equivalence, seeks to capture the notion that two theories are equivalent when they license all of the same claims about the phenomena they describe, going beyond purely empirical or formal considerations (Coffey, 2014; Teitel, 2021). In other words, theories T and \(T'\) can be understood to postulate the same ontologies and make the same claims about this shared ontology.

With these notions of equivalence on the table, we next move on to analyzing some recent discussions in the philosophical literature surrounding the issue of theoretical equivalence, and to evaluating these respective adjudications of theoretical equivalence for particular theories. The examples we will consider are (i) the equivalence of the Faraday tensor formulation and the gauge field formulation of electromagnetism (Weatherall, 2016), and (ii) the equivalence of general relativity (GR) and the teleparallel equivalent of general relativity (TPG) (Knox, 2011) .

3 Adjudicating theoretical equivalence

3.1 Example 1: Faraday tensor EM and gauge potential EM

We begin with the very familiar example of classical electromagnetism (EM) and consider how it has recently been presented in the philosophical literature on theory equivalence. Weatherall (2016) examines two different formulations of classical EM: EM\(_1\), where electromagnetism is presented in terms of the electric and magnetic fields through the Faraday tensor \(F_{\mu \nu }\), and EM\(_2\), where electromagnetism is presented in terms of the electromagnetic gauge potential \(A_{\mu }\). There is a near-universal consensus amongst both physicists and philosophers that EM\(_1\) and EM\(_2\) are in fact theoretically (hence empirically) equivalent.Footnote 2 However, due to issues that will become apparent shortly, cashing out this equivalence in terms of the criteria for theoretical equivalence that we have discussed is more subtle than one might expect. Thus, the question is not whether these formulations are truly equivalent, but rather how to state this equivalence perspicuously according to plausible criteria that physically equivalent theories should satisfy.

Weatherall proceeds by noting that these two different formulations do not meet the standard criteria for definitional equivalence as proposed by Glymour (1970) due to there being non-isomorphic translations between formulations, and then arguing that one can use categorical equivalence to capture the theoretical equivalence of these formulations. However, before the argument even gets to the point at which we must decide which notion of formal equivalence is suitable for this example, it is important to emphasize that an argument for full theoretical equivalence also necessarily depends upon establishing that models derived from these different formulations really do capture all of the same empirical content and are thus empirically equivalent as well. Within the argument, dynamical equivalence is assumed to be sufficient to demonstrate empirical equivalence, but as we shall see this seems to ignore significant empirical content that is not available in the dynamical equations of motion. Consequently, the argument fails to go through even before we get into the thornier issues surrounding the formal equivalence of these theories. To stress (and to repeat), there is an overwhelming consensus from both physicists and philosophers that EM\(_1\) and EM\(_2\) are empirically equivalent. We are not challenging this, but rather challenging the philosophical criteria used in this analysis because these criteria fail to capture the empirical equivalence that these two formulations readily display within the practice of physics.

Examining Weatherall’s analysis more closely, we find that he utilizes a conceptual framework that is broadly consistent with the semantic conception of scientific theories. We will have more to say on this later, but essentially the standard articulation of the semantic view holds that theories are collections of dynamically equivalent models. Weatherall takes EM\(_1\) to be a theory given by models built out of the objects \(\left<M, \eta _{\mu \nu }, F_{\mu \nu }, J^\mu \right>\), where M is a smooth manifold, \(\eta _{\mu \nu }\) is the Minkowski metric, \(F_{\mu \nu }\) is the Faraday tensor, and \(J^{\mu }\) is the charge density current. These models furthermore must all satisfy the dynamics encoded by Maxwell’s equations

$$\begin{aligned} \nabla _{[\rho }F_{\mu \nu ]}&= 0, \end{aligned}$$
(1)
$$\begin{aligned} \nabla _{\mu } F^{\mu \nu }&= J^{\nu }, \end{aligned}$$
(2)

On the other hand, EM\(_2\) is a theory given by \(\left<M, \eta _{\mu \nu }, A_{\mu }, J^\mu \right>\), where \(A_{\mu } = (\phi , \hat{A})\) is the four-potential vector field. These models likewise satisfy Maxwell’s equations in the form

$$\begin{aligned} \Box A^{\mu } = J^{\mu }, \end{aligned}$$
(3)

where \(\Box \) is the D’Alembertian operator.Footnote 3 Weatherall’s analysis quite understandably holds that these two ‘theories’ or ‘formulations’ of a single theory (whichever you prefer), are empirically equivalent:

Empirical equivalence:

“We stipulate that on both formulations, the empirical content of a model is exhausted by its associated Faraday tensor [that satisfies Maxwell’s equations]. In this sense, the theories are empirically equivalent, since for any model of EM\(_1\), there is a corresponding model of EM\(_2\) with the same empirical content (for some fixed \(J^a\)), and vice versa” (Weatherall, 2016, p. 1078). In other words, EM\(_1\) and EM\(_2\) both share all of the same dynamical content and are thus empirically equivalent.

Assuming that this claim of empirical equivalence goes through, one then naturally proceeds to analyze formal equivalence. As the familiar story goes, these different formulations are very closely related. Given the Faraday tensor \(F_{\mu \nu }\) that satisfies Maxwell’s equations, there is always a vector field \(A_{\mu }\) that also satisfies Maxwell’s equations and satisfies

$$\begin{aligned} F_{\mu \nu } = \nabla _{[\mu }A_{\nu ]}. \end{aligned}$$
(4)

Similarly, given a vector field \(A_{\mu }\) that satisfies Maxwell’s equations, there is always a corresponding tensor \(F_{\mu \nu }\) that satisfies Maxwell’s equations and can be defined via (4) (all of these facts follow from elementary properties of differential forms). As Weatherall notes, however, one cannot find an isomorphism between the spaces of models of these two formulations of classical electromagnetism. Starting with the EM\(_2\) formulation and given a vector potential \(A_{\mu }\), one can uniquely define a Faraday tensor \(F_{\mu \nu }\) in EM\(_1\). Conversely, going in the other direction and given a Faraday tensor \(F_{\mu \nu }\) in EM\(_1\), one cannot uniquely determine a model in EM\(_2\) due to the gauge freedom present in the four-potential \(A_{\mu }\). That is, \(F_{\mu \nu }\) is compatible with infinitely many different \(A_{\mu }\) because (4) will hold for any \(A_{\mu }\) such that

$$\begin{aligned} A'_{\mu } = A_{\mu } + G_{\mu }, \qquad \nabla _{[\mu }G_{\nu ]} = 0 \end{aligned}$$
(5)

(i.e. if \(G_{\mu }\) is a closed one-form). Given that a straightforward application of definitional equivalence is blocked, Weatherall motivates abandoning Glymour’s criterion for definitional equivalence (i.e., for every model in T, there is an isomorphic translation to a model in \(T'\) that preserves all of the same empirical content) in favor of demonstrating an equivalence between categories of models that preserve empirical content. Here, we now understand the models of EM\(_2\) to be \(\left<M, \eta _{\mu \nu }, [A_{\mu }], J^\mu \right>\), where \([A_{\mu }]\) is understood as an “equivalence class of physically equivalent vector potentials" that correspond to the same \(F_{\mu \nu }\) (Weatherall, 2016, p. 1079). Note that this adjustment depends on the argument that EM\(_1\) and EM\(_2\) are actually empirically equivalent, and this equivalence class of vector potentials contains identical empirical information as its counterpart in the corresponding field formulation.

Categorical equivalence:

Categorical equivalence is stated in terms of categories of models that preserve empirical content. Thus, according to Weatherall’s construction, we can translate between models of EM\(_1\) and EM\(_2\) and their respective vocabularies in a manner that faithfully preserves empirical content, provided that EM\(_2\) is redefined such that \([A_{\mu }]\) is an equivalence class of vector potentials that lead to the same \(F_{\mu \nu }\). Then, we have \(\left<M, \eta _{\mu \nu }, F_{\mu \nu }, J^\mu \right> \Longleftrightarrow \left<M, \eta _{\mu \nu }, [A_{\mu }], J^\mu \right>\), and this further notion of formal equivalence is then used to argue that both formulations are theoretically equivalent (Weatherall, 2016, p. 1083).

While it is certainly correct that all models in both formulations possess the same dynamical content, as we shall soon see, this does not mean that they necessarily share all of the same empirical content. Consider a simple environment-subsystem decomposition that includes a basic Faraday cage, described by a finite spatial subsystem region with a surface boundary \(\partial M\). The Faraday cage is a perfect electrical conductor, meaning that it effectively shields the subsystem from electromagnetic fields in the environment and any electric charge is accumulated on the boundary in the form of a surface charge \(\sigma \). When decomposing the environment and subsystem, boundary conditions delineate the subsystem from the environment. There are two relevant boundary conditions in this example: (1) \(E_{\parallel } = 0\), meaning that the electric field vanishes on the boundary in the direction parallel to the surface and (2) \(E_{\perp } = 4\pi \sigma \), meaning that the electric field is proportional to the surface charge in the direction perpendicular to the surface.Footnote 4 Let us now consider EM\(_1\) and EM\(_2\) models of the Faraday cage subsystem.

Beginning with the Faraday tensor formulation EM\(_1\), this construction in terms of the electric and magnetic fields will lead to the conclusion that the Faraday tensor describing the subsystem is always zero. This is simply a consequence of the fact that regardless of what the external electric and magnetic fields are, the conducting boundary will always arrange the surface charge \(\sigma \) to cancel the effect of the external fields. Thus, \(F_{\mu \nu } = 0\) inside the cage regardless of facts about the external fields and surface charge. By contrast, the gauge field formulation EM\(_2\) shows that the gauge potentials describing the subsystem will instead be constant, which is a result of the simple fact that the Faraday tensor is zero and any points lying inside the conductor must then lie at the same potential. While so far this is what we expect of potentials that lead to \(F_{\mu \nu } = 0\), it is also true that specifying the scalar electric potential \(\phi \) on the boundary (and thus the potential for the entire interior) uniquely specifies the surface charge \(\sigma \) on the boundary (Zangwill, 2012, p. 200).Footnote 5 Furthermore, in general one can fully construct a solution for \(\phi \) for both the subsystem and exterior in terms of the surface charge \(\sigma \).Footnote 6

How do these considerations influence our verdict on the empirical equivalence of these two formulations? EM\(_1\) treats the Faraday tensor as the fundamental object of interest. The same Faraday tensor \(F_{\mu \nu }\) within the isolated subsystem could potentially correspond to two empirically distinct surface charges \(\sigma _1\) and \(\sigma _2\) (in fact, it corresponds to infinitely many distinct surface charges!). However, EM\(_2\) treats gauge potentials as the fundamental objects of interest. Once we construct the gauge potential \(\phi \), it will always distinguish between \(\sigma _1\) and \(\sigma _2\) because specifying the potential uniquely specifies the surface charge. To be completely explicit, let us adopt Weatherall’s initial characterization of EM\(_1\) and EM\(_2\) as models given by \(\left<M, \eta _{\mu \nu }, F_{\mu \nu }, J^\mu \right>\) and \(\left<M, \eta _{\mu \nu }, A_{\mu }, J^\mu \right>\), respectively.Footnote 7 Furthermore, let us say that we are interested in an empirical description of a Faraday cage with a surface charge \(\sigma _1\). On Weatherall’s characterization, EM\(_1\) corresponds to \(\left<M, \eta _{\mu \nu }, 0, J^\mu \right>\) and EM\(_2\) corresponds to \(\left<M, \eta _{\mu \nu }, \phi _s(\sigma _1) , J^\mu \right>\), where \(\phi _s\) is the scalar potential for the subsystem. This EM\(_1\) description could correspond to infinitely many subsystems all with different surface charges because they will all lead to \(F_{\mu \nu }=0\), whereas the EM\(_2\) description uniquely describes the subsystem with the particular surface charge we are considering here. In other words, the U(1) gauge orbit of models within EM\(_2\) each make a specific assertion about the surface charge because the potential \(\phi \) fixes the surface charge uniquely. The dynamically equivalent counterparts within EM\(_1\) are completely silent on the matter. Consequently, the model of the subsystem in EM\(_2\) has the information necessary to model empirical facts about boundary phenomena and the external environment, empirical information that the model of the subsystem in EM\(_1\) simply does not have when we hold that the empirical content of the theory is given exclusively in terms of those particular mathematical objects and their dynamics. On this reading, these descriptions of the subsystem are not empirically equivalent because they do not carry the same empirical information about the target system, nor can the same empirical consequences be deduced from them.Footnote 8

It is important to emphasize here that we are not arguing against the actual empirical equivalence of the Faraday tensor and gauge field formulations of electromagnetism. Essentially, a physicist can deduce the same empirical claims about such a subsystem from both formulations as using these formulations in practice involves specifying further items (like the boundary conditions and their relationship to the surface charges) that are necessary to build the electromagnetic fields and potentials relevant to describing the system. Rather, we are arguing that the philosophical criterion for evaluating empirical equivalence in terms of dynamics alone is insufficient to account for the equivalence of EM\(_1\) and EM\(_2\). Indeed, this view leads to the conclusion that EM\(_1\) and EM\(_2\) (when stated as consisting of models \(\left<M, \eta _{\mu \nu }, F_{\mu \nu }, J^\mu \right>\) and \(\left<M, \eta _{\mu \nu }, A_{\mu }, J^\mu \right>\) respectively) are not equivalent because there is significantly more empirical information contained within a model specified as \(\left<M, \eta _{\mu \nu }, A_{\mu }, J^\mu \right>\). Here, the attempt to demonstrate theoretical equivalence gets tripped up, not in the nuanced technicalities surrounding formal notions of equivalence, but rather in the more mundane issue of empirical equivalence. This suggests that dynamical equivalence is not sufficient for empirical equivalence and that there is further empirical information that must be added to secure a verdict of equivalence. As we shall argue in §4.2, there is a relatively straightforward way of modifying our philosophical view of theory structure which can bring the empirical claims from these formulations into alignment (again from the perspective of the philosophical criteria we are employing) and restore the near-universal intuition that these formulations are in fact equivalent tout court.

3.2 Example 2: GR and TPG

The example above is not the only instance from the recent philosophical literature where there has been a proclaimed equivalence between two theories that relies on understanding dynamical equivalence as being sufficient for complete empirical equivalence. This has also come up in the context of general relativity (GR) and the teleparallel equivalent of general relativity (TPG) (Knox, 2011).

Both GR and TPG are theories of gravitation, but they differ in a number of ways. The most obvious is that rather than using the curved, symmetric Levi-Civita connection \(\Gamma ^{\rho }_{\mu \nu }\), TPG uses the Weitzenböck connection \(\dot{\Gamma }^{\rho }_{\mu \nu }\), which has non-vanishing torsion and vanishing curvature.Footnote 9 That is, rather than expressing gravity as a manifestation of spacetime curvature as GR does, TPG holds that gravity is a manifestation of spacetime torsion. TPG views gravity as a force because torsion directs bodies experiencing gravitation away from geodesics, as opposed to the situation in GR, whereby bodies experiencing gravitation follow the geodesics resulting from spacetime curvature. Furthermore, TPG is usually formulated in terms of tetrads \(e^a_\mu \), rather than a metric \(g_{\mu \nu }\). Tetrads, or frame fields, are sets of four linearly independent fields \(e^a = e^a_\mu dx^\mu \) that at each point p of a differentiable manifold M specify a basis for the tangent space \(T_p M\).Footnote 10 TPG uses frame fields \(h^a_\mu = e^a_\mu + B^a_\mu \) that are constructed to be invariant under local translations \(x^a \mapsto x^a + \epsilon ^a\), where \(B^a_\mu \) is the translation gauge potential. This gauge potential transforms as \(\delta B^a_\mu = -\partial _\mu \epsilon ^a\) so as to make the frame field invariant under such local translations. It is for this reason that TPG is often declared to be a gauge theory of the translation group (Aldrovandi & Pereira, 2013).Footnote 11

GR and TPG are seemingly very distinct theories, constructed using different mathematical structures—but Knox (2011) has argued that GR and TPG should in fact be understood as being equivalent to one another. She argues for this conclusion based upon: (i) the establishment of dynamical equivalence (and thus, for her argument, empirical equivalence) and definitional equivalence between the two theories, and (ii) an interpretation of TPG that holds that both TPG and GR actually postulate the same underlying spacetime structure despite the surface level appearances, which appears to be motivated by her advocacy of spacetime functionalism. As before, theoretical equivalence is taken to be a combination of demonstrating empirical equivalence, along with some stronger notions of equivalence that demonstrate clear formal relations between the theories or resolve interpretive issues such that we can understand both theories as making the same claims about the target phenomena. While the spacetime functionalist component of her argument certainly brings up a host of interesting issues, this is not the place to fully adjudicate the interpretational issues she raises regarding TPG and GR. However, we would like to focus specifically on the discussion of empirical equivalence between the theories.

The claim that TPG and GR are empirically equivalent is motivated by appealing to actions used in each theory,

$$\begin{aligned} S_{TPG}&= \frac{1}{16 \pi G} \int d^4x {h} T,&S_{GR}&= \frac{1}{16 \pi G} \int d^4x \sqrt{g} R, \end{aligned}$$
(7)

where h is the determinant of the tetrad, T is the torsion scalar defined as

$$\begin{aligned} T := \mathcal {S}_\rho ^{\mu \nu } T^\rho _{\mu \nu }, \end{aligned}$$
(8)

\(\mathcal {S}_\rho ^{\mu \nu }\) is the so-called superpotential tensor, \(T^\rho _{\mu \nu }\) is the torsion tensor, g is the determinant of the metric, and R is the Ricci scalar. The superpotential tensor is built out of the torsion tensor and the so-called contorsion tensor, which is defined as

$$\begin{aligned} K^{\rho }_{\mu \nu } := \Gamma ^{\rho }_{\mu \nu } - \dot{\Gamma }^{\rho }_{\mu \nu }, \end{aligned}$$
(9)

where we see that it is simply the difference between the Weitzenböck connection, \( \dot{\Gamma }^{\rho }_{\mu \nu }\), and the Levi-Civita connection, \(\Gamma ^{\rho }_{\mu \nu }\). This is significant because this allows one to translate between the mathematical structures of the teleparallel theory and those of general relativity. One can use this to re-write the TPG action in the language of GR asFootnote 12

$$\begin{aligned} S_{TPG} = \frac{1}{16 \pi G} \int d^4x \sqrt{g} R + \frac{1}{8 \pi G} \int d^{4}x \sqrt{g} \nabla _\mu T_{\alpha }^{\alpha \mu }. \end{aligned}$$
(10)

This shows that the TPG action is identical to the Einstein-Hilbert action of GR plus a total divergence term, which ensures that these actions both lead to the same dynamical equations of motion. On the basis of these observations, Knox makes three arguments regarding the equivalence of GR and TPG:

Empirical equivalence:

The equivalence of the actions up to a total divergence term, which indicates that they both share equivalent equations of motion, guarantees the empirical equivalence of TPG and GR (Knox, 2011, p. 272).

Definitional equivalence:

The relationship between the Levi-Civita connection and the Weizenböck connection allows us to directly translate between GR and TPG and vice versa. While definitional equivalence is not explicitly mentioned in her argument, this is a clear appeal to a similar notion of equivalence. Anything we express in the language of GR can be equivalently expressed in the language of TPG and vice versa in a way that preserves the content of each theory. For example, we have already seen how one moves between different connection coefficients and translates between spacetime curvature and torsion, but one can similarly translate between the frame fields of TPG and the metric of GR as \(g_{\mu \nu } = \eta _{ab}h^a_\mu h^b_\nu \), where \(\eta _{ab}\) is the Minkowski metric (Knox, 2011, p. 272).

Interpretational equivalence:

TPG and GR both encode the same spacetime structure, upon adopting spacetime functionalism (which, for Knox, is the view that spacetime structure is whatever identifies a class of local inertial frames—for critical discussion of this view, see e.g. Read and Menon (2021)), and thus can be understood as licensing the same claims about the phenomena they describe (Knox 2011, p. 273).

The argument that the actions are empirically equivalent hinges on the ability to throw away the total divergence term present in (10). Once this term is discarded, the actions are equivalent full stop and the argument for definitional equivalence goes through as well because these terms can be safely ignored when making these kinds of translations between TPG and GR. But why can this total divergence term simply be thrown away?

When discussing a particular theory whose content is captured by an action S, typically one takes the empirical content of that theory to be derived from a variational principle.Footnote 13 The ‘principle of least action’ is a variational principle which holds that the variation of the action is held fixed when the equations of motion—i.e., the dynamics—of the system are satisfied. Consider the simple textbook example of a free massive particle in motion where our variables are position q(t) and velocity \(\dot{q}(t)\) and the action is given by \(S = \int _{t_i}^{t_f} L[q, \dot{q}, t] dt\):

$$\begin{aligned} \delta S=\int _{t_{i}}^{t_{f}}\left[ \frac{\partial L}{\partial q}-\frac{d}{d t}\left( \frac{\partial L}{\partial \dot{q}}\right) \right] \delta q d t+\frac{\partial L}{\partial \dot{q}}\left( t_{f}\right) \delta q\left( t_{f}\right) -\frac{\partial L}{\partial \dot{q}}\left( t_{i}\right) \delta q\left( t_{i}\right) =0. \end{aligned}$$
(11)

Here we find the familiar Euler-Lagrange equations of motion in the first term. However, we also have two further terms which are the result of a total divergence that appears after the integration by parts necessary to write the Euler-Lagrange equations in their standard form. In this case we are simply concerned with the motion of a particle between two fixed end points, \(\delta q (t_i)\) an \(\delta q (t_f)\). These remaining terms thus automatically go to zero, leaving just the dynamics of our system captured in the first term. These total divergence terms do not affect the underlying dynamics of the system; furthermore, it is important to emphasize that any terms like this must vanish for there to be a well-defined variational principle at all, as a proper functional derivative could not be defined otherwise.

Given that we typically throw away total divergence terms because we know that they have to vanish anyway, our work, apparently, is done. The TPG action encodes the same dynamics as the GR action, so the equations of motion will be the same and we are left to choose the language in which to express them: the force equations of TPG or the geodesic equations of GR. That is,

$$\begin{aligned} \delta S_{GR} = \delta S_{TPG} = \frac{1}{16 \pi G}\int d^4x\sqrt{-g}G_{\mu \nu }\delta g^{\mu \nu }, \end{aligned}$$
(12)

where \(G_{\mu \nu }\) contains the dynamical equations of motion. Thus, “the equivalence of the Lagrangians is enough to establish empirical equivalence" (Knox, 2011, p. 272). The question of the theoretical equivalence between TPG and GR then hinges only upon the interpretive questions.

When doing GR, we often consider manifolds without boundary. This guarantees that the total divergence term in (10) is zero because Stokes’ theorem allows us to convert a total divergence term into a boundary term. In the event that there is no boundary, this term vanishes automatically. For example, this is exactly what is done in using GR to model cosmological solutions as we are attempting to model the entire universe and its contents filling an infinite space. What if we wanted to model some isolated subsystem instead? Consider an isolated subsystem \(\mathcal {S}\) that is being modelled with respect to an external environment \(\mathcal {E}\). For example, we might be interested in describing the mass-energy content of a region of spacetime, such as the mass-energy content contained within a black hole, as defined by an external observer who is sufficiently far away so that they do not interact with any of the relevant gravitational or material fields. In this event, it is not appropriate to consider manifolds without boundary. Rather, the manifold M must have a boundary \(\partial M\) along with appropriate boundary conditions to properly describe a subsystem \(\mathcal {S}\) isolated from its environment \(\mathcal {E}\). Total divergence terms such as the one we have considered then cannot be automatically discarded and generally will not vanish.

When considering the Einstein-Hilbert action in the presence of the boundary \(\partial M\), such residual total divergence terms are indeed present and we must find appropriate boundary conditions to render this a well-defined variation.Footnote 14 Here, it is natural to consider Dirichlet boundary conditions, \(\left. \delta g_{\mu \nu }\right| _{\partial M}=0\), as these boundary conditions are often used in the context of asymptotically flat spacetimes. These are spacetimes that approach flatness \(g_{\mu \nu } \rightarrow \eta _{\mu \nu }\) at null-infinity and are particularly significant for a number of reasons. Here is Penrose (1982) on the issue:

Asymptotically flat spacetimes are interesting, not because they are thought to be realistic models for the entire universe, but because they describe the gravitational fields of isolated systems, and because it is only with asymptotic flatness that general relativity begins to relate in a clear way to many of the important aspects of the rest of physics, such as energy, momentum, radiation, etc.

That is, in the asymptotic regime we can clearly define critical, empirically relevant concepts such as mass, energy, and momentum, and relate them to these concepts as they are understood in other realms of physics. (In brief: in the asymptotic regime, one has Killing fields, with which one can associate conserved quantities in a well-understood way: see e.g. de Haro (2021).)

Upon imposing Dirichlet boundary conditions \(\left. \delta g_{\mu \nu }\right| _{\partial M}=0\), we find that there is a problem. There are multiple boundary terms and it is only the term that depends on the tangential derivatives of the metric that vanishes, while another term that depends on the normal derivatives survives.Footnote 15 This is because Dirichlet boundary conditions fix only the values of the metric of the boundary, but this does not necessarily require that the derivatives of the metric vanish. In other words, the variation of this action does not yield a well-defined variation and cannot be used to represent or model isolated subsystems of the type that Penrose refers to in his description of asymptotically flat spacetimes. This is closely related to what Belot (2018) observes when he notes that two isomorphic solutions in GR do not always represent the same physical possibilities. For example, he notes that while cosmological solutions and asymptotically flat solutions are isomorphic dynamically, they (obviously) do not represent the same physical possibilities. The boundary conditions imposed for each solution are physically relevant facts! This discussion of the variational problem in GR reveals that the Einstein-Hilbert action only has the resources to represent one of the two physical possibilities we have mentioned (cosmological solutions), and that we need to look elsewhere to represent asymptotically flat solutions. We see that even within GR, dynamically equivalent solutions do not necessarily represent the same physical possibility. Thus, merely demonstrating the dynamical equivalence between a GR action and a TPG action likewise would not necessarily indicate that the two theories are physically equivalent.

This indicates the importance of boundary conditions in specifying the content of our theory and the scope of the empirical scenarios and target systems that our models and theories can represent. Let us now compare the analogous scenario in TPG to see how the teleparallel theory fares in describing isolated subsystems with aysmptotic characteristics.

Amazingly, upon varying the TPG action and imposing Dirichlet boundary conditions, we find that the TPG action indeed does have a well-defined variation (Oshita & Wu, 2017). The variation of the additional boundary term that distinguishes the TPG and Einstein-Hilbert actions ensures that the total variation is well-defined for asymptotic spacetimes because the additional terms perfectly cancel out the previously problematic terms.Footnote 16

The reason for this can be traced to the fact that the TPG action contains only first derivatives of the frame fields, whereas the Einstein-Hilbert formulation contains second derivatives of the metric. The additional boundary term effectively removes the second derivatives of the metric that fail to vanish when working with the Einstien-Hilbert action.

This TPG action functions perfectly well for describing such isolated subsystems. As this specific argument for theoretical equivalence is presently formulated (relying on the dynamical equivalence of two different actions), TPG and GR are not empirically equivalent—and so, per the above, should not be regarded as being equivalent, full stop. Under this articulation, these theories do not even have the resources to model all of the same target systems, much less discuss whether one can compare the empirical consequences derived from them for said target systems. As before, the account of empirical equivalence gets tripped up when considering isolated subsystems and it seems like merely showing the dynamical equivalence of models derived from particular actions used in the respective theories is not enough to ensure that they can support the same empirical claims. For anyone who may understandably be perturbed by the thought that GR cannot describe such systems: do not worry. This will be addressed in §4.2, where we will argue that we can make an argument for the empirical equivalence of GR and TPG. However, as is also the case with the example of electromagnetism, this will require modifying our philosophical criteria concerning theory structure and considering empirical information that goes beyond mere dynamical equivalence between models.

4 Views on theory structure

What is happening here? We have two fairly prominent examples of arguments for the theoretical equivalence of the respective theories considered in these examples. One of these examples (TPG and GR) is more contentious given the extent of the interpretive arguments that need to be made to secure interpretational equivalence, but the other (Faraday tensor and vector potential formulations of EM) is fairly uncontroversial. Yet, as articulated, these arguments for theoretical equivalence cannot even support claims of empirical equivalence for these respective theories. Something has clearly gone wrong!

Perhaps it is the way in which the theories have been stated that has disrupted these claims of empirical equivalence. After all, in making an adjudication of theoretical equivalence, it is certainly important to correctly specify the empirical content contained by a theory. Views on the structure of scientific theories can be roughly broken down into three camps: the ‘syntactic’, ‘semantic’, and ‘pragmatic’ views. The syntactic view seeks to axiomatize a theory in terms of abstract mathematical sentences. The semantic view casts a theory in terms of models and the kinds of mathematical objects that comprise these models. While the syntactic view was initially dominant as it emerged first as an outgrowth from logical empiricism, van Fraassen has prominently advocated for the semantic view by arguing that the semantic view, with its focus on models, can often more simply demonstrate the logical claims of a theory than a set of axioms.Footnote 17 Furthermore, he argues that the semantic view is a far more comprehensive and useful tool because it avoids the restrictions inherent to describing a theory in a particular axiomatic language, and allows us to conceptualize the objects and classes of structures that comprise a model in terms of a variety of valid, non-unique descriptions (van Fraassen, 1980, p. 43–4). Finally, the pragmatic view is a more recent perspective that emphasizes representational aims, model pluralism, scientific practice, and other non-formal characteristics (Cartwright, 1983; Hacking, 1983; Kitcher, 1993; Winther, 2021).

In this article, we will focus on viewing these adjudications of theoretical equivalence through the lenses of both the semantic and pragmatic views. While neither Weatherall nor Knox explicitly advocates a particular view of theory structure, both authors’ focus on dynamics and models reflects at the very least a straightforward consistency with fairly standard articulations of the semantic view, making this a natural place to start. Regarding the syntactic view, it should be noted that nothing automatically precludes a discussion in syntactic-friendly terms; however, these authors do not engage with this approach in any obvious way, so likewise we will not do so here. And finally, we will engage with the pragmatic view, as its focus on model pluralism and scientific practice is particularly relevant for the questions at hand and arguably can shed some light on these adjudications of equivalence.

4.1 The semantic view

The semantic view of theories holds that a theory is individuated via classes of models. One modern way of expressing the semantic view is to say that a theory \(\mathcal {T}\) has a set of ‘kinematically possible models’ \(\mathcal {K}\) (KPMs), defined by tuples of the form \(\left<O_i, ... O_n\right>\), where these \(O_i\) are mathematical objects, e.g. tensor fields on a differentiable manifold. Furthermore, these objects come with a set of particular dynamical equations that define the relationships and interactions between the \(O_i\). KPMs that satisfy these dynamical equations form a subspace \(\mathcal {D} \subset \mathcal {K}\) of KPMs known as the ‘dynamically possible models’ (DPMs). In other words, “the KPMs can be thought of as representing the range of metaphysical possibilities consistent with the theory’s basic ontological assumptions. The DPMs represent a narrower set of physical possibilities” (Pooley, 2013, p. 532). This dynamical content is then understood to capture the empirical content of the models that comprise the theory, via what van Fraassen calls the ‘empirical substructures’ of each of these models (van Fraassen, 1980, p. 45).

It is clear that Weatherall draws from this framework in his analysis. For example, his descriptions of EM\(_1\) and EM\(_2\) as theories with associated respective classes of models \(\left<M, \eta _{\mu \nu }, F_{\mu \nu }, J^\mu \right>\) and \(\left<M, \eta _{\mu \nu }, A_{\mu }, J^\mu \right>\) identifies the relevant KPMs, where his specification that these models obey Maxwell’s equations identifies the particular DPMs that correspond to the theories in question.

While the utilization of the standard semantic view is not as obvious in Knox,Footnote 18 it is clear that something like this is being supposed in her identifying the theory of GR with the empirical content contained within the Einstein-Hilbert action. Recall that in her argument it is the local equivalence of the two actions that cements the case for empirical equivalence, which really is just the statement that both theories share the same dynamical content when the actions are varied per standard variational principles. In identifying the Einstein-Hilbert action as capturing GR’s content and adjudicating the empirical equivalence of GR and TPG based on the dynamical equivalence of these actions, there is a naturally consistency with the standard semantic expression of GR in the philosophical literature.

In more detail: in the above-introduced model-based language (Pooley, 2013, 2015), GR is usually given by KPMs of the form \(\left<M, g_{\mu \nu }, \Phi \right>\), where (again) M is a smooth, four dimensional differentiable manifold, \(g_{\mu \nu }\) is the metric tensor field on M, and \(\Phi \) represents the matter fields of the theory. The DPMs of GR are the subset of the KPMs that obey the Einstein equation, which is given by

$$\begin{aligned} G_{\mu \nu } = 8\pi T_{\mu \nu }, \end{aligned}$$
(15)

where

$$\begin{aligned} G_{\mu \nu } := R_{\mu \nu } - \frac{1}{2}Rg_{\mu \nu } \end{aligned}$$
(16)

is the familiar Einstein tensor and \(T_{\mu \nu }\) is the stress-energy tensor. For Knox, the Einstein-Hilbert action contains all of these objects in which we are interested and which comprise the kinematic possibilities of GR; varying this action isolates the dynamical possibilities. We could likewise identify TPG with the KPMs \(\left<M, e^a_\mu , \Phi \right>\), whose DPMs are the subset of KPMs that also obey the Einstein field equations (written in terms of the primitive objects of TPG, i.e. the objects specified in the KPMs of that theory).

Read through this lens, both Weatherall and Knox are operating within a framework whereby they are identifying the relevant empirical content of the theories they are interested in with the dynamics obeyed by the models that comprise these theories. It is a very straightforward argument. There are theories given by models of the form \( \left<M, g_{\mu \nu }, \Phi \right>\) and \(\left<M, e^a_\mu , \Phi \right>\), as well as \(\left<M, \eta _{\mu \nu }, F_{\mu \nu }, J^\mu \right>\) and \(\left<M, \eta _{\mu \nu }, A_{\mu }, J^\mu \right>\). The first pair obeys the dynamics encoded by the Einstein field equations and the second pair obeys the dynamics encoded by the Maxwell equations. Therefore, both pairs are empirically equivalent to each other. The key assumption, of course, is that dynamics is sufficient to fully specify the empirical content of these theories and the models that comprise them. Yet, as we have already seen, there is important empirical content that this characterization leaves out: namely, the empirical content associated with boundary conditions and boundary-related phenomena.

4.2 Boundary possible models

In both of the examples considered, boundary conditions play a crucial role in determining the empirical content of models derived from the respective theories, particularly as it relates to describing subsystems. When the empirical information within these models is cast exclusively in terms of dynamics as in the standard semantic view, this additional empirical information is not accounted for in adjudications of theory equivalence, leading to sometimes bizarre and counter-intuitive results when these arguments are taken at face value. The above examples offer helpful illustrations of the importance of boundary conditions in empirical claims, and mesh well with recent philosophical investigations concerning the role of boundary conditions in scientific inquiry. In particular, Bursten (2021) argues that while boundary conditions have traditionally been understood in the philosophy literature as contingent facts akin to initial conditions, they are more properly understood as components of mathematical models. This has to do with, among other things, their role in specifying the scope of mathematical models and generating descriptions of novel phenomena. In a similar spirit, McKenna (forthcoming) emphasizes that boundary conditions sometimes display behavior and admit of generalizations most often associated with laws.

Considering both the specific examples we have discussed and these more general observations, this suggests a possible modification of the now-standard KPM/DPM version of the semantic approach to account for the role of boundary conditions in the models that are taken to capture the structure of our scientific theories. Here, we introduce a third class of models—proposed by Read (2016)—known as ‘boundary possible models’ \(\mathcal {B}\) (BPMs). Here, \(\mathcal {B} \subset \mathcal {K}\), and would denote the subset of KPMs compatible with particular boundary conditions. Then, those \(\mathcal {B} \cap \mathcal {D} \subset \mathcal {K}\) would specify those KPMs that are compatible with both particular boundary conditions and particular dynamics. This is depicted in Fig. 1.Footnote 19

Fig. 1
figure 1

The relation between \(\mathcal {K}\),\(\mathcal {D}\), and \(\mathcal {B}\) for a generic \(\mathcal {T}\)

How could this help us spell out the empirical equivalence of say, EM\(_1\) and EM\(_2\), in a way that captures this richer view of empirical content? For EM\(_1\) and EM\(_2\), this is very straightforward. We listed the boundary conditions relevant to describing the boundary surface of a perfect conductor. Let us call them \(\mathcal {B}_C\). Both models from EM\(_1\) and EM\(_2\) have dynamics given by Maxwell’s equations. Let us call them \(\mathcal {D}_M\). The empirical content of the model from EM\(_1\) is then given by the subset of KPMs defined by \(\mathcal {B}_{C} \cap \mathcal {D}_{M} \) and the empirical content of the model from EM\(_2\) is likewise given by \(\mathcal {B}_{C} \cap \mathcal {D}_{M} \). Clearly, these models now possess the same empirical information because we have included the boundary conditions necessary to pick out a unique description of the subsystem within the philosophical criteria that dictate the structural content of the models. Recall that this was previously unavailable in EM\(_1\) when its models were only described in terms of dynamics.

The key is simply to realize that boundary conditions are essential information in any attempt to represent a subsystem-environment decomposition. That is, whether we are using EM\(_1\) or EM\(_2\), we must specify boundary conditions in order to actually build solutions for the mathematical objects of which those descriptions make use (electric and magnetic fields versus gauge fields, respectively). Yet, under the standard semantic way of expressing these theories (in terms of mathematical objects and their dynamics), for a system like the Faraday cage one formulation contains more empirical information than the other precisely because the boundary conditions we used in building the mathematical objects are left out of the formal description of the theory. To be clear, this information is readily available (and often models cannot even be constructed without it) any time one uses standard techniques in electromagnetism. What we are pointing out is that this information has not found its way into the philosophical criteria we use to describe the content of the theory. If boundary conditions are admitted to the formal criteria that define the structure of a theory, this incongruity dissolves because these boundary conditions contain the information that is needed for the Faraday tensor formulation EM\(_1\) to distinguish between different surface charges from within the Faraday cage; something that EM\(_2\) more naturally does because information regarding the surface charges finds its way into the gauge potentials. Thus, it then becomes clear that EM\(_1\) and EM\(_2\) are indeed empirically equivalent once we admit boundary conditions into the semantic criteria. Of course, one could still resist the claim that they are theoretically equivalent due to interpretational issues and implications stemming from the different ontologies postulated by EM\(_1\) and EM\(_2\) (e.g. see Maudlin (2018) or Teitel (2021)), but at least their empirical equivalence in this context is secure.

Coming back to the more complicated example of TPG and GR, the action \(S_{TPG}\) and its variation \(\delta S_{TPG} = 0\) captures the empirical content for models that are compatible with both Dirichlet boundary conditions and the Einstein field equations. That is, \(S_{TPG}\) gives us the subset of the KPMs that satisfies Dirichlet boundary conditions and the dynamics of the Einstein field equations \(\mathcal {B}_{D} \cap \mathcal {D}_{EFE} \). As we saw, while the Einstein-Hilbert action (which we will now switch to specifying as \(S_{EH}\)) shares the same dynamics \(\mathcal {D}_{EFE}\), it is not capable of representing isolated subsystems with the Dirichlet boundary conditions \(\mathcal {B}_{D}\). This invites the question: can the models derived from the Einstein-Hilbert action represent any isolated subsystems and can isolated subsystems with Dirichlet boundary conditions be modeled within the framework of GR at all?

The answer to the former question is that there are boundary conditions that make the Einstein-Hilbert action well-defined. Recall that the normal derivatives of the metric did not vanish when examining the boundary terms in (14). Neumann boundary conditions, rather than specifying the values of the metric on the boundary, specify the values of the metric’s derivatives on the boundary. It turns out that when one imposes suitable Neumann boundary conditions, both the terms involving tangential and normal derivatives with respect to the metric vanish (Freidel et al., 2021). We can then clearly see with this framework that \(S_{EH}\) and \(S_{TPG}\) do not share the same empirical content because \( \mathcal {B}_{N} \cap \mathcal {D}_{EFE} \ne \mathcal {B}_{D} \cap \mathcal {D}_{EFE}\) (recall again that \(S_{TPG}\) gives us the subset of the KPMs \(\mathcal {B}_{D} \cap \mathcal {D}_{EFE}\)).

Finally, how does GR actually model isolated subsystems with Dirichlet boundary conditions and study important concepts found in asymptotic spacetimes? The answer is that we must set aside the Einstein-Hilbert action \(S_{EH}\) in favor of what is known as the Gibbons-Hawking-York (GHY) action \(S_{GHY}\):

$$\begin{aligned} S_{\textrm{GHY}}=\frac{1}{16 \pi G} \int d^{4} x \sqrt{-g} R + \frac{1}{8 \pi G} \oint _{\partial M} d^{3} \Omega \epsilon \sqrt{h} K, \end{aligned}$$
(17)

where \(K=\nabla ^\mu n_\mu \) is the trace of the extrinsic curvature, h is the induced metric on the boundary, and \(\epsilon \) is \(+1\) when the boundary hypersurface is spacelike and \(-1\) when the boundary hypersurface is timelike (York, 1972; Gibbons & Hawking, 1977). We see here that this action is equal to the Einstein-Hilbert action plus a boundary term. When varying this action, we find the bulk term that contains the dynamical Einstein field equations \(G_{\mu \nu }\), the boundary term from before, and a further boundary term originating from the GHY term. Upon imposing Dirichlet boundary conditions \(\left. \delta g_{\mu \nu }\right| _{\partial M}=0\), we find that the variation of the GHY boundary term exactly cancels out the previously non-vanishing terms. Thus, in the presence of manifolds with boundaries with Dirichlet boundary conditions, we have

$$\begin{aligned} \delta S_{\textrm{GHY}}=\frac{1}{16 \pi G} \int _{M} d^{4} x \sqrt{-g} G_{\mu \nu } \delta g^{\mu \nu }. \end{aligned}$$
(18)

This follows the exact same pattern as the variation of the TPG action. The additional boundary term plays a similar role and cancels out previously problematic terms, yielding a well-defined variation.

We see that \(S_{GHY}\) gives us the subset of KPMs \(\mathcal {B}_{D} \cap \mathcal {D}_{EFE}\). This matches up with the subset of KPMs given to us by \(S_{TPG}\), which as we have seen is also \(\mathcal {B}_{D} \cap \mathcal {D}_{EFE}\). Indeed, \(\delta S_{TPG} = \delta S_{GHY}\) when \(\mathcal {B}_{D}\) is imposed, so we know that both actions share the same dynamical content and the same representational capacity when it comes to isolated subsystems. Important quantities that depend on these boundary terms and conditions such as the ADM mass \(M_{ADM}\) and black hole entropy \(S_{BH}\) are found to be in agreement. For example, \(M_{ADM}\) is one of the quantities to which Penrose referred and represents the mass-energy content of a spacetime. Using \(S_{GHY}\) and \(S_{TPG}\) to determine this quantity gives the same results, which are crucially dependent on the role and behavior of the boundary terms and conditions that we have discussed (Dyer & Hinterbichler, 2009; Wald, 1993; Iyer & Wald, 1994; Hammad, 2019). As Freidel and Teh (2021) have noted, these boundary terms can also effectively bring the Noether charges of a theory into alignment with the corresponding Hamiltonian charges (i.e., the ADM mass), which connects such quantities to Hamiltonian observables. Coming to black hole entropy \(S_{BH}\), one can use the Euclidean semi-classical path integral approach and find that one obtains identical results for this quantity, with the boundary terms present in both \(S_{TPG}\) and \(S_{GHY}\) contributing the entire entropy in the calculation (Gibbons & Hawking, 1977; Gibbons et al., 1978; Oshita & Wu, 2017).

How could one go about arguing for theoretical equivalence of GR and TPG given our characterization of the semantic view that includes KPMs, DPMs, and BPMs? One way would involve taking inspiration from the characterization of equivalence found in Nguyen (2017), which he has dubbed ‘representational equivalence’. This would mean showing that models from both GR and TPG can represent the same target systems, and that they make the same empirical claims about these target systems. We have already partially done that by showing \(S_{GHY}\) and \(S_{TPG}\) coincide in the target subsystems they can represent and discussing how they align in the empirical claims they make about boundary dependent phenomena that goes beyond the shared dynamics of all these models. One could similarly investigate other actions, models, and isolated subsystems in both GR and TPG and ensure that they align in both representational capacity and empirical claims. This still leaves open the admittedly more difficult interpretative questions regarding whether GR and TPG license all of the same interpretive claims about the world and their target systems, but it at least provides a straightforward path to perspicuously demonstrating their empirical equivalence.

Our conception of a theory should specify the empirical content of the theory. KPMs define the objects of interest to us within a particular theory, but we would not say that defining a theory exclusively in terms of KPMs is satisfying because it plainly fails to specify empirical content. We also want to specify how these objects interact with each other and behave empirically. DPMs specify their dynamics. However, as the above examples demonstrate, dynamics does not constitute the full extent of the empirical content of these models. We also want to specify the subsystem-environment decompositions that these models can represent, as well as any boundary related empirical content that goes beyond the dynamics of these objects. Just as KPMs are insufficient to fully specify a theory’s empirical content, so too are DPMs alone: the latter should be supplemented with BPMs to more fully specify to empirical content of a theory.

Coming back to the issues of empirical and theoretical equivalence, it is clear that one’s conception of theory content and structure will have a non-trivial impact on any subsequent adjudication of theoretical equivalence. The identification of GR’s content with the dynamics resulting from the Einstein-Hilbert action and of EM\(_1\)’s content with a Faraday tensor obeying Maxwell’s equations does not fully specify the empirical content of those theories, and thus is responsible for incorrect adjudications of empirical equivalence when compared with their allegedly equivalent counterparts. Both Knox and Weatherall do make some qualifying statements. Knox (2011, p. 272) notes that the local equivalence of the TPG and EH actions up to a divergence may lead to some global worries, while Weatherall (2016, p. 1078) notes that he stipulates that the empirical content of EM is exhausted by Faraday tensors compatible with Maxwell’s equations. Yet, it is clear that in both cases, there are indeed global worries that render their adjudications problematic and that these qualifying statements do not do justice to the empirical content that is lost when one looks exclusively at local dynamics. Overall, as we have argued above, their analyses and conclusions can still obtain provided that these further considerations are accounted for. However, it is important to acknowledge both that these models require additional specifications beyond the equations of motion in order to generate the totality of their empirical content and that this is a relevant consideration in adjudicating equivalence.

4.3 The pragmatic view

We can also draw from the pragmatic view of theories to illuminate these adjudications of theoretical equivalence as well as the importance of considering carefully one’s view of theory structure. Rather than totally repudiating the syntactic and semantic views, the pragmatic view acknowledges the utility of many of the formal components of these other perspectives, while also emphasizing non-formal considerations. While there is significant variety amongst proponents of this view (Cartwright, 1983; Hacking, 1983; Kitcher, 1993; Winther, 2021), two strands of thought stand out as particularly relevant to the present discussion: (i) model pluralism and (ii) focus on scientific practice.

On (i): Cartwright claims that models are the appropriate level of scientific investigations (as opposed to theories) and argues that there are many different but legitimate reasons to utilize different models. “Models serve a variety of purposes, and individual models are to be judged according to how well they serve the purpose at hand” (Cartwright, 1983, p. 152). One model might be focused on accuracy for a particular quantity, while another might be trying to incorporate additional phenomena into the description and consequently, might be less focused on maximizing the accuracy of any one particular quantity.

This point is made quite generally, but we can see something similar going on in GR. We have already encountered two actions used in GR, \(S_{EH}\) and \(S_{GHY}\), but there are others, including, but not limited to, the \(\Gamma \)-\(\Gamma \) action

$$\begin{aligned} S_{\Gamma \Gamma } = \frac{1}{16 \pi G} \int d^4 x\sqrt{-g} g^{\mu \nu }\left( \Gamma _{\mu \beta }^{\alpha } \Gamma _{\alpha \nu }^{\beta }-\Gamma _{\mu \nu }^{\alpha } \Gamma _{\alpha \beta }^{\beta }\right) \end{aligned}$$
(19)

and the ADM action

$$\begin{aligned} S_{ADM} = \frac{1}{16 \pi G} \int d^4 x\sqrt{-g} (\tilde{R} + K^{\mu \nu }K_{\mu \nu } - K^2), \end{aligned}$$
(20)

where \(\tilde{R}\) is the three-dimensional Ricci scalar of the spatial slice in the \(3+1\) decomposition in the ADM formulation and K is the extrinsic curvature. \(S_{\Gamma \Gamma }\) turns out to be incredibly convenient for demonstrating that GR corresponds to the self-coupling of a massless spin-2 particle, due to the cubic nature of the form of the Lagrangian, which is in analogy with both Yang-Mills fields and spin-1 particles and chiral fields and spin-zero particles (Deser, 1970, 1987). Additionally, the \(3+1\) decomposition like that used in \(S_{ADM}\) is particularly important because, among many other benefits, it is useful for solving initial value problems as it allows us to mathematically formulate the Einstein equations as “a Cauchy problem with constraints" (Gourgoulhon, 2007, pp. 11–12). Consequently, it has become the foundation for most approaches in numerical relativity. Whether we choose an action based on convenience, clarity, or necessity, there are a lot of options at our disposal for modeling phenomena in GR. Under this pragmatic approach of embracing model pluralism, it is clear that GR is much broader than the dynamical content of one of these actions alone and that any adjudication of theoretical equivalence would need to address this broader scope.

Another theme that the pragmatic view emphasizes—point (ii) above—is that our view of theories should be commensurate with scientific practice. While acknowledging the utility of formal criteria, Teh (forthcoming, p. 7) has argued that a theory should be more properly viewed as a collection of physical representations, “accompanied by a keen ‘know how’ about what we can do with such representations and how they are related to each other.” This emphasis on ‘know how’ implores us to consider scientific practice in specifying the structure of theories and has indeed been a major focus of advocates for the pragmatic view (Hacking, 1983; Kitcher, 1993). Clearly, practitioners of GR use many different dynamically equivalent actions depending on the problem at hand, but this discussion also highlights how boundary phenomena have become more relevant in both physics and philosophy communities in recent years. As we have already noted, physicists have been exploring boundary phenomena and isolated subsystems, with examples including edge modes in the quantum Hall effect, black hole entropy, and slip/no-slip boundary conditions in fluid flow, while philosophers have been interested in them as a way to cash out the direct empirical significance of symmetries and the explanatory capabilities of models. Furthermore, it is worth emphasizing that these examples in physics feature novel phenomena that are apparent only when we consider boundary content as descriptions of these phenomena are not available from bulk dynamics alone. Consequently, our views on theory structure should be updated to accommodate these kinds of empirical phenomena.

Here, we see a potential connection between the semantic and pragmatic approaches. While the pragmatic view does emphasize non-formal elements of modeling and theory structure, its embrace of pluralism also allows it to accommodate a variety of strategies in describing theory structure, including the use of more formal notions. Indeed, some philosophers have even argued that “the semantic conception in its bare minimal expression” is very compatible with “pragmatic elements and themes” (Suárez, 2019, p. 348). We can thus rely on pragmatic considerations such as scientific practice to inform us of what structures should find their way into a formal representations of the models in our theories. Before the theoretical and empirical importance of boundaries was truly appreciated, it might have made more sense to view a theory exclusively in terms of its dynamics and mathematical objects. However, as scientific practice (and philosophical interest) has changed and brought this boundary phenomena more into focus, it now makes sense to adjust our views on the structure of theories to be commensurate with scientific practice. As we saw in the previous section, one can easily accommodate boundary conditions within a traditional semantic analysis of a theory.

5 Consequences and conclusions

Discussion concerning both the equivalence and structure of physical theories have been and will continue to be important themes in the philosophy of science. As we have seen (following Barrett (2019)), each of these questions bears upon the other because adopting a particular standard of equivalence will necessarily specify a view of what the contentful features of a theory actually are; and similarly, adopting a particular view of theory content or structure will necessarily set the standard by which equivalence is to be judged.

The aforementioned examples in the literature regarding the supposed theoretical equivalence between EM\(_1\) and EM\(_2\) and between TPG and GR illustrate both that these questions do indeed interact with each other and suggest that these questions need to be tackled in parallel. In navigating these issues surrounding theory equivalence and structure, we take one moral from this discussion to be that adopting a pragmatic attitude towards theory structure can be very fruitful. Indeed, we saw that in both examples considered, the source of the failure of empirical equivalence came about from the authors adopting views of theory structure that, while useful and consistent with a fairly standard view the philosophy literature, used formal criteria that were overly restrictive regarding the empirical substructures that one could attribute to the theories. Thus, additional empirical content related to boundary phenomena and isolated subsystems did not make its way into the analysis.

However, the pragmatic view can help bring these discussions of equivalence and structure into alignment. As we have seen in these examples, the pragmatic view indicates that we should be pluralistic regarding our representations of models and theories, as well as update the components we consider when utilizing formal descriptions of theory structure by supplementing the standard semantic representation with boundary conditions. In so doing, we can construct an argument for the equivalence of TPG and GR that also reflects the full richness of the empirical content that these theories are currently understood to possess. While the example of EM\(_1\) and EM\(_2\) is not quite as dramatic given that the interpretational issues are not generally considered to be quite as difficult, there is something similar going on. When boundary conditions are included in the formal criteria that describe theory structure, it is clear that EM\(_1\) and EM\(_2\) are equivalent and that the issue merely stemmed from adopting an overly restrictive view of the empirical content contained within the formal descriptions. Furthermore, this pragmatic attitude provides flexibility in that it allows us to continuously update our understanding of theory structure as previous empirical substructures become better understood and novel empirical substructures come into view. In the context of these many empirical realizations surrounding boundary phenomena and their increased importance to both physicists and philosophers, it is clear that such an update is needed and that boundary conditions and phenomena must be considered in discussions of theory structure and equivalence.