All the previous accounts of symmetry have one thing in common: they are all instances of global symmetry principles. This means that these symmetries are unchanged for all points in space-time. The introduction of a new kind of twist on the idea of symmetry in 1918 unlocked even greater powers of this abstract formalism in its propensity to probe reality, paving the way for a novel type of field theory to flourish. Today, the tremendous success of the mathematical framework underlying the standard model, providing a unified and overarching theory of all non-gravitational forces, can be understood to rest on the insights gained from what is known as gauge theory. The idea fueling this novel approach is related to a new kind of symmetry, called gauge symmetry. It is a local symmetry, meaning that its properties are now a function of the space-time coordinates \(x^\mu \). This principle was first fully formulated, independently, by Hermann Weyl and Emmy Noether in the same year (Brading 2002). However, the course of the history of gauge theory, and in parallel the road to unification, would take meandering paths.

1 Back to Geometry: The Principle of Covariance

Einstein’s theory of general relativity, sketched in (3.14), is an extremely elegant and aesthetic physical theory. It is based on two very subtle principles, a physical and a mathematical requirement.

The physical principle is known as the equivalence principle. Sometimes, seemingly innocuous observations have the power to help uncover deep truths about the workings of nature. Say, a ball rolling in a toy wagon or a spinning bucket. To quote from Matthews (1994):

As a child, the Nobel Prize-winning physicist Richard Feynman asked his father why a ball in his toy wagon moved backward whenever he pulled the wagon forward. His father said that the answer lay in the tendency of moving things to keep moving, and of stationary things to stay put. “This tendency is called inertia,” said Feynman senior. Then, with uncommon wisdom, he added: “But nobody knows why it is true.”

Inertia, the measure of a body’s resistance to acceleration, is encoded in Newton’s second law describing the resulting force F due to the acceleration \(a=\ddot{x}\) reads \(F=m_{\text {i}} a\). The mass term \(m_i\) appearing in this equation is called inertial mass. This is to distinguish it from the mass term appearing in Newton’s law of universal gravitation,Footnote 1 called gravitational mass \(m_{\text {g}}\). A simple experiment, going back to Newton, is the following. A bucket partly filled with water is hung from a long cord and rotated so many times until the cord becomes strongly twisted. By releasing the bucket, after ensuring the water is at rest, it will rotate in the other direction due to the cord untwisting. Slowly the water begins to rotate with the bucket and as it does so the water moves to the sides of the bucket. In effect, the surface of the water becomes concave. This effect is not due to the water spinning relative to the bucket, as, at some point, the bucket and the water are spinning at the same rate while the surface stays concave. Again, the question of inertia emerges. Why should the surface of the water bulge? What is the origin of this effect? One explanation was proposed by the philosopher and physicist Ernst Mach. He attributed the source of inertia to the whole matter content of the universe, an idea today referred to as Mach’s principle (Misner et al. 1973). This principle guided Einstein in his formulation of general relativity (Penrose 2004, p. 753). In his equivalence principle, Einstein asserted that the gravitational mass \(m_{\text {g}}\) is equivalent to the inertial mass \(m_{\text {i}}\). In other words, the acceleration a body experiences due to its mass being exposed to the pull of the gravitational force, is independent of the nature of the body. The insight leading to the postulation of this principle, Einstein would later call “the happiest thought of my life” (Thorne 1995, p. 97). This thought was the following, quoting Einstein in Thorne (1995, p. 96f.):

I was sitting in a chair in the patent office at Bern, when all of a sudden a thought occurred to me: “If a person falls freely, he will not feel his own weight.”

In effect, the principle of equivalence states that there is no local way of knowing if one is feeling the effect of gravitational pull or the force due to acceleration. So a free falling observer will not detect any traces of gravity in her local reference frame, and only the laws of special relativity apply. Einstein soon derived two testable consequences of the equivalence principle, namely that gravity bends light, and that the frequency of radiation varies with the strength of gravity (Torretti 1999, p. 290). Unfortunately, it was later shown that Mach’s principle is not actually incorporated in general relativity (Penrose 2004, p. 753), and still today the origins of inertia are puzzling (Matthews 1994). Thus, seemingly obvious and uncontroversial aspects of reality can have very deep and mysterious connotations.

Leaving the physical world and returning to the realm of mathematical abstractions, Einstein required another principle to base general relativity on. This mathematical requirement is intimately interwoven with the ideas of invariance, related to symmetry and is called the principle of (general) covariance. In a nutshell, it states that the contents of physical theories should be independent of the choice of coordinates needed to make explicit calculations. In accordance with the insights gained from analyzing symmetry transformation, the equations of general relativity are invariant under general (differentiable) coordinate transformations: they are covariant. Lorentz transformations, seen in (3.38), can also be understood as coordinate transformations. This means that they do not infer a change in the physical system anymore, but now relate to the choice of the coordinate system used for labeling and measuring abstract vectors and tensors. As an example, some manipulations on the vector \(\varvec{a} \in \mathbb {R}^3\) only become possible once a coordinate system is chosen, relative to which the vector components \(a_1\), \(a_2\), and \(a_3\) can be assigned numbers. For instance, if \(\varvec{a} = (1, 1, 1)\) is one manifestation then a 45\(^{\circ }\) rotation around the \(x_3\)-axis of the coordinate system reveals \(\varvec{a}^\prime = (\sqrt{2}, 0, 1)\). As this is still the same abstract entity, its properties, such as the length,Footnote 2 must stay unchanged: \(|\varvec{a}| = \sqrt{3} =|\varvec{a}^\prime |\). Covariance may not appear like a particularly profound insight into the workings of nature, as one could argue that theses are common sense requirements for a physical theory. However, the ramifications are far-reaching and profound.

Formally, general coordinate transformations in four-dimensional space-time are defined as follows

$$\begin{aligned} x^\mu \rightarrow x^{\prime \mu } = x^\mu + \xi ^\mu (x^\mu ), \end{aligned}$$
(4.1)

where \(\xi ^\mu \) is some smooth function of the coordinates. This can be rephrased infinitesimallyFootnote 3 in general terms for a vector \(\text {d} x^\mu \) as

$$\begin{aligned} \text {d} x^\mu \rightarrow \text {d} x^{\prime \mu } = \frac{\partial x^{\prime \mu }}{\partial x^{ \nu }} \text {d} x^\nu , \end{aligned}$$
(4.2)

where the new set of general coordinates are denoted by the prime symbol, and \(x^{\prime \mu } = x^{\prime \mu } ( x^\nu )\) describe the same point in space-time as \(x^\nu \). These transformations are represented by elements of \(GL(4,\mathbb {R})\), i.e., real \(4 \times 4\) matrices:

$$\begin{aligned} \varDelta ^{\mu ^\prime }_{\ \nu } := \frac{\partial x^{\prime \mu }}{\partial x^{ \nu }}. \end{aligned}$$
(4.3)

It should be noted that

$$\begin{aligned} \varDelta ^{\mu ^\prime }_{\ \nu } \varDelta ^{\nu }_{\ \lambda ^\prime } = \delta ^{\mu ^\prime }_{\ \lambda ^\prime } =\delta ^{\mu }_{\ \lambda } =\varDelta ^{\mu }_{\ \nu ^\prime } \varDelta ^{\nu ^\prime }_{\ \lambda }, \end{aligned}$$
(4.4)

employing Kronecker’s delta. These transformation matrices can be used to define vectors and tensors. In other words, the details of how an object transforms covariantly under general coordinate transformations renders it a vector or a tensor. As an example, for a second-rank tensor transforms as

$$\begin{aligned} T^{\prime \nu }_{ {\prime } \ \mu } = \varDelta ^{\nu ^\prime }_{\ \sigma } \varDelta ^{\rho }_{\ \mu ^\prime } T^{\sigma }_{\ \rho }. \end{aligned}$$
(4.5)

Although the placement of the indices in the subscript or superscript is related to the details of the transformation properties,Footnote 4 these technicalities are irrelevant for this discussion. It suffices to recall that the metric tensor can be utilized to lower or raise indices, e.g., \(A_\mu = g_{\mu \nu } A^\nu \). The metric also transforms as a second-rank tensor:

$$\begin{aligned} g^{\prime }_{\mu \nu } = \varDelta ^{\sigma }_{\ \mu ^\prime } \varDelta _{\ \nu ^\prime }^{\rho } g_{\sigma \rho }. \end{aligned}$$
(4.6)

Looking at the transformation properties of a derivative of a vector \(\partial _\nu A^\mu \), with \(\partial _\nu := \partial / \partial x^\nu \), one finds

$$\begin{aligned} \begin{aligned} \partial ^\prime _\mu A^{\prime \nu }&= \left( \varDelta ^\sigma _{\ \mu ^\prime } \partial _\sigma \right) \left( \varDelta _{\ \rho }^{\nu ^\prime } A^\rho \right) \\&= \varDelta _{\ \mu ^\prime }^\sigma \left( \partial _\sigma \varDelta ^{\nu ^\prime }_{\ \rho } \right) A^\rho + \varDelta _{\ \mu ^\prime }^\sigma \varDelta ^{\nu ^\prime }_{\ \rho } \left( \partial _\sigma A^\rho \right) , \end{aligned} \end{aligned}$$
(4.7)

using the product rule. The first term in the second line of the equation breaks the transformation law for a second-rank tensor. In order to restore covariance, a new kind of derivative is introduced, called the covariant derivative

$$\begin{aligned} \nabla _\mu A^{ \nu } := \partial _\mu A^{ \nu } - \Gamma _{ {\nu }\mu \lambda }^{\nu } A^\lambda , \end{aligned}$$
(4.8)

where \(\Gamma _{ {\nu }\mu \lambda }^{\nu }\) are the Christoffel symbols seen in (3.13), which have the following special transformation properties

$$\begin{aligned} \Gamma _{ {\nu }\mu \lambda }^{\prime \nu } = \varDelta ^{\alpha }_{\ \mu ^\prime } \varDelta ^{\beta }_{\ \lambda ^\prime } \varDelta ^{\nu ^\prime }_{\ \gamma } \Gamma _{ {\gamma }\alpha \beta }^{ \gamma } + \varDelta ^{\alpha }_{\ \mu ^\prime } \varDelta ^{\beta }_{\ \lambda ^\prime } \left( \partial _\alpha \varDelta ^{\nu ^\prime }_{\ \beta } \right) . \end{aligned}$$
(4.9)

Now the covariant derivative can be seen to transform correctly under general coordinate transformations

$$\begin{aligned} (\nabla _\mu A^{\nu })^\prime = \nabla ^\prime _\mu A^{\prime \nu } = \varDelta _{\ \mu ^\prime }^\sigma \varDelta ^{\nu ^\prime }_{\ \rho } \left( \nabla _\sigma A^{ \rho } \right) , \end{aligned}$$
(4.10)

where the second term in the transformed Christoffel symbols is responsible for the cancellation of the undesired expression in (4.7). Note that (4.4) was employed for the calculation. See Misner et al. (1973), Peebles (1993), Lawrie (2013).

In summary, the innocent requirement that geometric entities, like vectors and tensors, should be independent of their coordinate representation conjures up a novel mathematical machinery. Yet again, the basic operation of taking the derivative is recast in a more general form, bringing with it powerful new properties and relationships. Indeed, it is interesting to note that the Christoffel symbols are associated not only with the curvature of space-time, see (3.13), and the covariant derivatives of (4.8), but also the differential geometric notions of parallel transport and geodesics, a generalization of the idea of a straight line to curved space-time. Additional equations relating to general relativity are (4.47) and (4.56).

Guided by the principles of equivalence and covariance, Einstein was able to formulate the famous geometrodynamic field equations, one of the most aesthetic and accurate physical theories. One experiment confirmed the effect of gravity, as predicted by general relativity, on clocks up to an accuracy of \(10^{-16}\) hertz (Chou et al. 2010). Another experiment measured the “twisting” of space-time, called frame-dragging, due to the rotation of Earth to be \(37.2 \pm 7.2\) milliarcseconds. The theoretical value was calculated to be 39.2 milliarcseconds (Everitt et al. 2011). This amazing accuracy between experiment and theory is only rivaled by the relativistic quantum field theory of electrodynamics, know as quantum electrodynamics, winning Tomonaga, Julian Schwinger, and Feynman a Nobel Prize in 1965. In this theory, the magnetic moment of the electron can be computed. The experimental measurement can be performed with an impressive precision of fourteen digits, in exact correspondence with the theoretical value (Hanneke et al. 2008). For more details on the field equations of general relativity, see Sect. 10.1.2.

2 The History of Gauge Theory

A key feature of general relativity is that it is a local theory. Only local coordinate systems are meaningful. Christoffel symbols describe the effects of transporting geometrical information along curves in a manifold, allowing coordinate systems to be related to each other. In detail, the value of \(\Gamma _{ {\nu }\mu \lambda }^{\nu }\), defined via the metric tensor at each point in space-time, depends on the properties of the gravitational field, allowing the relative “orientation” of local coordinate systems to be compared. Weyl took this idea to the next level (Moriyasu 1983). He wondered if the effects of other forces of nature could be associated with a corresponding mathematical quantity similar to \(\Gamma _{ {\nu }\mu \lambda }^{\nu }\). Weyl was specifically thinking about electromagnetism.

He embarked on a quest that would eventually reveal “one of the most significant and far-reaching developments of physics in this [20th] century” (Moriyasu 1983, p. 1) in 1918, when he was attempting to derive a unified theory of electromagnetism and gravitation (Weyl 1918). The same year Noether published her famous theorems relating symmetry to conserved quantities, Weyl was independently attempting to explain the conservation of the electric charge with a novel local symmetry. He called the invariance related to this new symmetry Eichinvarianz . Although the notion was originally related to invariances due to changes in scale, the English translations of Weyl’s work referred to gauge invariance and gauge symmetry. It would, however, require nearly 50 years for gauge invariance to be rediscovered and reformulated as the powerful theory known today. Indeed, the idea of local gauge symmetry was premature in 1918, where the only known elementary particles were electrons and protons.

In more detail, Weyl proposed that the norm of a physical vector should not be a constant, but depend on the location in space-time. Associated with this, a new quantity, similar to the Christoffel symbols is required, in order to relate the lengths of vectors at different positions.

Formally, invariance is restored again, if, in analogy to (4.8), the derivative is replaced with a new kind of derivative, resulting in the cancellation the unwanted terms

$$\begin{aligned} D_\mu := \partial _\mu - c_1 \Lambda _\mu . \end{aligned}$$
(4.11)

\(D_\mu \) is called the gauge-invariant derivative and \(c_1>0\) is some constant. Note that \(\Lambda _\mu (x^\nu )\) is a new vector field, referred to as the gauge field. Weyl’s great insight was his idea to decode these abstract notions and connect them with electromagnetism.

The equations of electromagnetism can be recast in Minkowski space by introducing the so-called 4-vector potential, defined as

$$\begin{aligned} A_\mu :=(\Phi ,\varvec{A}). \end{aligned}$$
(4.12)

The scalar potential \(\Phi \) and the vector potential \(\varvec{A}\) can be derived from the charge density \(\rho \) and the current density \(\varvec{J}\), respectively. Recall that \(\rho \) and \(\varvec{J}\) appear in (2.4). Similarly to \(A_\mu \), they can be understood as the components of the 4-vector current density, or 4-current \(J_\mu =(\rho , \varvec{J})\). Moreover, both the electric and magnetic fields can be derived from the scalar and vector potentials as

$$\begin{aligned} \varvec{B} = \nabla \times \varvec{A}, \quad \varvec{E} = - \partial _t \varvec{A} - \nabla \Phi , \end{aligned}$$
(4.13)

where \(\nabla \) is defined in (2.2). From the new quantity of (4.12), the Maxwell field-strength tensor, hinted at in (3.7), can be explicitly constructed as

$$\begin{aligned} F_{\mu \nu } = \partial _\mu A_\nu - \partial _\nu A_\mu . \end{aligned}$$
(4.14)

Now Maxwell’s equations can be recovered in two different ways. Either by inserting the true electromagnetic Lagrangian

$$\begin{aligned} \mathcal {L}_{\text {EM}} = -\frac{1}{4} F_{\mu \nu } F^{\mu \nu } - J_\mu A^\mu , \end{aligned}$$
(4.15)

into the Euler-Lagrange equations (3.6). Alternatively, Maxwell’s equations can be computed directly from \(F_{\mu \nu }\). The inhomogeneous Maxwell equations (2.4a) and (2.4d) are retrieved by virtue of

$$\begin{aligned} \partial _\mu F^{\mu \nu } = \Box A^\nu - \partial ^\nu (\partial _\nu A^\mu ) = J^\nu . \end{aligned}$$
(4.16)

The new type of derivative, called the d’Alembertian operator, is defined as

$$\begin{aligned} \Box := \partial _\mu \partial ^\mu = \partial ^2 / \partial t^2 - \nabla ^2. \end{aligned}$$
(4.17)

The homogeneous equations (2.4b) and (2.4c) can be derived from

$$\begin{aligned} \partial _\mu \hat{F}^{\mu \nu } = 0, \end{aligned}$$
(4.18)

where \(\hat{F}^{\mu \nu }\) is obtained from \(F^{\mu \nu }\) by substituting \(\varvec{E} \rightarrow \varvec{B}\) and \(\varvec{B} \rightarrow -\varvec{E}\). For more details, see Jackson (1998), Collins et al. (1989).

It turns out that the formulation of electrodynamics in this guise leads to a large redundancy associated with the theory. All the equations related to \(A_\mu \), importantly Maxwell’s equations, are invariant under the following transformation

$$\begin{aligned} A^\prime _\mu = A_\mu + c_2\partial _\mu \chi , \end{aligned}$$
(4.19)

where \(\chi \) is an unspecified scalar function of \(x^\nu \) and \(c_2\) is a constant.

Weyl realized, that (4.19) could be understood as a local gauge transformation, associated with the covariant derivative \(D_\mu \). In effect, he identified the potential \(A_\mu \) to be the gauge field or gauge boson \(\Lambda _\mu \) appearing in (4.11). Technically, for the equations to work, the constants are required to be \(c_1=ie\) and \(c_2=1\), where e denotes the elementary charge. See, for instance de Wit and Smith (2014), Peskin and Schroeder (1995). Unfortunately, it was shown by Einstein and others that Weyl’s gauge theory based on changes in scale had failed—it lead to conflicts with known physical facts (Vizgin 1994; Moriyasu 1983 and Penrose 2004, Section 19.4). The mathematical observation that Maxwell’s equations are gauge invariant was simply seen as an accident, as there was no deeper interpretation of the phenomena able to shed some light on the issue. The potential \(A_\mu \) was just a ghost in the theory.

However, with the development of quantum mechanics, Weyl could re-apply the idea of gauge invariance in a new context. This gave his gauge theory a new meaning. Note that the wave function, as any plane wave, can be expressed as

$$\begin{aligned} \psi (t,\varvec{x}) = C \exp \left( {i(\varvec{k}\cdot \varvec{x} - \omega t)} \right) , \end{aligned}$$
(4.20)

where C is the amplitude, \(\varvec{k}\) the wave vector, and \(\omega \) represents the wave’s angular frequency. For details, see, for instance Schwabl (2007). A change of the phase of a wave by the amount \(\lambda \) is related to the transformation \(\exp (i \lambda )\). In quantum mechanics, for the wave function of an electron, this is realized by the transformation

$$\begin{aligned} \psi ^\prime = \exp \left( ie \lambda \right) \psi , \end{aligned}$$
(4.21)

where e is the elementary charge. Weyl’s essential idea was to interpret the phase of the wave function as the new local variable. In other words, the value \(\lambda \) is promoted to \(\lambda (x^\nu )\) in (4.21). Instead of changes in scale, this new local gauge transformations is now interpreted as changes in the phase of \(\psi (t, \varvec{x})\), encoded via \(\lambda \) at various points in space-time (Weyl 1929). From the explicit form of the gauge transformation (4.21), the covariant derivative and the transformation properties of the gauge fields can easily be derived.

The transformation of the derivative of the field is given by

$$\begin{aligned} \left( \partial _\mu \psi \right) ^\prime = \partial _\mu \psi ^\prime = \exp \left( ie \lambda \right) \left( \partial _\mu \psi + ie\partial _\mu \lambda \psi \right) , \end{aligned}$$
(4.22)

utilizing the chain rule for the derivative of the exponential function. The term \(ie\partial _\mu \lambda \) due to the local parameter breaks the covariance. Introducing the gauge fields in the covariant derivative as \(D_\mu = \partial _\mu - ie A_\mu \), similarly to (4.11), one finds

$$\begin{aligned} \left( D_\mu \psi \right) ^\prime = \left( \partial _\mu \psi \right) ^\prime - ie\left( A_\mu \psi \right) ^\prime . \end{aligned}$$
(4.23)

By inserting (4.22) into this equation, and noting that \(( A_\mu \psi )^\prime = A^\prime _\mu \psi ^\prime \), it can be shown that the additional term in the covariant derivative cancels the quantity destroying the covariance. However, this is only true if the gauge field transforms as follows

$$\begin{aligned} A^\prime _\mu = A_\mu +\partial _\mu \lambda . \end{aligned}$$
(4.24)

These calculations finally offered new insights for the interpretation of (4.19). The Schrödinger equation, seen in (3.24), is left unchanged after the two gauge transformations (4.21) and (4.24), with \(\lambda =\lambda (x^\nu )\). Despite offering a clear meaning for the new local variables \(\lambda \), it was still believed that the potential \(A_\mu \) had no physically measurable effects. It took nearly thirty years before a simple but ingenious idea uncovered a possible observable effect due to the potential (Aharonov and Bohm 1959), promoting \(A_\mu \) to a physical field in its own right. In a sense, it is more fundamental than the electric or magnetic fields. A year later an experimental verification of the Aharonov-Bohm effect was carried out (Chambers 1960). Looking back at these developments, Feynman would remark (quoted in Moriyasu 1983, p. 21):

It is interesting that something like this can be around for thirty years but, because of certain prejudices of what is and is not significant, continues to be ignored.

To summarize, the electromagnetic interactions of charged particles can be understood as a local gauge theory, embedded in the deeper framework of quantum mechanics. Just as the \(\Gamma _{ {\nu }\mu \lambda }^{\nu }\) describe how coordinate systems are related to each other in general relativity, the connection between phase values of the wave function at different points is given by \(A_\mu \), just as Weyl had originally envisioned. The link to the global symmetry transformations discussed previously is given by the following. Recalling that (3.29) describes the transformation properties of a quantum field under a group action, the formula given in (4.21) can be understood as a special case thereof. If the variable \(\lambda \), parameterizing the symmetry transformations, would be a constant, (4.21) reveals the transformation property of the field \(\psi \) under a global U(1) symmetry.Footnote 5 The simple mathematical trick of letting the parameter \(\lambda \) become space-time dependent is responsible for the transition between the global and the local symmetry. In other words, and in the general case where the parameters of the symmetry group are not restricted do being scalars as seen in (3.27), the notion of “gauging the symmetry” is the straightforward substitution

$$\begin{aligned} \theta _k \rightarrow \theta _k (x^\nu ). \end{aligned}$$
(4.25)

Adding this small degree of freedom to the mathematical machinery has profound consequences.

Re-expressing (3.29) as the transformation properties related to a local symmetry yields

$$\begin{aligned} \psi ^{\prime } = \exp \left( \theta _k (x^\nu ) \text {X}^{k} \right) \psi =: \text {U} (x^\nu , \theta _i) \psi , \end{aligned}$$
(4.26)

In plain words, the matrix U is an element of a local symmetry group G, with the group generators represented as matrices \(\text {X}^k\) which satisfy commutation relations (3.19), and the parameters \(\theta _k\) are now gauged. Note that (4.21) is a special case of (4.26).

Again, this subtle change of letting the parameter \(\theta _k\) be a function of \(x^\nu \) results in an additional term in the transformation rules, as now \(\partial _\mu \theta _k \ne 0\). This new term is responsible for breaking the covariance. As discussed above, and in the case of general relativity, in order to restore gauge invariance, a gauge field \(A^k_\mu (x^\nu )\) is required.Footnote 6 The covariant derivative is constructed from these fields. Similarly to (4.11) and (4.8)

$$\begin{aligned} D_\mu := \partial _\mu - A^k_{\mu } \text {X}_k =: \partial _\mu - B_{\mu }, \end{aligned}$$
(4.27)

where \(B^{\mu }\) is a matrix constructed from the gauge fields. By replacing \(\partial _\mu \) with \(D_\mu \) gauge invariance is restored. The transformation properties of \(B_{\mu }\) can be calculated as follows. The requirement of covariance also applies to the transformed covariant derivative. So, utilizing the gauge transformation laws specified in (4.26)

$$\begin{aligned} (D_\mu \psi )^\prime = D^\prime _\mu \psi ^\prime = \text {U} \left( D_\mu \psi \right) . \end{aligned}$$
(4.28)

Inserting (4.27) yields

$$\begin{aligned} \left( \partial _\mu - B^\prime _{\mu } \right) \text {U} \psi = \text {U} \left( \partial _\mu - B_{\mu } \right) \psi . \end{aligned}$$
(4.29)

Noting the product rule for derivatives, and rearranging some terms, the following expression can be found, describing the transformation property associated with the gauge fields

$$\begin{aligned} B^\prime _{\mu } = \text {U} B_\mu \text {U}^{-1} + \left( \partial _\mu \text {U} \right) \text {U}^{-1}. \end{aligned}$$
(4.30)

Infinitesimally, i.e., for small parameter values \(\theta _k (x^\nu ) \ll 1\), the gauge transformation can be expressed as

$$\begin{aligned} \begin{aligned} \text {U} = 1 + \theta _k \text {X}^k + \mathcal {O} (\theta ^2), \\ \text {U}^{-1} = 1 - \theta _k \text {X}^k + \mathcal {O} (\theta ^2). \end{aligned} \end{aligned}$$
(4.31)

From this, an expression for the components of the wave function can be derived, similar to (3.29)

$$\begin{aligned} \psi ^{\prime i} = \psi ^i + \theta _k f^{ki}_{ {ki}j} \psi ^j. \end{aligned}$$
(4.32)

Recall that the adjoint representation (3.28) employs the structure constants \(f^{ijk}\), which encode the generator matrices \(\text {X}^k\). Switching to the gauge fields, one finds that

$$\begin{aligned} A^{\prime k}_\mu = A^{k}_\mu + f^k_{ {k} ij} \theta ^i A^{j}_\mu +\partial _\mu \theta ^k. \end{aligned}$$
(4.33)

The details of how the commutation relations for the generators \(\text {X}^k\) and the associated structure constants enter the picture can, for instance, be seen in Cheng and Li (1996, p. 232), de Wit and Smith (2014, p. 408f). Again, the expression (4.24) is found as a special case of (4.33).

In order to link the abstract equations of gauge theory with concrete physically relevant quantities, one introduces a free parameter g into the theoryFootnote 7. The local parameter and the gauge fields are rescaled with this value

$$\begin{aligned} \theta _k \rightarrow g \theta _k, \quad A_\mu ^k \rightarrow g A_\mu ^k. \end{aligned}$$
(4.34)

The resulting effect of this trivial exercise is that the values g appear in the Lagrangian and can be interpreted as the physical coupling strength (de Wit and Smith 2014). This is a number that determines the strength of the interaction associated with the gauge fields. As was seen for the case of electromagnetism, \(g=e\). Essentially, the abstract concepts of the formal representation are enriched by encoding additional measurable aspects of the physical reality domain.

After a long journey through symmetry and geometry, all that remains, due to the requirement of gauge invariance, are transformation properties of the wave function and the gauge field determined solely by the structure constants and the parameter of the local symmetry. Today, such theories are called Yang-Mills gauge theories. Originally proposed by Chen Ning Yang and Robert Mills in 1954, as a gauge theory describing the strong nuclear interaction (Yang and Mills 1954). They postulated that the local gauge group was SU(2) . However, this specific theory for the strong force failed. It was known from experiments, that the nuclear force only acted on short ranges. Yang and Mills’ theory, however, predicted that the carrier of the force, the gauge field, would be, like the photon in electromagnetism, long-range. This is because there is no way to incorporate gauge invariant mass terms for the gauge field into the Lagrangian (Moriyasu 1983). Nevertheless, this specific kind of gauge theory laid the foundation for modern gauge theory, culminating in the standard model of particle physics. Unfortunately, the potential power inherent in the formal machinery of gauge theories was not anticipated at the time. Indeed, Freeman Dyson would, eleven years after the introduction of Yang-Mills theory, gloomily remark (quoted in Moriyasu 1983, p. 73):

It is easy to imagine that in a few years the concepts of field theory will drop totally out of the vocabulary of day-to-day work in high energy physics.

Quantum field theory (Sect. 10.1.1) and gauge theory were each plagued, individually, by major problems. While the issue of quantum field theory was related to a mathematical nightmare, the gauge theory problem was related to symmetry. It was found that any gauge invariant Lagrangian cannot contain mass terms, as they necessarily break covariance. So how can a physical system with mass be described by a gauge theory and still have properties which violate gauge invariance? The mathematical problem was, in detail, related to infinities appearing in the framework. Quantum field theory is based on perturbation theory, the idea of taking the solution to an easier problem and then adding corrections to approximate the real problem. Unfortunately, the perturbation series are divergent, assigning infinite values to measurable quantities.

It is obvious that, in order for a field theory to be at all sensible or believable, the problems raised by the divergences must be satisfactorily resolved.

Quote from Ryder (1996, p. 308). It is ironic, that at a time when experimental physics had entered a golden era, theoretical efforts, after so many promising findings, would dwindle and “the practice of quantum field theory entered a kind of ‘Dark Age”’ (Moriyasu 1983, p. 85). However, due to new technological advances—epitomized by the high-energy particle accelerator—more and more particles were discovered. Simply organizing these was a challenge. As an example, Murray Gell-Mann and others introduced new fermions, they called quarks (Gell-Mann and Ne’eman 1964). Now it was possible to categorize many of the observed particles as being composed of quarks. The quarks themselves are representations of the global symmetry group SU(3) . Gell-Mann called this classification scheme the Eightfold Way. Although alluding to the Noble Eightfold Path of Buddhism, the reference is “clearly intended to be ironic or humorous” (Kaiser 2011, p. 161).

Bearing witness to the tremendous success of deciphering the workings of reality in mathematical terms, the mathematical obstacles were overcome. The theory of renormalization, first developed for quantum electrodynamics, is a collection of techniques for dealing with the infinities of perturbative quantum field theory. The divergent parts of the theory can be tamed: the infinities are viewed as rescaling factors which can be ignored. In more detail, the mathematical manipulations related to these scale transformations can be understood in terms of what is called the renormalization group. Wrapping it all up (Peskin and Schroeder 1995, p. 466):

The qualitative behavior of a quantum field theory is determined not by the fundamental Lagrangian, but rather by the nature of the renormalization group flow and its fixed points. These, in turn, depend only on the basic symmetries that are imposed on the family of Lagrangians that flow into one another. This conclusion signals, at the deepest level, the importance of symmetry principles in determining the fundamental laws of physics.

General references are Peskin and Schroeder (1995), Ryder (1996), Cheng and Li (1996).

The solution to the problem of incorporating mass terms into a gauge-invariant theory is discussed in the following section. The details require a journey deep into the undergrowth of the abstract world.

2.1 The Higgs Mechanism

The Higgs mechanism is the mathematical machinery that allows massless gauge invariant Lagrangians to collect mass terms for their quantum fields via the notion of spontaneous symmetry breaking. It is an elaborate mathematical trick used in the standard model to regain the physical mass terms in the most “natural” way possible.

It is a formalism related to a scalar field \(\phi \) described by \(\mathcal {L}_{\text {Higgs}}\), seen in (3.11). The mass parameter \(m_H\) implicit in the scalar potential \(\mathcal {V}\), which, taking the most general SU(2) invariant form, is derived to be

$$\begin{aligned} \mathcal {V} (\phi ) = m_H^2 \bar{\phi } \phi + \lambda _H (\bar{\phi } \phi )^2, \end{aligned}$$
(4.35)

with a dimensionless coupling \(\lambda _H\) and \(\bar{\phi }\) denoting the Hermitian conjugate. In perturbation theory, \(\phi \) is expanded around the minimum of \(\mathcal {V}\), i.e.,

$$\begin{aligned} \left. \frac{\partial \mathcal {V}}{\partial \phi } \right| _{\langle \phi \rangle } = 0, \end{aligned}$$
(4.36)

where the vacuum expectation value of the field is defined as \(\langle \phi \rangle := \langle 0| \phi | 0 \rangle \). This specifies the vacuum state of the theory. The mass parameter \(m^2_H\) in (4.35) is related to spontaneous symmetry breaking. If \(m^2_H >0\), i.e., the parameter is real, this simply would describe the mass of a scalar spin-0 field. Moreover, the shape of the potential is such, that there is a single global minimum at \(\phi = 0\). However, by taking \(m^2_H\) to be negative,Footnote 8 the minimum of \(\mathcal {V}\) is shifted. Now there is a local maxima at \(\phi = 0\) and an infinite number of minima appear at

$$\begin{aligned} \bar{\phi } \phi = |\phi |^2 = -\frac{m_H^2}{2 \lambda _H} = \frac{\mu ^2}{2 \lambda _H} =: v^2, \end{aligned}$$
(4.37)

where \(m_H = i \mu \) is the imaginary mass. In summary, the infinite minima are located at \(|\phi | = v\) and the original symmetry is spontaneously broken. This is also associated with a non-zero vacuum expectation value of \(\langle \phi \rangle = v\). Now a new field can be introduced, called the Higgs field. Technically, there exists a non-zero componentFootnote 9 of the scalar field \(\phi \), such that \(\langle \phi _i \rangle = v\). The new Higgs fields, associated with a Higgs boson, is defined as

$$\begin{aligned} h(x^\nu ) = \phi _i(x^\nu )-v. \end{aligned}$$
(4.38)

This can be interpreted as follows. In perturbative field theory, a scalar field \(\phi \) is expanded about some minimum of the associated potential \(\mathcal {V}(\phi )\). If the minimum of the non-zero vacuum expectation value is chosen, the physical Higgs particle is now interpreted as quantum fluctuations of \(\phi _i\) about the value v. In essence, the Higgs field “plays the role of a new type of vacuum in gauge theory” (Moriyasu 1983, p. 120). Formally, replacing \(\phi _i\) in the appropriate places in the Lagrangian with \(h+v\), yields the much awaited mass terms appearing due to the value v entering the mathematical machinery.

The Higgs scalar \(\phi \) appears the standard model Lagrangian via the (Yukawa) coupling to the fermions, seen in (3.13). The covariant derivative \(D_\mu \), seen in (3.11), defines the kinetic quantities and takes the following form

$$\begin{aligned} \mathcal D_\mu := \partial _\mu + i g W^i_\mu \tau _i +i g^\prime B_\mu Y. \end{aligned}$$
(4.39)

Here g and \(g^\prime \) are the coupling constants introduced in (4.34). The terms \(\tau ^i\) and Y are the generators of the symmetry groups SU(2) and U(1), respectively. Finally, \(W^{i}_{\mu }\) and \(B_\mu \) are the gauge fields associated with the corresponding symmetry groups, and \(i=1,2,3\). The gauge-invariant Lagrangian, containing the field-strength tensors, reads

$$\begin{aligned} \mathcal {L} = -\frac{1}{4} B_{\mu \nu } B^{\mu \nu } -\frac{1}{4} W^i_{\mu \nu } W_i^{\mu \nu }, \end{aligned}$$
(4.40)

It can be constructed from the gauge bosons as follows

$$\begin{aligned} B_{\mu \nu } = \partial _\mu B_\nu - \partial _\nu B_\mu , \end{aligned}$$
(4.41a)
$$\begin{aligned} W^i_{\mu \nu } = \partial _\mu W^i_\nu - \partial _\nu W^i_\mu -g \varepsilon _{ijk} W^j_\mu W^k_\nu , \end{aligned}$$
(4.41b)

where \(\varepsilon _{ijk}\) is the Levi-Civita symbol in three dimensions.Footnote 10 Note that (4.41a) and (4.14) are identical expressions. In the next step, the physical boson fields are constructed from the quantities \(W_\mu ^1, W_\mu ^2 , W_\mu ^3, B_\mu \). This yields the \(W^{\pm }\) bosons (\(W^{\pm }_\mu )\), the Z boson (\(Z_\mu \)), and the photon field (\(A_\mu \)). These gauge bosons are the carriers of the electroweak force. As anticipated, these quantum fields receive mass terms, if \(\phi \) is substituted with \(h+v\) from (4.38) in the Lagrangian \(\mathcal {L}_{\text {Higgs}}\), described in (3.11) and (4.35). This is the process of spontaneous symmetry breaking and results in

$$\begin{aligned} m_{W^{\pm }} = \frac{1}{2} v g, \quad m_Z = \frac{1}{2} \sqrt{g^2 + g^{\prime 2}}, \quad m_\gamma = 0, \end{aligned}$$
(4.42)

without violating the gauge-invariance of the theory. Similarly, the fermions get their mass terms using the same substitution, i.e., also via spontaneous symmetry breaking. But now the part of the Lagrangian describing the coupling of fermions to the Higgs field is employed, seen in (3.13). The result is

$$\begin{aligned} m_{\text {leptons}} = \frac{v \lambda _l}{\sqrt{2}}, \quad m_{\text {quarks}} = \frac{v \lambda _q}{\sqrt{2}}, \end{aligned}$$
(4.43)

where \(\lambda _l\) and \(\lambda _q\) are arbitrary coupling constants. For more details see Collins et al. (1989).

Although the theoretical contraptions, described in (4.35)–(4.43), are today associated with Peter Higgs, there were many contributors. The first discovery of the ideas of symmetry breaking was made in condensed matter physics, namely in the theory of superconductivity, formalized by John Bardeen, Leon Cooper, and Robert Schrieffer. Using quantum field theory techniques (Bardeen et al. 1957), symmetry breaking properties of superconductors were uncovered. This theory of superconductivity would win the authors a Nobel Prize in 1972. Important mathematical details were also gleaned from an earlier phenomenological theory of superconductivity (Ginzburg and Landau 1950). Here, the explicit shape of the scalar potential, seen in (4.35), was introduced, and its critical dependence on the sign of the mass term noted. In 1962, Schwinger discussed gauge invariance and mass (Schwinger 1963). He suggested the following (quoted in Anderson 1963, p. 439):

[...] associating a gauge transformation with a local conservation law does not necessarily require the existence of a zero-mass vector boson.

Building on the works of Schwinger, Philip Warren Anderson spelled out the first accounts of what would later become known as the Higgs mechanism (Anderson 1963). He also incorporated the insights gained from superconductivity. There, in the theory of Bardeen et al. (1957), it was realized that the mechanism of breaking the symmetry was associated with the appearance of a new boson (Nambu 1960). These ideas could be systematically generalized within the context of quantum field theory (Goldstone et al. 1962). Anderson grappled with the technicalities related to the Goldstone theorem, which was a final hurdle in the mass generating mechanism. The term “spontaneous symmetry breaking” was introduced in Baker and Glashow (1962), to account for the fact that the mechanism does not require any explicit mass terms in the Lagrangian to violate gauge invariance. The full model was developed in the same year by three independent groups:Footnote 11 Englert and Brout (1964), Higgs (1964), Guralnik et al. (1964). However, the names Higgs mechanism and Higgs boson stuck. Indeed, the Nobel Committee, allowed to nominate a maximum of three people, only awarded François Englert and Higgs, with a Nobel Prize in 2013, after the 2012 discovery at CERN’s LHC (CERN 2013):

[...] today, the ATLAS and CMS collaborations at the Large Hadron Collider (LHC) presented preliminary new results that further elucidate the particle discovered last year. Having analyzed two and a half times more data than was available for the discovery announcement in July [2012], they find that the new particle is looking more and more like a Higgs boson, the particle linked to the mechanism that gives mass to elementary particles. It remains an open question, however, whether this is the Higgs boson of the Standard Model of particle physics, or possibly the lightest of several bosons predicted in some theories that go beyond the Standard Model. Finding the answer to this question will take time.

A general reference is Gunion et al. (2000). Here the parenthesis closes.

2.2 Tying Up Some Loose Ends

Incidentally, Yang-Mills theory also uncovered a new type geometry for physics. This understanding only became apparent in the 1970s, and helped in popularizing gauge theories. Interestingly, this new concept in physics of uniting space-time with an “internal” symmetry space had been proposed by mathematicians at nearly the same time. See, for instance Moriyasu (1983, p. 32), Schottenloher (1995, p. 8). In detail, gauge theories have the topology of a fiber bundle. This means, that at every point in space-time a Lie group G is attached; there is an internal symmetry space existing at every space-time coordinate. The group G associated with a point \(x^\nu \) is called a fiber. As a particle moves through space-time, it also follows a path through the internal spaces at each point. The gauge transformations describe how the internal spaces at different points can be transformed into each other. The tangent bundle TM , described in Sect. 3.1.1, is a specific example of a fiber bundle. More details can be found in Drechsler and Mayer (1977), Nash and Sen (1983), Coquereaux and Jadczyk (1988).

Finally, there is one peculiar historical confusion related to Noether and Weyl. It is a good reminder that the devil, as always, is in the details. Many textbooks and review articles on quantum field theory gloss over the fact, that Noether actually published two theorems in 1918. The first one, famously deals with global symmetries and conserved quantities. However, she also proved a second theorem relating to local symmetry, which, prima facie, has nothing to do with conservation laws. Brading (2002) observes that there is either no, or no detailed, discussion of the second theorem in the literature, for instance O’Raifeartaigh (1997), Vizgin (1994), Kastrup (1984), Moriyasu (1982). Notable exception are Utiyama (1959), Byers (1999), Rowe (1999). As mentioned, Weyl, working on his unified field theory of electromagnetism and gravity in 1918, independently was trying to explain the conservation of the electric charge with the notion of a local symmetry. His results, in effect, can be understood as an application of Noether’s second theorem. The confusion arises, because “the standard textbook presentation of the connection between conservation of electric charge and gauge symmetry in relativistic field theory involves Noether’s first theorem” (Brading 2002, p. 9). Although these books discuss both local and global symmetries, they do not mention her second theorem. Despite the fact that both ways of deriving the conservation of electric charge, employing local or global symmetries, are correct, the text book approach via global symmetry is somewhat misleading. There it is implied that the conservation of charge depends on the Euler-Lagrange equations of motion being fulfilled. Noether’s second theorem, and Weyl’s derivation, yields the conservation law based on local symmetry only, without the necessity of the additional constraint due the equations of motion. See Brading (2002).

The mathematical methods of renormalizing quantum field theories and the spontaneous symmetry breaking mechanism for gauge theories would help pave the way to unification, ultimately unearthing a powerful formalism describing all non-gravitational forces and matter: the standard model of particle physics.

Fig. 4.1
figure 1

The contents of the physical universe. Matter particles, called fermions due to their half-integer spin, are classified either as quarks or leptons and come in three generations. The six types of quarks are labeled according to their flavor, up (u), down (d), charm (c), strange (s), top (t), and bottom (b) and are the constituents of composite particles (such as protons and neutrons) . The muon (\(\mu \)) and tau (\(\tau \)) can be understood as heavier versions of the electron (e), each coming with an associated neutrino (\(\nu \)). The three non-gravitational forces are associated with spin-1 gauge bosons, where the photon (\(\gamma \)) mediates the electromagnetic force, gluons (g) the strong nuclear force, and the Z and \(W^\pm \) bosons the weak force. The Higgs particle (h), a scalar spin-0 boson, is associated with the phenomena of mass. The graviton (\(\mathcal {G}\)) is the hypothetical quantum particle associated with gravity which, up-to-date, has not been detected. The elementary particlues represented by gray circles are massless, and each particle comes with an electric charge, given by the number associated with it on the upper right side. Next to these particles there also exists an elusive mirror world of antiparticles, or antimatter, with identical properties but opposite charge

3 The Road to Unification

The road to unification has been a rocky one. Unification is the epitome of human understanding of reality. What appear as independent phenomena, described by fragmented theories, suddenly become united in a unified framework. It is the ultimate act of translation seen in Fig. 2.1: superficially separate properties of the natural world are encoded and merged into a single formal description. In essence, from the multifarious complexity of nature the formal essence is distilled, a unified theory of phenomena. Such an over-aching structure of knowledge has the power to unlock new and unexpected understanding of the workings of nature. This is why, in physics, the ultimate unified field theory describing all fundamental forces and elementary particles is, grandiosely, known as “the theory of everything.” It should be noted, however, that here the context of “everything” excludes emergent complexity, discussed in Chap. 6, and the fact that a conscious entity, the physicist, is doing the inquiring, covered in Chaps. 11 and 14. Nevertheless, this version of the theory of everything tries to explain all observable phenomena related to the fundamental workings of reality. In detail, it should explain all four known forces and describe the behavior of all elementary particles and antiparticles. What this all amounts to can be seen in Fig. 4.1.

In the history of physics there were a few instances where different abstract formalisms representing unrelated aspects of the world could be fused into a single conceptual formalism. For instance, Maxwell’s insight that light was an electromagnetic wave, unifying the fields of optics and electromagnetism. Or the fusion of thermodynamics with statistical mechanics (Gibbs 1884, 1902). In a sense, special relativity can be understood as the merger of electromagnetism with the laws of classical mechanics (Einstein 1905b), and general relativity as the synthesis of inertial and gravitational forces (Einstein 1915).

3.1 Jumping to Higher Dimensions

However, the first unification success regarding the forces of nature goes back to Maxwell. The theory of electromagnetism is a classical unified field theory. As is inherent in its name, the two separate phenomena of electricity and magnetism can be understood as a new single force. Formally, the introduction of the 4-vector potential \(A_\mu \) of (4.12) is enough to derive the following quantities:

  1. 1.

    The electric and magnetic fields: (4.13).

  2. 2.

    The corresponding field-strength tensor: (4.14) and (4.41a).

  3. 3.

    Maxwell’s equations: (2.4).

Although Weyl, as discussed, was successful in spawning the idea of gauge theory, his unification scheme marrying electromagnetism with gravity ultimately failed (Weyl 1918, 1929). However, this approach would eventually lead to the unification of all known forces. Moreover, gauge theory also naturally incorporates matter particles next to particles mediating the interaction. In detail, matter is represented as operator-valued spin one-half fermion quantum fields (spinors) in the Lagrangian, as seen in (3.8), and the force carrying bosonic quantum fields appear by virtue of the gauge-invariant derivative, described in (4.39). This dichotomy between matter and forces was a major problem at the time, as the attempts to unify gravity and the electromagnetic force focused on incorporating matter as classical fields obeying the Schrödinger equation (3.24) or the Dirac equation (3.41) or (3.42). An additional problem was that up to “the 1940s the only known fundamental interactions were the electromagnetic and the gravitational, plus, tentatively, something like the ‘mesonic’ or ‘nuclear’ interaction” (Goenner 2005, p. 303). In effect, lacking the correct quantum field formalism and missing crucial experimental observations, people embarked on the futile quest of unification. To make matters worse, general relativity substitutes the notion of a gravitational field with an elaborate geometry of space-time. To summarize, general relativity is formulated in a (pseudo) Riemannian space-time, with zero torsion and non-vanishing curvature. Torsion and curvature are two natural defining properties of differentiable manifolds, where torsion is related to the twisting of space-time.

Formally, torsion is defined as a tensor

$$\begin{aligned} T(X,Y) = \nabla _X Y - \nabla _Y X - [X,Y], \end{aligned}$$
(4.44)

where XY are two vector fields on the manifold and \(\nabla _X\) , related to (4.8), computes the covariant derivative of a vector field in the direction of X. The Lie brackets, introduced in (3.19), are now also functions of vector fields. For a basis \(\varvec{e}_i\) one finds

$$\begin{aligned} \nabla _{\varvec{e}_i} \varvec{e}_j = \Gamma _{ {k} i j}^{k} \varvec{e}_k, \end{aligned}$$
(4.45)

with the Christoffel symbols seen in (4.9). Thus the components of the torsion tensor are

$$\begin{aligned} T_{ {k} i j}^{k} = \Gamma _{ {k} i j}^{k} - \Gamma _{ {k} j i}^{k} - f_{ {k} i j}^{k}, \end{aligned}$$
(4.46)

for non-vanishing structure constants \(f_{ {k} i j}^{k}\). For more details, see Nomizu and Sasaki (1994). Similarly, the Riemann curvature tensor is found to be

$$\begin{aligned} R(X,Y) = [\nabla _X, \nabla _Y]- \nabla _{[X,Y]}. \end{aligned}$$
(4.47)

This can also be expressed componentwise as \({R^\rho }_{\sigma \mu \nu }\), the quantity appearing in (3.13), by utilizing the \(\Gamma _{ {\rho } \sigma \mu }^{k}\) and their derivatives. See, for instance Misner et al. (1973, p. 224).

The equations defining torsion and curvature nicely illustrate how the concepts of group theory, namely the Lie brackets, enter into the language of geometry. In a general sense, general relativity is only one manifestation of possible space-time structures, with \(T=0, R\ne 0\). Varying these parameters categorizes different space-times and gravitational theories. The case \(T \ne 0, R\ne 0\) yields the Riemann-Cartan space-time, which can be associated with a gauge theory of gravity à la Yang-Mills theory, for instance, the gauging of the Lorentz group (Utiyama 1956). This was later extended to a gauged version of the Poincaré groupFootnote 12 (Kibble 1961; Sciama 1962, 1964). Setting the curvature to zero in the Riemann-Cartan space-time, uncovers Weitzenböck space-time with \(T \ne 0, R= 0\), a variant Einstein would later work on, as detailed in Sect. 4.3.3. More details are found in Gronwald and Hehl (1996). Generalizing the idea of geometrization was the main avenue for unification at the time. A wealth of details on the history of unified field theories, including an extensive bibliography, can be found in Goenner (2004). A shorter version is Goenner (2005).

Until 1928, Einstein only reacted to the new ideas advanced by others. One notable and bold idea was proposed by the mathematician Theodor Kaluza. He picked up on an obscure theory aimed at unifying gravity with electromagnetism (Nordström 1914). The concept is simple as it is mysterious: space-time is assumed to be five-dimensional, i.e., comprised of four spacial and one temporal dimension. Kaluza communicated these new ideas to Einstein in 1919, who was initially very supportive. “At first glance I like your idea enormously;” and “The formal unity of your theory is startling” (Goenner 2004, p. 44). Kaluza had achieved to show that electromagnetism is a consequence of general relativity in five dimensions. The metric tensor \(g_{\mu \nu }\), from which the curvature of space-time is derived by virtue of (3.13) and (4.47), is extended by one dimension as follows

(4.48)

where the indices M and N run from 1 to 5, and \(A_\mu \) is the vector potential of (4.12), incorporated with a proportionality factor c. The component \(g_{55} = \phi \) is a new scalar gravitational potential. While promising, this extra-dimensional framework was plagued by inconsistencies. Moreover, could there really be any physical reality at the heart of this idea transcending human perception? Although Kaluza published his work in 1921 (Kaluza 1921), Einstein would remain silent on these matters until 1926. In that year, the physicist Oskar KleinFootnote 13 reawakened the interest in Kaluza’s ideas (Klein 1926). He not only linked quantum mechanics to the machinery of general relativity in five dimensions, crucially, he was able to give a physical interpretation of the extra dimension. This idea is today known as compactification, or dimensional reduction. If the extra dimension is “curled up” tight enough it becomes undetectable from our familiar slice of reality. Only at sufficiently large energies, the three-dimensional world unveils its richer structure due to the additional compactified dimensions. Today, modern versions of Kaluza-Klein theories can go up to 26 dimensions.Footnote 14

Back in 1926, Klein had only to grapple with one additional compactified dimension. He imposed a simple topology on the higher-dimensional space-time structure. Instead of simply using a five-dimensional Minkowski space \(M^5\), he assumed a product space \(M^4 \times S^1\), i.e., the product of a four-dimensional Minkowski space and a circle. If the radius of the circle is small enough, our reality appears four dimensional. A lucid discussion of how this idea might be possible can be found in Einstein and Bergmann (1938). The special topology Kaluza chose means that Kaluza-Klein theories have a similar geometric structure to gauge theories. Recalling that in gauge theory an internal symmetry space is attached at each point in space-time giving it the structure of a fiber bundle, now, in Kaluza and Klein’s version, a multi-dimensional compactified space, consisting of the “curled up” dimensions, resides at each point of physical reality.

To illustrate, a quantum field \(\psi (x^M)\), with \(x^M:=(x^{\mu },y)\), where \(x^\mu \) is the usual space-time coordinate, is constrained as follows

$$\begin{aligned} \psi (x^\mu , y) = \psi (x^\mu , y + 2\pi r), \end{aligned}$$
(4.49)

where the scale parameter r gives the “radius” of the fifth dimension. Expanding \(\psi \) in a Fourier series yields

$$\begin{aligned} \psi (x^\mu , y) = \sum _{n = -\infty }^{\infty } \psi _n (x^\mu ) e^{iny/r}. \end{aligned}$$
(4.50)

In the context of quantum mechanics, one can now identify the y-component of a state with given n as being associated with the momentum \(p=|n| /r\). Thus, for a sufficiently small r, only the \(n=0\) state will appear in the low-energy world we live in. As a result, all observed states will be independent of y

$$\begin{aligned} \frac{\partial \psi }{\partial y} \approx 0. \end{aligned}$$
(4.51)

If the radius of compactification r is of the order of the Planck length \(l_p \approx 1.6 \times 10^{-35}\) m, the masses associated with the higher modes (\(n\ne 0\)) would be of the order of the Planck mass \(m_p \approx 2.2 \times 10^{-8}\) kg (Collins et al. 1989, p. 295), removing the effects of the higher-dimensional space-time structure from current technological possibilities. What is today known as Kaluza-Klein theory is in fact an amalgamation of different contributions by both scientists. A detailed account of their various contributions can be found in Goenner and Wünsch (2003).

A modern version of Kaluza-Klein theory can, for instance, be found in Kaku (1993), Collins et al. (1989). Now

$$\begin{aligned} g_{MN} (x^\mu , y) = \sum _{n} g_{MN}^{(n)} (x^\mu ) e^{iny/r}, \end{aligned}$$
(4.52)

with

(4.53)

The scalar field \(\phi \) appearing in the theory is known as a dilaton or a radion. So the five-dimensional metric \(g_{MN}\) can be decomposed in four space-time dimensions as the metric tensor of gravity \(g_{\mu \nu }\), a massless spin-1 photon \(A_\mu \), and a predicted massless scalar \(\phi \). However, interpreting \(\phi \) as a physical particle was very radical at the time and most researchers tried to eliminate it. “[...] this prediction seems to have embarrassed the early writers; predicting a new particle [...] was not so accepted in those days” (Green et al. 2012a, p. 15). Finally, inserting (4.53) into the five-dimensional version of (3.13), yields the five-dimensional Lagrangian \(\mathcal {L}_{\text {GR}}^{(5)}\), which can be simplified as

$$\begin{aligned} \mathcal {L}_{\text {KK}} \sim \sqrt{- \text {det}(g_{\mu \nu })} \left( R + \frac{1}{4} \phi F_{\mu \nu } F^{\mu \nu } + \mathcal {F} (\phi ) \right) . \end{aligned}$$
(4.54)

In essence, the Kaluza-Klein Lagrangian unifies general relativity, expressed as \(\mathcal {L}_{\text {GR}}\) in (3.13) and Maxwell’s theory of electromagnetism encoded as \(\mathcal {L}_{\text {EM}}\) in (3.7). The equation for the scalar field is encapsulated in the function \(\mathcal {F}\).

Some authors did, however, take the prediction of Kaluza-Klein theory seriously and accepted the reality of the scalar field \(\phi \). In their eyes, five-dimensional general relativity is reduced to a scalar-tensor theory of gravity. Such an extension of general relativity was proposed in Brans and Dicke (1961). In the version of Brans-Dicke, the metric tensor \(g_{\mu \nu }\) is paired with a scalar dilaton field \(\phi \). The physical justification for such a theory came from the desire to make Einstein gravity more Machian. This was achieved by promoting the gravitational constant G to become a dynamical variable. This constant appears in Newton’s law of universal gravitation

$$\begin{aligned} F = G \frac{m_1 m_2}{r^2}, \end{aligned}$$
(4.55)

describing the force F between two masses, \(m_1\) and \(m_2\), separated at a distance r. It is also featured in Einstein’s field equations of general relativity, sketched at in (3.14)

$$\begin{aligned} G_{\mu \nu } = {8 \pi G \over c^4} T_{\mu \nu }, \end{aligned}$$
(4.56)

where c denotes the speed of light in a vacuum. Hence, in Brans-Dicke gravity, the following substitution is made

$$\begin{aligned} G \rightarrow \phi (x^\nu ), \end{aligned}$$
(4.57)

where the dynamical field \(\phi \) is dependent on the position in space-time (Peacock 1999), yielding a theory closer to the ideas of Mach.

3.2 The Advent of String Theory

The study of string theory has become one of the main focuses within theoretical physics. Its proponents hail it the only viable candidate for a “theory of everything.” While Einstein and others had hoped to achieve such a feat by staying faithful to the paradigm of geometrodynamics, modern attempts at unification propose that gravity should also undergo the treatment of quantization, forging a theory of quantum gravity. However,

Quantum gravity has always been a theorist’s puzzle par excellence.

Experiment offers little guidance except for the bare fact that both quantum mechanics and gravity do play a role in natural law.

The real hope for testing quantum gravity has always been that in the course of learning how to make a consistent theory of quantum gravity one might learn how gravity must be unified with other forces.

All three quotes from Green et al. (2012a, p. 14). In this respect, string theory has a lot to offer and, indeed, ties together some of the ideas emerging from the early attempts in constructing a unified field theory (Green et al. 2012a, p. 14):

The earliest idea and one of the best ideas ever advanced about unifying general relativity with matter was Kaluza’s suggestion in 1921 that gravity could be unified with electromagnetism by formulating general relativity not in four dimensions but in five dimensions.

Nonetheless, string theory was in fact discovered by accident. Edward Witten, arguably the most important contributor to the enterprise, once remarked (quoted in Penrose 2004, p. 888):

It is said that string theory is part of twenty-first-century physics that fell by chance into the twentieth century.

The evolution of this theory also had many twists and turns. Originally, string theory models were proposed to describe the strong nuclear force in the late 1960s, known as dual resonance models. These developments started with (Veneziano 1968). In 1970, it was independently realized by Yoichiro Nambu, Leonard Susskind, and Holger Bech Nielsen, that the equations of this theory should, in fact, be understood as describing one-dimensional extended objects, or strings (Schwarz 2000). The first manifestation of these ideas is known as bosonic string theory, living in 26-dimensional space-time. See, for instance Polchinski (2005a). One year later, a string theory model for fermions was proposed (Ramond 1971; Neveu and Schwarz 1971). However, these theories, aimed at describing hadrons, i.e., composite particles comprised of quarks held together by the strong force, were competing with another theory which was rapidly gaining popularity. By 1973, quantum chromodynamics had become an established and successful theory describing hadrons. It was formulated as a Yang-Mills gauge theory with a SU(3) symmetry group, capturing the interaction between quarks and gluons, the gauge bosons in the theory. A special property, called asymptotic freedom (Politzer 1973; Gross and Wilczek 1973), was instrumental in developing the theory, winning a Nobel prize in 2004. Unsurprisingly, in the wake of quantum chromodynamics, the string model became an oddity within theoretical physics.

Strings and Gravity

In 1974, things changed for string theory. It was known that the theory contained a massless spin-2 particle. “This had been an embarrassment with the original ‘hadronic’ version of string theory, since there is no hadronic particle of this nature” (Penrose 2004, p. 891). Instead of trying to eliminate this unwanted particle, a simple acceptance lead to profound consequences: it was identified as the graviton, the elusive quantum particle of gravity (Yoneya 1974). Although general relativity does not admit a force carrying particle for the propagation of gravitational interactions—missing a quantum gauge boson of gravity—due to the fact that the space-time curvature per se encodes the gravitational dynamics, a straightforward quantization scheme of gravity is the following. The metric can be expanded as

$$\begin{aligned} g_{\mu \nu } = \eta _{\mu \nu } + h _{\mu \nu }, \end{aligned}$$
(4.58)

where \(\eta _{\mu \nu }\) is the metric of flat Minkowski space and \(h _{\mu \nu }\) represents the excitation of the gravitational quanta. By inserting this new quantity into Einstein’s field equations, a wave equation can be derived that corresponds to the propagation of a massless spin-2 particle, identifying \(h _{\mu \nu }\) as the graviton (Collins et al. 1989). This made the next step in the evolution of string theory obvious and its original purpose, as theory of hadrons, was abandoned. “The possibility of describing particles other than hadrons (leptons, photons, gauge bosons, gravitons, etc.) by a dual model is explored” (Scherk and Schwarz 1974, abstract).

Unexpectedly, string theory had suddenly become an exciting candidate for a “theory of everything,” as it had “[...] the remarkable property of predicting gravity” (Witten 2001, p. 130). Indeed, as up to then the merger of gravity with quantum physics proved to be such an intractable and elusive puzzle, this was big news:

[...] the fact that gravity is a consequence of string theory is one of the greatest theoretical insights ever.

Again, Witten quoted in Penrose (2004, p. 896). Unfortunately, at the time not many physicists took the idea seriously. It would take another ten years before string theory would experience the next advancement in its evolution: an event that would propel it into the limelight of theoretical physics. After 1984, string theory was transformed into one of the most active areas of theoretical physics. See, for instance Bradlyn (2009), for a chart of the number of string theory papers published per year from 1973 onward, as cataloged by the ISI Web of Science. Or Google’s Ngram Viewer,Footnote 15 which “charts the yearly count of selected n-grams (letter combinations) or words and phrases, as found in over 5.2 million books digitized by Google Inc (up to 2008)”.Footnote 16 It is also interesting to graph comma-separated phrases in comparison: “string theory, loop quantum gravity.” This clearly illustrates the predominance of string theory over other proposed “theories of everything,” like loop quantum gravity. While string theory is a theory of quantum gravity originating in the paradigm of quantum field theory, loop quantum gravity has its foundation in general relativity. See, for instance, Smolin (2001) for a popular account of the various paths to quantum gravity, and (Giulini et al. 2003) for a technical one. For a general discussion of loop quantum gravity, consult Sect. 10.2.3.

Supersymmetry

In the early 1980s it was realized, that by introducing a crucial novel element into the string theory formalism, some pressing problems could be solved. Inadvertently, a powerful new level of descriptive power would emerge. This missing element was associated with a novel symmetry property, called supersymmetry. Historically, it was originally developed as a symmetry between hadrons, namely a symmetry relating mesons (a composition of a quark and an anti-quark) to baryons (made up of three quarks, like the neutron and proton) (Miyazawa 1966). “Unfortunately, this important work was largely ignored by the physics community” (Kaku 1993, p. 663). Only in 1971, a refined version of supersymmetry was independently discovered from two distinct approaches. In the early version of fermionic string theory (Ramond 1971; Neveu and Schwarz 1971) a new gauge symmetry was discovered, from which supersymmetry was derived (Gervais and Sakita 1971). The second approach was based on the idea of extending the Poincaré algebra described in (3.52), resulting in the super-Poincaré algebra (Gol’fand and Likhtman 1971). Then, in 1974, the first four-dimensional supersymmetric quantum field theory was developed (Wess and Zumino 1974). Even ten years before it would have a fertilizing effect on string theory, supersymmetry was understood as a remarkable symmetry structure in and of itself, fueling advancements in theoretical physics. In essence, it is a symmetry eliminating the distinction between bosons and fermions. Now matter particles—fermions described by spinors with 720\(^{\circ }\) rotational-invariance—and force mediating particles—the gauge bosons emerging from the covariant derivatives in the Lagrangian, with 360\(^{\circ }\) rotational-invariance—lose their independent existence in the light of supersymmetry. It also turns out that supersymmetry is the only known way to unify internal gauge symmetries with external space-time symmetries, a marriage otherwise complicated by the Coleman-Mandula theorem (Coleman and Mandula 1967). There is, however, a heavy phenomenological price to pay for the mathematical elegance of supersymmetry. The number of existing particles has to be doubled, as each matter fermion and gauge boson must have a supersymmetric partner, conjuring up a mirror world of Fig. 4.1.

Formally, there exists an operator Q, which converts bosonic states into fermionic ones, and vice versa. Symbolically, \(Q|B\rangle = |F\rangle \).

Infinitesimally, supersymmetric transformations Q can be expressed in group theoretic terms described in (3.30), similarly to the example given for the Lorentz group in (3.30)

$$\begin{aligned} \delta ^{\text {SUSY}} \Phi = i \varepsilon Q \Phi , \end{aligned}$$
(4.59)

where the super-multiplet \(\Phi \) contains all the matter and gauge fields and spans a representation of the supersymmetric algebra associated with Q and \(\varepsilon \) is the usual parametrization parameter. From the Poincaré algebra the super-Poincaré algebra can be constructed by adding Q to the old (bosonic) commutation relations seen in (3.52). The new (fermionic) sector of the algebra is now given by anticommutation relations for the Q, similar to (3.16), which are defined as

$$\begin{aligned} \left\{ X, Y\right\} = XY + YX. \end{aligned}$$
(4.60)

In mathematical terms, the basic tools to construct supersymmetric extensions of the Poincaré algebra are called Clifford algebras (Varadarajan 2004). These are algebras defined via specific anticommutation relations. The operator Q transform themselves as a 2-component Weyl spinor under Lorentz transformations. This means that the usual four-dimensional theory is broken down to two dimensions via the Pauli matrices \(\sigma ^i, i=1,2,3\). Mathematically, the Pauli matrices are related to the Dirac matrices \(\gamma ^\mu \), introduced in Sect. 3.2.2.1, (given in the Weyl representation) as follows

$$\begin{aligned} \gamma ^\mu = \begin{pmatrix} 0&{} \sigma ^\mu \\ \bar{\sigma }^\mu &{}0 \end{pmatrix}, \end{aligned}$$
(4.61)

with \(\sigma ^\mu := ( \mathbbm {1}_2, \varvec{\sigma })\) and \(\bar{\sigma }^\mu := ( \mathbbm {1}_2, -\varvec{\sigma })\), where \(\mathbbm {1}_2\) is the two dimensional identity matrix. Note that the bar is just a notational convention and does not denote the Hermitian conjugate of a matrix, as the previous usage of the symbol could imply. Dirac and Pauli matrices are defined by the virtue of anticommutators

$$\begin{aligned} \begin{aligned} \{\sigma ^i, \sigma ^j\}&= 2 \delta ^{i j} \, \mathbbm {1}_2; \quad i,j=1,2,3,\\ \{ \gamma ^\mu , \gamma ^\nu \}&= 2 g^{\mu \nu } \, \mathbbm {1}_4; \quad \mu ,\nu =0, \dots , 3, \end{aligned} \end{aligned}$$
(4.62)

with the flat Minkowski metric \(g^{\mu \nu }\) and the Kronecker delta. For more details, see, for instance Peskin and Schroeder (1995). Just as the \(\gamma ^\mu \) are associated with a 4-component spinor representation of the Lorentz group generators via (3.48), the Pauli matrices give a 2-component (Weyl) spinor representation. Now each point \(x^\mu \) in space-time is associated with a matrix X by virtue of the Pauli matrices

$$\begin{aligned} x^\mu \leftrightarrow X := \sigma _\mu x^\mu . \end{aligned}$$
(4.63)

The action of the Lorentz group on a Weyl spinors is captured by the following

$$\begin{aligned} x^{\prime \mu } = \Lambda ^\mu _\nu x^\nu \leftrightarrow X^\prime := \mathcal {M} X \mathcal {M}^*, \end{aligned}$$
(4.64)

where \(\mathcal {M}^*\) denotes the Hermitian conjugate.Footnote 17 It should be noted that \(\mathcal M \in SL(2,\mathbb {C})\), establishing a relationship between the Lorentz group and SL(2,\(\mathbb {C}\)). See, for instance Sternberg (1999). A Weyl spinors transforms under this representation as

$$\begin{aligned} \psi _\alpha \rightarrow \psi _\alpha ^\prime = \mathcal M_\alpha ^{ {\alpha }\beta } \psi _\beta , \quad \bar{\psi }_{\dot{\alpha }} \rightarrow \bar{\psi }_{\dot{\alpha }}^\prime = \mathcal M_{\dot{\alpha }}^{*\,\, \dot{\beta }} \bar{\psi }_{\dot{\beta }}. \end{aligned}$$
(4.65)

In other words, there are two Weyl spinors associated with the two possible representations of the Lorentz group, \(\mathcal {M}\) and \(\mathcal {M}^*\). They are either labeled with the indices \(\alpha , \beta , \dots \), or the dotted indices \(\dot{\alpha }, \dot{\beta }, \dots \), which run from one to two. Again, the bar is simply a notational convention associated with quantities carrying dotted indices and does not denote the Hermitian conjugate of a 2-component spinor \(\psi _\alpha \). These computations imply that for supersymmetry there also exist two operators: \(Q_\alpha \) and \(\bar{Q}_{\dot{\alpha }}\). Now the novel fermionic sector of the super-Poincaré algebra can be defined via the following anticommutators which are added to the set of equations seen in (3.52)

$$\begin{aligned} \{ Q_\alpha , \bar{Q}_{\dot{\beta }} \} = 2 [\sigma ^\mu ]_{\alpha \dot{\beta }}P_\mu . \end{aligned}$$
(4.66)

Note that \([\sigma ^\mu ]_{\alpha \dot{\beta }} := (\mathbbm {1}_2, - \sigma _i)_{\alpha \dot{\beta }}\). All other combinations of \(Q_\alpha \) and \(\bar{Q}_{\dot{\alpha }}\) are trivial. Finally, the combination of the fermionic and bosonic sector needs to be specified. The only non-zero relations are

$$\begin{aligned} \begin{aligned}{}[ M^{\mu \nu }, Q_\alpha ]&= i [\sigma ^{\mu \nu }]_\alpha ^{ {\alpha }\beta } Q_\beta , \\ [ M^{\mu \nu }, \bar{Q}^{\dot{\alpha }} ]&= i [\bar{\sigma }^{\mu \nu }]^{\dot{\alpha }}_{ {\alpha } \dot{\beta }} \bar{Q}^{\dot{\beta }}. \end{aligned} \end{aligned}$$
(4.67)

See Wess and Bagger (1992) for more details. The matrices \(\sigma ^{\mu \nu }\) and \(\bar{\sigma }^{\mu \nu }\) are the generators of the Lorentz transformations for Weyl spinors. They can be expressed using the Pauli matrices \(\sigma ^\mu \). It holds that

$$\begin{aligned} \begin{aligned}{}[\sigma ^{\mu \nu } ]_\alpha ^{ {\alpha }\beta }&= \frac{1}{4} \left( [\sigma ^\mu ]_{\alpha \dot{\gamma }} [\bar{\sigma }^\nu ]^{\dot{\gamma }\beta } - [\sigma ^\nu ]_{\alpha \dot{\gamma }} [\bar{\sigma }^\mu ]^{\dot{\gamma }\beta } \right) , \\ [\bar{\sigma }^{\mu \nu } ]^{\dot{\alpha }}_{ {\alpha } \dot{\beta }}&= \frac{1}{4} \left( [\bar{\sigma }^\mu ]^{\dot{\alpha }{\gamma }} [\sigma ^\nu ]_{ \gamma \dot{\beta }} - [\bar{\sigma }^\nu ]^{\dot{\alpha }\gamma } [\sigma ^\mu ]_{ \gamma \dot{\beta }} \right) , \end{aligned} \end{aligned}$$
(4.68)

where \([\bar{\sigma }^\mu ]^{\dot{\alpha }\beta } = (\mathbbm {1}_2, + \sigma _i)^{\dot{\alpha }\beta }\). The spinor representation, whose matrices are derived from the Dirac matrices in (3.48), is related to the Pauli matrices as follows

$$\begin{aligned} \Sigma ^{\mu \nu } = \frac{i}{4} \begin{pmatrix} \sigma ^\mu \bar{\sigma }^\nu - \sigma ^\nu \bar{\sigma }^\mu &{}0\\ 0&{}\bar{\sigma }^\mu {\sigma }^\nu - \bar{\sigma }^\nu {\sigma }^\mu \end{pmatrix}. \end{aligned}$$
(4.69)

More details can be found in Bilal (2001). Finally, extended supersymmetry algebras are possible, with \(Q^A_{\alpha }\) and \(\bar{Q}^B_{\dot{\beta }}\), where \(A,B = 2, \dots , N\), see Wess and Bagger (1992).

Supersymmetry can exist in various space-time dimensions. However, eleven is the maximal number of dimensions in which a consistent supersymmetric theory can be formulated in Nahm (1978).

Supergravity

From the mid-1970s to the mid-1980s string theory lay dormant amongst the exciting developments related to supersymmetry, and only a handful of dedicated people kept it alive. Gell-Mann, shortly before his 80th birthday, reflected on this as follows in an interview (Siegfried 2009):

I didn’t work on string theory itself, although I did play a role in the prehistory of string theory. I was a sort of patron of string theory—as a conservationist I set up a nature reserve for endangered superstring theorists at Caltech, and from 1972 to 1984 a lot of the work in string theory was done there. John Schwarz and Pierre Ramond, both of them contributed to the original idea of superstrings, and many other brilliant physicists like Joel Sherk and Michael Green, they all worked with John Schwarz and produced all sorts of very important ideas.

However, at the same time, the idea of supersymmetry was uncovering important novel insights. “Perhaps one of the most remarkable aspects of supersymmetry is that it yields field theories that are finite to all orders in perturbation theory” (Kaku 1993, p. 664). This makes the heavy machinery of renormalization redundant. And, as many times before in the history of physics, tinkering with the mathematical formalism would uncovered new ideas and powerful tools that had the power to unlock new and unexpected knowledge. In the early years, supersymmetry was understood as a global symmetry. Taking the promising step of gauging supersymmetry, that is, by reconstructing it as a local gauge symmetry, a new type of gauge theory emerged. This new theory, called supergravity, is a supersymmetric theory inevitably accommodating gravity (Freedman et al. 1976). Only two years after string theory was given a new twist as “theory of everything,” another viable candidate for quantum gravity had been discovered, fascinating the community of theoretical physicists. Not long after the discovery of the elven-dimensional limit to supersymmetry (Nahm 1978), it was realized in Cremmer et al. (1978) “that supergravity not only permits up to seven extra dimensions but in fact takes its simplest and most elegant form when written in its full eleven-dimensional glory” (Duff 1999, p. 1). Supergravity would provide the impetus for a revival of Kaluza-Klein theory. This allowed \(D=11, N=1\) supergravity to be compactified to a \(D=4, N=8\) theory (Cremmer and Julia 1979), where \(N>1\) describes the extended supersymmetry algebra and D denotes the dimensions of space-time. In an influential paper, Witten proved that the structure of the associated four-dimensional gauge-group is actually determined by the structure of the isometry group—the set of all distance-preserving maps—of the compact seven-dimensional manifold \(\mathcal {K}\) (Witten 1981). He showed, “what to this day seems to be merely a gigantic coincidence, that seven is not only the maximum dimension of \(\mathcal {K}\) permitted by supersymmetry but the minimum needed for the isometry group to coincide with the standard model gauge group \(SU(3)\times SU(2) \times U(1)\)” (Duff 1999, p. 2). The next steps were the development of \(N=8\) supergravity with SO(8) gauge symmetry in \(D=4\) anti-de Sitter spaceFootnote 18 or \(AdS_4\) (De Wit and Nicolai 1982), and its extension to eleven dimensions, compactified on a seven dimensional sphere \(S^7\) which admits an SO(8) isometry (Duff and Pope 1983). Indeed, the compactification from eleven-dimensional space-time to \(AdS_4 \times S^7\) could be shown to be the result of spontaneous compactification (Cremmer and Scherk 1977). These were certainly very promising developments. Indeed, so much so, that a then 38-year-old Stephen Hawking was tempted in 1980, in his inaugural lecture as Lucasian professor of mathematics at the University of Cambridge, England (Hawking 1980), to divine that \(N=8\) supergravity was the definite “theory of everything” (Ferguson 2011). Indeed (as quoted in Ferguson 2011, p. 5):

He [Hawking] said he thought there was a good chance the so-called Theory of Everything would be found before the close of the twentieth century, leaving little for theoretical physicists like himself to do.

The First Superstring Revolution

Alas, things turned out quite differently and supergravity did not fulfill its promising claims. “We therefore conclude that, despite the initial optimism, \(N=8\) supergravity theory is not theoretically or phenomenologically satisfactory” (Collins et al. 1989). Perhaps the most damning problem was the reappearance of non-renormalizability. The infinities meticulously removed from quantum field theory returned to render the theory of supergravity useless. Indeed, all known quantum theories of spin-2 particles, meaning the elusive gravitons, are now known to be non-renormalizable. Yet again, gravity and quantum physics refuse to cooperate. That is, as quantum theories of point particles. This opened up a loophole for string theory, as its theoretical machinery never touched the notion of particles and rested on extended one-dimensional, vibrating strings. Some general references are Wess and Bagger (1992), Buchbinder and Kuzenko (1998), Duff (1999).

Unsurprisingly, the pendulum of interest slowly swung back to string theory in the beginning of the 1980s. A major driving force was the introduction of the newly discovered idea of supersymmetry to the framework, unleashing superstring theory.Footnote 19 An earlier modification to the original Ramond and Neveu-Schwarz models was conjectured to harbor supersymmetry (Gliozzi et al. 1977), which was proved by Green and Schwarz (1981). Unfortunately, these superstring theories appeared to be inconsistent (Alvarez-Gaume and Witten 1984), plagued by anomalies. Then, later in 1984, an avalanche was triggered by some notable developments. For one, a method was found to cancel the anomalies by assigning the gauge group of the theory to be SO(32) or \(E_8 \times E_8\) (Green and Schwarz 1984). Moreover, a new superstring theory was introduced, called heterotic string theory (Gross et al. 1985). The “first superstring revolution” was ignited. In the words of Witten (2001, p. 130):

Since 1984, when generalized methods of “anomaly” cancellation were discovered and the heterotic string was introduced, one has known how to derive from string theory uncannily simple and qualitatively correct models of the strong, weak, electromagnetic, and gravitational interactions.

Also Hawking realized the potential, aligning his prophecy (Ferguson 2011, p. 213f.):

In June 1990, ten years after his inaugural lecture as Lucasian Professor, I asked him [Hawking] how he would change his Lucasian lecture, were he to write it over again. Is the end in sight for theoretical physics? Yes, he said. But not by the end of the century. The most promising candidate to unify the forces and particles was no longer the \(N=8\) supergravity he’d spoken of then. It was superstrings, the theory that was explaining the fundamental objects of the universe as tiny, vibrating strings, and proposing that what we had been thinking of as particles are, instead, different ways a fundamental loop of string can vibrate. Give it twenty or twenty-five years, he said.

To summarize, five consistent string theories have been developed, living in ten-dimensional space-time. In the low-energy limit, they reduce to \(N=1,2\), \(D=10\) supergravity of point particles. String theory is a bizarre contraption. It alludes to outlandish realms of reality, like ten-dimensional space-time and a mirror world of supersymmetric particles laying latent in the undiscovered weaves of the fabric of reality. It is built up of an extraordinarily vast and abstract formal machinery, blurring the borders between mathematics and physics, as was discussed in Sect. 2.1.4. Yet, at its heart, it has a surprisingly simple and colorful intuition attached to it (Greene 2013, p. 146):

What appear to be different elementary particles are actually different “notes” on a fundamental string. The universe—being composed of an enormous number of these vibrating strings—is akin to a cosmic symphony.

String theory has the potential to unify all known forces within a single framework. The theory could support a marriage between gravity and quantum mechanics, by offering a theory of quantum gravity which is not plagued by unwanted infinities. Finally, it can accommodate the symmetries of the standard model. The six extra spacial dimensions are compactified on special geometries, called Calabi-Yau manifolds (Candelas et al. 1985). They are shapes, wrapping the additional dimensions into tiny packages located at each point in four-dimensional space-time, which reside at length scales not accessible to current experimental probes. Basically, Calabi-Yau manifolds are similar to fiber bundles. Moreover, as the strings still vibrate in all ten dimension after compactification, “the precise size and shape of the extra dimensions has a profound impact on string vibrational patterns and hence on particle properties” (Greene 2004, p. 372). As an example, if the Calabi-Yau spaces have a topology with three holes, as a result, there will be three families of elementary particles (fermions) , as seen in Fig. 4.1. Technically, the number of particle generations is one half of the Euler characteristic of the chosen Calabi-Yau manifold (Candelas et al. 1985).

In all string theories, also already in the early bosonic versions (Scherk and Schwarz 1974), a scalar \(\phi \) with a gravitational-strength coupling to matter is found. This scalar field has a very special property in the theory. Its expectation value

$$\begin{aligned} \exp (\langle \phi \rangle ), \end{aligned}$$
(4.70)

controls the string coupling constant, determining the strength of the string interaction. If the coupling constant gets too large, perturbation theory breaks down. This scalar \(\phi \) can also be identified with the dilaton, the scalar field appearing in Kaluza-Klein theory, linking back to the scalar-tensor theory of Brans-Dicke gravity (Brans and Dicke 1961). Indeed, there are proposed theories of superstring cosmology, for instance Lidsey et al. (2000).

General references for string theory are, for instance (Hatfield 1992), (Polchinski 2005a, b), (Green et al. 2012a, b), (Rickles 2014). Examples of non-technical references are Greene (2004, 2013), Randall (2006), Susskind (2006).

After over a decade of intense study, in 1995, the next remarkable step towards a “theory of everything” was achieved. Initiated by a single person, the “second superstring revolution” took place. In that year, Witten published a paper which would changed the face of string theory for ever (Witten 1995). By moving to elven-dimensional space-time, and allowing for membranes in the theory, i.e., the higher-dimensional equivalents to vibrating two-dimensional strings, he realized that all five superstring theories could be united within one overarching theory. So the previous embarrassment of having five “theories of everything” was finally explained.

Witten put forward a convincing case that this distinction is just an artifact of perturbation theory and that non-perturbatively these five theories are, in fact, just corners of a deeper theory. [...] Moreover, this deeper theory, subsequently dubbed M-theory , has \(D=11\) supergravity as its low energy limit! Thus the five string theories and \(D=11\) supergravity represent six different special points in the moduli space of M-theory.

Quote from Duff (1999, p. 326). Technical references on M-theory theory are, for instance Duff (1999), Kaku (2000), Rickles (2014). More details on the issues plaguing string theory—and quantum gravity in general—can be found in Sect. 10.2.2. For the notion of AdS/CFT duality, see Sect. 13.4.1.2.

3.3 Einstein’s Unified Field Theory

Returning back to 1926, Klein’s twist on Kaluza’s original proposal ignited new interest in five-dimensional gravity as a unified field theory incorporating electromagnetism. Einstein remarked in 1927 (Goenner 2004, p. 65):

It appears that the union of gravitation and Maxwell’s theory is achieved in a completely satisfactory way by the five-dimensional theory.

In the following years, many physicists and mathematicians started to study the implications and finer details of Kaluza-Klein theory. See Goenner (2004, Section 7.2.4). Albeit promising, the theory ultimately failed. In 1929, the physicists Vladimir A. Fock summarized the situation as follows (Goenner 2004, p. 105):

Up to now, quantum mechanics has not found its place in this geometric picture [of general relativity] ; attempts in this direction (Klein, [...]) were unsuccessful.

Also the reality status and the meaning of the extra dimensions was seen as problematic. Indeed, a little more than a year after his initial publication on the matter, Klein conceded (Goenner 2004, p. 112):

Particularly, I no longer think it to be possible to do justice to the deviations from the classical description of space and time necessitated by quantum theory through the introduction of a fifth dimension.

In 1928, Einstein himself took a leading role in the conceptual development of a unified field theory. A new wave of research ensued. Einstein was tinkering with the equations of general relativity and set out to extend the formalism. On the 10th of June of that year, he introduced the idea of teleparallelism, originally called Fernparallelismus , which allowed the comparison of the direction of a tangent vector at various points in space-time (Einstein 1928a). Technically, the underlying space-time is a Weitzenböck space-timeFootnote 20 (Gronwald and Hehl 1996). Four days later, Einstein published his first attempt of constructing a unified theory of gravitation and electromagnetism (Einstein 1928b).

[Since the publication of the 10th of June] I discovered that this theory—at least to a first approximation—yields the field laws for gravitation and electromagnetism easily and naturally. It is thus conceivable, that this theory will supersede the original version of general relativity.

Quote from Einstein (1928b, p. 224, translation mine). With this, Einstein would embark on a more than two-decade long scavenger hunt, chasing this elusive goal. At the time, he was probably quite upbeat about the project’s future. His intuition as a physicists had been validated by two very unexpected and profound theories of relativity. An additional motivational factor was perhaps also given by the fact that he had disproved Weyl’s attempts at a unified field theory. On the 10th of January 1929, Einstein published an update (Einstein 1929a, p. 1, translation mine):

Indeed, it was possible to assign the same coherent interpretation to the gravitational and the electromagnetic field. However, the derivation of the field equation from Hamilton’s principle did not lead to a straightforward and unambiguous path. These difficulties intensified under further reflection. Since then, I was however successful in finding a satisfactory derivation of the field equations, which I will present in the following.

Unfortunately, there would be more clouds on the horizon. Others doubted the validity of the field equations Einstein presented in Einstein (1929a). To such criticism, on the 21st of March 1929, he responded as follows (Einstein 1929b, p. 156, translation mine):

In the meantime, I have discovered a possibility to solve this problem in a satisfactory manner, founded on Hamilton’s principle.

Then, on the 9th of January 1930 (Einstein 1930, p. 18, translation mine):

A couple of months ago I published an article [...] summarizing the mathematical foundation of the unified field theory. Here I want to recapitulate the essential ideas and also explain how some remarks appearing in previous works can be improved.

Undeterred, Einstein continued with his quest. He was assisted by Walther Mayer, a mathematician specialized in topology and differential geometry. New technical publications followed (Einstein and Mayer 1930, 1931a, b). Einstein knew that his attempts had opened a Pandora’s box of challenges. Indeed, the very notion of teleparallelism, the original seeding insight, had to be abandoned. On the 21st of March 1932, in a letter to Élie Cartan, he observed (as quoted in Goenner 2004, p. 85):

[...] in any case, I have now completely given up the method of distant parallelism. It seems that this structure has nothing to do with the true character of space [...].

By 1932, Einstein had become increasingly isolated in his research. Most physicists considered his attempts to be ultimately futile. Indeed, from 1928 to 1932 Einstein had been faced with criticism by notable scholars, like Hans Reichenbach, a logical positivist philosopher of science, and Weyl (Goenner 2004). But Wolfgang Pauli was most vocal in his criticism. Already on the 29th of September 1929, in a letter to a fellow physicists, Pauli confessed (Goenner 2004, p. 89):

By the way, I now no longer believe one syllable of teleparallelism; Einstein seems to have been abandoned by the dear Lord.

Then, on the 19th of December 1929, Pauli wrote a direct and blunt letter to Einstein (quoted in Goenner 2004, p. 87):

I thank you so much for letting be sent to me your new paper [...], which gives such a comfortable and beautiful review of the mathematical properties of a continuum with Riemannian metric and distant parallelism [...]. Unlike what I told you in spring, from the point of view of quantum theory, now an argument in favor of distant parallelism can no longer be put forward [...]. It just remains [...] to congratulate you (or should I rather say condole you?) that you have passed over to the mathematicians. Also, I am not so naive as to believe that you would change your opinion because of whatever criticism. But I would bet with you that, at the latest after one year, you will have given up the entire distant parallelism in the same way as you have given up the affine theory earlier. And, I do not wish to provoke you to contradict me by continuing this letter, because I do not want to delay the approach of this natural end of the theory of distant parallelism.

Einstein answered on 24th of December 1929 as follows (Goenner 2004, p. 88):

Your letter is quite amusing, but your statement seems rather superficial to me. Only someone who is certain of seeing through the unity of natural forces in the right way ought to write in this way. Before the mathematical consequences have not been thought through properly, it is not at all justified to make a negative judgment.

In 1931, a collaborator of Einstein published a review article on teleparallelism (Lanczos 1931). It appeared in a journal whose name can be literally translated as “Results in the Exact Sciences.” Pauli, when reviewing the article, sarcastically remarked (Goenner 2004, p. 89):

It is indeed a courageous deed of the editors to accept an essay on a new field theory of Einstein for the “Results in the Exact Sciences.”

A summary of Einstein’s work between 1914 and 1932, appearing in the Preußischen Akademie der Wissenschaften, can be found in Simon (2006).

3.4 A Brief History of Quantum Mechanics

After 1932, things became quiet around Einstein’s unified field theory. He spent the remaining years up to his death in 1955 publishing articles on the philosophy of science, the history of physics, and special and general relativity. He was also concerned with quantum mechanics, a subject he continued to be displeased with. Ironically, Einstein himself was instrumental in the creation of the theory.

Quanta

In Einstein (1905a) he proposed that the experimental data relating to the photoelectric effect should be interpreted as the result of light being made up of discrete quantized packets, called Lichtquanten . These light quanta, known as photons today, each come with the energy

$$\begin{aligned} E=h\nu , \end{aligned}$$
(4.71)

i.e., an energy proportional to the frequency \(\nu \) of the light, where the proportionality constant is given by Planck’s constant h. Max Planck had proposed this relationship to explain the observed frequency spectrum of black-body radiation (Planck 1901), for which he would receive a Nobel prize in 1918. With the radical and revolutionary assumption that radiation is not emitted continuously but in discreet amounts, Planck was able to solve a puzzle, which had baffled physicists at the time: all previous theoretical calculations of black-body radiation resulted in nonsensical, infinite results. “It is a remarkable fact that so simple a hypothesis [\(E=h\nu \)], even if incomprehensible at first sight, leads to a perfect agreement with everything we can observe and measure” (Omnès 1999, p. 138).

For Planck, postulating quanta was an act of despair: “I was ready to sacrifice any of my previous convictions about physics” (quoted in Longair 2003, p. 339). Indeed, he originally believed that the notion of quanta was “a purely formal assumption and [he] really did not give it much though [...]” (Longair 2003, p. 339). Einstein understood the quantum hypothesis literally to explain the photoelectric effect. This work, and not his groundbreaking publications on special and general relativity, would win him a Nobel prize in 1921. Planck and Einstein’s discoveries led to the quantum revolution.

In another notable publication, Einstein proposed the possibility of stimulated emission, the physical process making lasers possible (Einstein 1917). Despite his vital role in the initiating developments of quantum theory, Einstein always stayed skeptical. In a letter he wrote to Max Born in 1926, he lamented (quoted in Schweber 2008, p. 34):

Quantum mechanics is certainly imposing. But an inner voice tells me that it is not yet the real thing. The theory says a lot, but does not bring us closer to the secret of the “old one.” I, at any rate, am convinced that He is not playing with dice.

Einstein’s reservations explicitly dealt with the probabilistic and indeterministic nature of quantum theory. For instance, Born’s interpretation of the wave function as a probability amplitude (Born 1926) or, later, Heisenberg’s uncertainty principle (Heisenberg 1927). Einstein still believed that his unified field theory would shed light on these issues, and “that the quantum mechanical properties of particles would follow as a fringe benefit from [the field theory]” (Goenner 2004, p. 8). In the end, his skepticism stemmed from certain philosophical considerations relating to the nature of reality, of which there is an abundance. For instance, the philosopher Charles S. Peirce proposed the theory of tychism , where he argued that chance and indeterminism are indeed ruling principles in the universe (Peirce 1892)—a direct antithesis to Einstein’s opinion.

Entanglement

In what would end as an ironic turn of events, Einstein set out to disprove the bizarre consequences of quantum theory with collaborators. The now infamous EPR paradox, an acronym containing the last initials of the authors, was a clever thought experiment designed to show that quantum mechanics must be incomplete (Einstein et al. 1935). In a nutshell, the experiment showed that quantum mechanics allows for non-local effects: under certain conditions, a measurement conducted on a particle A would instantaneously change the properties of a particle B, regardless of the distance of separation between the two. Einstein felt victorious, as he did not believe such “spooky actions at a distance” (Kaiser 2011, p. 30) could be possible. But alas, things turned out differently. John Stewart Bell was able to furnish a theorem out of the EPR paradox. He proved that non-locality was indeed endemic to quantum mechanics (Bell 1964). The experimental validation was given years later in Freedman and Clauser (1972) and notably by Aspect et al. (1981, 1982a, b). In trying to expose the outlandish nature of quantum physics, Einstein helped to distill one of reality’s most mind-boggling properties: entanglement, a term introduced by Schrödinger in (1935) to account for the “spooky action at a distance.” Although it seems to imply that in some bizarre way reality is simultaneously interconnected with itself, entanglement does not allow actual information to propagate faster than the speed of light. Hence special relativity is not violated and there are no tenable objections form physics against entanglement. Quite to the contrary, today entanglement plays a central role in the emerging fields of quantum computation, quantum information, and quantum cryptography, the cutting-edge of current technological advancements. Indeed (Nielsen and Chuang 2007, pp. 11f.):

Entanglement is a uniquely quantum mechanical resource that plays a key role in many of the most interesting applications of quantum computation and quantum information; entanglement is iron to the classical world’s bronze age. In recent years there has been a tremendous effort trying to better understand the properties of entanglement considered as a fundamental resource of Nature, of comparable importance to energy, information, entropy, or any other fundamental resource.

Key to this surge in research was a theorem proved in 1982. It goes by the name of the no-cloning theorem (Wootters and Zurek 1982). In a nutshell (Kaiser 2011, p. xxv):

[...] the no-cloning theorem stipulates that it is impossible to produce perfect copies (or “clones”) of an unknown arbitrary quantum state. Efforts to copy the fragile quantum state necessarily alter it.

This property thwarts any attempts to intercept the communication of information, allowing for a 100% secure transmission channel: quantum encrypted communications cannot, by the laws of nature, be tapped without the signal being affected. This promise of perfect security would be the gold standard in an age of information processing and global computer networks. Experiments have demonstrated the proof-of-concept, for instance Poppe et al. (2004). And real-world applications followed (Hensler et al. 2007):

On Thursday, October 11 [2007], the State of Geneva announced its intention to use quantum cryptography to secure the network linking its ballot data entry center to the government repository where the votes are stored. The main goal of this initiative, a world first, is to guarantee the integrity of the data as they are processed.

For more details on entanglement and the no-cloning theorem, see Sect. 10.3.2.1.

In the ebb and flow of history one can sometimes lose track of the peculiarities and coincidences leading to a major advancement in science. The popularization of entanglement and the development of the no-cloning theorem are prominent examples of how a very unlikely group of people can end up being responsible for such revolutionary feats: a loose collaboration of physicists, dabbling in psychedelics, Eastern mysticism, parapsychology, and other esoteric concepts. Indeed, it is often hard to appreciate how drastically geopolitics has influence the development of science and how crucial mindsets and culture can be for setting research agendas. See Kaiser (2011).

Some technical aspects of quantum mechanics can be found in the paragraphs encapsulating the following equations: (3.24) on p. 78, (3.51) on p. 88, and (4.20) on p. 100.

3.5 Einstein’s Final Years

Regarding Einstein’s post-1932 research, in 1945 a publication surfaced where he once gain engaged in the quest of formulating a unified field theory. He would call his new conception the generalized theory of gravity (Einstein 1945). More publications followed in 1948, 1950, 1953, 1954, and 1955. In December 1954, in a note for the fifth edition of Einstein (1956), his passion was still burning:

For the present edition I have completely revised the “Generalization of Gravitation Theory” [Appendix II] under the title “Relativistic Theory of the Non-Symmetric Field.” For I have succeeded—in part in collaboration with my assistant B. Kaufman—in simplifying the derivations as well as the form of the field equations. The whole theory becomes thereby more transparent, without changing its contents.

Einstein, well aware of the pressing conflicts in his approach, was confident that no one else could claim any certainty on the matter either (Einstein 1956, p. 165f.):

Is it conceivable that a field theory permits one to understand the atomistic and quantum structure of reality? Almost everybody will answer this question with “no.” But I believe that at the present time nobody knows anything reliable about it. [...] One can give good reasons why reality cannot at all be represented by a continuous field. From the quantum phenomena it appears to follow with certainty that a finite system of finite energy can be completely described by a finite set of numbers (quantum numbers). This does not seem to be in accordance with a continuum theory, and must lead to an attempt to find a purely algebraic theory for the description of reality. But nobody knows how to obtain the basis of such a theory.

For a detailed account of all of Einstein’s works, see Schilpp (1970).

Although being isolated from most of the physics community, and not up-to-date with modern advancements in quantum field theory, Einstein’s work still managed to catch the attention and fascination of the media. For instance, a newspaper article, appearing on the day of Einstein’s death, hailed (Associated Press 1955, p. 17):

In 1950, after 30 years of intensive study, Einstein expounded a new theory that, if proved, might be the key to the universe.

In the end, Einstein’s efforts at a unified field theory are reduced to a footnote in history. What has stayed, is Weyl’s idea of local gauge symmetry, Kaluza’s venture into extra dimensions, and Klein’s compactification scheme. Gauge symmetry would reveal itself as the unifying principle behind the standard model of particle physics, as is discussed in the next section. Additional spatial dimensions, the novel symmetry principle called supersymmetry, and compactification are the fundamental building blocks of string theory to this day.

4 Unification—The Holy Grail of Physics

Chapter 3 set out to establish the power of a simple abstract notion, called symmetry, in decoding the workings of the universe. The notion of symmetry fulfills a dream of physicists. The hope that nature is ultimately not only comprehensible to the human mind, but also in a way that is pleasing and satisfying. In the words of Nobel laureate Steven Weinberg (Weinberg 1992, p. 165):

We believe that, if we ask why the world is the way it is and then ask why that answer is the way it is, at the end of this chain of explanations we shall find a few simple principles of compelling beauty.

This was witnessed in the insights gained from invariance: the emergence of conservation laws, presented in Sect. 3.1, and the fundamental physical classification of matter states and particle fields, discussed in Sect. 3.2. Perhaps the epitome of Weinberg’s dream comes in the guise of unification, the theme with which this chapter began. Indeed, Weinberg was himself instrumental in showing how symmetry principles are instrumental tools for crafting a unified theory of all known forces excluding gravity, leading to the standard model of particle physics. Weinberg in (1992, p. 142):

Symmetry principles have moved to a new level of importance in this [twentieth] century [...]: there are symmetry principles that dictate the very existence of all the known forces of nature.

However, before these groundbreaking insights could be uncovered, some obstacles still needed to be removed for quantum field theory and gauge theory to emerge from the “Dark Age.” The single most damning problem was that the mathematics was still plagued by infinities, the demon of non-renormalizability. Almost simultaneously, Weinberg (1967) and Abdus Salam (Salam 1968) “boldly ignored the problem of the ‘non-renormalizable’ infinities and instead proposed a far more ambitious unified gauge theory of the electromagnetic and weak interactions” (Moriyasu 1983, p. 102). They built on the work by Schwinger (1957) and Sheldon Glashow (Glashow 1961) and developed a spontaneously broken gauge theory by incorporating the Higgs mechanism, discussed in Sect. 4.2.1. Nearly a century after Maxwell’s merger of electricity and magnetism, the next step in unifying the forces of nature was in sight: the electroweak interaction. It is based on the gauge group\(SU(2)\times U(1)\) . Salam, Glashow, and Weinberg were awarded the Nobel Prize in Physics in 1979 for this achievement.

The final piece of the puzzle was found by Gerard ’t Hooft. He proved that spontaneously broken gauge theories are renormalizable (’t Hooft 1971). This crucial theorem was all that was holding back the inevitable step to unification (Moriyasu 1983, p. 113):

However, [renormalizability] could not be proved at the time and the general response of the community of physicists to the Weinberg-Salam theory was best described some years later by Sidney Coleman: “Rarely has so great an accomplishment been so widely ignored.”

The strong nuclear force, responsible for the stability of matter, confining the quarks into hadron, was successfully described as a gauge theory of a new quantum charge, called color (Han and Nambu 1965; Greenberg and Nelson 1977). These are new quantum properties carried by the quarks, just like electric charge is a property of some fermions and bosons, recall Fig. 4.1. Hence the term quantum chromodynamics is used to describe this theory. The gauge potential fields are called gluons and mediate the strong interaction between the color charged quarks. The gauge group is SU(3) and some mathematical tools can be borrowed from the SU(3) classification in the old quark model provided by Gell-Mann and Ne’eman (1964). However, “it is important to keep in mind that neither the theoretical predictions nor the experimental tests of chromodynamics have yet achieved the level of either quantum electrodynamics or the Weinberg-Salam theory” (Moriyasu 1983, p. 122).

Although, basically, the standard model was created by splicing the electroweak theory and the theory of quantum chromodynamics, it ranks as “one of the great successes of the gauge revolution” (Kaku 1993, p. 363). Technically, the standard model is a spontaneously broken quantum Yang-Mills theory describing all known particles and all three non-gravitational forces.

The standard model Lagrangian, seen in (3.8), is invariant under the unified symmetry group

$$\begin{aligned} G_{\text {SM}} = SU(3) \times SU(2) \times U(1). \end{aligned}$$
(4.72)

Fermionic matter is described by spinors which interact via gauge bosons that enter the Lagrangian through the gauge-invariant derivative

$$\begin{aligned} \mathcal D_\mu = \partial _\mu + i \hat{g} \sum _{\alpha =1}^8 G_\mu ^\alpha \lambda ^\alpha + i g \sum _{i=1}^3 W^i_\mu \tau ^i +i g^\prime B_\mu Y, \end{aligned}$$
(4.73)

where the SU(3) generators \(\lambda ^\alpha \), with the corresponding coupling constant \(\hat{g}\) and gluon gauge fields \(G_\mu ^\alpha \), are added to (4.39). The scalar Higgs field is responsible for generating the mass terms without violating covariance, as described in Sect. 4.2.1.

General references are Moriyasu (1982, 1983), Collins et al. (1989), Kaku (1993), Peskin and Schroeder (1995), Cheng and Li (1996), Ryder (1996), O’Raifeartaigh (1997).