1 Introduction

The first direct detection of gravitational waves (GWs), widely expected in the mid 2010s with advanced ground-based interferometers [219, 2], will represent the culmination of a fifty-year experimental quest [124]. Soon thereafter, newly plentiful GW observations will begin to shed light on the structure, populations, and astrophysics of mostly dark, highly relativistic objects such as black holes and neutron stars. In the low-frequency band that will be targeted by space-based detectors (roughly 10−5 to 1 Hz), GW observations will provide a census of the massive black-hole binaries at the center of galaxies, and characterize their merger histories; probe the galactic population of binaries that include highly evolved, degenerate stars; study the stellar-mass objects that spiral into the central black holes in galactic nuclei; and possibly detect stochastic GW backgrounds from the dynamical evolution of the very early universe.

Thus, there are very strong astrophysical motivations to observe the universe in GWs, especially because the systems and phenomena that can be observed in this fashion are largely orthogonal to those accessible to traditional electromagnetic (EM) and astroparticle astronomy. The promise of GWs appears just as great for fundamental physics. Einstein’s theory of gravity, general relativity (GR), has been confirmed by extensive experimental tests; but these have largely been confined to the solar system, where gravity is well approximated by Newtonian gravity with small corrections. A few tests, based on observations of binary compact-object systems, have confirmed the weakest (leading-order) effects of GW generation. By contrast, observation of strong GWs will provide the first direct observational probe of the dynamical, strong-field regime of GR, where the nature and behavior of gravity can be significantly different from the Newtonian picture. GWs are prima facie the perfect probe to investigate gravitation, since they originate directly from the bulk motion of gravitating matter, relieving the need to understand and model the physics of other intermediate messengers, typically photons from stellar surfaces or black-hole surroundings.

Already today we can rely on a very sophisticated understanding of the analytical and numerical techniques required to model GW sources and their GW emission, including the post-Newtonian expansion [84, 190], black-hole perturbation theory [119], numerical relativity for vacuum spacetimes [368], spacetimes with gases or magnetized plasmas [184], and much more. That these techniques should have been developed so much in the absence of a dialogue with experimental data (except for the binary pulsar [293]) is witness to the great perceived promise of GW astronomy. For a bird’s-eye view of the field, see the Living Review by Sathyaprakash and Schutz [395], who cover the physics of GWs, the principles of operation of GW detectors, the nature of GW sources, the data analysis of GW signals, and the science payoffs of GW observations for physics, astrophysics, and cosmology.

This review focuses on the opportunities to challenge or confirm our understanding of gravitational physics that will be offered by forthcoming space-based missions to observe GWs in the low-frequency band between 10−5 and 1 Hz. Most of the literature on this subject has focused on one mission design, LISA (the Laser Interferometer Space Antenna [64, 252, 370]), which was studied jointly by NASA and ESA between 2001 and 2011. In 2011, budgetary and programmatic reasons led the two space agencies to end this partnership, and to pursue space-based GW detection separately, studying cheaper, rescoped LISA-like missions.

ESA’s proposed eLISA/NGO [20] would be smaller than LISA, fly on orbits closer to Earth, and operate interferometric links only along two arms. In 2012 eLISA was considered for implementation as ESA’s first large mission (“L1”) in the Cosmic Vision program. A planetary mission was selected instead, but eLISA will be in the running for the next launch slot (“L2”), with a decision coming as soon as 2014. NASA ran studies on a broader range of missions [215], including several variants of LISA to be implemented by NASA alone, as well as options with a geocentric orbit (OMEGA [229]), and without drag-free control (LAGRANGE [304]). The final study report concludes that scientifically compelling missions can be carried out for less, but not substantially less, than the full LISA cost; that scientific performance decreases far more rapidly than cost; and that no design choice or technology can make a dramatic reduction in cost without much greater risks. The NASA study noted the possibility of participation in the ESA-led eLISA mission (if selected by ESA) as a minority partner.

Whatever specific design is eventually selected, it is likely that its architecture, technology, and scientific reach will bear a strong resemblance to LISA’s (with the appropriate scalings in sensitivity, mission duration, and so on). Thus, the research reviewed in this article, which was targeted in large part to LISA, is still broadly relevant to future missions. Such LISA-like observatories are characterized by a few common elements: a set of three spacecraft in long-baseline (Mkm) orbits, monitoring their relative displacements using laser interferometry; drag-free operation (except for LAGRANGE [304]), whereby displacement measurements are referenced to freely falling test masses protected by the spacecraft, which hover around the masses using precise micro-Newton thrusters; frequency correction of laser noise using a variety of means, including onboard cavities and interferometers, arm locking, and a LISA-specific technique known as Time-Delay Interferometry.

The predictions of GR that can be tested by space-based GW observatories include the absence of gravitational fields other than the metric tensor; the number and character of GW polarization states; the speed of GW propagation; the detailed progress of binary inspiral, as driven by nonlinear gravitational dynamics and loss of energy to GWs; the strength and shape of the GWs from binary merger and ringdown; the true nature of astrophysical black holes; and more.

Some of these tests will also be performed with ground-based GW detectors and pulsar-timing observations [500], but space-based tests will almost always have superior accuracy and significance, because low-frequency sources are intrinsically stronger, and will spend a larger time within the band of good detector sensitivity. For binary systems with very asymmetric mass ratios, such as extreme mass-ratio inspirals (EMRIs), LISA-like missions will measure hundreds of thousands of orbital cycles; because successful detections require matching the phase of signals throughout their evolution, it follows that these observations will be exquisitely sensitive to source parameters. The data-analysis detection problem will be correspondingly delicate, but has been tackled both theoretically [38], and in a practical program of mock data challenges for LISA [37, 39, 450].

The rest of this review is organized as follows. Section 2 provides the briefest overview of Einstein’s GR, of the theoretical framework in which it can be tested, and of a few leading alternative theories. It also introduces the “black-hole paradigm,” which augments Einstein’s equations with a few assumptions of physicality that lead to the prediction that the end result of gravitational collapse are black holes described by the Kerr metric. Section 3 reviews the “classic” LISA architecture, as well as possible options for LISA-like variants. Section 4 summarizes the main classes of GW sources that would be observed by LISA-like detectors, and that can be used to test GR. Section 5 examines the tests of gravitational dynamics that can be performed with these sources, while Section 6 discusses the tests of the black-hole nature and structure. (A conspicuous omission are possible stochastic GW backgrounds of cosmological origin [82]; indeed, in this article we do not discuss the role of space-based detectors as probes of cosmology and early-universe physics.) Last, Section 7 presents our conclusions and speculations.

2 The Theory of Gravitation

Newton’s theory of gravitation provided a description of the effect of gravity through the inverse square law without attempting to explain the origin of gravity. The inverse square law provided an accurate description of all measured phenomena in the solar system for more than two hundred years, but the first hints that it was not the correct description of gravitation began to appear in the late 19th century, as a result of the improved precision in measuring phenomena such as the perihelion precession of Mercury. Einstein’s contribution to our understanding of gravity was not only practical but also aesthetic, providing a beautiful explanation of gravity as the curvature of spacetime. In developing GR as a generally covariant theory based on a dynamical spacetime metric, Einstein sought to extend the principle of relativity to gravitating systems, and he built on the crucial 1907 insight that the equality of inertial and gravitational mass allowed the identification of inertial systems in homogeneous gravitational fields with uniformly accelerated frames — the principle of equivalence [339]. Einstein was also guided by his appreciation of Ricci and Levi-Civita’s absolute differential calculus (later to become differential geometry), arguably as much as by the requirement to reproduce Newtonian gravity in the weak-field limit. Indeed, one could say that GR was born “of almost pure thought” [471].

Einstein’s theory of GR is described by the action

$${S_{{\rm{GR}}}} = \int {\sqrt {- g} R{{\rm{d}}^4}x},$$

in which g is the determinant of the spacetime metric and R = gμνRμν is the Ricci scalar, where \({R^{\mu \nu}} = R_{\mu \alpha \nu}^\alpha\) is the Ricci Tensor, \(R_{\beta \gamma \delta}^\alpha = \Gamma _{\beta \delta ,\gamma}^\alpha - \Gamma _{\beta \gamma ,\delta}^\alpha + \Gamma _{\gamma \lambda}^\alpha \Gamma _{\beta \delta}^\lambda - \Gamma _{\delta \lambda}^\alpha \Gamma _{\beta \gamma}^\lambda\) the Riemann curvature tensor, \(\Gamma _{\beta \gamma}^\alpha = {g^{\alpha \delta}}({g_{\beta \delta ,\gamma}} + {g_{\delta \gamma ,\beta}} - {g_{\beta \gamma ,\delta}})\) the affine connection, and a comma denotes a partial derivative. When coupled to a matter distribution, this action yields the field equations

$${G_{\mu \nu}} \equiv {R_{\mu \nu}} - {1 \over 2}{g_{\mu \nu}}R = {{8\pi G} \over {{c^4}}}{T_{\mu \nu}},$$

where denotes the stress-energy tensor of the matter.

Since the development of the theory, GR has withstood countless experimental tests [471, 443, 444] based on measurements as different as atomic-clock precision [378], orbital dynamics (most notably lunar laser ranging [309]), astrometry [415], and relativistic astrophysics (most exquisitely the binary pulsar [293, 471], but not only [372]). It is therefore the correct and natural benchmark against which to compare alternative theories using future observations and we will follow the same approach in this article. Unlike in the case of Newtonian gravity at the time that GR was developed, there are no current observations that GR cannot explain that can be used to guide development of alternatives.Footnote 1 Nonetheless, there are crucial aspects of Einstein’s theory that have never been probed directly, such as its strong-field dynamics and the propagation of field perturbations (GWs). Furthermore, it is known that classical GR must ultimately fail at the Planck scale, where quantum effects become important, and traces of the quantum nature of gravity may be accessible at lower energies [400]. As emphasized by Will [471], GR has no adjustable constants, so every test is potentially deadly, and a probe that could reveal new physics.

2.1 Will’s “standard model” of gravitational theories

Will’s Living Review [471] and his older monograph [469] are the fundamental references about the experimental verification of GR. In this section, we give only a brief overview of what may be called Will’s “standard model” for alternative theories of gravity, which proceeds through four steps: a) strong evidence for the equivalence principle supports a metric formulation for gravity; b) metric theories are classified according to what gravitational fields (scalar, vector, tensor) they prescribe; c) slow-motion, weak-field conservative dynamics are described in a unified parameterized post-Newtonian (PPN) formalism, and constrained by experiment and observations; d) finally, equations for the slow-motion generation and weak-field propagation of gravitational radiation are derived separately for each metric theory, and again compared to observations. Many of the tests of gravitational physics envisaged for LISA belong in this last sector of Will’s standard model, and are discussed in Section 5.1 of this review. This scheme however leaves out two other important points of contact between gravitational phenomenology and LISA’s GW observations: the strong-field, nonlinear dynamics of black holes and their structure and excitations, especially as probed by small orbiting bodies. We will deal with these in Sections 5 and 6, respectively; but let us first delve into Will’s standard model.

The equivalence principle and metric theories of gravitation. Einstein’s original intuition [338] placed the equivalence principle [222] as a cornerstone for the theories that describe gravity as curved spacetime. As formulated by Newton, the principle states simply that inertial and gravitational mass are proportional, and therefore all “test” bodies fall with the same acceleration (in modern usage, this is known as the weak equivalence principle, or WEP). Dicke later recognized that in developing GR Einstein had implicitly posited a broader principle (Einstein’s equivalence principle, or EEP) that consists of WEP plus local Lorentz invariance and local position invariance: that is, of the postulates that the outcome of local non-gravitational experiments is independent of, respectively, the velocity and position of the local freely-falling reference frames in which the experiments are performed.

Table 1 Hierarchy of formulations of the equivalence principle.

Turyshev [443] gives a current review of the experimental verification of WEP (shown to hold to parts in 1013 by differential free-fall tests [399]), local Lorentz invariance (verified to parts in 1022 by clock-anisotropy experiments [276]), and local position invariance (verified to parts in 105 by gravitational-redshift experiments [58], and to much greater precision when looking for possible time variations of fundamental constants [445]). Although these three parts of EEP appear distinct in their experimental consequences, their underlying physics is necessarily related in any theory of gravity, so Schiff conjectured (and others argued convincingly) that any complete and self-consistent theory of gravity that embodies WEP must also realize EEP [471].

EEP leads to metric theories of gravity in which spacetime is represented as a pseudo-Riemannian manifold, freely-falling test bodies move along the geodesics of its metric, and non-gravitational physics is obtained by applying special-relativistic laws in local freely-falling frames. GR is, of course, a metric theory of gravity; so are scalar-vector-tensor theories such as Brans-Dicke theory, which include other gravitational fields in addition to the metric. By contrast, theories with dynamically varying fundamental constants and theories (such as superstring theory) that introduce additional WEP-violating gravitational fields [471, Section 2.3] are not metric. Neither are most theories that provide short-range and long-range modifications to Newton’s inverse-square law [3].

The scalar and vector fields in scalar-vector-tensor theories cannot directly affect the motion of matter and other non-gravitational fields (which would violate WEP), but they can intervene in the generation of gravity and modify its dynamics. These extra fields can be dynamical (i.e., determined only in the context of solving for the evolution of everything else) or absolute (i.e., assigned a priori to fixed values). The Minkowski metric of special relativity is the classic example of absolute field; such fields may be regarded as philosophically unpleasant by those who dislike feigning hypotheses, but they have a right of citizenship in modern physics as “frozen in” solutions from higher energy scales or from earlier cosmological evolution.

The additional fields can potentially alter the outcome of local gravitational experiments: while the local gravitational effects of different metrics far away can always be erased by describing physics in a freely-falling reference frame (which is to say, the local boundary conditions for the metric can be arranged to be flat spacetime), the same is not true for scalar and vector fields, which can then affect local gravitational dynamics by their interaction with the metric. This amounts to a violation not of EEP, but of the strong equivalence principle (SEP), which states that EEP is also valid for self-gravitating bodies and gravitational experiments. SEP is verified to parts in 104 by combined lunar laser-ranging and laboratory experiments [476]. So far, GR appears to be the only viable metric theory that fully realizes SEP.

The PPN formalism. Because the experimental consequences of different metric theories follow from the specific metric that is generated by matter (possibly with the help of the extra gravitational fields), and because all these theories must realize Newtonian dynamics in appropriate limiting conditions, it is possible to parameterize them in terms of the coefficients of a slow-motion, weak-field expansion of the metric. These coefficients appear in front of gravitational potentials similar to the Newtonian potential, but involving also matter velocity, internal energy, and pressure. This scheme is the parameterized post-Newtonian formalism, pioneered by Nordtvedt and extended by Will (see [469] for details).

Of the ten PPN parameters in the current version of the formalism, two are the celebrated γ and β (already introduced by Eddington, Robertson, and Schiff for the “classical” tests of GR) that rule, respectively, the amount of space curvature produced by unit rest mass and the nonlinearity in the superposition of gravitational fields. In GR, γ and β each have the value 1. The other eight parameters, if not zero, give origin to violations of position invariance (ξ), Lorentz invariance (α1–3), or even of the conservation of total momentum (α3, ζ1–4) and total angular momentum (α1–3, ζ1–4).

The PPN formalism is sufficiently accurate to describe the tests of gravitation performed in the solar system, as well as many tests using binary-pulsar observations. The parameter γ is currently constrained to 1 ± a few 10−5 by tests of light delay around massive bodies using the Cassini spacecraft [81]; β to 1 ± a few 10−4 by lunar laser ranging [476].Footnote 2 The other PPN parameters have comparable bounds around zero from solar-system and pulsar measurements, except for α3, which is known exceedingly well from pulsar observations [471].

2.2 Alternative theories

Tests in the PPN framework have tightly constrained the field of viable alternatives to GR, largely excluding theories with absolute elements that give rise to preferred-frame effects [471]. The (indirect) observation of GW emission from the binary pulsar and the accurate prediction of its by Einstein’s quadrupole formula have definitively excluded other theories [471, 422]. Yet more GR alternatives were conceived to illuminate points of principle, but they are not well motivated physically and therefore are hardly candidates for experimental verification. Some of the theories that are still “alive” are described in the following. More details can be found in [469].

2.2.1 Scalar-tensor theories

The addition of a single scalar field ϕ to GR produces a theory described by the Einstein-frame action (see, e.g., [471]),

$$\tilde I = {(16\pi G)^{- 1}}\int {[\tilde R - 2{{\tilde g}^{\mu \nu}}{\partial _\mu}\varphi {\partial _\nu}\varphi - V(\varphi)]{{(- \tilde g)}^{1/2}}{{\rm{d}}^4}x + {I_{{\rm{matter}}}}({\psi _{\rm{m}}},{A^2}(\varphi){{\tilde g}_{\mu \nu}})} ,$$

where \({{\tilde g}_{\mu \nu}}\) is the metric, the Ricci curvature scalar \({\tilde R}\) yields the general-relativistic Einstein-Hilbert action, and the two adjacent terms are kinetic and potential energies for the scalar field. Note that in the action Imatter for matter dynamics, the metric couples to matter through the function A(φ), so this representation is not manifestly metric; it can however be made so by a change of variables that yields the Jordan-frame action,

$$I = {(16\pi G)^{- 1}}\int {[\phi R - {\phi ^{- 1}}\omega (\phi){g^{\mu \nu}}{\partial _\mu}\phi {\partial _\nu}\phi - {\phi ^2}V]{{(- g)}^{1/2}}{{\rm{d}}^4}x + {I_{{\rm{matter}}}}({\psi _{\rm{m}}},{g_{\mu \nu}})} ,$$

where ϕA(φ)−2 is the transformed scalar field, \({g_{\mu \nu}} \equiv {A^2}(\varphi){{\tilde g}_{\mu \nu}}\) is the physical metric underlying gravitational observations, and 3 + 2ω(ϕ) = [d(ln A(φ))/dφ]−2.

The “classic” Brans-Dicke theory corresponds to fixing ω to a constant ωbd, and it is indistinguishable from GR in the limit ωBD → ∞. In the PPN framework, the only parameter that differs from GR is γ = (1 + ωBD)/(2 + ωBD). Damour and Esposito-Farese [142] considered an expansion of log A(φ) around a cosmological background value,

$$\log A(\varphi) = {\alpha _0}(\varphi - {\varphi _0}) + {1 \over 2}{\beta _0}{(\varphi - {\varphi _0})^2} + \cdots ,$$

where β0 (and further coefficients) = 0 reproduces Brans-Dicke with \(\alpha _0^2 = 1/(2{\omega _{{\rm{BD}}}} + 3),\,{\beta _0} > 0\), β0 > 0 causes the evolution of the scalar field toward φ0 (and therefore toward GR); and β0 < 0 may allow a phase change inside objects like neutron stars, leading to large SEP violations. These parameters are bound by solar-system, binary-pulsar, and GW observations [143, 186].

Scalar-tensor theories have found motivation in string theory and cosmological models, and have attracted the most attention in terms of tests with GW observations.

2.2.2 Vector-tensor theories

These are obtained by including a dynamical vector field coupled to the metric tensor. The most general second-order action in such a theory takes the form [471]

$$\begin{array}{*{20}c} {S = {1 \over {16\pi G}}\int {{{\rm{d}}^4}x\sqrt {- g}} \left[ {(1 + \omega {u_\mu}{u^\nu})R - K_{\alpha \beta}^{\mu \nu}u_{;\mu}^\alpha u_{;\nu}^\beta + \lambda ({u_\mu}{u^\mu} + 1)} \right]} , \\ {{\rm{where}}\,\,K_{\alpha \beta}^{\mu \nu} = {c_1}{g^{\mu \nu}}{g_{\alpha \beta}} + {c_2}\delta _\alpha ^\mu \delta _\beta ^\nu + {c_3}\delta _\beta ^\mu \delta _\alpha ^\nu - {c_4}{u^\mu}{u^\nu}{g_{\alpha \beta}},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array}$$

in which a semicolon denotes covariant differentiation, and the coefficients ci are arbitrary constants. There are two types of vector-tensor theories: in unconstrained theories, λ ≡ 0 and the constant ω is arbitrary, while in Einstein-aether theories the vector field is constrained to have unit norm, so the Lagrange multiplier λ is arbitrary and the constraint allows ω to be absorbed into a rescaling of G. For the unconstrained theory, only versions of the theory with c4 = 0 have been studied and for these the field equations are [469]

$$\begin{array}{*{20}c} {\quad \quad \quad \quad {R_{\kappa \lambda}} - {1 \over 2}R{g_{\kappa \lambda}} + \omega \Theta _{\kappa \lambda}^{(\omega)} + \eta \Theta _{\kappa \lambda}^{(\eta)} + \epsilon \Theta _{\kappa \lambda}^{(\epsilon)} + \tau \Theta _{\kappa \lambda}^{(\tau)} + \Lambda {g_{\kappa \lambda}} = 0,} \\ {\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \epsilon F_{\mu \nu}^{;\nu} + {1 \over 2}\tau u_{\mu ;\nu}^{;\nu} - {1 \over 2}\omega {u_\mu}R - {1 \over 2}\eta {u^\alpha}{R_{\mu \alpha}} = 0,} \\ {\quad \quad \quad \quad \quad \Theta _{\kappa \lambda}^{(\omega)} = {u_\kappa}{u_\lambda}R + {u^2}{R_{\kappa \lambda}} - {1 \over 2}{g_{\kappa \lambda}}{u^2}R - {{({u^2})}_{;\kappa \lambda}} + {g_{\kappa \lambda}}{\square_g}{u^2},} \\ {\Theta _{\kappa \lambda}^{(\eta)} = 2{u^\alpha}{u_{\left[ \kappa \right.}}{R_{\left. \lambda \right]}}_\alpha - {1 \over 2}{g_{\kappa \lambda}}{u^\alpha}{u^\beta}{R_{\alpha \beta}} - {{({u^\alpha}{u_{\left[ \kappa \right.}})}{}_{\left. {;\lambda} \right]\alpha}} + {1 \over 2}{\square_g}({u_\kappa}{u_\lambda}) + {1 \over 2}{g_{\kappa \lambda}}{{({u^\alpha}{u^\beta})}_{;\alpha \beta}},\quad \quad \quad \quad \,\,} \\ {\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,\,\,\,\Theta _{\kappa \lambda}^{(\epsilon)} = - 2(F_\kappa ^\alpha {F_{\lambda \alpha}} - {1 \over 4}{g_{\kappa \lambda}}{F_{\alpha \beta}}{F^{\alpha \beta}}),} \\ {\Theta _{\kappa \lambda}^{(\tau)} = {u_{\kappa ;\alpha}}u_\lambda ^{;\alpha} + {u_{\alpha ;\kappa}}u_{;\lambda}^\alpha - {1 \over 2}{g_{\kappa \lambda}}{u_{\alpha ;\beta}}{u^{\alpha ;\beta}} + {{({u^\alpha}{u_{\left[ {\kappa ;\lambda} \right]}} - u_{;\left[ \kappa \right.}^\alpha {u_{\left. \lambda \right]}} - {u_{\left[ \kappa \right.}}u_{\left. \lambda \right]}^{;\alpha})}_{;\alpha}},} \\ \end{array}$$

where Fμν = uν;μuμ;ν, u2uμuμ, η = −c2, ϵ = −(c2 + c3)/2, and τ = −(c1 + c2 + c3). We use the usual subscript notation, such that “(,)” and “[,]” denote symmetric and antisymmetric sums.

In the constrained Einstein-aether theory [250] the field equations are

$$\begin{array}{*{20}c} {{J^\alpha}_{\mu ;\alpha} - {c_4}{{\dot u}^\alpha}u_{;\mu}^\alpha = \lambda {u_\mu},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,\,\,} \\ {{\rm{where}}\,{J^\alpha}_\mu = K_{\mu \nu}^{\alpha \beta}u_{;\beta}^\nu ,\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {{G_{\alpha \beta}} = T_{\alpha \beta}^{(u)} + {{8\pi G} \over {{c^4}}}T_{\alpha \beta}^{{\rm{matter}}},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {T_{\alpha \beta}^{(u)} \equiv {{\left({{J_{\left(\alpha \right.}}^\mu {u_{\left. \beta \right)}} - {J^\mu}_{\left(\alpha \right.}{u_{\left. \beta \right)}} - {J_{\left({\alpha \beta} \right)}}{u^\mu}} \right)}_{;\mu}} + {c_1}\left({u_{;\mu}^\alpha u_\beta ^{;\mu} - {u_{\mu ;\alpha}}u_{;\beta}^\mu} \right) + {c_4}{{\dot u}_\alpha}{{\dot u}_\beta}} \\ {\quad \quad \quad + \left[ {{u_\nu}J_{;\mu}^{\mu \nu} - {c_4}{{\dot u}^2}} \right]{u_\alpha}{u_\beta} - {1 \over 2}{g_{\alpha \beta}}{L_u},} \\ \end{array}$$

where \({\dot u^\alpha} \equiv {u^\beta}u_{;\beta}^\alpha ,\,{L_u} = - K_{\mu \nu}^{\alpha \beta}\,u_{;\alpha}^\nu u_{;\beta}^\nu\) is the aether Lagrangian, and \(T_{\alpha \beta}^{{\rm{matter}}}\) is the usual matter stress-energy tensor [163]. Via field redefinition this theory can be shown to be equivalent to GR if c1 + c4 = 0, c1 + c2 + c3 = 0, and \({c_3} = \pm \sqrt {{c_1}({c_1} - 2)}\) [57]. Field redefinition can also be used to set c1 + c3 =0 [185]; if this constraint is imposed then equivalence to GR is only achieved if the ci are all zero. This constraint is therefore appropriate to pose Einstein-aether theory as an alternative to test against GR, since then any non-zero values of the ci would represent genuine deviations from GR.

Unconstrained vector-tensor theories were introduced in the 1970s as a straw-man alternative to GR [469], but they have four arbitrary parameters and leave the magnitude of the vector field unconstrained, which is a serious defect. Interest in Einstein-aether theories was prompted by the desire to construct a covariant theory that violated Lorentz invariance under boosts, by having a preferred reference frame — the aether, represented by the vector uμ. The preferred reference frame also provides a universal notion of time [202]. Interest in theories that violate Lorentz symmetry has recently been revived as a possible window onto aspects of quantum gravity [22].

2.2.3 Scalar-vector-tensor theories

The natural extension of scalar-tensor and vector-tensor theories are scalar-vector-tensor theories in which the gravitational field is coupled to a vector field and one or more scalar fields. These theories are relativistic generalizations of Modified Newtonian Dynamics (MoND), which was proposed in order to reproduce observed rotation curves on galactic scales. The relativistic extensions were designed to also satisfy cosmological observations on larger scales. The action takes the form

$$S = {1 \over {16\pi}}\int {{{\rm{d}}^4}x} \sqrt {- g} ({L_G} + {L_S} + {L_u} + {L_{{\rm{matter}}}}),$$

where LG = (R − 2Λ)/G and Lmatter are the usual gravitational and matter Lagrangians. There are two main versions of the theory, which differ in the choice of the scalar-field and vector-field Lagrangians LS and Lu.

In Tensor-Vector-Scalar gravity (TeVeS) [61] the dynamical vector field uμ is coupled to a dynamical scalar field ϕ. A second scalar field σ is here considered non-dynamical. The Lagrangians are

$$\begin{array}{*{20}c} {{L_S} = {1 \over {2G}}\left[ {{\sigma ^2}\left({{g^{\alpha \beta}} - {u^\alpha}{u^\beta}} \right){\phi _{,\alpha}}{\phi _{,\beta}} + {1 \over 2}{G \over {{l^2}}}{\sigma ^4}F(G{\sigma ^2})} \right],} \\ {{L_u} = {K \over {2G}}\left[ {{g^{\alpha \beta}}{g^{\mu \nu}}{B_{\alpha \mu}}{B_{\beta \nu}} + 2{\lambda \over K}({u^\mu}{u_\mu} + 1)} \right],\quad \quad \quad \,\,} \\ \end{array}$$

where Bαβ = uβ,αuα,β, F is an unspecified dimensionless function, K is a dimensionless parameter, and l is a constant length parameter. The Lagrange multiplier λ is spacetime dependent, set to enforce normalization of the vector field uμuμ = −1. In TeVeS the physical metric that governs the gravitational dynamics of ordinary matter does not coincide with gμν, but is determined by the scalar field through

$${\hat g_{\mu \nu}} = {e^{2\phi}}{g_{\mu \nu}} - 2{u_\mu}{u_\nu}\sinh (2\phi){.}$$

An alternative version of TeVeS, called Bi-Scalar-Tensor-Vector gravity (BSTV) has also been proposed [392], in which the scalar field σ is allowed to be dynamical. TeVeS is able to explain galaxy rotation curves and satisfies constraints from cosmology and gravitational lensing, but stars are very unstable [402] and the Bullet cluster [123] observations (which point to dark matter) cannot be explained.

In Scalar-Tensor-Vector Gravity (STVG) [317] the Lagrangian for the vector field is taken to be

$${L_u} = \omega \left[ {{B^{\mu \nu}}{B_{\mu \nu}} - 2{\mu ^2}{u^\mu}{u_\mu} + {V_u}(u)} \right],$$

with Bμν defined as before. The three constants ω, μ, and G that enter this action and the gravitational action are then taken to be scalar fields governed by the Lagrangian

$${L_S} = {{16\pi} \over G}\left[ {{1 \over 2}{g^{\nu \rho}}\left({{{{G_{,\nu}}{G_{,\rho}}} \over {{G^2}}} + {{{\mu _{,\nu}}{\mu _{,\rho}}} \over {{\mu ^2}}} - {\omega _{,\nu}}{\omega _{,\rho}}} \right) + {{{V_G}(G)} \over {{G^2}}} + {{{V_\mu}(\mu)} \over {{\mu ^2}}} + {V_\omega}(\omega)} \right]{.}$$

It is claimed that STVG predicts no deviations from GR on the scale of the solar system or for small globular clusters [319], and that it can reproduce galactic rotation curves [97], gravitational lensing in the Bullet cluster [98], and a range of cosmological observations [318]. TeVeS-like theories are constrained by binary-pulsar observations [186]. It has been proposed that an extension of the ESA-led LISA Pathfinder technology-demonstration mission may allow additional constraints on this class of theories [301]. To date the consequences of TeVeS or STVG for GW observations have not been investigated.

2.2.4 Modified-action theories

f(R) gravity. This theory is derived by replacing R with an arbitrary function f(R) in the Einstein-Hilbert action. There are two versions of f(R) gravity. In the metric formalism the action is extremized with respect to the metric coefficients only, and the connection is taken to be the metric connection, depending on the metric components in the standard way. The resulting field equations are

$$(- {R_{;\kappa}}{R_{;\lambda}} + {g_{\kappa \lambda}}{R_{;\mu}}{R^{;\mu}})f^{\prime\prime\prime}(R) + (- {R_{;\kappa \lambda}} + {g_{\kappa \lambda}}{\square}R)f^{\prime\prime}(R) + {R_{\kappa \lambda}}f^{\prime}(R) - {1 \over 2}{g_{\kappa \lambda}}f(R) = 0{.}$$

In the Palatini formalism, the field equations are found by extremizing the action over both the metric and the connection. For an f(R) action the resulting equations are

$$\begin{array}{*{20}c} {{R_{\kappa \lambda}}{f^\prime}(R) - {1 \over 2}{g_{\kappa \lambda}}f(R) = 0,} & {{\nabla _\alpha}\left[ {\sqrt {- g} {f^\prime}(R){g^{\kappa \lambda}}} \right] = 0{.}} \\ \end{array}$$

If the second derivative f′(R) ≠ 0, metric f(R) gravity can be shown to be equivalent to a Brans-Dicke theory with ωBD = 0, while Palatini f(R) gravity is equivalent to a Brans-Dicke theory with ωBD = −3/2, with no constraint imposed on f(R) [180, 419, 146]. In both cases, the Brans-Dicke potential depends on the exact functional form f(R).

f(R) theories have attracted a lot of interest in a cosmological context, since the flexibility in choosing the function f(R) allows a wide range of cosmological phenomena to be described [336, 108], including inflation [423, 459] and late-time acceleration [107, 112], without violating constraints from Big-Bang Nucleosynthesis [168]. However, metric f(R) theories are strongly constrained by solar-system and laboratory measurements if the scalar degree of freedom is assumed to be long-ranged, which modifies the form of the gravitational potential [121]. This problem can be avoided by assuming a short-range scalar field, but then f(R) theories can only explain the early expansion of the universe and not late-time acceleration. The Chameleon mechanism [262] has been invoked to circumvent this, as it allows the scalar-field mass to be a function of curvature, so that the field can be short ranged within the solar system but long ranged on cosmological scales.

There are also other issues with f(R) theories. For example, in Palatini f(R) gravity the post-Newtonian metric depends on the local matter density [418], while in metric f(R) gravity with f′(R) < 0 there is a Ricci-scalar instability [153] that arises because the effective gravitational constant increases with increasing curvature, leading to a runaway instability for small stars [56, 55]. We refer the reader to [419, 146] for more complete reviews of the current understanding of f(R) gravity.

Chern-Simons gravity. Yunes and others [6, 8, 12, 13, 103, 104, 111, 158, 189, 212, 218, 272, 323, 416, 488, 491, 496, 501, 503, 499, 340] have recently developed an extensive analysis of the observational consequences of Jackiw and Pi’s Chern-Simons gravity [249], which extends the Hilbert action with an additional Pontryagin term *RR that is quadratic in the Riemann tensor [212]:

$$I = {(16\pi G)^{- 1}}\int {[\tilde R - {1 \over 4}\theta ^{\ast}RR]{{(- g)}^{1/2}}{{\rm{d}}^4}x + {{{\rm I}} _{{\rm{matter}}}}({\psi _{\rm{m}}},{g_{\mu \nu}});}$$

here \({}^*RR = {}^*{R^a}_b{}^{cd}{R^b}_{acd}\) is built with the help of the dual Riemann tensor \(^{\ast}{R^a}_b{}^{cd} = {1 \over 2}{\epsilon ^{cde\,f}}{R^a}_{be\,f}\), and it can be expressed as the divergence of the gravitational Chern-Simons topological current;Footnote 3 the scalar field θ can be treated either as a dynamical quantity, or an absolute field. In both cases, *RR vanishes, either dynamically, or as a constraint on acceptable solutions, needed to enforce coordinate-invariant matter dynamics, which restricts the space of solutions available to GR.

Chern-Simons gravity is motivated by string theory and by the attempt to develop a quantum theory of gravity satisfying a gauge principle. The Pontryagin term arises in the standard model of particle physics as a gauge anomaly: the classical gravitational Noether current that comes from the symmetry of the gravitational action is no longer conserved when the theory is quantized, but has a divergence proportional to the Pontryagin term. This anomaly can be canceled by modifying the action via the addition of the Chern-Simons Pontryagin term. The same type of correction arises naturally in string theory through the Green-Schwarz anomaly-canceling mechanism, and in Loop Quantum Gravity to enforce parity and charge-parity conservation.

The presence of the Chern-Simons correction leads to parity violation, which has various observable consequences, with magnitude depending on the Chern-Simons coupling, which string theory predicts will be at the Planck scale. If so, these effects will never be observable, but various mechanisms have been proposed that could enhance the strength of the Chern-Simons coupling, such as non-perturbative instanton corrections [433], fermion interactions [10], large intrinsic curvatures [9] or small string couplings at late times [468]. For further details on all aspects of Chern-Simons gravity, we refer the reader to [11].

General quadratic gravity. This theory arises by adding to the action all possible terms that are quadratic in the Ricci scalar, Ricci tensor, and Riemann tensor. For the action

$$S = {1 \over {16\pi G}}\int {\sqrt {- g} (- 2\Lambda + R + \alpha {R^2} + \beta {R_{\sigma \tau}}{R^{\sigma \tau}} + \gamma {R_{\alpha \beta \gamma \delta}}{R^{\alpha \beta \gamma \delta}}){{\rm{d}}^4}x}$$

the field equations are [372]

$$\begin{array}{*{20}c} {{R_{\kappa \lambda}} - {1 \over 2}R{g_{\kappa \lambda}} + \alpha {K_{\kappa \lambda}} + \beta {L_{\kappa \lambda}} + \Lambda {g_{\kappa \lambda}} = 0,\,\,\,\,\,\,\,\,\,\,\quad \quad \quad \quad \,\,\,} \\ {{K_{\kappa \lambda}} = - 2{R_{;\kappa \lambda}} + 2{g_{\kappa \lambda}}{\square}R - {1 \over 2}{R^2}{g_{\kappa \lambda}} + 2R{R_{\kappa \lambda}},\quad \quad \quad \quad \,\,\,\,} \\ {{L_{\kappa \lambda}} \equiv - 2R_{\kappa ;\sigma \lambda}^\sigma + {\square}{R_{\kappa \lambda}} + {1 \over 2}{g_{\kappa \lambda}}{\square}R - {1 \over 2}{g_{\kappa \lambda}}{R^{\sigma \tau}} + 2R_\kappa ^\sigma {R_{\sigma \lambda}}{.}} \\ \end{array}$$

This class of theories is parameterized by the coefficients α, β, and γ. More recently, Stein and Yunes [504] considered a more general form of quadratic gravity that includes the Pontryagin term from Chern-Simons gravity. Their action was

$$\begin{array}{*{20}c} {S \equiv \int {\sqrt {- g}} \left\{{\kappa R + {\alpha _1}{f_1}(\theta){R^2} + {\alpha _2}{f_2}(\theta){R_{ab}}{R^{ab}} + {\alpha _3}{f_3}(\theta){R_{abcd}}{R^{abcd}}} \right.} \\ {+ {\alpha _4}{f_4}(\theta)R_{abcd}^{\ast} {R^{abcd}} - \left. {{\beta \over 2}[{\nabla _a}\theta {\nabla ^a}\theta + 2V(\theta)] + {{\mathcal L}_{{\rm{matter}}}}} \right\},\quad} \\ \end{array}$$

in which the αi and β are coupling constants, θ is a scalar field, and matter is the matter Lagrangian density as before. There are two versions of this theory: a non-dynamical version in which the functions are constants, and a dynamical version in which they are not.

General quadratic theories are known to exhibit ghost fields — negative mass-norm states that violate unitarity (see, e.g., [419] for a discussion and further references). These occur generically, although models with an action that is a function only of R and R2 − 4Rμν Rμν + Rμνρσ Rμνρσ only are ghost-free [126]. Ghost fields are also present in Chern-Simons modified gravity [323, 158], which places strong constraints on the parameters of that model.

2.2.5 Massive-graviton theories

Massive-graviton theories were first considered by Pauli and Fierz [350, 175, 176], whose theory is generated by an action of the form

$$\begin{array}{*{20}c} {{S_{{\rm{PF}}}} = M_P^2\int {{{\rm{d}}^4}x} \left[ {- {1 \over 4}{{({\partial _\mu}{h_{\nu \rho}})}^2} + {1 \over 4}{{({\partial _\mu}h)}^2} - {1 \over 2}({\partial _\mu})({\partial ^\nu}h_\nu ^\mu) + {1 \over 2}({\partial _\mu}{h_{\nu \rho}})({\partial ^\nu}{h^{\mu \rho}})} \right.} \\ {\left. {- {1 \over 4}{m^2}({h_{\mu \nu}}{h^{\mu \nu}} - {h^2}) + M_P^{- 2}{T_{\mu \nu}}{h^{\mu \nu}}} \right],} \\ \end{array}$$

in which hμν is a rank-two covariant tensor, m and MP are mass parameters, Tμν is the matter energy-momentum tensor, indices are raised and lowered with the Minkowski metric ημν, and h = hμνημν. The terms on the first line of this expression are generated by expanding the Einstein-Hilbert action to quadratic order in hμν. The massive graviton term is m2 (hμνhμνh2); it contains a spin-2 piece hμν and a spin-0 piece h.

This model suffers from the van Dam-Velten-Zakharov discontinuity [454, 505]: no matter how small the graviton mass, the Pauli-Fierz theory leads to different physical predictions from those of linearized GR, such as light bending. The theory also predicts that the energy lost into GWs from a binary is twice the GR prediction, which is ruled out by current binary-pulsar observations. It might be possible to circumvent these problems and recover GR in the weak-field limit by invoking the Vainshtein mechanism [446, 41], which relies on nonlinear effects to “hide” certain degrees of freedom for source distances smaller than the Vainshtein radius [40]. The massive graviton can therefore become effectively massless, recovering GR on the scale of the solar system and in binary-pulsar tests, while retaining a mass on larger scales. In such a scenario, the observational consequences for GWs would be a modification to the propagation time for cosmological sources, but no difference in the emission process itself.

There are also non-Pauli-Fierz massive graviton theories [36]. For these, the action is the same as that in Eq. (20), but the first term on the second line (the massive graviton term) takes the more general form

$$- ({k_1}{h_{\mu \nu}}{h^{\mu \nu}} + {k_2}{h^2}),$$

where k1 and k2 are new constants of the theory that represent the squared masses of the spin-2 and spin-0 gravitons respectively. This theory can recover GR in the weak field, since k1 and k2 can independently be taken to zero, with modifications to weak-field effects that are on the order of the graviton mass squared. These theories are generally thought to suffer from instabilities [350, 175, 176], which arise because the spin-0 graviton carries negative energy. However, it was shown in [36] that the spin-0 graviton cannot be emitted without spin-2 gravitons also being generated. The spin-2 graviton energy is positive and greater than that of spin-2 gravitons in GR, which compensates for the spin-0 graviton’s negative energy. The total energy emitted is therefore always positive, and it converges to the GR value in the limit that the spin-2 graviton mass goes to zero.

These alternative massive-graviton theories are therefore perfectly compatible with current observational constraints, but make very different predictions for strong gravitational fields [36], including the absence of horizons for black-hole spacetimes and oscillatory cosmological solutions. Despite these potential problems, the existence of a “massive graviton” can be used as a convenient strawman for GW constraints, since the speed of GW propagation can be readily inferred from GW observations and compared to the speed of light. These proposed tests generally make no reference to an underlying theory but require only that the graviton has an effective mass and hence GWs suffer dispersion. This will be discussed in more detail in Section 5.1.2.

2.2.6 Bimetric theories of gravity

As their name suggests, there are two metrics in bimetric theories of gravity [382, 384]. One is dynamical and represents the tensor gravitational field; the other is a metric of constant curvature, usually the Minkowski metric, which is non-dynamical and represents a prior geometry. There are various bimetric theories in the literature.

Rosen’s theory has the action [381, 382, 383, 384]

$$S = {1 \over {64\pi G}}\int {{{\rm{d}}^4}x} \left[ {\sqrt {- \det (\eta)} {\eta ^{\mu \nu}}{g^{\alpha \beta}}{g^{\gamma \delta}}\left({{g_{\alpha \gamma {\vert}\mu}}{g_{\alpha \delta {\vert}\nu}} - {1 \over 2}{g_{\alpha \gamma {\vert}\mu}}{g_{\alpha \delta {\vert}\nu}}} \right)} \right] + {S_{{\rm{matter}}}},$$

in which is the fixed flat, non-dynamical metric, gμν is the dynamical gravitational metric and the vertical line in subscripts denotes a covariant derivative with respect to ημν. The final term, Smatter, denotes the action for matter fields. The field equations may be written

$${{\square}_\eta}{g_{\mu \nu}} - {g^{\alpha \beta}}{\eta ^{\gamma \delta}}{g_{\mu \alpha {\vert}\gamma}}{g_{\nu \beta {\vert}\delta}} = - 16\pi G\sqrt {\det (g)/\det (\eta)} \left({{T_{\mu \nu}} - {1 \over 2}{g_{\mu \nu}}T} \right){.}$$

Lightman and Lee [287] developed a bimetric theory based on a non-metric theory of gravity due to Belinfante and Swihart [62]. The action for this “BSLL” theory is

$$S = {1 \over {64\pi G}}\int {{{\rm{d}}^4}x\sqrt {- \det (\eta)} \left({{1 \over 4}{h_{\mu \nu {\vert}\alpha}}{h^{\mu \nu {\vert}\alpha}} - {5 \over {64}}{h_{,\alpha}}{h^{,\alpha}}} \right) + {S_{{\rm{matter}}}},}$$

in which η is the non-dynamical flat background metric and is a dynamical gravitational tensor related to the gravitational metric gμν via

$$\begin{array}{*{20}c} {{g_{\mu \nu}} = {{\left({1 - {1 \over {16}}h} \right)}^2}\Delta _\mu ^\alpha {\Delta _{\alpha \nu}},} \\ {\delta _\nu ^\mu = \Delta _\nu ^\alpha \left({\delta _\alpha ^\mu - {1 \over 2}h_\mu ^\alpha} \right),\,\,\,} \\ \end{array}$$

in which \(\delta _\nu ^\mu\) is the Kronecker delta and \(\Delta _\nu ^\alpha\) is defined by the second equation. Indices on Δαβ and hαβ are raised and lowered with ημν, but on all other tensors indices are raised and lowered by gμν. Both the Rosen and BSLL bimetric theories give rise to alternative GW polarization states, and have been used to motivate the construction of the parameterized post-Einsteinian (ppE) waveform families discussed in Section 5.2.2.

There is also a bimetric theory due to Rastall [374], in which the metric is an algebraic function of the Minkowski metric and of a vector field Kμ. The action is

$$S = {1 \over {64\pi G}}\int {{{\rm{d}}^4}x} \left[ {\sqrt {- \det (g)} F(N){K_{\mu ;\nu}}{K^{\mu ;\nu}}} \right] + {S_{{\rm{matter}}}},$$

in which F(N) = −N/(N + 2), N = gμνKμKν and a semicolon denotes a derivative with respect to the gravitational metric gμν. The metric follows from Kμ by way of

$${g_{\mu \nu}} = \sqrt {1 + {\eta ^{\alpha \beta}}{K_\alpha}{K_\beta}} ({\eta _{\mu \nu}} + {K_\mu}{K_\nu}),$$

where is again the non-dynamical flat metric. This theory has not been considered in a GW context and we will not mention it further; more details, including the field equations, can be found in [469].

2.3 The black-hole paradigm

The present consensus is that all of the compact objects observed to reside in galactic centers are supermassive black holes, described by the Kerr metric of GR [377]. This explanation follows naturally in GR from the black-hole uniqueness theorems and from a set of additional assumptions of physicality, briefly discussed below. If a deviation from Kerr is inferred from GW observations, it would imply that the assumptions are violated, or possibly that GR is not the correct theory of gravity. Space-based GW detectors can test black-hole “Kerr-ness” by measuring the GWs emitted by smaller compact bodies that move through the gravitational potentials of the central objects (see Section 6.2). Kerr-ness is also tested by characterizing multiple ringdown modes in the final black hole resulting from the coalescence of two precursors (see Section 6.3).

The current belief that Kerr black holes are ubiquitous follows from work on mathematical aspects of GR in the middle of the 20th century. Oppenheimer and Snyder demonstrated that a spherically-symmetric, pressure-free distribution would collapse indefinitely to form a black hole [341]. This result was assumed to be a curiosity due to spherical symmetry, until it was demonstrated by Penrose [351] and by Hawking and Penrose [224] that singularities arise inevitably after the formation of a trapped surface during gravitational collapse. Around the same time, it was proven that the black-hole solutions of Schwarschild [401] and Kerr [260] are the only static and axisymmetric black-hole solutions in GR [248, 114, 379]. These results together indicated the inevitability of black-hole formation in gravitational collapse.

The assumptions that underlie the proof of the uniqueness theorem are that the spacetime is a stationary vacuum solution, that it is asymptotically flat, and that it contains an event horizon but no closed timelike curves (CTCs) exterior to the horizon [223]. The lack of CTCs is needed to ensure causality, while the requirement of a horizon is a consequence of the cosmic-censorship hypothesis (CCH) [352]. The CCH embodies this belief by stating that any singularity that forms in nature must be hidden behind a horizon (i.e., cannot be naked), and therefore cannot affect the rest of the universe, which would be undesirable because GR can make no prediction of what happens in its vicinity. However, the CCH and the non-existence of CTCs are not required by Einstein’s equations, and so they could in principle be violated.

Besides the Kerr metric, we know of many other “black-hole-like” solutions to Einstein’s equations: these are vacuum solutions with a very compact central object enclosed by a high-redshift surface. In fact, any metric can become a solution to Einstein’s equation: it is sufficient to insert it in the Einstein tensor, and postulate the resulting “matter” stress-energy tensor as an input to the equations. However, such matter distributions will not in general satisfy the energy conditions (see, e.g., [361]):

  • The weak energy condition is the statement that all timelike observers in a spacetime measure a non-negative energy density, Tμνvμvν ≥ 0, for all future-directed timelike vectors vμ. The null energy condition modifies this condition to null observers by replacing by an arbitrary future-directed null vector kμ.

  • The strong energy condition requires the Ricci curvature measured by any timelike observer to be non-negative, \(({T_{\mu \nu}} - T_\alpha ^\alpha {g_{\mu \nu}}/2){\upsilon ^\mu}{\upsilon ^\nu} \ge 0\), for all timelike vμ.

  • The dominant energy condition is the requirement that matter flow along timelike or null world lines: that is, that \(- T_\nu ^\mu {\upsilon ^\nu}\) be a future-directed timelike or null vector field for any future-directed timelike vector vμ.

These conditions make sense on broad physical grounds; but even after imposing them, there remain several black-hole-like solutions [427] besides Kerr. Thus, space-based GW detectors offer an important test of the “black-hole paradigm” that follows from GR plus CCH, CTC non-existence, and the energy conditions. This paradigm is especially important: putative black holes are observed to be ubiquitous in the universe, so their true nature has significant implications for our understanding of astrophysics.

If one or many non-Kerr metrics are found, the hope is that observations will allow us to tease apart the various possible explanations:

  • Does the spacetime contain matter, such as an accretion disk, exterior to the black hole?

  • Are the CCH, the no-CTC assumption, or the energy conditions violated?

  • Is the central object an exotic object, such as a boson star [389, 261]?

  • Is gravity coupled to other fields? This can lead to different black-hole solutions [265, 396, 413], although some such solutions are known [428]) or suspected [156] to be unstable to generic perturbations.

  • Is the theory of gravity just different from GR? For instance, in Chern-Simons gravity black holes (to linear order in spin) differ from Kerr in their octupole moment [496], and this correction may produce the most significant observational signature in GW observations [416].

While these questions are challenging, we can learn a lot by testing black-hole structure with space-based GW detectors. These tests are discussed in detail in Section 6.

3 Space-Based Missions to Detect Gravitational Waves

The experimental search for GWs began in the 1960s with Joseph Weber’s resonant bars (and resonant claims [124]); it has since grown into an extensive international endeavor that has produced a network of km-scale GW interferometers (LIGO [288], EGO/VIRGO [167], GEO600 [203], and Tama/KAGRA [254]), the proposals for space-based observatories such as LISA, and the effort to detect GWs by using an array of pulsars as reference clocks [232].

In this section we briefly describe the architecture of LISA-like space-based GW observatories, beginning with the “classic” LISA design, and then discussing the variations studied in the 2011–2012 ESA and NASA studies [20, 215]. We also discuss proposals for detectors that would operate in a higher frequency band (between 0.1 and 10 Hz) that would bridge the gap in sensitivity between LISA-like and ground-based observatories.

3.1 The classic LISA architecture

The most cited LISA reference is perhaps the 1998 pre-phase A mission study [64]; a more up-to-date review of the LISA technical architecture is given by Jennrich [252], while the LISA science-case document [370] describes the state of LISA science at the end of the 2000s. Here we give only a quick review of the elements of the mission, referring the reader to those references for in-depth discussions.

LISA principles. LISA consists of three identical cylindrical spacecraft, approximately 3 m wide and 1 m high, that are launched together and, after a 14-month cruise, settle into an Earth-like heliocentric orbit, 20° behind the Earth. The orbit of each spacecraft is tuned slightly differently, resulting in an equilateral-triangle configuration with 5 × 106 km arms (commensurate with the frequencies of the LISA sensitivity band), inclined by 60° with respect to the ecliptic. This configuration is maintained to 1% by orbital dynamics alone for the lifetime of the mission, nominally 5 years, with a goal of 10. In the course of a year, the center of the LISA triangle completes a full revolution on the ecliptic, while the triangle itself, as well as its normal vector, rotate through 360°.

LISA detects GWs by monitoring the fluctuations in the distances between freely-falling reference bodies — in LISA, platinum-gold test masses housed and protected by the spacecraft. LISA uses three pairs of laser interferometric links to measure inter-spacecraft distances with errors near 10 ± Hz−1/2. The phase shifts accumulated along different interferometer paths are proportional to an integral of GW strain along those trajectories. The test masses suffer from residual acceleration noise at a level of 3 × 10−15 m s−2, which is dominant below 3 mHz. Together, the position and acceleration noises (divided by a multiple of the armlength) determine the LISA sensitivity to GW strain, which reaches 10−20 Hz−1/2 at frequencies of a few mHz.

Alternatively (but equivalently), we may describe LISA as measuring the GW-induced relative Doppler shifts between the local lasers on each spacecraft and the remote lasers. These Doppler shifts are directly proportional to a difference of instantaneous GW strains (as experienced by the local spacecraft at the time of measurement, and by the distant spacecraft at a time retarded by the LISA armlength divided by the speed of light). In either description, the 1% “breathing” of the LISA constellation occurs at frequencies that are safely below the LISA measurement band.

LISA technology. The distance measurement between the test masses along each arm is in fact split in three: an inter-spacecraft measurement between the two optical benches, and two local measurements between each test mass and its optical bench. To achieve the inter-spacecraft measurement, 2 W of 1064-nm light are sent and received through 40-cm telescopes. The diffracted beams deliver only 100 pW of light to the distant spacecrafts, so they cannot be reflected directly; instead, the laser phase is measured and transponded back by modulating the frequency of the local lasers. Each of the two optical assemblies on each spacecraft include the telescope and a Zerodur optical bench with bonded fused-silica optical components, which implements the interspacecraft and local interferometers, as well as a few auxiliary interferometric measurements, which are needed to monitor the stability of the telescope structure, to control point-ahead corrections, and to compare the two lasers on each spacecraft. The LISA phasemeters digitize the signals from the optical-bench photodetectors at 50 MHz, and multiply them with the output of local oscillators, computing phase differences and driving the oscillators to track the frequency of the measured signal. This heterodyne scheme is needed to handle Doppler shifts (as well as laser frequency offsets) as large as 15 MHz.

Because intrinsic fluctuations in the laser frequencies are indistinguishable from GWs in the LISA output, laser frequency noise needs to be suppressed by several orders of magnitude, using a hierarchy of techniques [252]: the lasers are prestabilized to local frequency references; arm locking may be used to further stabilize the lasers using the LISA arms (or their differences) as stable references; finally, Time-Delay Interferometry [149] is applied in post processing to remove residual laser frequency noise by algebraically combining appropriately-delayed single-link measurements, in such a way that all laser-frequency-noise terms appear as canceling pairs in the combination. With this final step, the LISA measurements combine into synthesized-interferometer observables [447] analogous to the readings of ground-based interferometers such as LIGO.

The LISA disturbance reduction system (DRS) minimizes the deviations of the test masses from free-fall trajectories, by shielding them from solar radiation pressure and interplanetary magnetic fields. On each spacecraft, the DRS includes two gravitational reference sensors (GRSs): a GRS consists of a 2-kg test mass, enclosed in an electrode housing with capacitive reading and control of test-mass position and orientation, and accompanied by additional components to cage and reposition the test mass, to maintain vacuum, and to control the accumulation of charge. The other crucial part of the DRS are the micro-Newton thrusters (on each spacecraft, three clusters of four colloid or field-emission-electric propulsion systems), which are controlled in response to the GRS readings to maintain the nominal position of the test mass with respect to the spacecraft. This is known as drag-free control. The thrusters need to provide up to 100 force with < 0.1 μN noise. In addition to this active correction, test-mass acceleration noise is minimized by the accurate knowledge and correction of spacecraft self-gravity, by enforcing magnetic cleanliness, and by controlling thermal fluctuations. A version of the LISA DRS with slightly lower performance will be flown and tested in LISA Pathfinder [305, 25], a single-spacecraft technology precursor mission.

The LISA response to GWs. Compared to ground-based interferometers, the LISA response to GWs is both richer and more complex [166, 462, 448, 133]. First, the revolution and rotation of the LISA constellation imprint a sky-position-dependent signature on long-lasting GW signals (which for LISA include all binary signals): the revolution causes a time-dependent Doppler shift with a period of a year, and a fractional amplitude of 10−4; the rotation introduces a 1-year periodicity in the LISA equivalent of the ground-based-interferometer antenna patterns [299], endowing an incoming monochromatic GW signal with eight sidebands, separated by yr−1 = 3.17 × 10−8 Hz. These effects are sometimes referred to as the LISA AM and FM modulations. Thus, GW signals as measured by LISA carry information regarding the position of the source; on the other hand, data analysis must account for these effects, possibly including sky position among the parameters of matched-filtering search templates.

Second, the long-wavelength approximation, whereby the entire “interferometer” shrinks and expands as one, cannot be used throughout the LISA band; indeed, the wavelength of GWs reaches the LISA armlength at a frequency of 60 mHz. As a consequence, the LISA response to a few-second impulsive GW is not a single pulse, but a collection of pulses with amplitudes and separations dependent on the sky position and polarization of the source; the effect on high-frequency chirping signals is more subtle, but still present. This further complicates data analysis, and introduces an independent mechanism (a triangulation of sorts) to localize GW sources of short duration and high frequency.

Third, LISA is in effect three detectors in one: this can be understood most easily by considering that subsets of the three LISA arms form three separate Michelson-like interferometers (known in LISA lingo as X, Y, and Z) at 120° angles. More formally, the LISA interferometric measurements can be combined into many different TDI observables (see Figure 1), some resembling actual optical setups, others quite exotic [447], although at most three observables are independent in the sense that any other observable can be reconstructed by time-delaying and summing a generic basis of three observables [452]. Furthermore, such a basis can be chosen so that its components have uncorrelated noises, much like widely separated ground-based detectors [369]. One of these must correspond, in effect, to X+Y + Z, and by symmetry it must be relatively insensitive to GWs in the long-wavelength limit, providing for an independent measurement of a combination of instrument noises.

Figure 1
figure 1

Time-Delay Interferometry (TDI). LISA-like detectors measure GWs by transmitting laser light between three spacecraft in triangular configuration, and comparing the optical phase of the incident lasers against reference lasers on each spacecraft. To avoid extreme requirements on laser-frequency stability over the course of the many seconds required for transmission around the triangle, data analysts will generate time-delayed linear combinations of the phase comparisons; the combinations simulate nearly equal-delay optical paths around the sides of the triangle, and (much like an equal-arm Michelson interferometer) they suppress laser frequency noise. Many such combinations, including those depicted here, are possible, but altogether they comprise at most three independent gravitational-wave observables. lmage reproduced by permission from [447], copyright by APS.

3.2 LISA-like observatories

The mission-concept studies ran in 2011–2012 by ESA [20] and NASA [215] embrace several approaches to limiting cost.

Reducing mass is a broadly useful strategy, because it allows for launch on smaller, cheaper rockets, and because mass has been shown to be a good proxy for mission complexity, and therefore implementation cost. ESA’s NGO design envisages interferometric links along two arms rather than LISA’s three, resulting in an asymmetric configuration with one full and two “half” spacecraft. As a consequence, only one TDI observable can be formed, eliminating the capability of measuring two combinations of GW polarizations simultaneously.

Propellant may be saved by placing the spacecraft closer together (with arms ∼ 1–3 Mkm rather than 5 Mkm) and closer to Earth. At low frequencies, the effect of shorter armlengths is to reduce the response to GWs proportionally; for the same test-mass noise, sensitivity then decreases by the same ratio. At high frequencies, the laser power available for position measurement increases as L−2, since beams are broadly defocused at millions of kms, improving shot noise, but not other optical noises, by a factor L−1 (in rms). As a consequence, the sweet spot of the LISA sensitivity shifts to higher frequencies, although one may instead plan for a less powerful laser, for further cost savings. Spacecraft orbits that are different than LISA’s, such as geocentric options, or flat configurations that lie in the ecliptic plane would also alter GW-signal modulations, and therefore parameter-estimation performance. The NASA report examines such effects briefly [215].

Reducing mission duration also saves money, because it reduces the cost of supporting the mission from the ground, and it allows for shorter “warranties” on the various subsystems. Conversely, missions are made cheaper by accepting more risk of failure or underperformance, since risk is “retired” by extensive testing and by introducing component redundancy, both of which are expensive. The NASA study also explored replacing LISA’s custom subsystems with variants already flown on other missions (as in OMEGA [229]), or eliminating some of them altogether (as in LAGRANGE [304]). However, the significant performance hit and additional risk incurred by such steps is not matched by correspondingly major savings, because the main cost driver for LISA-like missions is the necessity of launching and flying three (or more) independent spacecraft. Switching to atom interferometry would make for very different mission architectures, but the NASA study finds that an atom-interferometer mission would face many of the same cost-driving constraints as a laser-interferometer mission [215, 43].

Indeed, the overarching conclusion of the NASA study is that no technology can provide dramatic cost reductions, and that scientific performance decreases far more rapidly than cost. Thus, “staying the course” and pursuing a modestly descoped LISA-like design, whenever programs and budgets will allow it, may yet be the most promising strategy for GW detection in space.

3.3 Mid-frequency space-based observatories

The DECi-hertz Interferometer Gravitational wave Observatory (DECIGO [408, 256, 257]) is a proposed Japanese mission that would observe GWs at frequencies between 1 mHz and 100 Hz, reaching its best (h ∼ 10−23) sensitivity between 0.1 and 10 Hz, and thus bridging the gap between LISA-like and ground-based detectors. Prior to DECIGO, the possibility of observing GWs in the decihertz band had been studied in the context of a possible follow-up to LISA, the Big Bang Observer (BBO [357, 135, 138]).

The final DECIGO configuration (2024+) envisages four clusters in an Earth-like solar orbit, each cluster consisting of three drag-free spacecraft in a triangle with 1000-km arms. GWs are measured by operating the arms as a Fabry-Pérot interferometer, which requires keeping the arm-lengths constant, in analogy to ground-based interferometers and in contrast to LISA’s transponding scheme. DECIGO’s test masses are 100 kg mirrors, and its lasers have 10 W power. The roadmap toward DECIGO includes two pathfinders: the single-spacecraft DECIGO Pathfinder [23] consists of a 30 cm Fabry-Pérot cavity, and it could detect binaries of 103–105 M black holes if they exist near the galaxy [480]; next, pre-DECIGO [257] would demonstrate the DECIGO measurement with three spacecraft and modest optical parameters, resulting in a sensitivity 10–100 times worse than one of the final DECIGO clusters.

The DECIGO science objectives [257] include measuring the GW stochastic background from “standard” inflation (with sensitivity down to Ωgr − 2 × 10−16), and determining the thermal history of the universe between the end of inflation and nucleosynthesis [325, 274]; searching for hypothesized primordial black holes [390]; characterizing dark energy by using neutron-star binaries as standard candles (either with host redshifts [333], or by the effect of cosmic expansion on the inspiral phasing [408, 334]); illuminating the formation of massive galactic black holes by observing the coalescences of intermediate-mass (103–104 M) systems; constraining the structure of neutron stars by measuring their masses (in upwards of 100000 detections per year); and even searching for planets orbiting neutron-star binaries.

DECIGO can also test alternative theories of gravity, as reviewed in [482]: by observing neutron-star-intermediate-mass-black-hole systems, it can constrain the dipolar radiation predicted in Brans-Dicke scalar-tensor theory (see Section 5.1.3 in this review) better than LISA, and four orders of magnitude better than solar-system experiments [486]; it can constrain the speed of GWs, parametrized as the mass of the graviton (Section 5.1.2) three orders of magnitude better than solar-system experiments, although not as well as LISA [486]; by observing binaries of neutron stars and stellar-mass black holes, it can constrain the Einstein-Dilaton-Gauss-Bonnet [481] and dynamical Chern-Simons [487] theories; it can measure the bulk AdS curvature scale that modifies the evolution of black-hole binaries in the Randall-Sundrum II braneworld model [484]; and it can even look for extra polarization modes (Section 5.1.1) in the stochastic GW background [332].

In the rest of this review we concentrate on the tests of GR that will be possible with low-, rather than mid-frequency GW observatories. Nevertheless, many of the studies performed for LISA-like detectors are easily extended to the ambitious DECIGO program, which would probe GW sources of similar nature, but of different masses or in different phases of their evolution.

4 Summary of Low-Frequency Gravitational-Wave Sources

The low-frequency GW band, which we define as 10−5 Hz ≲ f ≲ 10−1 Hz, is astrophysically appealing because it is populated by a large number of sources, which are relevant to understanding stellar evolution and stellar populations, to probing the growth and evolution massive black holes (MBH) and their links to galaxies, and possibly to observing the primordial universe. Space-based interferometric detectors such as LISA and eLISA are sensitive over this entire band. Figure 2 illustrates the GW strength, as a function of GW frequency, for a variety of interesting sources classes. The LISA and eLISA baseline sensitivities are overlaid, illustrating the potential reach of interferometric detectors. The area above the baseline curve is generically called “discovery space;” the height of a source above the curve provides a rough estimate of the expected signal-to-noise ratio (SNR) with which the source could be detected using matched-filtering techniques. SNR and matched filtering are important concepts that underlie much of what will be discussed elsewhere in this review, so we explain them briefly in this section.

Figure 2
figure 2

The discovery space for space-based GW detectors, covering the low-frequency region of the GW spectrum, 10−5 Hz ≲ f ≲ 0.1 Hz. The discovery space is delineated by the LISA threshold sensitivity curve [277] in black, and by the eLISA sensitivity curve in red [21] (the curves were produced using the online sensitivity curve and source plotting website [321]). This region is populated by a wealth of strong sources, often in large numbers, including mergers of MBHs, EMRls of stellar-scale compact objects into MBHs, and millions of close-orbiting binary systems in the galaxy. Thousands of the strongest signals from these galactic binary systems should be individually resolvable, while the combined signals of millions of them produce a stochastic background at low frequencies. These systems provide ample opportunities for astrophysical tests of GR for gravitational-field strengths that are not well characterized and studied in conventional astronomy.

In a particular GW search based on the evaluation of a detection “statistic” ρ, the detection SNR of a GW signal is defined as the ratio of the expectation value when the signal is present to the root-mean-square average of ρ when the signal is absent. Typically the output s of the detector is modeled as the sum of a signal \({\bf{h}}(\vec \theta)\), depending on parameters \({\vec \theta}\), and of instrumental noise n. It is normally assumed that the noise is stationary and Gaussian, and that it has uncorrelated frequency components ñ(f). Under these assumptions, the statistical fluctuations of noise are completely determined by the one-sided power spectral density (PSD) Sh(f), which is defined by the equation

$$\left\langle {\tilde n(f)\tilde n(f^{\prime})^{\ast}} \right\rangle = {1 \over 2}{S_h}(f)\delta (f - f^{\prime}),$$

where 〈〉 denotes the expectation value over realizations of the noise (also known as the ensemble average), and the asterisk denotes complex conjugation. This gives rise to a natural definition of an inner product on the space of possible waveform templates,

$$({{\bf{h}}_1}{\vert}{{\bf{h}}_2}) = 2\int\nolimits_0^\infty \left[ {{{\tilde h_1^{\ast}{{\tilde h}_2} + {{\tilde h}_1}\tilde h_2^{\ast}} \over {{S_h}(f)}}} \right]\,\,\,{\rm{d}}f.$$

In particular, the (sampling) probability of a given realization of noise n0 is just

$$p({\bf{n}} = {{\bf{n}}_0}) \propto \exp \left[ {- {1 \over 2}({{\bf{n}}_0}{\vert}{{\bf{n}}_0})} \right].$$

Searching the data for GW signals usually involves applying a linear filter K(t) to compute the statistic

$$W = \int\nolimits_{- \infty}^\infty K(t)s(t){\rm{d}}t = \int\nolimits_{- \infty}^\infty {\tilde K^{\ast}}(f)\tilde s(f){\rm{d}}f.$$

Redefining the filter by setting \(\tilde F(f) = \tilde K(f){S_h}(f)\) yields the overlap (F|s). The corresponding SNR is

$${S \over N} = {{(F{\vert}h)} \over {\sqrt {\left\langle {(F{\vert}N)(F{\vert}N)} \right\rangle}}} = {{(F{\vert}h)} \over {\sqrt {(F{\vert}F)}}},$$

and from the Cauchy-Schwarz inequality we have

$${\left({{S \over N}} \right)^2} = {{{{(F{\vert}h)}^2}} \over {(F{\vert}F)}} \leq {{(F{\vert}F)(h{\vert}h)} \over {(F{\vert}F)}} = (h{\vert}h),$$

with equality when F = h. This shows that the matched filter obtained when F = h is the optimal linear statistic to search for the signal h in noise characterized by the PSD Sh(f). Furthermore, the optimal matched-filtering SNR is just \(\sqrt {(h|h)}\). Future references to SNR in this review will always refer to this mathematical object.

The three main types of GW sources in the low-frequency band are massive black hole (MBH) binaries, extreme mass-ratio inspirals (EMRIs), and galactic binaries.

MBH binaries comprise two supermassive and nearly equal-mass black holes. These systems typically form following the merger of two galaxies, when the MBHs originally in the centers of the two pre-merger galaxies reach the center of the merged galaxy and form a binary. These binaries generate very strong GW emission and can be detected by space-based GW detectors out to cosmological distances. The signals will be observed from the time they enter the detector band, at ∼ 10−4 Hz, until the signal cuts off after the objects have merged. The systems evolve through an inspiral, as the objects orbit one another on a nearly circular orbit of gradually shrinking radius, followed by the merger, and then the ringdown. This last phase of the signal is generated as the black-hole merger product, which is initially perturbed in a highly asymmetric state, settles down to a stationary and axisymmetric configuration. During the inspiral phase the field strength is moderate and the velocity is low compared to the speed of light, so the system can be modeled using post-Newtonian theory [289, 477, 263, 478, 84]. During the merger the system is highly dynamical and requires full nonlinear modeling using numerical relativity [367, 102, 42, 398]. The final ringdown emission can then be computed using black-hole perturbation theory, since the perturbations of the object away from stationarity and axisymmetry are small.

There have been several attempts to develop models that include all three phases of the emission in a single framework. The effective one-body (EOB) model was initially developed analytically, and is based on the idea of modeling the dynamics of a binary by describing the relative motion of binary components as the motion of a test particle in an external spacetime metric (the metric of the “effective” single body) [99, 100, 141]. The EOB model includes a smooth transition to plunge and merger, and then a sharp transition to ringdown. The model incorporates a number of free parameters that have now been estimated by comparison to numerical-relativity simulations [101, 144]. The initial “effective-one-body numerical-relativity” (EOBNR) model described non-spinning black holes, but the formalism has now been extended to include the effects of non-precessing spins [345, 51, 438, 44]. The other model that includes all phases of the waveform is the phenomenological inspiral-merger-ringdown (pIMR) developed by Ajith et al. [4, 5]. This model was constructed by directly fitting an ansatz for the frequency-domain waveform, which was motivated by analytic and numerical results, to the output of numerical relativity simulations. This model has also now been extended to include the effect of non-precessing spins [393].

EMRIs consist of stellar-mass (∼ 0.5–50 M) compact objects, either white dwarfs, neutron stars, or black holes, that orbit MBHs. These are expected to occur in the centers of quiescent galaxies, in which a central MBH is surrounded by a cluster of stars. Interactions between these stars can put compact objects onto orbits that come very close to the MBH, leading to gravitational capture. EMRIs are not as strong GW emitters as MBH binaries, but are expected to be observable (if sufficiently close to us) for the final few years before the smaller object merges with the central MBH, when the emitted GWs are in the most sensitive part of the frequency range of space-based detectors. EMRIs will also undergo inspiral, merger, and ringdown, but the signals from the latter two phases are likely to be too weak to be detected. During the inspiral, EMRIs are also expected to be on eccentric and inclined orbits, which colors the emitted GWs with multiple frequency components. During an EMRI, the smaller object spends many thousands of orbits in the strong-field regime, where its velocity is a significant fraction of the speed of light, so post-Newtonian waveforms are inapplicable. However, the extreme mass ratio means that black-hole perturbation theory can be used to compute waveforms, using the mass ratio as an expansion parameter [362, 394].

Stellar binaries in our own galaxy with two compact object components will also be sources for space-based detectors if the binary period is appropriate (∼102 − 104 s). The binary components must be compact to ensure that the binary can reach such periods without undergoing mass transfer. Theoretical models and observational evidence suggest that there may be many millions of such binaries that are potential sources. These systems are not expected to evolve significantly over the typical lifetime of a space mission, and will therefore produce continuous and mostly monochromatic GW sources in the band of space-based detectors. For a small number of systems it will be possible to observe a linear drift in frequency over the observation time, due to either GW-driven inspiral or mass transfer. These systems remain in the weak-field regime through their observation, and their emission can be represented accurately using the quadrupole formula [353, 354]. In addition to what they will teach us about astrophysics, all these sources are prospective laboratories for testing gravitation. The different character of sources in each class provides different opportunities. MBH binaries are strong-field systems that yield GW signals with large SNRs, making signal detection and characterization less ambiguous than for weaker sources. This will allow detailed explorations of waveform deviations from the predictions of GR. EMRIs provide detailed probes of MBH spacetimes, thanks to the large number of strong-field waveform cycles that will be observable from these systems. Individually-resolvable compact galactic binaries can also be used to test GR, because their waveforms and evolution are well understood and easily described using model templates. In addition, many such binaries may be detectable with both GW and EM observatories, providing opportunities to test the propagation of GWs relative to EM signals.

Many approaches to constraining alternative theories are proposed as null experiments, where the assumption is that GW observations will validate the predictions of GR to the level of the detector’s instrumental noise. The size of residual deviations from GR is then constrained by the size of the errors in the GW measurement. However, in order to perform these null experiments, it is necessary to have accurate models for the GWs predicted in GR. The appropriate modeling scheme, assuming GR and a system in vacuum, depends on the system under consideration, as described above. Additional modeling complications arise from astrophysical phenomena, since tidal coupling, extended body effects, and mass transfer can all leave an impact on source evolution, complicating the interpretation of the observed GWs.

The very capabilities of space-based detectors also introduce complications: in contrast to the rare appearance of GW signals in the output of ground-based interferometers, the data from space-based missions will contain the superposition of millions of individual continuous signals. We expect that we will be able to resolve individually thousands of the strongest sources. Thus, we must deal with problems of confusion (where the presence of many interfering sources complicates their individual detection), subtraction (where strong signals must be carefully modeled and removed from the data before weaker sources, overlapping in frequency, become visible), and global fit (where the parameters of overlapping sources have correlated errors that must be varied together in searches and parameter-estimation studies). These issues have not escaped the attention of LISA scientists, and have been tackled both theoretically [38], and in a practical program of mock data challenges [37, 39, 450]. All these issues in data analysis and modeling will impact the prospects for tests of GR using the observations. Little work has been done to date to estimate these impacts, but we will discuss relevant studies where appropriate.

In the remainder of this section we will briefly describe the three principal source classes expected in the low-frequency band. For each source type, we will discuss the astrophysics of the systems, the estimated event rates for (e)LISA observations, their potential astrophysical implications, and their applications to testing GR, with pointers to later sections that describe the tests in more detail. We focus on the three types of source that we introduced above and that are most likely, from an astrophysical point of view, to be observed by LISA-like observatories: MBH coalescences, EMRIs, and galactic binaries.

It is possible that a space-based detector like LISA or eLISA could detect other sources that could be used for tests of relativity. For instance, if intermediate-mass black holes with mass between 100M and 104M do exist, they could be observed as intermediate-mass-ratio inspirals (IMRIs) when they inspiral into the MBHs in the centers of galaxies [18]. These systems would have the potential for the same kind of tests of fundamental physics as EMRIs, but with considerably larger SNR at the same distance and hence would be observable to much greater distances. Cosmological GW backgrounds might also be observed, generated by processes occurring at the TeV scale in the early universe, which would provide constraints on the physics of the early universe and inflation. We refer the reader to [21, 20] for discussions of these sources. We regard them as somewhat more speculative than the other three source types and we do not consider them further here.

4.1 Massive black-hole coalescences

Most (if not all) galactic nuclei come to harbor a MBH during their evolution [296, 377], and individual galaxies are expected to undergo multiple mergers over their lifetime. It follows that the formation of MBH binaries following galaxy mergers is an expected outcome. The mergers of such binaries will be among the strongest sources of low-frequency GWs. The rate at which MBH binaries merge in the universe is uncertain at best, but these events will be detectable by LISA-like detectors to extremely large distances, probing an enormous volume of the visible universe. The detection of any MBH mergers, even at a low rate, would produce interesting astrophysical results.

For a circular binary, the mass of a system that merges at a frequency fm is roughly

$${M_t} \approx {10^5}{M_ \odot}\left({{{40\,\,{\rm{mHz}}} \over {{f_m}}}} \right);$$

the frequency range accessible to space-based GW detectors extends from a few 10−4 Hz to a few 10−1 Hz, which sets the sensitive mass range to ∼ 104–107 M.

MBH mergers as low-frequency GW sources. Predictions for the observable population of MBH mergers are based on merger-tree structure-formation models. The overall merger rate depends on the detailed mechanisms of evolutionary growth of the MBH population. The energy budgets of active galactic nuclei suggest that MBHs could grow by efficient accretion processes [490], while other considerations suggest that mergers could contribute significantly to their early growth [247]. Studies of early cosmic structure [298] indicate that MBHs form from the coalescence of many smaller seed black holes. Many models based on this idea have been developed and simulated numerically [216, 460, 437, 60]. The number of events detectable by LISA was estimated to be in the range 3–300 per year [406], with a spectrum of masses in the range 103–107 M. The predicted event rate is not much different for eLISA [20], although some of the marginally detectable events involving lighter black holes at high redshift will no longer be observable.

Figure 3 shows contours of constant SNR (as seen by eLISA) in the redshift-total-mass plane for equal-mass, nonspinning MBH mergers (left panel) and in the total-mass-mass-ratio plane for MBH mergers at a fixed redshift of z = 4 (right panel). For typical events at z ∼ 4, the SNRs of eLISA observations are expected to exceed 100 for total masses ∼ 5 × 105 M. For comparablemass mergers at lower redshifts, eLISA observations could exceed SNRs of 1000. SNRs for LISA are typically factors of 2–3 higher. For total mass > 105 M, a significant fraction of this SNR comes from the final merger and ringdown phase.

Figure 3
figure 3

Contours of constant SNR for MBH binaries observed with eLISA. The left-hand panel shows contours in the total-mass-redshift plane for equal-mass binaries, while the right-hand panel shows contours in the total-mass-mass-ratio plane for sources at redshift z = 4. Image reproduced by permission from [21].

The practical detection of MBH systems will have to rely on careful data analysis. Several detection algorithms for MBH binaries have been studied by a number of different groups, encouraged in part by the Mock LISA Data Challenges [39]. These include Markov Chain Monte Carlo techniques [132], time-frequency analysis [96], particle-swarm optimization [196], nested sampling [172], and genetic algorithms [356]. The most recent mock-data challenge included multiple spinning MBH binary signals in the same dataset, and multiple groups demonstrated their ability to recover these binaries from the LISA data stream [39].

Astrophysics using MBH mergers. GW observations will directly probe the character of the black holes on scales comparable to the size of the event horizons. How the information about the black holes (such as their spin and mass) is encoded in the waveform is a topic of much focused research (see, e.g., [275, 411, 117, 421] for reviews).

According to GR, any astrophysical black hole is fully described by just ten numbers, encoding its mass, position, momentum, and spin. Correspondingly, a binary is fully described by 20 parameters, three of which, related to its center-of-mass momentum, cannot be measured from the GW signal. Einstein’s theory of gravity thus predicts a 17-dimensional parameter space of possible signals. The precision with which GW observations will be able to extract the parameters characterizing a MBH binary system has been the subject of extensive investigation. The first comprehensive study [137] considered the LISA determination of the sky location, luminosity distance, and mass of MBH systems, using a simple model of LISA’s orbital motion around the sun. Later studies [322] considered the angular resolution of detectors in precessing-plane configurations like LISA’s, as well as ecliptic-plane interferometers that lie flat in the ecliptic as they orbit the sun. More recent studies have considered more complete MBH-binary waveforms with spin effects, higher harmonics, and with merger and ringdown signals, and have looked at the effect of these corrections on LISA’s parameter-estimation ability [455, 29, 442, 365, 33, 266, 307]. These papers indicate that observations with a LISA-like detector will be able to determine the masses of the two binary components to 0.01–0.1%, the spins to 0.001–0.1%, the sky position of the binary to 1–10 deg2, and the luminosity distance to 0.1–10%. For eLISA, the corresponding results are 0.1–1%, 1–10%, 10–100 deg2 and 10–100%. Such precise measurements will provide unprecedented information about MBHs in the universe.

These parameter-estimation studies witness the quality of the information that will be available to astrophysicists for the phenomenological exploration of the astrophysical character of MBH systems. Of particular interest is the evolution of black-hole spins and the final spin of merger remnants [80, 242], which GW observations should be able to shed some light on. In addition, the set of detected MBH merger systems will provide constraints on the formation history of MBHs [360, 359, 241, 308, 355]. In [405, 198] it was shown that a detector like LISA would be able to tell with high confidence the difference between ten different models for the growth of structure in the universe. For many of the models, the difference would already be apparent after as little as three months of data collection. A similar differentiation among models would also be possible with eLISA [21], and both detectors will also be able to constrain “mixed models” combining different MBH population models. MBH binaries can also be used as standard candles to probe cosmological evolution [234].

Testing relativity using MBH mergers. MBH mergers are good laboratories for testing relativity because of the high SNR expected for the signals and the correspondingly large volume within which they can be observed. This makes them good systems for constraining GW polarizations (see Section 5.1.1) since relatively weak alternative-polarization signals could in principle be detected. They are also excellent systems for testing GW-propagation effects, such as subluminal propagation speeds and the presence of parity violations, since these effects accumulate with distance and MBH binaries can be seen to very high redshifts (see Section 5.1.2). MBH binaries are also well suited for generic tests of GR based on measuring the evolution of inspiral phasing and checking it for consistency with general-relativistic predictions, since the high SNR allows a large number of phasing parameters to be measured (see Section 5.2).

MBH binaries are also the only systems for which the quasinormal ringdown radiation, generated as the highly-distorted post-merger black hole settles down into a quiescent state, will be detectable. Although the amount of energy deposited into different ringdown modes is not well understood, the frequencies and decay rates of the ringdown modes are predicted precisely in GR from perturbation theory. The mass and spin of the final black hole can be measured from a single ringdown mode; if multiple modes are detected, the system can be used to check for consistency with the Kerr metric, and therefore provides a probe of black-hole structure. For a comprehensive review of black-hole ringdowns, we refer the reader to [270, 337, 76]. A more detailed discussion of the use of ringdown radiation to probe black-hole structure may be found in Section 6.3.

Last, the space-based GW observation of MBH binaries offers an unprecedented opportunity to probe the fully-dynamic strong-field regime of GR by comparison of the observed merger signals to the predictions of numerical relativity. This is an area that has not yet been explored in any detail and so we will not discuss it further in this review. However, it is an important subject that should be carefully explored before a space-based GW detector finally becomes a reality.

4.2 Extreme-mass-ratio inspirals

An EMRI is the GW-emitting inspiral of a stellar-mass compact object, either a black hole, neutron star or white dwarf, into a MBH in the center of a galaxy. A main-sequence star with mean density \(\bar \rho = 10\,{\rm{g c}}{{\rm{m}}^{- 3}}\) will be tidally disrupted by a black hole of mass 106 M at a distance of ∼ 25 Schwarzschild radii [187]. At such separations an extra-galactic EMRI system will not be generating detectable amounts of gravitational radiation in the low-frequency band. The tidal-disruption radius increases with decreasing stellar density and decreasing central-black-hole mass, and the reference values used above are at the upper end of suitable values for main-sequence stars and mHz GW sources. Therefore, it is not expected that the inspiral of a main-sequence star will be a candidate for an EMRI, with the possible exception of such sources in the galactic center, which is sufficiently nearby that radiation from an object orbiting at several tens of Schwarzschild radii might be detectable [187].

EMRIs occur in the dense stellar clusters that are found surrounding the black holes in the cores of galaxies, and are triggered by a variety of processes:

  • Direct capture: This is the “standard” scenario, in which two-body encounters in the stellar cluster gradually perturb the compact-object orbits, changing their angular momentum. This can put a compact object onto an orbit that passes very close to the central black hole. The orbit loses energy and angular momentum in bursts of gravitational radiation emitted near periapse, and if sufficient energy is radiated, the object can be left on an orbit that is bound to the central black hole. It will then gradually inspiral into the central black hole as it loses orbital energy and angular momentum to GW emission.

  • Tidal splitting of binaries: Galactic-center stellar clusters will also contain binaries, which can similarly be displaced onto orbits that pass close to the central MBH. If this happens, the binary will typically be disrupted, with one object remaining bound to the MBH and the other becoming unbound and being flung out with high velocity. If this happens to a binary containing one or two compact objects, the captured compact object will be left on a fairly tightly bound orbit a few hundred astronomical units from the MBH, and will become an EMRI on a nearly circular orbit [312].

  • Tidal stripping of giant stars: Giant stars typically have a compact core surrounded by a diffuse hydrogen envelope. If such a star passes close to a MBH, the envelope can be partially or completely stripped by the tidal interaction with the MBH. This will deposit the dense core, which is essentially a white dwarf, on a close orbit about the MBH; the system will eventually become an EMRI [145, 150],

  • In-situ formation: If a MBH has a massive accretion disc, the disc can become unstable to star formation. Stars formed in such a way are biased toward higher masses and will therefore tend to form black-hole remnants. These remnants will be in circular, equatorial orbits around the central MBH, and will inspiral as EMRIs under GW emission.

More details on all of these processes and further references can be found in [18, 16].

EMRIs as low-frequency GW sources. The intrinsic rate at which EMRIs occur depends on a lot of astrophysics that is rather poorly understood. The rate is strongly influenced by several processes, including:

  • Mass segregation: Stellar encounters tend on average to lead to equipartition of energy. This means that after a two-body interaction the heavier object tends to be moving slower, and the lighter object tends to be moving faster. If this occurs for two objects around a MBH, the heavier object sinks in the potential of the MBH and the lighter object moves further out. Components of the stellar distribution thus become segregated according to their mass. The more massive objects, which tend to be the black holes, end up closer to the MBH and are captured preferentially [18, 188, 313].

  • Triaxiality: The centers of galaxies tend to be approximately spherically symmetric, but on larger scales galactic nuclei can be triaxial. Orbits in triaxial potentials tend to be centrophilic: they have a tendency to pass close to the center. This process can funnel stars that are further away from the MBH toward the center and hence increase the inspiral rate [233, 364].

  • Resonant relaxation: In the standard relaxation picture, each encounter is random and uncorrelated, so stars undergo a random walk. The process is driven by the diffusion of energy which then leads to angular-momentum transfer. However, in a stellar cluster around a MBH, each star will be on a Keplerian orbit, which is a fixed ellipse in space. The orbits of two nearby stars will thus exert correlated torques on one another, which can lead to a direct resonant evolution of the angular momentum [375, 376]. This process can lead to a much more rapid diffusion of orbits into the regime where they can become EMRIs.

  • The “Schwarzschild barrier”: Resonant relaxation relies on the orbits having commensurate radial and azimuthal frequencies, so they remain in fixed planes over multiple orbits. In the strong-field potential of a massive object, orbits are no longer Keplerian but undergo significant perihelion precession. Resonant relaxation is only efficient in the regime where precession is negligible. The “Schwarzschild barrier” refers to the boundary between orbits with and without significant precession. Inside this point resonant relaxation is strongly quenched, potentially reducing inspiral rates [310].

A full description of these processes and their relative importance can be found in [18, 16]. The range of EMRI rates per galaxy reported in the literature is ∼ 1–1000 Gyr−1 for black holes, and ∼ 10–5000 Gyr−1 for white dwarfs, with most recent best-guess estimates of ∼ 400 Gyr−1 / 20 Gyr−1 / 7 Gyr−1 for black holes / white dwarfs / neutron stars [235, 19].

Uncertainties in the event rate come not only from the intrinsic rate of capture, but also from uncertainties in the number density of black holes of suitable mass. At present, only three objects in the range 104–107 M are known from kinematic measurements [173]. Using galaxy luminosity functions, the luminosity-velocity-dispersion relation and the MBH-mass-velocity dispersion relation, one infers the number density of MBHs in the LISA mass range to be roughly constant per logarithmic mass unit, and equal to a few 10−2 Mpc−3. However, when black holes merge, the asymmetric radiation of linear momentum over the final few orbits imparts linear momentum (a “merger kick”) to the remnant black hole. These kicks can be large enough to eject the black hole from its host galaxy. If these black-hole merger kicks are significant for black holes merging in the centers of galaxies, the number of black holes with M < 2 × 106 M may therefore be significantly depleted [147].

EMRI waveforms are complex, since the orbits are expected to be both eccentric and inclined with respect to the equatorial plane of the central black hole. Various algorithms have been suggested for EMRI detection, including hierarchical matched filtering starting with short data segments [193], time-frequency analysis [467, 200, 194], searches with phenomenological templates [464], and Markov Chain Monte Carlo searches [429, 35, 130]. Indeed, EMRIs have proven to be the most difficult source to detect in the Mock LISA Data Challenges, but in the third challenge round two groups correctly identified the three EMRIs with the highest-mass central black holes [39]. Several challenging aspects remain unsolved, such as dealing with source confusion and detecting the EMRIs with lower-mass central black holes, which are expected to dominate the data; recent progress encourages confidence that these problems will be solved by the time a space-based detector is launched.

Assuming that a matched-filtering SNR of 30 is required for EMRI detection, and using the previously-quoted EMRI rates per black hole [235], it was estimated that LISA could see several hundred events [192] out to a redshift z ≲ 1. The intrinsic rate uncertainties mean that the actual number could be anywhere from a few tens to a few thousands. Most of the events will be black-hole inspirals into MBHs of mass 105–106 M. The estimate for eLISA is a few tens of events [197, 20, 21], although this is based on a lower SNR detection threshold of 20. (The lower threshold is justified by the results of the Mock LISA Data Challenges, which successfully demonstrated detection of EMRI events with SNR of ∼ 15, albeit with less-than-realistic signal confusion.)

Astrophysics using EMRIs. EMRI observations will tell us about the properties of MBHs at low redshift and about the physics of nuclear stellar clusters. Each EMRI observation can determine many of the parameters of the system, including the mass and spin of the central black hole, to accuracies of a fraction of a percent [46, 238]. With as few as ten EMRI observations, LISA would be able to constrain the slope of the MBH mass function at low redshift to ∼ ±0.3, which is the level of the current observational uncertainty [199]. eLISA would have the same capability given the same number of detections. If LISA observed as many EMRIs as current estimates predict, the constraints on the slope would improve by an order of magnitude, although the corresponding improvement for eLISA would be only a factor of two. EMRIs will also provide information on the distribution of black-hole spins. The masses of the compact objects observed in EMRIs will provide information about mass-segregation processes, while the overall number of events encodes information about the complicated physics that determines the event rate. Finally, the eccentricities and inclinations of the observed EMRIs will provide information on the relative efficiencies of the various channels through which EMRIs could be formed [18, 16].

Testing relativity using EMRIs. EMRIs are well suited for testing relativity for several reasons: the systems are expected to be clean since the small object is likely to be a black hole with no internal structure, and the influence of any gaseous material in the spacetime is expected to be negligible (but see Section 6); the waveforms can be computed very accurately using black-hole perturbation theory [362, 48], allowing very small deviations from the model to be identified; the inspiral is slow, meaning that many waveform cycles are generated in the strong-field regime, and can be used to test the spacetime structure [179]; and the waveform structure is complex, due to the expected eccentricity and inclination of the orbits, and to the intrinsic richness of dynamics in Kerr (or Kerr-like) spacetimes [18, 154].

The primary fundamental-physics application of EMRI observations is probing black-hole structure, since the EMRI waveforms encode a map of the spacetime geometry close to the central black hole. This can be used to test whether the central object is indeed described by the Kerr metric, as we expect, or is some other kind of object. A discrepancy could indicate an astrophysical perturbation in the system, a violation of the cosmic-censorship hypothesis [352] or of the energy conditions of GR, or even evidence for the existence of an exotic massive compact object such as a boson star. All of this will be discussed in detail in Section 6. EMRIs can also be used for tests of GW polarization (Section 5.1.1), and to constrain alternative theories of gravity through detection of modifications to the inspiral rate (see Section 5.1.3).

4.3 Galactic binaries

The most numerous source class in the low-frequency GW band are the ultra-compact binaries of two stellar-mass compact objects: white dwarfs, neutron stars, or black holes. Most of these will be in the Milky Way. Numerical population synthesis [330, 328] and theoretical studies of the highly-evolved stellar population in the galaxy [245, 246] predict that there will be ∼ 107 ultracompact binaries generating GWs in the mHz band visible to space-based detectors like LISA or eLISA.

Ultra-compact binaries as low-frequency GW sources. For an instrument with sensitivity similar to the “classic” LISA design, the complete population of compact galactic binaries would separate into two distinct observational classes: confused binaries and resolvable binaries. Confused binaries are those with orbital frequencies that are close enough to other binaries in the population that they cannot be readily distinguished from one another in the data [63, 231, 131]; their signals are overlapping and cannot be disentangled from one another, unless one source happens to be much closer and stronger than the others. For LISA, the confused binaries make up a stochastic foreground signal from the galaxy, which would be present in the data for all frequencies below the confusion limit fc ≃ 3 mHz. Various attempts have been made to estimate the spectrum of this foreground [329, 439, 162, 161, 386, 335], but the final profile would also depend on details of the algorithms used to analyze the LISA data [278, 131, 431, 136, 90, 290]. For an instrument with a sensitivity comparable to eLISA, which is approximately a factor of two worse in the relevant range, this confusion foreground would be much less prominent, and perhaps not detectable [20, 21].

Resolvable binaries are found both above and below the fc confusion limit. Sources below fc are binaries that happen to be nearby, and so dominate the emission near their central frequency. For sources above fc, the characteristic orbital frequencies are separated by a few frequency bin widths Δf ∼ 1/Tobs, where Tobs is the length of observation. An instrument like LISA would be expected to resolve as many as twenty thousand individual binaries [329, 335], while for eLISA the number is expected to be a few thousand [20, 21, 335].

Among the resolvable binaries there are several verification binaries (see [430]; for a current list see [327]). These are binaries that are expected to be resolvable in the LISA/eLISA data, but have already been detected and characterized using electromagnetic telescopes. These known binaries will provide early verification that the mission is operating correctly at the nominal sensitivity.

Astrophysics using ultra-compact binaries. The large number of compact binaries detectable throughout the entire galaxy makes these sources an exceptional resource to address questions about the stellar evolutionary history of the galaxy and the astrophysical dynamics of strong-field interactions in compact systems. If the confusion foreground is observed, it will be modulated by detector motion, a fact that can be used to estimate the multipole moments of the spatial distribution of binaries in the galaxy [407, 162]. A detector like LISA might also be able to detect the extra-galactic population of ultra-compact binaries, providing a valuable probe of its characteristics [171, 127, 129]. However, an extra-galactic population is unlikely to be observed by eLISA.

Detections of galactic binaries will be dominated by double-white-dwarf binaries, which are scarce in the galaxy, so (e)LISA will be in a unique position to illuminate their physics. This will shed light on their formation and evolution, including the poorly-understood phase of common-envelope evolution. This is the phase during the evolution of a binary that follows unstable mass transfer. In the frame that co-rotates with the binary, the region of space around a star where orbiting material is bound to that star is known as the Roche lobe. As the orbit of a binary shrinks during inspiral, the Roche lobes also shrink, and one of them can eventually become smaller than the corresponding star. At that point the outer layers of the star are no longer gravitationally bound to the star so they are lost, and this mass is transferred to the other star in the binary. The process can be stable if, when the material is transferred, the orbital radius increases, the Roche lobe increases, and the star therefore no longer overflows the Roche lobe. Typically as the star loses mass it also becomes bigger, so stable mass transfer requires the Roche lobe to increase in size more quickly than the star. Mass transfer can also occur unstably if, when the material is lost, the orbit shrinks, leading to further shrinking of the Roche lobe and more mass transfer (or if the rate at which the star increases in size due to mass loss is faster than the rate at which the Roche lobe increases). After an unstable mass-transfer event, the companion star and the core of the donor star will end up as a binary contained entirely within the material that was transferred from the donor — the “common envelope.” Subsequently, energy is transferred from the orbit to the envelope, leading to shrinking of the orbit until the objects merge or, in the case of interest for ultra-compact binaries, the envelope has been completely expelled. The physics of this phase of the evolution is very poorly understood, but it is normally characterized in terms of the amount of energy transferred between the binary and the envelope. The distribution of orbital periods and masses for ultra-compact galactic binaries will encode important clues about this process.

(e)LISA is also expected to observe interacting binaries where mass transfer is ongoing. GW observations can shed light on the fraction of mass-transferring systems; these systems may allow coincident electromagnetic observations, making them even more precise probes of astrophysical processes such as mass transfer and tidal coupling [403, 475, 404, 474].

Testing relativity using ultra-compact binaries. Resolvable compact binaries offer a good laboratory for testing GR because their waveforms and evolution are well understood and easily described using model templates, and because a reasonable fraction of them will be visible both to (e)LISA and to electromagnetic telescopes.

Ultra-compact binaries will provide constraints on massive-graviton theories, either by comparison of GW-propagation delays to the observable electromagnetic signals from the binary, or by the dispersion of the GW as the binary slowly evolves. This is discussed in Section 5.1.2. The ultra-compact binaries also provide laboratories for testing waveform evolution, and for GW polarization and dispersion studies, precisely because they evolve slowly and are easily described by post-Newtonian analysis. This is discussed in Sections 5.1.1 and 5.2.

5 Gravitational-Wave Tests of Gravitational Physics

Almost since its inception, GR was understood to possess propagating, undulatory solutions — GWs, described at leading order by the celebrated quadrupole formula [258]. It took several decades to establish firmly that these waves were real physical phenomena and not merely artifacts of gauge freedom.

How would GW observations test the GR description of strong gravitational interactions, and possibly distinguish between GR and alternative theories? To answer this question we need to take a quick detour through GW data analysis. At least for foreseeable detectors, individual GW signals will typically be immersed in overwhelming noise, and therefore will need to be dug out with techniques akin to matched filtering [251], which by definition can only recover signals of shapes known in advance (the templates), or very similar signals. A matched-filtering search is set up by first selecting a parameterized template family (where the parameters are the source properties relevant to GW emission), and then filtering the detector data through discrete samplings of the family that cover the expected ranges of source parameters. The best-fitting templates correspond to the most likely parameter values, and by studying the quality of fits across parameter space it is possible to derive posterior probability densities for the parameters.

After a detection, the first-order question that we may ask is whether the best-fitting GR template is a satisfactory explanation for the measured data, or whether a large residual is left that cannot be explained as instrument noise, at least within our understanding of noise statistics and systematics. (Slightly more involved tests are also possible: for instance, we may divide measured signals in sections, estimate source parameters separately for each, and verify that they agree.) If a large residual is found, many hypotheses would be a priori more likely than a violation of GR: the fitting algorithm may have failed; another GW signal, possibly of unexpected origin, may be present in the data; the data may reflect a rare or poorly understood instrumental glitch; the GW source may be subject to astrophysical effects from nearby astrophysical objects, or even from intervening gravitational lenses.

Having ruled out such non-fundamental explanations, the only way to quantify the evidence for or against GR is to consider it alongside an alternative model to describe the data. This alternative model could be a phenomenological one (discussed below) or a self-consistent calculation within an alternative theory of gravity. If the alternative theories under consideration include one or more adjustable parameters that connect them to GR (such as ωBD for Brans-Dicke theory, see Section 2.1), and if those parameters can be propagated through the mathematics of source modeling and GW generation, then GR template families can be enlarged to include them, and the extra parameters can be estimated from GW observations. These extra parameters may have a more phenomenological character, as would, for instance, a putative graviton mass that would affect GW propagation, without finding direct justification in a specific theory. Indeed, many of the “classic tests” discussed below (Section 5.1) fall within this class. To test GR against “unconnected” theories without adjustable parameters, we would instead filter the data through separate GR and alternative-theory template families, and decide which model and theory are favored by the data using Bayesian model comparison, which we now describe briefly.

In complex data-analysis scenarios such as those encountered for GW detectors, the techniques of Bayesian inference [211, 414] are particularly useful for making assessments about the information content of data and for studying tests of gravitational theory, where the goal is to examine the hypothesis that the data might be described by some theory other than GR. In a traditional “frequentist” analysis of data, one computes the value of a statistic and then accepts or rejects a hypothesis about the data (e.g., that it contains a GW signal) based on whether or not the statistic exceeds a threshold. The threshold is set on the basis of a false-alarm rate, which is a statement about how the statistic would be distributed if the experiment was repeated many times. Evaluating the distribution of the statistic relies on a detailed and reliable understanding of the measurement process (noise, instrument response, astrophysical uncertainties, etc.). By contrast, Bayesian inference attempts to infer as much as possible about a particular set of data that has been observed, instead of making a statement about what would happen if the experiment were repeated.

Bayesian inference relies on the application of Bayes’ 1763 theorem: given the observed data d and a parameterized model \(M(\vec \theta)\), the theorem relates the posterior probability of the parameters given the data, \(p(\vec \theta |d,M)\) to the likelihood \(p(d|\vec \theta ,M)\) of observing the data d given the parameters \({\vec \theta}\), and the prior probability \(p(\vec \theta/M)\) that the parameters would take that value:

$$p(\vec \theta {\vert}d,M) = {{p(d{\vert}\vec \theta ,M)p(\vec \theta {\vert}M)} \over {p(d{\vert}M)}}.$$

The term \(p(d|M) = \int {p(d|\vec \theta ,M)p(\vec \theta |M)d\vec \theta}\) in the denominator is the evidence for the model M. While Eq. (35) follows trivially from the definition of conditional probability, its power comes from the idea of updating the prior knowledge of a system given the results of observations. However, its practical application is complicated by the necessity of attributing priors, and the correct evaluation of likelihoods relies on the same detailed understanding of the statistical properties of the measurement noise as in the frequentist case.

The evidence represents a measure of the consistency of the observed data with the model M, and can be used to compare two models (e.g., the GR and modified-gravity descriptions of a GW-emitting system) by evaluating the odds ratio for model 1 over model 2,

$${{\mathcal O}_{12}} = {{p(d{\vert}{M_1})P({m_1})} \over {p(d{\vert}{M_2})P({m_2})}},$$

where p(M1) and p(M2) are the prior probabilities assigned for model 1 and 2 respectively. Either model would be preferred if the odds ratio is sufficiently large/small, but the decision on which hypothesis is best supported by the data is influenced by the choice of the priors, which will reflect the analyst’s assessment of the relative correctness of the alternatives (see [451] for a discussion of this point).

In the absence of well-defined alternative-theory foils, it may be desirable to proceed along the lines of the PPN formalism (Section 2.1) and immerse the GR predictions in expanded waveform families, designed to isolate differences in the resulting GW phenomenology (Section 5.2). Proposals to do so include schemes where the waveform-phasing post-Newtonian coefficients, which are normally deterministic functions of a smaller number of source parameters, are estimated individually from the data [28, 27]; the ambitiously-named parameterized post-Einstein (ppE) framework [497]; and the parameterization of Feynman diagrams for nonlinear graviton interactions [106]. In Section 5.3 we discuss ideas (so far rather sparse) to use the GWs from binary mergers-ringdowns to test GR.

We close these introductory comments by discussing two methodological caveats. First, GW observations are often characterized as “clean” tests of gravitational physics — whereby the “clean” emission of GWs from the bulk motion of matter (already emphasized above) is contrasted to “dirty” processes such as mass transfer, dynamical equation-of-state effects, magnetic fields, and so on. An even stronger notion of cleanness is important for the purpose of testing GR: for the best sources, the waveform signatures of alternative theories cannot be reproduced by changing the astrophysical parameters of the system — this orthogonality is quantified by the fitting factor between the GR and alternative-theory waveform families [451]. The degeneracy of the alternative-theory and source parameters would also lead to a “fundamental bias.” Fundamental bias arises from the assumption that the underlying theory in the analysis, generally taken to be GR, is the correct fundamental description for the physics being observed, which will impact the estimation of astrophysical quantities [497, 453].

Second, many of the results presented in this section rely on the Fisher-matrix formalism for evaluating the expected parameter-estimation accuracy of GW observations [449]. As described at the beginning of Section 4, the output of a GW detector is normally modeled as a linear combination of a signal, \({\bf{h}}(\vec \theta)\), and noise \({\bf{n}},\,{\bf{d}} = {\bf{h}}(\vec \theta) + {\bf{n}}\). If the detector noise is assumed to be Gaussian and stationary, the probability p(n = n0) is given by Eq. (30). The likelihood \(p({\bf{d}}{\rm{|}}\vec \theta {\rm{,M)}}\) is just the probability that the noise takes the value \({\bf{n}} = {\bf{d}} - {\bf{h}}(\vec \theta)\), which is

$$p({\bf{d}}{\vert}\vec \theta ,M) \propto \exp \left[ {- {1 \over 2}\left({{\bf{d}} - {\bf{h}}(\vec \theta)\left\vert {{\bf{d}} - {\bf{h}}} \right.(\vec \theta)} \right)} \right],$$

where (a|b) is the inner product defined in Eq. (29). Writing \({\bf{d}} = h({{\vec \theta}_0})\) and assuming that we are close to the true parameters \({{\vec \theta}_0}\), so that we can use the linear approximation \(h(\vec \theta) = h({{\vec \theta}_0}) + {\partial _j}h\Delta {\theta ^j} + \cdots\), we find that at quadratic order in Δθj

$$p({\bf{d}}{\vert}\vec \theta ,M) \propto \exp \left[ {- {1 \over 2}{\Gamma _{jk}}\left({\Delta {\theta ^j} - {{({\Gamma ^{- 1}})}^{jl}}({\bf{n}}{\vert}{\partial _l}h)} \right)\,\,\left({\Delta {\theta ^k} - {{({\Gamma ^{- 1}})}^{km}}({\bf{n}}{\vert}{\partial _m}h)} \right)} \right],$$


$${\Gamma _{ij}} = \left({\left. {{{\partial h} \over {\partial {\theta ^i}}}} \right\vert{{\partial h} \over {\partial {\theta ^j}}}} \right)$$

is the Fisher information matrix. Thus, to leading order the shape of the likelihood in the vicinity of its maximum is that of a multivariate Gaussian with covariance matrix (Γ−1)jk (independent of n), and the variance of the one-dimensional marginalized posterior probability density of parameter θi is approximately (Γ−1)ii (no sum over i). This will be achieved in the limit of “high” signal-to-noise ratio where the errors Δθj are small and the linear approximation is valid. The Fisher matrix arises also as the Cramér-Rao lower bound on the variance of an unbiased estimator of the waveform parameters \({\vec \theta}\). A full discussion of the various routes to the Fisher-matrix formula and its applications may be found in [449].

As emphasized by one of us [449], because the Fisher matrix is built with the first derivatives of waveforms with respect to source parameters, it can only “know” about the close neighborhood of the true source parameters. If the estimated errors take the waveform outside that neighborhood, then the formalism is simply inconsistent and unreliable. Higher SNRs reduce expected errors and therefore would generally make the formalism “safer,” but the meaning of “high” is problem dependent, depending on the number of parameters that need to be estimated, on their correlation, and on the strength of their effects on the waveforms.

In practice, only by carrying out a full computation of the posterior probability using, for example, Monte Carlo methods will it be known if the Fisher matrix is providing a good guide to the shape of the posterior. However, the Fisher matrix is generally much easier to compute than the full posterior, so it is widely used as a guide to the precision with which parameters of the model can be determined. In the context of testing GR, the Fisher matrix can be evaluated for an expanded waveform model that includes non-GR-correction parameters, but at a set of parameters that correspond to GR. The estimated error in the correction parameter, (Γ−1)ii, can then be interpreted as the minimal size of a correction that would be detectable with a GW observation.

5.1 The “classic tests” of general relativity with gravitational waves

As Will points out [469, ch. 10], virtually any Lorentz-invariant metric theory of gravity must predict gravitational radiation, but alternative theories will differ in its properties. Will identifies three main properties that can be measured with GW detectors. These are the polarization, speed, and emission multipolarity (monopole, dipole, quadrupole, etc.) of GWs in GR. In this paper, we broaden the scope of the third to include changes to the loss of energy to GWs in inspiraling systems.

In analogy to the three classic tests of GR (the perihelion of Mercury, deflection of light, and gravitational redshift) we like to refer to the verification that these properties have the predicted GR values, rather than the values predicted by alternative theories, as the “classic tests” of GR using GWs. Just as PPN tests probe weak-field, slow-motion dynamics, these tests can be seen as probing the weak-field far zone, where waves have propagated far from their sources. However, the multipolarity of GWs at emission and the energy that they carry away can be influenced by strong-field properties in the near zone where waves are generated.

5.1.1 Tests of gravitational-wave polarization

GR predicts the existence of two transverse quadrupolar polarization modes for GWs (also described as “spin-2” and “tensor” using the language of group theory), usually labeled h+ and h×. Alternative metric theories of gravity predict as many as six polarizations [469] (three transverse and three longitudinal), corresponding to the independent electric-type components of the Riemann curvature tensor, R0i0j. Schematically, these components are measured by GW detectors by monitoring the geodesic deviation of nearby reference masses. The effect of different polarization modes is best illustrated by the induced motion of a ring of test particles, as in Figure 4. The response of a standard right-angle interferometer to a scalar wave is maximal when the wave propagates along one arm; by contrast, tensor modes elicit maximal response when the wave propagates in a direction perpendicular to the plane of the detector.

Figure 4
figure 4

Effect of the six possible GW polarization modes on a ring of test particles. The GW propagates in the z-direction for the upper three transverse modes, and in the x-direction for the lower three longitudinal modes. Only modes (a) and (b) are possible in GR. Image reproduced by permission from [471].

Direct detection. The use of GW polarization modes to test GR was first proposed in 1973 [160, 159]. The sensitivity of resonant and interferometric detectors, as well as Doppler-tracking and pulsar-timing measurements, to the extra modes was considered in several studies [343, 227, 412, 461, 292, 300, 324, 280, 331, 15, 118, 89, 225]. In the most general setting, the problem of disentangling the modes has eight unknowns — the time series for the six polarizations, plus two direction angles that affect the projection of the modes on the detector — but only six observables, corresponding to the R0i0j components. Thus, the problem is indeterminate, unless the source position is known from other observations (such as time-of-flight delays for a long-baseline network of detectors), or unless we restrict GWs to transverse modes on theoretical grounds.

Space-based observatories similar to LISA have either a single independent interferometric observable or three (if laser links are maintained across all three arms). Each observable measures a different admixture of polarizations. Thus, a detector with three active arms could in principle discriminate a non-GR polarization mode if the direction to the source is known, or if it can be determined from the measured signal (by means of the modulations produced by orbital motion, or by triangulation between the signals measured at the three spacecraft).

The LISA sensitivity to alternative polarization modes was assessed in [440], using the full TDI response (see Section 3.1). At frequencies larger than the inverse light-travel time along the arms, LISA would be ten times more sensitive to scalar-longitudinal and vector modes [(d) to (f) in Figure 4] than to tensor and scalar-transverse modes [(a) to (c) in Figure 4], because longitudinal effects can accumulate as the lasers travel between the spacecraft. At lower frequencies, the sensitivity to all modes is approximately the same. These results have not yet been used to work out the constraints that LISA could place on specific alternative theories using different types of sources.

In [26], a generic model for a system emitting dipole radiation in addition to quadrupole radiation was constructed. The model was similar in structure to the ppE models which will be discussed in Section 5.2.2. This model included both the dipolar component of the waveform, at the orbital frequency, and modifications to the gravitational wave phasing of both the quadrupole and dipole waveform components that arise from the additional energy lost into the dipole mode. In [26], the model was used to determine the constraints on dipole radiation emission that would be possible using ground-based GW detectors. Results for space-based detectors were included in a subsequent review [31]. This demonstrated that eLISA would be able to place bounds on the parameter α, that describes the observed amplitude of the dipole radiation relative to the quadrupole, of 10−3, and bounds on the parameter β, which describes the amount of binary orbital energy lost into the dipole radiation, of 10−6. The parameter β affects the phase evolution and so stronger bounds would be obtained for less massive systems, for which more waveform cycles will be observed in band. These bounds are, in both cases, comparable to those from observations with the Einstein Telescope, one order of magnitude better than those possible with Advanced LIGO and one order of magnitude worse than what would be possible with LISA.

Solar oscillations. Finn [177] observed that solar oscillations with 5- to 10-minute periods produce gravitational strains ∼ 10−26 at Earth, possibly within reach of space-based detectors. The detectors would measure the sun’s dynamical gravitational field in the transition region where it is turning into radiation. Finn showed that the field develops a significant phase shift relative to solar oscillations, which depends on the GW polarizations, and which could distinguish between scalar, vector, tensor, and scalar-tensor theories of gravity. The limit placed by such observations on the Brans-Dicke parameter would be weaker than current bounds from solar-system tests; on the other hand, measuring incipient GWs in the transition zone makes this a novel and possibly unique test. However, we note that Finn’s early exploration [177] predates our full understanding of the design and parameters of LISA-like missions, which are likely to be less sensitive to this signal. This problem was revisited in [140], in which the authors assessed the sensitivity of LISA to the quadrupole (l = 2) low-order normal modes (p, f and g-modes) of the sun. They estimated that the energy in these modes would have to exceed 1030 ergs to allow a LISA detection, and that the required mode energy would be even higher for eLISA.

Galactic binaries. Among the compact galactic binaries that would be detected by a LISA-like detector, several have orbital inclinations known from optical observations. For these systems we can compute the specific linear combination(s) of polarizations that would be appear in the data, which can then be checked for consistency. A single inconsistent binary may indicate an error in the determination of inclination or distance, but systematically inconsistent sources would hint at large non-tensor GW components. However, from general arguments the measurement accuracy for polarization amplitudes is ∼ 1/SNR (with SNR a few tens at most for galactic binaries), so only very large corrections to GR would be detectable in this way.

5.1.2 Tests of gravitational-wave propagation

In GR, gravitational radiation propagates at the speed of light: vg = c. The experimental validation of this prediction can be posed as a bound on the graviton mass mg, which is exactly zero in GR (see [59, 209] for a broader context). However, it may be advisable to consider mg as a purely phenomenological parameter, since certain massive-graviton theories do not recover GR predictions such as light bending, as discussed in Section 2.2.5.

Weak-field measurements in the solar system already provide bounds on mg on the basis of the massive-graviton Yukawa correction to the Newtonian potential:

$$V(r) = {M \over r}\exp (- r/{\lambda _g}),$$

where λg = h/mg is the Compton wavelength of the graviton. The corresponding GW propagation speed would be given by

$${\left({{{{v_g}} \over c}} \right)^2} = 1 - {\left({{c \over {{f_g}{\lambda _g}}}} \right)^2},$$

with fg the frequency of radiation. The best solar-system tests provide the bound λg > 2.8 × 1012 km (or mg < 4.4 × 10−22 eV) [435]. By contrast, binary-pulsar dynamics only provide a bound mg < 7.6 × 10−20 eV [178]. As we discuss in this section, observations of binary GWs with LISA-like detectors could provide bounds competitive with these results, with the advantage of examining a rather different sector of gravitational physics, wave propagation. Two distinct methods have been proposed for this.

Comparing the phase of GW and EM signals. This technique offers a direct comparison of the speed of GWs with the speed of a radiation assumed to be null (light itself). For the technique to work, sources must be observable in both light and GWs, and the astrophysical delays (if any) between the two signals must be well understood and modeled. The most prominent low-frequency sources for this purpose are compact galactic binaries.

Let the difference between the arrival times of GWs and EM signals be

$$\Delta t = \Delta {t_a} - (1 + z)\Delta {t_e},$$

where Δta arises from propagation (the very effect we wish to measure), and Δta from different emission mechanisms or geometries. Here zDLH0/c is the redshift of the source, with the luminosity distance and H0 the value of the Hubble parameter. In terms of vg,

$${\epsilon _g} = 1 - {{{v_g}} \over c} = 1 \times {10^{- 11}}\left({{{1\,\,{\rm{kpc}}} \over {{D_L}}}} \right)\left({{{\Delta {t_a}} \over {1\,\,{\rm{s}}}}} \right),$$

which we relate to mg using the total relativistic energy,

$${\epsilon _g} \equiv 1 - {{{v_g}} \over c} = {1 \over 2}{\left({{{{m_g}{c^2}} \over {h{f_g}}}} \right)^2},$$

where fg is the GW frequency.

The measurement of Δt has been considered repeatedly in the literature [279, 139, 128]. The main difficulty lies with modeling the emission delay Δte: consider for instance AM CVn binaries, where a low-mass helium donor has expanded to fill its Roche lobe and is spilling mass onto a white-dwarf primary. The EM signal from these systems is greatly affected by the light emitted from the overflow stream impacting the accretion disk, and the light curve oscillates as the system orbits, alternately flashing the impact point toward and away from the observer. The times of maximum emission can be taken as reference for the EM phase, but how are they related to GW emission?

To evaluate this Δte, one may observe the compact binary at two epochs, ideally at opposite points across the Earth’s orbit [279, 139]. Under the assumption that Δte is constant, differencing the total Δt measured at the two epochs leaves a measure of Δta alone. However, the subtraction reduces Δta to what can be accumulated across the diameter of the Earth’s orbit, rather than across the entire distance to the binary. As a consequence, the strongest bound from known LISA verification binary would be λg > 3 × 1013 km (mg < 4.6 × 10−23 eV).

Alternatively, one may concentrate on eclipsing compact binaries, where the light curve varies due to the mutual eclipses of the binary components, allowing the orientation geometry of the system to be precisely determined as a function of time, and yielding an accurate measure of Δte. In this case the measured Δta is accumulated over the entire distance to the source. Only one eclipsing binary that would be observable with LISA-like detectors is currently known [230], but an analysis of their statistically-expected population suggests that LISA would obtain a bound λg > 2 × 1014 km (mg < 6 × 10−24eV) [128].

The reader may question whether it is appropriate to compare gravitons to photons, when the current bound on the putative mass of the photon is as high as mλ < 2 × 10−16 eV. However, the much higher frequency of optical photons compared to low-frequency gravitons leads to ϵλ < 3 × 10−33, much smaller than ϵg < 10−8 (for solar-system tests) [279], so a comparison based on speeds is indeed appropriate.

A related test using pulsar-timing observations would compare the GW-induced phase delays accumulated by photons traveling to Earth from different pulsars [281]: the delays depend on the graviton speed through a geometric factor that alters the expected Hellings-Downs correlation [228] that GWs will produce in the timing of pulsars located at different positions on the sky.

It might also be possible to observe simultaneous EM and GW signals from MBH mergers, using the approximate position of the source known from pre-merger GW observations to guide a follow-up campaign in the EM spectrum [267]. However, the nature of possible EM counterparts is extremely uncertain, so differences between the GW and EM phasing could be explained by uncertainties in the modeling of the EM signal. Therefore, it is unlikely that constraints from these systems will be competitive with galactic-binary constraints, or with the constraints from GW dispersion discussed in the following subsection.

Measuring the dispersion of gravitational-wave chirps. The chirping signals emitted by inspiraling binaries contain a range of frequency components: if the graviton has mass, the components propagate at different speeds, again given by Eq. (44). This effect can be modeled in the templates used to search for binary signals by including a λg dependence in the waveform phasing [470]. In the frequency-domain representation, the propagation effect appears as a “dephasing” term −β(πℳcf), where \(\beta = {\pi ^2}D{{\mathcal M}_c}/\lambda _g^2/(1 + z)\) with c the binary chirp mass, f the GW frequency, D the source distance, and z the source redshift. By comparison, the leading-order term in the post-Newtonian expansion of the phasing is 3/128(πℳcf)−1, while the λg correction contributes the same power of orbital frequency as the “1PN” term.

For space-based detectors, the best chirp-dispersion bounds will come from massive-black-hole systems; they improve slightly with the total binary mass and with better low-frequency (10−5 − 10−4 Hz) sensitivity. However, the expected bounds depend strongly on which other physical effects (such as spin-induced precessions, orbital eccentricity, higher waveform harmonics, the merger-ringdown phase) will be relevant in the detected systems. As a result, a variety of predictions have appeared in the literature [473, 70, 71, 32, 424, 485, 425, 259, 244]. Bounds as strong as λg > a few 1016 km seem possible, and would be strengthened by analyzing full catalogs of binary detections at once [78].

Instead of the chirping signals from inspiraling binaries, Jones [253] proposes a test of the GW dispersion relation using the waves from eccentric galactic binaries, which are emitted at multiple harmonics of the orbital frequencies; if at least one galactic binary has sufficient eccentricity, Jones claims sensitivity comparable to the chirp-dephasing measurements. Mirshekari et al. [314] extend the graviton-mass formalism to more general modified-gravity theories that predict violations of Lorentz invariance and modified dispersion relations for GW modes, given by

$${E^2} = {p^2}{c^2} + m_g^2{c^4} + {\mathbb A}{p^\alpha}{c^\alpha};$$

both mg and \(\mathbb A\) can be constrained together, given the α corresponding to specific theories, by inspiral-binary observations with ground and space-based detectors.

Parity violations. In GR parity is a conserved quantity, so left and right-circular polarized gravitational radiation propagates alike. Many attempts to formulate a quantum theory of gravity require the addition of a parity-violating Chern-Simons (CS) term to the Einstein-Hilbert action [14, 7, 363]:

$${S_{{\rm{CS}}}} = {1 \over {64\pi}}\int {\theta \,R^{\ast}R{{\rm{d}}^4}x,} \quad \,\,\,{\rm{where}}\quad R^{\ast}R = {1 \over 2}{R_{\alpha \beta \gamma \delta}}{\epsilon ^{\alpha \beta \gamma \nu}}R_{\mu \nu}^{\gamma \delta};$$

here Rαβγδ is the Riemann tensor, ϵαβμν is the Levi-Civita tensor density, and θ is a (possibly) position-dependent function that describes the coupling of the CS field to spacetime. This correction creates a difference in the propagation equations for the left- and right-circular GW polarizations, resulting in their amplitude birefringence: one circularly-polarized state is amplified through propagation, while the other is attenuated.

This effect is potentially observable with LISA-like detectors for MBH-binary inspirals at cosmological distances [6] (see also [491]), where the amplitude birefringence generates an apparent precession of the orbital plane of the binary. The CS correction accumulates with distance, and is larger for sources at higher redshifts. Orbital-plane precession will also arise from general-relativistic spin-orbit coupling, but the scaling of the precession with frequency is different, so the two effects can be distinguished, at least in principle.

For an equal-mass binary with redshifted masses of 106 M that is observed plane-on at a redshift z = 15, LISA could constrain the integrated CS contribution at the level of 10−19 [6]. This is several orders of magnitude better than solar-system experiments, which furthermore can only provide local constraints. Thus, LISA-like detectors may provide some hints as to the very quantum nature of gravity.

5.1.3 The quadrupole formula and loss of energy to gravitational waves

In theories that do not satisfy the strong equivalence principle, the internal gravitational binding energies of bodies can create a difference between the inertial dipole moment (i.e., the linear momentum, which is conserved) and the GW-generating gravitational dipole moment. Thus, alternative theories of gravity generally admit dipole radiation, but it is forbidden in GR, where the two moments are identical. Dipole radiation would be given at leading order by [471]

$${h_d}\sim{1 \over {{D_L}}}{{\rm{d}} \over {{\rm{d}}t}}(m_1^G{{\bf{x}}_1} + m_2^G{{\bf{x}}_2})\sim{{{\mu ^I}{\bf{v}}} \over {{D_L}}}\left({{{m_1^G} \over {m_1^I}} - {{m_2^G} \over {m_2^I}}} \right),$$

where x1,2 are the positions of the binary components, v their relative velocity, \(m_{1,2}^I\) and \(m_{1,2}^G\) are their inertial and gravitational masses respectively, μI is the inertial reduced mass, and DL is the luminosity distance to the observer.

For relativistic objects such as neutron stars (NS), the gravitational binding energy can be considerable and so can be the resulting loss of energy to dipolar GWs. Indeed, the experimental result that the orbital decay of the binary pulsar PSR1913+16 [293] adhered closely to GR’s quadrupole-formula prediction was sufficient to definitely falsify GR alternatives such as bimetric and “stratified” theories [469]. (Amusingly, certain theories even predict that dipole radiation carries away negative energy from a binary [469].) Thus it is factually correct to state that the indirect detection of GWs has already provided a strong test of GR.

By contrast, the binary pulsar could not falsify scalar-tensor theories in this way, because these are “close” to GR. For instance, although dipole radiation is predicted by Brans-Dicke theory and changes the progression of orbital decay, the coupling parameter ωBD can be adjusted to approximate GR results to any desired accuracy. GR is reproduced for ωBD → ∞, so experimental bounds on Brans-Dicke are lower bounds. The Hulse-Taylor binary pulsar does provide a bound on ωBD, but one that is not competitive with solar-system tests, among which the best comes from the Doppler tracking of the Cassini spacecraft, which sets ωBD > 40000 [471, 81]. However, other binary systems containing pulsars are known that provide constraints, which are competitive with solar-system constraints. The best constraints on scalar-tensor gravity (and also TeVeS gravity) come from the pulsar-white-dwarf binary J1738+0333 [186], which provides the limit ωBD > 25000.

LISA-like detectors can constrain ωBD by looking for dipole-radiation-induced modifications in the GW phasing of binary inspirals (monopole radiation is also present, but suppressed relative to the dipole), as long as at least one of the binary components is not a black hole: because of the no-hair theorem, black holes cannot sustain the scalar field that would lead to a differing and mI (as was recently confirmed in full numerical-relativity simulations [226]). This restriction can be circumvented by having non-asymptotically flat boundary conditions for the black hole [237]. If the scalar field is slowly varying far from the black hole (either as a function of time or space) then it can support a scalar field. This scenario was investigated numerically in [75], which found that accelerated single black holes and black-hole binaries would emit scalar radiation, in the latter case at twice the orbital frequency. If the asymptotic scalar-field gradient that supports the black-hole scalar hair is cosmological in origin, this effect will be negligible, but the possibility does exist in general. Except for these considerations, the canonical source for detecting this effect is the inspiral of a neutron star into a relatively low-mass central black hole, although the number of detections of such systems is likely to be very low [192].

Early studies [397, 473], based on simplified models of the waveforms and of the LISA sensitivity, estimated that for a 1.4 M neutron star inspiraling into a MBH, at fixed SNR = 10, the ωBD bounds would scale as

$${\omega _{{\rm{BD}}}} > 2 \times {10^4}{\left({{{\mathcal S} \over {0.3}}} \right)^2}\left({{{100} \over {\Delta {\Phi _D}}}} \right){\left({{T \over {1\,\,{\rm{yr}}}}} \right)^{7/8}}{\left({{{{{10}^4}{M_ \odot}} \over {{M_ \bullet}}}} \right)^{3/4}},$$

where \({\mathcal S}\) (the “sensitivity”) is a measure of the difference between the neutron-star and MBH self-gravitational binding energies per unit rest mass; ΔΦD is the dipole contribution to the GW phasing; T is the time of observation; and M is the MBH mass. However, this estimate is reduced by a factor of ten or more when more realistic waveforms are considered that include spin couplings [70, 71], spin-induced orbital precession and eccentricity [485]. Bounds can also be derived for a massive-scalar variant of Brans-Dicke theory [79], and are of order

$$\left({{{{m_s}} \over {\sqrt {{\omega _{{\rm{BD}}}}}}}} \right)\left({{\rho \over {10}}} \right) \lesssim {10^{- 19}}\,\,{\rm{eV,}}$$

(where is the mass of the scalar and ρ the detection SNR) for the intermediate-mass-ratio inspiral of a NS into a black hole with mass ≲ 103 M.

These results were obtained using only the leading order correction from the scalar radiation. In [495] the authors extended this calculation to all post-Newtonian orders, but in the extreme-mass-ratio limit by using the Teukolsky formalism. The conclusion, that constraints on massless scalar-tensor theories from GW observations will, in general, be weaker than those from solarsystem observations, was unchanged. The reason is that scalar-tensor theories are weak-field (infrared) corrections to GR and are therefore largest in the weak field, so the leading order correction captures the majority of the effect. Massive scalar-tensor theories were also considered in [110, 495]. In those theories, the primary observable consequence is the possible existence of “floating orbits” at which the scalar flux experiences a condition where GWs scatter off the central, massive body, emerging with more energy (extracted from the spin of the central body). The waves transfer that energy to the small orbiting body, increasing its orbital energy. This “super-radiant resonance” temporarily balances the GW flux. The transition of an EMRI through such a floating orbit is many orders of magnitude slower than the normal EMRI inspiral and can last more than a Hubble time. If an EMRI consistent with GR is observed it means that the EMRI not only did not pass through such a floating orbit during the timescale of the observation but could not have encountered one prior to the observation since it would not then have reached the millihertz band. Therefore, an observation of a single EMRI can constrain the massive scalar-tensor parameter space to many orders of magnitude greater precision than current solar-system observations.

Other modifications to the inspiral phasing. A number of other suggestions have been made for low-frequency GW tests of GR that do not quite fit a “modified energy-loss” description. For instance, dynamical Chern-Simons theory introduces nonlinear modifications in the binary binding energy and dissipative corrections at the same PN order [426, 483] that could be observed in the late inspiral, constraining the characteristic Chern-Simons length scale ξ1/4 to ≲ 105–106 km [487], comparable to current solar-system constraints [13] (advanced ground-based detectors could do even better, placing bounds of ≲ 10–100 km).

Corrections to the inspiral phasing will also arise if the spacetime outside the central object is not described by the Kerr metric or if additional energy is lost into scalar or other forms of radiation. This has been considered for various alternative theories of gravity; we discuss these results in detail in Section 6.2.6.

GW tails, which are due to the propagation of gravitational radiation on the curved background of the emitting binary, appear at a relative 1.5PN order (c−3) beyond the leading-order quadrupole radiation, and their observation would test the nonlinear nature of GR [88]. (This would be a null test of GR, since tails are included in the “standard” post-Newtonian inspiral phasing; see also the PN-coefficient tests discussed in Section 5.2.1.

Promoting Newton’s constant, G, to a function of time modifies both a binary’s binding energy and GW luminosity, and therefore its phasing. A three-year observation of a 104–105 M inspiral would constrain \(\dot G/G\) to ∼ 10−11 yr−1 [498]. The infinite Randall-Sundrum braneworld model [373] may predict an enormous increase in the Hawking radiation emitted by black holes [164, 436]. The resulting progressive mass loss may be observed as an outspiral effect in the quasi-monochromatic radiation of galactic black-hole binaries, as a correction to the inspiral phasing of a black-hole binary [484] and it would also affect the rate of EMRI events [306, 484]. The constraints on the size of extra dimensions coming from observations with LISA will, in general, be worse than those derivable from tabletop experiments. However, DECIGO observations of BH-NS binary mergers would be able to place a constraint about ten times better than tabletop experiments, assuming a detection rate of ∼ 105 binaries per year [484].

5.2 Tests of general relativity with phenomenological inspiral template families

As discussed above, quantitative tests of GR against modified theories of gravity evaluate how well the measured signals are fit by alternative waveform families, or (more commonly) by waveform families that extend GR predictions by including one or more modified-gravity parameters, such as ωBD for Brans-Dicke theory. To set up these tests we need to work within the alternative theory to derive sufficiently accurate descriptions of source dynamics, GW emission, and GW propagation. An alternative approach is to operate directly at the level of the waveforms by introducing phenomenological corrections to GR predictions: for instance, by modifying specific coefficients, or by adding extra terms.

This section discusses the first attempts to do so. So far these have concentrated on post-Newtonian waveforms [84] for circular, adiabatic inspirals, as described by the stationary-phase approximation in the frequency domain:

$$\tilde h(f) = A{f^{- 7/6}}\exp \left[ {i\Psi (f) + i{\pi \over 4}} \right],$$

where f is the GW frequency; A is the GW amplitude, given by geometrical projection factors × \({\mathcal M}_c^{5/6}/{D_L}\) (with \({{\mathcal M}_c} = {({m_1}{m_2})^{3/5}}/{({m_1} + {m_2})^{1/5}}\) the chirp mass and DL the luminosity distance); and for simplicity we omit the nontrivial response of space-based detectors, as well as the PN amplitude corrections. The phasing Ψ(f) is expanded as

$$\Psi (f) = 2\pi f{t_c} + {\Phi _c} + \sum\limits_{k \in {\mathbb Z}} {\left[ {{\psi _k} + \psi _k^{\log}\log f} \right]{f^{(k - 5)/3}}}.$$

For binaries with negligible component spins, the post-Newtonian phasing coefficients ψk are currently known up to k = 7 (3.5 PN order), and in GR they are all functions of the two masses m1 and m2 alone (although \({\psi _1} = \psi _{0 - 4}^{\log} = \psi _7^{\log} = 0\), and ψ5 is completely degenerate with Φc, so it is usually omitted) [86, 87, 85, 30].

5.2.1 Modifying the PN phasing coefficients

Arun et al. [28] propose a test of GR based on estimating all the ψk simultaneously from the measured waveform as if they were free parameters, in analogy to the post-Keplerian formalism [293, Section 4.5]. The value and error estimated for eachFootnote 4 ψk, together with its PN functional form as a function of m1 and m2, determines a region in the m1m2 plane. If GR is correct, all the regions must intersect near the true masses, as shown in Figure 5. The extent of the intersection provides a measure of how precisely GR is verified by a GW observation. A Fisher-matrix analysis [28] suggests that, for systems at the optimistic distance of 3 Gpc, LISA could measure ψ0 to ∼ 0.1% and ψ2 and ψ3 to 10%, but that the fractional error on higher-order terms would be at best ∼ 1.

Figure 5
figure 5

Estimating all the binary-inspiral phasing coefficients ψk of Eq. (51) yields differently shaped regions in the m1m2 plane, which must intersect near true mass values if GR is correct. Image reproduced by permission from [27], copyright by APS.

However, this setup may understate the power of this kind of test, since most of the estimation uncertainty in the ψk arises from their mutual degeneracy — that is, from the fact that it is possible to vary the value of a subset of without appreciably modifying the waveform. This degeneracy should not impact the degree to which the data is deemed consistent with GR. In a follow-up paper [27], Arun et al. propose a revised test whereby the masses are determined from ψ0 and ψ2, while the other ψk(as well as \(\psi _5^{\log}\) and \(\psi _6^{\log}\)) are individually estimated and checked for consistency with GR. In this case, even for sources at z = 1 (∼ 7 Gpc), all the parameters can be constrained to 1% (a few % for ψ4 0.1% for ψ3), at least for favorable mass combinations. Performing parameter estimation for the eigenvectors of the ψk Fisher matrix [342] indicates which combinations of coefficients can be tested more accurately for GR violations.

However, it is not clear what significance with regards to testing GR should be ascribed to the accuracy of measuring the ψk, since we do not know at what level we could expect deviations to appear. By contrast, if we were to find that, say, the nσ regions in the m1m2 plane do not intersect, we could make the statistically-meaningful statement that GR appears to be violated at the nσ level.

Del Pozzo et al. [148] and Li et al. [284, 285] propose a more satisfying formulation for these tests, based on Bayesian model selection [211], which compares the Bayesian evidence, given the observed data, for the pure-GR scenario against the alternative-gravity scenarios in which one or more of the ψk are modified. The issue of significance discussed above reappears in this context as the inherent arbitrariness in choosing prior probabilities for the ψk, but Del Pozzo et al. argue that this does not affect the efficacy of the model-comparison test in detecting GR violations. (For a comprehensive discussion of model selection in the context of GW detection, rather than GR tests, see also [456, 457, 291]. For more recent applications of this formalism to ground-based detectors, see [315].)

5.2.2 The parameterized post-Einstein framework

In [497], Yunes and Pretorius propose a similar but more general approach, labeling it the “parameterized post-Einsteinian” (ppE) framework. For adiabatic inspirals, they propose enhancing the stationary-phase inspiral signal with extra powers of GW frequency:

$${\tilde h^{{\rm{ppE}}}}(f) = {\tilde h^{{\rm{GR}}}}(f) \times (1 + \alpha {(\pi {\mathcal M}f)^a})\exp \left[ {i\beta {{(\pi {\mathcal M}f)}^b}} \right],$$

where \({{\tilde h}^{{\rm{GR}}}}\) is given in Eqs. (50) and (51). While the initial suggestion in [497] is to consider a, b ∈ ℝ, there are analytical arguments why a and b should be restricted to values a = ā/3 and \(a = \bar a/3\), with \(b = \bar b/3\) [120], which reproduces Arun’s PN-coefficient scheme for \(\bar b \ge - 5\). Nevertheless, this representation can reproduce the leading-order effects of several alternative theories of gravity (see Table 2).

Table 2 Leading-order effects of alternative theories of gravity, as represented in the ppE framework [Eq. (52)]. For GR α = β = 0. This table is copied from [134], except for the two entries labeled with an asterisk. The quadratic curvature ppE exponent given in [134] was b = −1/3, coming from the conservative dynamics. However, it was shown in [483] that the dissipative correction is larger, giving the value b = −7/3 quoted above. The dynamical Chern-Simons ppE exponent given in [134] was b = 4/3, which was derived using the slow-rotation metric accurate to linear order in the spin [496]. At quadratic order in the spin [488], the corrections to both conservative and dissipative dynamics occur at lower post-Newtonian order, giving b = 1/3 [487].

In [497], Yunes and Pretorius are motivated by the possibility of detecting GR violations, but also by the “fundamental bias” that would be incurred in estimating GW-source parameters using GR waveforms when modified GR is instead correct. In [134], Cornish et al. reformulate the detection of GR violations described by ppE as a Bayesian model-selection problem, similar to the PN-coefficient tests discussed in Section 5.2.1. Figure 6 shows the β bounds, for various fixed b, that could be set with LISA observations of m1,2 ∼ 106 M binary inspirals at z = 1 and 3. For b corresponding to modifications in higher-order PN terms (which require strong-field, nonlinear gravity conditions to become evident), the bounds provided by LISA-like detectors become more competitive with respect to solar-system and binary-pulsar results (where weak-field conditions prevail).

Figure 6
figure 6

Constraints on phasing corrections in the ppE framework, as determined from LISA observations of ∼ 106 M massive-black-hole inspirals at z = 1 and z = 3. The figure also includes the β bounds derived from pulsar PSR J0737-3039 [492], the solar-system bound on the graviton mass [435], and PN-coefficient bounds derived as described Section 5.2.1. The spike at b = 0 corresponds to the degeneracy between the ppE correction and the initial GW-phase parameter. (Adapted from [134].)

A ppE-like model including dipole radiation in addition to quadrupole radiation but no other modifications to the waveform phasing was described in [26] and was discussed in Section 5.1.1 above. The full ppE framework was extended to include all additional polarization states and higher waveform harmonics in [120]. The final form was motivated by considering Brans-Dicke theory, Lightman-Lee theory and Rosen’s theory. In the most general form, Eq. 52 is modified to

$$\begin{array}{*{20}c} {{{\tilde h}^{{\rm{ppE}}}}(f) = {{\tilde h}^{{\rm{GR}}}}(f) \times \exp \left[ {i\beta {{(\pi {\mathcal M}f)}^b}} \right]\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ {+ ({\alpha _ +}{F_ +} + {\alpha _ \times}{F_ \times} + {\alpha _b}{F_b} + {\alpha _L}{F_L} + {\alpha _{{\rm{sn}}}}{F_{{\rm{sn}}}} + {\alpha _{{\rm{se}}}}{F_{{\rm{se}}}})\quad \quad \quad \quad \quad \quad \quad} \\ {\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \times {{(\pi {\mathcal M}f)}^a}\exp \left[ {- i\Psi _{{\rm{GR}}}^{(2)} + i\beta {{(\pi {\mathcal M}f)}^b}} \right]} \\ {+ ({\alpha _ +}{F_ +} + {\alpha _ \times}{F_ \times} + {\alpha _b}{F_b} + {\alpha _L}{F_L} + {\alpha _{{\rm{sn}}}}{F_{{\rm{sn}}}} + {\alpha _{{\rm{se}}}}{F_{{\rm{se}}}})\,\,\,\quad \quad \quad \quad \quad \quad} \\ {\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \times {{(2\pi {\mathcal M}f)}^c}{\eta ^{{1 \over 5}}}\exp \left[ {- i\Psi _{{\rm{GR}}}^{(1)} + i\delta {{(2\pi {\mathcal M}f)}^b}} \right],\,\,} \\ \end{array}$$

in which \(\psi _{{\rm{GR}}}^{(l)}\) is the GR phase of the lth waveform harmonic, η = m1m2/(m1 + m2)2 is the symmetric mass ratio, and FA is the detector response to a GW in polarization mode A. The ppE parameters are ({αA}, {γA}, a, β, b, c, γ, d).

The authors of [120] considered two further variants of this scheme. One variant restricted the coefficients in the expansion so that they were not all independent, but were related to one another via energy conservation. The second variant included this interdependence of the parameters, and also accounted for modified propagation effects by introducing additional “phase-difference” parameters into the second and third terms. As yet, this fully extended ppE scheme has not been used to explore the constraints that will be possible with space-based detectors.

An analysis using a waveform model with higher harmonics and spin precession, but not alternative polarization states, was carried out in [244]. Its authors considered modifications to a subset of the phase and amplitude parameters only, which corresponded to certain post-Newtonian orders and could therefore also be interpreted in terms of modifications to the pN phase coefficients as discussed in Section 5.2.1. The estimated bounds derived using this more complete waveform model were typically one to two orders of magnitude better than previous estimates for highmass systems, but basically the same for low-mass systems. This is unsurprising, since the effects of spin-precession and higher harmonics will only be important late in the inspiral. High-mass systems generate lower frequency GWs and are therefore only observable for the final stages of inspiral, merger and plunge. Therefore, late-time corrections are proportionally more important for those systems. For high-mass systems, the authors of [244] estimated that LISA would be able to measure deviations in the phasing parameters to a precision ΔΨn ∼ 0.1, 10, 50, 500, 1000 for n = −1, 0, 1, 3/2, 2 respectively, where n denotes the post-Newtonian order, with ΔΨn the coefficient of f2n/3–5/3 in the waveform phase. Using the same model, they also estimated that LISA could place a bound of λg > 1 × 1016 km on the graviton Compton wavelength when allowing for correlations between the different phase-modification parameters ΔΨn. This was discussed in Section 5.1.2.

An extension of the ppE framework to EMRI systems requires a model in which orbits can be both eccentric and inclined. To develop this, Vigeland et al. [458] derive a set of near-Kerr spacetime metrics that satisfy a set of conditions, including the existence of a Carter-constant-like third integral of the motion, as well as asymptotic flatness. The solutions, which were previously found in [65], are restricted to a physically interesting subset by setting to zero any metric coefficients not required to reproduce known black-hole solutions in modified gravity, and by applying the peeling theorem (i.e., by requiring that the mass and spin of the black hole not be renormalized by the perturbation).

The existence of a third integral is not a requirement for black-hole solutions, but in general its absence allows ergodic behavior in the orbits. This is discussed as a potential observable signature for deviations from GR in Section 6.2.5. However, data-analysis pipelines designed for GR waveforms may be insensitive to such qualitatively different systems. Therefore the existence of a third integral is a practical assumption for interpretation once a GR-like EMRI has been observed.

In [201], Gair and Yunes construct gravitational waveforms for EMRIs occurring in the metrics of [458], based on the analytic kludge model constructed for GR EMRIs [46]. The waveforms provide a ppE-like model for EMRIs that can be used in the same way as the circular ppE framework. Parameter-estimation results with these ppE-EMRI models have not yet appeared in the literature.

5.2.3 Other approaches

In [451], Vallisneri provides a unified model-comparison performance analysis of all modified-GR tests that is valid for sufficiently-loud signals, and that yields the detection SNR required for a statistically-significant detection of GR violations as a simple function of the fitting factor FF between the GR and modified-GR waveform families. The FF measures the extent to which one can reabsorb modified-GR effects by varying standard-GR parameters from their true values. Vallisneri’s analysis is valid in the limit of large SNR, and may not be applicable to all realistic scenarios with finite SNRs.

An alternative to modifying frequency-domain inspiral waveforms is offered by Cannella et al. [106, 105]. They propose tests based on the effective-field-theory approach to binary dynamics [208], which expands the Hilbert+point-mass action as a set of Feynman diagrams. In this framework, GR corrections can be introduced by displacing the coefficients of interaction vertices from their GR values. For instance, multiplying the three-graviton vertex by a factor (1+β3) affects the conservative dynamics of the theory in a manner similar to the PPN parameter β, but also has consequences on radiation. A similar modification to the four-graviton vertex (parameterized by β4) yields effects at the second post-Newtonian order, so it has no analog in PPN. Cannella et al. argue that GR-violating values of β3 and β4 would not be detectable with GW signals, but they would instead generate small systematic errors in the estimation of standard binary parameters. However, a thorough analysis of the detectability of such deviations has not been carried out, so this conclusion may be modified in the future.

5.3 Beyond the binary inspiral

According to GR, black-hole mergers are the most energetically luminous events in the universe, with L ∼ 1023 L ∼ 1056 erg/s, regardless of mass: at their climax, they outshine the combined power output of all the stars in the visible universe. Nevertheless, second-generation ground-based GW interferometers are expected to yield the first detections of black-hole mergers [1], but only with rather modest SNRs. By contrast, LISA-like GW detectors would observe the mergers of heavier black holes, with SNRs as high as hundreds or more throughout the universe, offering very accurate measurements of the merger waveforms. Massive-black-hole coalescences may feature significant spins and eccentricity, further enriching the merger phenomenology [80, 380].

The powerful merger events correspond to very relativistic velocities and very strong gravitational fields, so much that the PN expansion of the field equations cannot be applied, and we must resort to very complex and costly numerical simulations [117]. This makes it challenging to encode the effects of plausible GR modifications in the signal model. The first ppE paper [497] makes such an attempt on the basis of a very crude model of merger-ringdown signals, which would probably be insufficient even to phase-match the GR signals themselves. Broad efforts are currently under way to build phenomenological full-waveform (inspiral-merger-ringdown) models [4, 344, 438]; these involve tunable parameters that are adjusted to match the waveforms produced by numerical relativity. Such parameters could also be used to encode non-GR effects in the merger-ringdown. However, at this time designing such extensions in a principled way seems daunting.

A simpler approach, proposed by Hughes and Menou [243], involves the golden binaries for which system parameters can be estimated from both inspiral and ringdown GWs. The former encode the parameters of the binary, while the latter encode the parameters of the final black hole formed in the merger. The functional relation of the two sets of parameters can then be compared with the predictions of numerical relativity, providing a null test of the strong-field regime of GR.

Hughes and Menou focus on measuring the remnant’s mass deficit, which equals the total energy carried away by GWs, so their definition of golden binaries selects those in which the mass deficit can be estimated to better than 5%. For LISA, these systems tend to have component masses between a few 105 M and a few 106 M, and to be found at z ∼ 2–3, making up 1–10% of the total merger rate depending on black-hole population models. The estimates of [243] are based on rather simple waveform models that omit a range of physical effects, so they could be seen as conservative, given that increased waveform complexity tends to improve parameter-estimation accuracy. A more complete analysis was carried out in [295], but in the context of ground-based GW detectors rather than space-based detectors.

6 Tests of the Nature and Structure of Black Holes

It has become apparent over the last few decades that the centers of most galaxies harbor massive, dark compact objects with masses in the range 106–109 M. It is clear that these objects play a very important role in the evolution of galaxies. This is exemplified by the very tight measured correlation (the M-σ relation) between the mass of the central dark objects and the velocity dispersion of stars in the central spheroid [174, 441]. It is generally accepted that the central dark objects are black holes described by the Kerr metric, but there is presently no definitive proof of that assumption. The alternatives to the black-hole interpretation include dense star clusters, supermassive stars, magnetoids, boson stars, and fermion balls. Support for the blackhole interpretation has arisen as a result of both observational and theoretical work. A short review of the evidence may be found in [115], although we summarize some key details in Section 6.1.

As described in Section 2.3, the theoretical basis for the belief that these objects are Kerr black holes has arisen from proofs that singularities inevitably form during gravitational collapse [341, 351, 224] and that the Kerr solution is the unique stationary and axisymmetric black-hole solution in GR [248, 114, 379]. The uniqueness of the Kerr solution is sometimes referred to as the “no-hair” theorem, since the solution is characterized by just two parameters, the black-hole mass M and angular momentum (per unit mass) a.

The field of any vacuum, axisymmetric spacetime in GR can be characterized by a sequence of mass and current multipole moments, which we denote as Ml; and Sl; respectively [204, 217]. For Kerr spacetimes these multipole moments are all determined by the mass and spin via

$${M_l} + {\rm{i}}{S_l} = M{({\rm{i}}a)^l},$$

so these spacetimes require no additional independent parameters or “hair”. The proof of the uniqueness theorem relies on various assumptions beyond the validity of the Einstein equations, so a demonstration of the non-Kerr nature of astrophysical black holes could reveal exotic physics within GR. It might also indicate the presence of material in the spacetime outside the black-hole horizon, or a deviation from GR in the true theory of gravity. In this section we discuss the potential of space-based low-frequency GW detectors to probe the structure of massive compact objects and the possible interpretation of these results. Short reviews of the prospects for testing relativity with measurements of black-hole “hair” can be found in [73, 191].

6.1 Current observational status

The observational evidence for the presence of black holes in the centers of galaxies has come mainly from the studies of Active Galactic Nuclei (AGN). These are known to be extremely energetic and also compact — typical luminosities of 1046 erg s−1 are produced in regions less than 1016 m across [273]. The inferred AGN efficiency of ∼ 10% is much greater than the typical efficiencies of nuclear fusion processes 1%), implying the need for a very deep relativistic potential. X-ray observations show variability on timescales of less than an hour, while observations of iron lines indicate the presence of gas moving at speeds of several thousand km per second [273]. Radio observations of water maser discs are consistent with Keplerian motion around very compact central objects. In the spiral galaxy NGC 4258 VLBA observations have indicated a disc with an inner (detected) edge at ∼ 0.1 pc, around an object of mass 3.6 × 107 M [316]. Such compactness cannot be realized by a stellar cluster. In addition, about 10% of AGNs are associated with jets, which move at highly relativistic velocities and persist for millions of years. This requires a relativistic potential that has a preferred axis that is stable over very long timescales. AGNs are also remarkably similar over several decades of mass, which favors the black-hole explanation, again because Kerr black holes are characterized by just M and a.

In the Milky Way, evidence for the presence of a black hole coincident with the Sgr A* radio source has come from observations of stellar motions. These are completely consistent with Keplerian motion around a point source of mass M ∼ 4 × 106 M [206, 205]. One star, S2, has been observed for one complete orbit and from this it has been possible to put a limit of 0.066 on the extended fraction of the central mass that could be contained between pericenter and apocenter of the orbit of S2. At perihelion S2 was ∼ 100 AU from the central object, which provides a fairly tight constraint on its compactness.

Electromagnetic observations can rule out stellar clusters as the explanation for the massive central objects, but some of the exotic alternatives remain. X-ray emission comes from the very inner regions of accretion discs, but uncertainties in the radius from which the emission is coming and in the mass and spin of the central object limit their utility for probing the structure of the central object [372]. It is also possible to construct very compact boson star spacetimes [261] that could not be ruled out from electromagnetic observations alone. The same applies to spacetimes with a naked singularity. By contrast, GW observations will probe the spacetime structure as the object proceeds through the inspiral and then passes the innermost stable orbit and plunges into the horizon of the central object, if a horizon exists. We discuss the prospects for using GW observations to probe black-hole structure in the following Sections 6.2 and 6.3.

6.2 Tests of black-hole structure using EMRIs

6.2.1 Testing the “no-hair” property

Equation (54) tells us that a Kerr black hole is uniquely characterized by two parameters. If we can measure three multipole moments of the spacetime, we can check that they are consistent with Eq. (54). If they are not, then the object cannot be a Kerr black hole. Boson stars will typically have more independent multipole moments. In a certain class of models of rotating boson stars, the boson star can be uniquely characterized by three multipole moments [389, 73], so a LISA measurement of four multipole moments could also exclude these models as an explanation of the data.

GWs from EMRIs are complicated superpositions of different frequency components, found at harmonics of the three fundamental frequencies of the orbit: the orbital frequency and the frequencies of precession of the perihelion and of the orbital plane [154]. This complex structure encodes detailed information about the spacetime in which the GWs are generated. The details of this encoding were first worked out by Ryan [387]. If the spacetime is assumed to be stationary and axisymmetric, it can be shown that the Einstein equations reduce to a single equation, the Ernst equation, for a complex scalar function, the Ernst potential [165]. By using the Ernst potential and expressions due to Fodor et al. [183] that relate this potential to the multipole moments of the spacetime, Ryan was able to study the properties of orbits in vacuum and axisymmetric spacetimes that possess an arbitrary set of mass and current multipole moments. Circular and equatorial orbits do not show perihelion or orbital plane precession. However, if such an orbit is given a small radial or vertical perturbation, it will undergo small oscillations at frequencies (the “epicyclic” frequencies) that correspond to the perihelion or orbital plane precession frequencies of nearly circular and nearly equatorial orbits respectively. These frequencies can be readily computed. For the arbitrary stationary axisymmetric spacetimes considered in [387] one finds

$$\begin{array}{*{20}c} {{{{\Omega _r}} \over \Omega} = 3{{(3M\Omega)}^{{2 \over 3}}} - 4{{{S_1}} \over {{M^2}}}(M\Omega) + \left({{9 \over 2} - {3 \over 2}{{{M_2}} \over {{M^3}}}} \right){{(M\Omega)}^{{4 \over 3}}} - 10{{{S_1}} \over {{M^2}}}{{(M\Omega)}^{{5 \over 3}}}\quad \quad \quad \quad \quad \quad \quad} \\ {+ \left({{{27} \over 2} - 2{{S_1^2} \over {{M^4}}} - {{21} \over 2}{{{M_2}} \over {{M^3}}}} \right){{(M\Omega)}^2} + \left({- 48{{{S_1}} \over {{M^2}}} - 5{{{S_1}{M_2}} \over {{M^5}}} + 9{{{S_3}} \over {{M^4}}}} \right){{(M\Omega)}^{{7 \over 3}}} + \cdots ,} \\ {{{{\Omega _z}} \over \Omega} = 2{{{S_1}} \over {{M^2}}}(M\Omega) + {3 \over 2}{{{M_2}} \over {{M^3}}}{{(M\Omega)}^{{4 \over 3}}} + \left({7{{S_1^2} \over {{M^4}}} + 3{{{M_2}} \over {{M^3}}}} \right){{(M\Omega)}^2}\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad} \\ \end{array}$$
$$+ \left({11{{{S_1}{M_2}} \over {{M^5}}} - 6{{{S_3}} \over {{M^4}}}} \right){(M\Omega)^{{7 \over 3}}} + \cdots ,$$

where Ω is the angular (ϕ) frequency of the circular orbit being perturbed, Ωr and Ωz are the perihelion and orbital plane precession frequencies, and Ml/Sl denote the mass/current multipole moments of the spacetime metric, as in Eq. (54). The primary conclusion from Eqs. (55)(56) is that the various multipole moments enter the different terms in this expansion at different orders in Ω. The precession frequencies and orbital frequency could be extracted from GW observations, so these expansions are, in principle, observable. We can use this information to “map” the spacetime structure near the central object and verify that the multipole moments are consistent with the no-hair property (54) that we expect for a Kerr black hole. This technique is sometimes termed “bothrodesy” or “holiodesy”Footnote 5 by analogy with geodesy, in which observations of the motion of satellites are used to probe the gravitational field of the Earth.

The multipole moments are also encoded in the total orbital energy lost as the orbital frequency changes by a unit logarithmic interval

$$\begin{array}{*{20}c} {{{\Delta E(f)} \over \mu} = {1 \over 3}{{(M\Omega)}^{{2 \over 3}}} - {1 \over 2}{{(M\Omega)}^{{4 \over 3}}} + {{20} \over 9}{{{S_1}} \over {{M^2}}}{{(M\Omega)}^{{5 \over 3}}} + \left({- {{27} \over 8} + {{{M_2}} \over {{M^3}}}} \right){{(M\Omega)}^2}} \\ {{{28} \over 3}{{{S_1}} \over {{M^2}}}{{(M\Omega)}^{{7 \over 3}}} + \left({- {{225} \over {16}} + {{80} \over {27}}{{S_1^2} \over {{M^4}}} + {{70} \over 9}{{{M_2}} \over {{M^3}}}} \right){{(M\Omega)}^{{8 \over 3}}}} \\ {+ \left({{{81} \over 2}{{{S_1}} \over {{M^2}}} + 6{{{S_1}{M_2}} \over {{M^5}}} - 6{{{S_3}} \over {{M^4}}}} \right){{(M\Omega)}^3} + \cdots} \\ \end{array}$$

A more powerful observable than the three discussed so far is the number of cycles that a trajectory spends near a particular frequency

$$\Delta {\mathcal N}(f) = {{{f^2}} \over {{\rm{d}}f/{\rm{d}}t}} = {f^2}{{{\rm{d}}E/{\rm{d}}f} \over {{\rm{d}}E/{\rm{d}}t}},$$

but this is not as clean an observable as the precession frequencies, since it requires computing the rate of energy loss to GWs in an arbitrary spacetime. Ryan used this formalism in conjunction with a post-Newtonian waveform model to estimate LISA’s capability to measure the spacetime multipoles [388]. He considered nearly circular and nearly equatorial inspirals, and found that LISA’s ability to determine the spacetime multipoles degraded as more multipoles were included in the waveform model. The typical errors that Ryan found are in Table 3. The conclusion was that LISA would be able to make moderately accurate measurements of the lowest three multipole moments, but probably no more.

Table 3 Accuracy with which a LISA observation could determine the multipole moments of a spacetime decreases as more multipoles are included in the model, taken from Ryan [388]. The third column indicates the highest multipole included in the particular model. Results are shown for two typical cases, a 10 M + 105 M inspiral and a 10 M + 106 M inspiral; in both cases the SNR of the inspiral is 10.

Ryan’s analysis was restricted to circular and equatorial orbits, but a counting argument suggests that spacetime mapping should still be possible for generic orbits [283]. One complication is that the evolution of the orbital elements must also be inferred from the observation, which spoils the nice form of the expansions (55)(56) [195], since all the multipole moments now enter at each order of the expansion. However, this would also be true if the expansions for circular-equatorial orbits were rewritten as an expansion in some initial frequency, Ω0, which would more closely represent a band-limited observation. In practice, the lowest-order multipole dominates the lowest term in the expansion and so on, which makes spacetime mapping possible in practice.

The Ryan formalism neatly illustrates why spacetime mapping with EMRIs is possible, but it is not a very practical scheme. We expect that the massive central objects are indeed Kerr black holes and so we really want to consider what imprint small deviations from Kerr will leave on the emitted GWs. A multipole-moment expansion is not a very practical way to do this, as the Kerr metric has an infinite number of nonzero multipoles. Several authors have adopted the approach of constructing spacetimes given by Schwarzschild-Kerr plus a small deviation, and have examined the properties of geodesics in those spacetimes.

Collins and Hughes [125] considered a static deviation from the Schwarzschild metric. This was constructed by writing the metric in Weyl coordinates and adding a quadrupole perturbation to the potential (in these coordinates, the potential equation reduces to the flat-space Laplacian for the (ρ, z) cylindrical coordinates, which facilitates writing solutions). They considered two types of quadrupole perturbation: a torus around the black hole, and the addition of a point mass at each pole. In the second case, the spacetime necessarily contains line singularities running from the point masses either to infinity or to the black-hole horizon, which are needed to support the point masses. The solutions are perturbative, in that the authors kept only the terms that are linear in the deviation from Schwarzschild. Collins and Hughes explored the properties of orbits in these spacetimes by comparing the precession and orbital frequencies of equatorial orbits in the spacetime to orbits with the same orbital parameters in Kerr. They found that there were measurable differences in the perihelion precession in the strong field: for instance, at a radius of ∼ 20 M for a 2% perturbation of the black hole, the trajectory would accumulate one radian of dephasing in ∼ 1000 orbits. Collins and Hughes coined the term “bumpy black hole” to describe spacetimes of this type.

Glampedakis and Babak [207] took a different approach to studing deviations from Kerr. Starting from the Hartle-Thorne metric [220, 221], which is an exact solution to Einstein’s equations describing the spacetime outside of a slowly, rigidly rotating axisymmetric object, the authors constructed a spacetime with metric of the form \({g_{\mu \nu}} = g_{\mu \nu}^{{\rm{Kerr}}} + \varepsilon {h_{\mu \nu}} + O(\delta {M_{l \ge 3}},\,\delta {S_{l \ge 2}})\), working perturbatively and keeping only the δM2 perturbation in the quadrupole mass moment. They termed the resulting spacetime a “quasi-Kerr” solution. A comparison of the frequencies of eccentric equatorial geodesics in the quasi-Kerr spacetime to the same geodesics in the Kerr spacetime indicated that it would take only ∼ 100 cycles to accumulate a π/2 phase shift for a ∼ 1% deviation from Kerr. They also computed waveform overlaps and found that, over the radiation-reaction timescale, the overlap of the waveforms for an orbit with a semilatus rectum of ten geometric radii (GM/c2) was ∼ 20%, 50%, 90%, 98% for inspirals with mass ratio μ/M = 10−6, 10−5, 10−4, 10−3 respectively.

A third approach to analyzing deviations from the Kerr spacetime was considered by Gair et al. [195], who studied geodesic motion in a family of exact spacetimes due to Manko and Novikov [303], which include the Kerr spacetime for a specific choice of parameters. By using exact solutions of the Einstein field equations, they obtained solutions that were valid everywhere, in contrast to the perturbative solutions considered in [125, 207], which break down near the central object. However, this scheme offers less control over the multipole moments, since it is possible to choose the lowest multipole that differs from Kerr, but then the higher multipoles must also change. Gair et al. [195] studied observable properties of the orbits, including the variation of the precession frequencies of nearly circular and nearly equatorial orbits as a function of orbital frequency, and the loss of the third integral of the motion (see Section 6.2.5).

These three papers outlined different ways to approach the problem of spacetime mapping in practice. However, none of the analyses were complete as they did not consider inspirals. Collins and Hughes and Glampedakis and Babak also ignored waveform confusion by assuming that the orbital elements were the same between the orbits under consideration. Glampedakis and Babak did discuss the importance of confusion and the role of the inspiral evolution in breaking such degeneracies, but no inspiral results were included in the published paper. Observationally, the correct orbits to compare will be those with the same frequencies since we have no way to determine the orbital radius or eccentricity directly. This was the approach adopted in [195]. Assessing the confusion problem is relatively easy, but including inspiral is very hard in general, since the presence of excess multipole moments in the spacetime leads to changes in the rates of energy and angular momentum loss, which must also be included in the analysis. Progress can be made in the presence of small deviations by including only the leading-order contributions to the radiation reaction from the multipole moments. This is an open area of research, although Barack and Cutler [47] carried out a preliminary assessment using post-Newtonian EMRI waveforms [46] augmented with the leading-order contribution of an excess quadrupole moment to the precession and inspiral rates. The resulting waveforms were an improvement in comparison to Ryan’s analysis [388], since they included orbital eccentricity and inclination, and were filtered through an approximation of the LISA response. Barack and Cutler performed a Fisher-matrix analysis of parameter-estimation uncertainties, and hence correctly accounted for the confusion issue. For this simple model, they found that a single LISA observation of the inspiral of a 10 M black hole into a 105.5 M, 106 M or 106.5 M black hole could measure the deviation from Kerr of the quadrupole moment to an accuracy of Δ(M2/M3) ∼ 10−4,10−3, 10−2, while simultaneously measuring the mass and spin of the central object to fractional accuracies of 10−4. This suggests that a LISA-like observatory would be able to perform high-precision tests of the no-hair property of massive compact objects in galactic centers. To put these numbers in perspective, a boson star may have a quadrupole moment that is ∼ 100 times that of a Kerr black hole with the same mass and spin [389], so it could easily be excluded.

6.2.2 Probing the nature of the central object

During an EMRI, the inspiraling object interacts gravitationally with the horizon of the central black hole. This can be thought of as a tidal interaction — the gravitational field of the small body raises a tide on the horizon that is dragged around through the orbit, leading to dissipation of energy — or as energy being lost by GWs falling into the black hole. Fang and Lovelace [170] explored the nature of the tidal-coupling interaction by perturbing a Schwarzschild black hole with a distant orbiting moon. They found that the time-dependent piece of the perturbation affected the orbit in an unambiguous way: a time-varying quadrupole moment is induced on the blackhole horizon that is proportional to the time derivative of the moon’s tidal field. This quadrupole perturbation extracts energy and angular momentum from the orbit at the same rate that energy and angular momentum enter the horizon. However, the effect of the time-independent piece of the perturbation remained ambiguous. Working in the Regge-Wheeler gauge, Fang and Lovelace found that this piece vanished, in contrast to a previous result by Suen [432], who used a different gauge. This ambiguity leads to an ambiguity in the phase of the induced quadrupole moment as measured in a local asymptotic rest frame (LARF), although the phase of the bulge relative to the orbiting moon is well defined (using a spacelike connection between the moon and the black hole, Fang and Lovelace found that the horizon shear led the horizon tidal field by an angle of 4MΩ, where Ω is the angular velocity of the moon). The ambiguity of interpretation in the LARF makes it impossible to define the polarizability of the horizon or the phase shift of the tidal bulge in a body-independent way. Fang and Lovelace left open the possibility of developing a body-independent language to describe the response of the central object to tidal coupling, but as yet this has not happened.

Although the nature of the response of the central object to tidal coupling may be difficult to characterize from GW observations, the total energy lost to tidal interactions is a good observable. Ryan’s original theorem [387] ignored tidal coupling, but it was later generalized by Li and Lovelace [283]. They found that the GWs propagating to infinity depended only very weakly on the inner boundary conditions (i.e., on the nature of the central object). This means that the spacetime’s multipole structure can be inferred from the outgoing radiation field in the usual way, and hence the expected rate of orbital energy loss, assuming no energy loss into the central body, can be calculated. The rate of inspiral measures the actual rate of orbital energy loss, and the difference then gives the rate at which energy is lost to the central object, which is a direct measure of the tidal coupling. Li and Lovelace estimated that the ratio of the change in energy radiated to infinity due to the inner boundary condition to the energy in tidal coupling scales with the orbital velocity as v5. Therefore, it should be possible to simultaneously determine the spacetime structure and the tidal coupling through low-frequency GW observations.

Information about the central object can also come from the transition to plunge at the end of the inspiral. In a black-hole system, we expect GW emission to cut off sharply as the orbit reaches the innermost stable orbit and then plunges rapidly through the horizon. If there is no horizon in the system, the orbit may instead enter a phase where it passes into and out of the material of the central object. This was explored for boson-star models in [261]: Kesden et al. found that persistent radiation after the apparent innermost orbit could be a clear signature of the presence of a supermassive, horizonless central object in the spacetime. This analysis did not treat gravitational radiation or the interaction of the inspiraling body with the central object accurately, but it does illustrate a possible way to identify non-black-hole central objects. Something similar might happen if the spacetime were to contain a naked singularity rather than a black hole [195]: in principle, the nature of the emitted waveform after “plunge” would encode information about the exact nature of the central object. However, this has not yet been investigated. Naked-singularity spacetimes may have very-high-redshift surfaces rather than horizons: these spacetimes would be observationally indistinguishable from black holes, unless the inspiraling object happened to move inside the high-redshift surface and then emerged, and the two epochs of radiation could be connected observationally.

Another example of an object that can be arbitrarily close to a Schwarzschild black hole in the exterior but lack a horizon is a gravastar [346]. These are constructed by matching a de Sitter spacetime interior onto a Schwarzschild exterior through a thin shell of matter, whose radius can be made arbitrarily close to the Schwarzschild horizon. It was shown in [346] that the oscillation modes of such a gravastar have quasinormal frequencies that are completely different from those of a Schwarzschild black hole. Therefore the absence of a horizon would be apparent if ringdown radiation was observed from such a system. In addition, the tidal perturbations that arise during the inspiral of a compact object into a gravastar during an EMRI [347] can resonantly excite polar oscillations of the gravastar as the orbital frequency passes through certain values over the course of the inspiral. The excitation of these modes generates peaks in the GW-emission spectrum at frequencies that are characteristic of the gravastar, and can also show signatures of the microscopic surface of the gravastar. This process would be apparent both in the amplitude of the detected GWs and in the rate of inspiral inferred from the gravitational-waveform phase, since the rate of inspiral will change significantly in the vicinity of each resonance due to the additional energy radiated in the excited quasi-normal modes. Although the gravastar model used in [346, 347] may not be physically relevant, this work illustrates the more general fact that if the horizon of a black hole is replaced by some kind of membrane, then the modes of that membrane will inevitably be excited during an inspiral and these modes will typically be different to those of a black hole.

6.2.3 Astrophysical perturbations: the influence of matter

A change in the inspiral trajectory need not be caused only by differences in the central object, but might arise due to the presence of material in the spacetime, close to the black hole but external to the event horizon. Such material could influence the inspiral trajectory, and hence the emitted GWs, in two distinct ways: the gravitational field of the matter could modify the multipole moments of the spacetime and hence the orbit as discussed above; if the orbital path intersected the material, it would cause sufficient hydrodynamic drag on the object to alter the orbit.

The influence of the gravitational field of external material was considered in [53]: Barausse et al. constructed a model spacetime that included both a black hole and an external torus of material very close to the central black hole. They examined the properties of orbits in two systems: one with a torus of comparable mass and spin to the central black hole (spacetime “A”), and one with a torus of low mass, but much greater angular momentum than the central black hole (spacetime “B”). Their comparisons were based on computing equatorial geodesics and then “kludge” gravitational waveforms in the spacetime and in a corresponding Kerr spacetime, and then evaluating the waveform overlap. Orbits were identified by matching the radial and azimuthal frequencies between the orbits in the two spacetimes in two ways: altering the orbital parameters, while setting the mass and spin to be the same in the corresponding Kerr spacetime; or altering the mass and spin, while keeping the orbital parameters the same.

This approach identified a confusion problem: over much of the parameter space the overlaps were very high, particularly when the second approach was adopted. Overlaps were lower in spacetime “B” than in spacetime “A,” and overlaps for “internal” orbits between the black hole and the torus were particularly low. This work suggested that it would not be possible to distinguish between such a spacetime and a pure Kerr spacetime in low-frequency GW EMRI observations. However, it did not consider inspiraling orbits. The need to constantly adjust the orbital parameters in order to maintain equality of frequencies would lead to a difference in the evolution of the orbit between the two spacetimes, which might break the waveform degeneracies. The torus model was also not physical, being much more compact than one would expect for AGN discs.

The effect of hydrodynamic drag on an EMRI was first considered in [326] and was found to be negligibly small for systems likely to be of interest to space-based gravitational-wave detectors. The problem was revisited in [52]: Barausse and Rezzolla considered a spacetime containing a Kerr black hole surrounded by a non-self-gravitating torus with constant specific angular momentum. The hydrodynamic drag consists of a short-range part that arises from accretion, and a long-range part that arises from the gravitational interaction of the body with the density perturbations it causes in the disc. The accretion onto the small object was modeled as Bondi-Hoyle-Lyttleton accretionFootnote 6 and the long-range force using collisional dynamical-friction results from the literature [264, 50]. The effect of the hydrodynamical drag on the orbital evolution was computed for geodesic orbits and compared to the orbital evolution from radiation reaction for a variety of torus models, varying in mass and outer radius.

The conclusion was that, for realistic outer radii for the torus, rout > 104M, the effect of hydrodynamic drag on the orbital radius and eccentricity was always small compared to radiation reaction. However, the relative importance increases further from the central black hole. The hydrodynamic drag has a greater relative effect on the orbital inclination, and tends to cause orbits to become more prograde, which is opposite to the effect of radiation reaction. For rout = 105 M, the hydrodynamic drag becomes significant at r ≈ 35 M, but this radius is smaller for a more compact torus. Thus, this effect will only be important for LISA if we observe a system with a very compact accretion torus, or for systems of low central mass. For the latter, the GWs are detectable from orbits at larger radii where hydrodynamic drag can be important. However, the SNR of such events will be low, so we are unlikely to see many of them [192]. Thus, although it seems that this effect is also marginal, this conclusion is based on considering geodesic orbits, and the possible secular build up of a drag signature over the inspiral has yet to be examined,

The influence of an accretion disc on the evolution of an EMRI embedded within it was explored in [493, 268]. One of the channels that has been suggested to produce EMRIs is the formation of stars in a disc, followed by the capture of the compact stellar remnants left after the evolution of those stars [18, 16]. The migration of such an EMRI through the accretion disc could potentially leave a measurable imprint on the GW signal. The literature distinguishes between two types of migration. In Type I migration, which generally occurs for lower-mass objects, the disc persists in the vicinity of the object throughout the inspiral. The object excites density waves in the disc, which exert a torque on the object. In general, the torque from material exterior to the orbit is greater than that from material interior to the orbit, which causes the object to lose angular momentum and spiral inward on a timescale that is short relative to the lifetime of the disc. In Type II migration, a gap opens in the disc in the vicinity of the inspiraling object. Material enters the gap on the disc’s accretion timescale, driving the object and gap inwards on that timescale.

Yunes, Kocsis et al. [493, 268] considered migration in geometrically-thin and radiatively-efficient discs, in which thermal energy is radiated on a much shorter timescale than the timescale over which the material moves inward, so the disc can remain thin. Such discs can be described by the Shakura-Sunyaev α-disc model [409], in which the viscous stress in the disc is proportional to the total pressure at each point in the disc. These discs are known to be unstable to linear perturbations [286, 410, 83, 358]. The alternative β-disc model, where stress is proportional to the gas pressure only, is stable to perturbations [391]. Both disc models were considered for EMRI migration. Yunes, Kocsis et al. showed that, over a year of observation, Type I migration could lead to ∼ 0.01/10 radian dephasings in an EMRI signal for both α-discs and β-discs, while Type II migration could lead to dephasings of ∼ 10/103 radians. The effects are larger for β discs, since these can support higher surface density. For more massive central black holes ∼ 106 M the dominant contribution is from Bondi-Hoyle accretion (see Footnote 6 for a description), while for less massive black holes of ∼ 105 M the dominant contribution is from the migration. These dephasings were computed for the same system parameters (apart from maximizing over time and phase shifts), so they do not account for possible parameter correlations, but the authors argue that the migration dephasing decouples from GW parameters as the effect becomes weaker with decreasing orbital radius, while relativistic effects become stronger. The relative number of EMRIs that will be produced in discs rather than from other channels is not well understood. However, it will be straightforward to identify such EMRIs, which will be circular and equatorial, to look for and constrain effects of this type.

Another important question that has not yet been addressed is how to distinguish the effect of external material from a difference in the structure of the central object. If the orbit does not intersect the material, such identification would come from the variation in the effect over the inspiral — if the change in the multipole structure comes from material, then at some stage the object would pass inside the matter, and the qualitative effect on the inspiral would be different from that of a central object with an unusual multipole structure. If the orbit does intersect the material, then the spacetime-mapping analysis described above no longer applies, since the Geroch-Hansen multipole decomposition [204, 217] applies to vacuum spacetimes only. If this decomposition could be generalized to nonvacuum axisymmetric spacetimes, then low-frequency GW observation could potentially recover not only the spacetime metric but also the structure of the external material. It would then be possible to verify that this matter obeys the various energy conditions (see Section 2.3). This is an open area of research at present.

An independent indicator of the presence of material in the spacetime would come from the observation of an electromagnetic counterpart to an EMRI event. For instance, if an inspiraling black hole was moving through an accretion disc, there might be emission from the material that was accreted onto the inspiraling object or from shocks formed in the disc. Again, this has not been explored, although it is likely, given the poor sky-position determination of EMRI events in GW observations [46], that it would not be possible to conclusively identify such a weak electromagnetic signature in coincidence with a GW event.

The presence of exotic matter outside a black hole, in the form of a cloud of axions, was discussed in [34]. The presence of large numbers of light axions would be one consequence of extra dimensions in string theory, so the detection of an axion cloud would provide strong evidence in support of their existence. The axion cloud would modify the motion of an inspiraling black hole in a similar way to regular matter, although estimates for the precision of measurements possible with future GW detectors have not yet been carried out. The passage of an EMRI through the cloud could also lead to its disruption, which may rule out the cloud as a possible explanation for any observed deviations, but further theoretical work is required to properly quantify these processes.

The existence of axion clouds can also have other observable GW signatures. The axions in the cloud exist in different quantum energy states. If multiple states with the same orbital momenta l and magnetic moments m but different principal quantum numbers n are occupied, transitions between these states can generate GWs with characteristic strain

$$h\sim{10^{- 22}}{\alpha ^2}{({\epsilon _0}{\epsilon _1})^{1/2}}\left({{{10\,{\rm{Mpc}}} \over r}} \right)\,\,\left({{{{M_{{\rm{BH}}}}} \over {2\,{M_ \odot}}}} \right)$$

for a black hole of mass MBH at a distance r. The pre-factor α2 (ϵ0ϵ1)1/2 depends on the axion masses and coupling. The axions can also undergo annihilations, which generate GWs with very similar characteristic strains

$$h\sim{10^{- 22}}{\alpha ^7}\epsilon \left({{{10\,{\rm{Mpc}}} \over r}} \right)\,\,\left({{{{M_{{\rm{BH}}}}} \over {2\,{M_ \odot}}}} \right).$$

In both cases the frequency depends on the black-hole mass in the usual way f ∼ 1/MBH, with typical values for 2 M black holes of 100 α3 Hz and 30 α kHz respectively. Both eLISA and LIGO could place interesting constraints on the axion parameter space through (non)detections of these events [34]. Finally, the self-interactions in the axion cloud could eventually lead to the collapse of the cloud in a “bosenova” explosion [269, 489], which would generate GWs with strain

$$h\sim{10^{- 17}}\left({{\epsilon \over {{{10}^{- 4}}}}} \right)\,\,\left({{{100\,{\rm{Mpc}}} \over r}} \right)\,\,\left({{{{M_{{\rm{BH}}}}} \over {{{10}^8}\,{M_ \odot}}}} \right)$$

and frequency ∼ c3/GMBH. These could also be an interesting source for low-frequency GW detectors. Recent calculations [489] suggest that a bosenova from the Milky Way black hole at Saggitarius A* would be marginally detectable by LISA. The bosenova explosion comes about due to a super-radiant interaction between the cloud of particles and the central black hole, which extracts rotational energy from the black hole and transfers it to the cloud of particles. Recent results on these super-radiant instabilities can be found in [479].

In [297] the observable signatures of the presence of a cloud of bosons outside a massive compact object was considered. It was shown that the motion of a particle through the cloud would be dominated by boson accretion rather than by gravitational radiation reaction. During this accretion-dominated phase, the frequency and amplitude of the gravitational-wave emission is nearly constant in the late stages of inspiral. The authors also considered inspirals exterior to the boson cloud, and found that resonances could occur when the orbital frequency matched the characteristic frequency associated with the characteristic mass of the bosonic particles. These resonances could lead to significant, detectable deviations in the phase of the emitted GWs.

6.2.4 Astrophysical perturbations: distant objects

Perturbations to the EMRI trajectory could also arise from the gravitational influence of distant objects, such as other stars or a second MBH. At present, it is thought to be very unlikely that a second star or compact object would be present in the spacetime sufficiently close to the EMRI to leave a measurable imprint on the trajectory [18], although detailed calculations for this scenario have not been carried out. However, if the MBH that was the host of the EMRI was in a binary with a second MBH, this could perturb the trajectory by a detectable amount [494]. The primary observable effect is a Doppler shift of the GW signal, which arises due to the acceleration of the center of mass of the EMRI system relative to the observer. It was estimated that, for typical EMRI systems, the presence of a second black hole within a few tenths of a parsec would lead to a measurable imprint in the signal. The frequency scaling of the Doppler effect differs from the scaling of the post-Newtonian terms in the unaccelerated waveform, which suggests that this effect will be distinguishable in GW observations [494]. The magnitude of the leading-order Doppler effect scales as Msec/r2, where Msec is the mass of the perturbing black hole, and r is its distance. If the second object is within a few hundredths of a parsec, higher-order time derivatives could also be measured from the GW observations. These scale differently with Msec and r, which would in principle allow the mass and distance of the perturber to be measured from the EMRI data [494].

The probability that a second black hole would be within a tenth of a parsec of a system containing an EMRI is difficult to assess. At redshifts z < 1, at any given time a few percent of Milky Way-like galaxies will be involved in a merger, which suggests an upper limit of a few percent of EMRIs that could have perturbing companions. However, there is uncertainty as to how long the black-hole binary will spend at radii of a few tenths of a parsec following a merger, and it is plausible that the presence of a second black hole would increase the EMRI rate by perturbing stars onto orbits that pass close to the other black hole, introducing an observational bias in favor of these systems [494]. Given these uncertainties, the possibility of a distant perturber will have to be accounted for in the analysis, and, if it is observed in some systems, LISA-like detectors could indirectly inform us of the processes that drive MBH mergers.

The gravitational influence of a second stellar mass black hole on the evolution of an EMRI were considered in [17]. It was shown that a second black hole with a semi-major axis of ≲ 10−5 pc could influence the orbital parameters of an EMRI ongoing in the same galaxy and already in the low-frequency GW band. This influence is chaotic, leading to an unpredictable evolution of the orbital parameters. It was estimated that in 1% of EMRIs a second black hole could be within the required distance. However, the timescale of the chaotic motion was ∼ 109 s, which is much longer than a typical low-frequency observation, and the system considered in [17] had an orbital period of 6 × 103 s, which would probably not yet be detectable by LISA-like detectors. On the timescale of a GW mission, the effect would probably manifest itself as a linear drift in the orbital parameters, and therefore it is unlikely that this effect would prevent the detection of an EMRI signal. A full analysis of the effect on parameter estimation has not yet been carried out.

6.2.5 Properties of the phase space of orbits

Loss of the third integral. It was demonstrated by Carter [113] that the Kerr metric has a complete set of integrals — in addition to the energy and angular momentum that arise as conserved quantities in any stationary and axisymmetric spacetime, geodesics in the Kerr metric conserve the Carter constant, Q. This is the analog of the third isolating integral found in some classical axisymmetric systems, and has been shown to arise due to the existence of a Killing tensor of the spacetime [463]. In the Schwarzschild limit, Q is the sum of the squared angular momentum components in the two equatorial directions. The Kerr metric is one of a very special class of metrics that have this property. In fact, Carter [113] demonstrated that it was the only axisymmetric metric not containing a gravitomagnetic monopole for which both the Hamilton-Jacobi and Schrödinger equations were separable. The special nature of the Kerr metric was emphasized by Will [472] who demonstrated that, in Newtonian gravity, motion about a body with an arbitrary set of multipole moments Ml possesses a third integral of the motion only if the multipole moments obey the conditions

$${M_{2l + 1}} = 0,\quad \,\,{M_{2l}} = m{({M_2}/m)^l}\quad {\rm{for}}\,{\rm{every}}\,l,$$

which are precisely the conditions satisfied by the mass moments of a Kerr black hole, Eq. (54).

The separability of the Kerr metric aids the analysis of inspirals in that spacetime, but it also suggests another potential observable that would show a deviation from the Kerr metric in a GW observation. The specialness of the third integral in the Kerr spacetime suggests that if a spacetime differed from the Kerr solution, even by a small amount, the third integral might vanish, which would potentially lead to the existence of chaotic orbits. If such orbits were observed it would be a clear “smoking gun” for a deviation from the Kerr metric. The existence of chaotic orbits in various spacetimes has been explored by several authors. The standard approach is via construction of a Poincaré map: a geodesic is computed in cylindrical coordinates (ρ, z, ϕ, θ), and the values of ρ and \({\dot \rho}\) recorded every time the particle intersects a specified plane z =constant. If the resulting plot of all these points on a (ρ, \({\dot \rho}\)) plane yields a closed curve, then a third integral exists, otherwise the orbit is chaotic. This is illustrated in Figure 7.

Figure 7
figure 7

Poincaré map for a regular orbit (left panel) and a chaotic orbit (right panel) in the Manko-Novikov spacetime. Image reproduced by permission from [195], copyright by APS.

Chaotic motion has been found by Sota et al. for orbits in the Zipoy-Voorhees-Weyl and Curzon spacetimes [417]; by Letelier and Viera for orbits around a Schwarzschild black hole perturbed by GWs [282]; by Gueron and Letelier for orbits in a black-hole spacetime with a dipolar halo [213], and in prolate Erez-Rosen bumpy spacetimes [214]; and by Dubeibe et al. for some oblate spacetimes that are deformed generalizations of the Tomimatsu-Sato spacetime [157]. None of these examples represented systems that were small deviations from the Kerr metric. The only investigation to date of chaotic orbits in the context of LISA was by Gair et al. [195], who explored geodesic motion in a family of spacetimes due to Manko and Novikov [303] that had arbitrary multipole moments, but which included Kerr as a special case. Gair et al. [195] considered orbits in a family of spacetimes parameterized by a single “excess quadrupole moment” parameter, such that q = 0 represented the Kerr solution. They found that, while the majority of orbits in these spacetimes possessed an apparent third integral, chaotic orbits existed very close to the central object for arbitrarily small oblate deformations of the Kerr solution. As the spacetime was deformed away from Kerr, a second allowed region for bound geodesic motion was found to appear close to the central black hole, in addition to the allowed region present in the Kerr metric. Chaotic orbits were found only in this additional bound region. Gair et al. [195] concluded that this chaotic region would probably be inaccessible to an object that was initially captured at a large distance from the central object. This analysis was revisited in [294] but the conclusions in that paper were the same. The only difference was that the authors in [294] identified a region of stable motion within the inner region that contains chaotic orbits. The chaotic orbit shown in the right hand panel of Figure 7 appears to pass in and out of the region of stability which should not happen, so this might be a numerical artifact. However, the existence of chaotic orbits and the probable inaccessibility of these orbits to inspirals was confirmed by [294].

Brink [94] also explored integrability in arbitrary stationary, axisymmetric and vacuum spacetimes, concentrating on regions where the Poincaré maps indicated the presence of an effective third integral. Brink hypothesized that some spacetimes might admit an integral of the motion that was quartic in the momentum, in contrast to the Carter constant, which is quadratic. This hypothesis is as yet unproven. There is also a potential conflict with the example given in [195]. The Kolmogorov, Arnold, and Moser (KAM) theorem indicates that when a Hamiltonian system with a complete set of integrals is weakly perturbed, the phase-space motion will either be confined to the neighborhoods of the invariant tori, or the motion will be chaotic [434]. Thus, if there is a region of the spacetime where chaotic motion exists, there cannot be another region where a third invariant exists. A mathematical demonstration that the orbits can possess an approximate invariant, while technically being chaotic, is lacking at present, although this does appear to be the case from the numerical calculations [195]. It is also not entirely clear that the perturbation can be regarded as “small” everywhere, since the change in mutipole moments necessarily changes the horizon structure and so the perturbation is infinitely large at certain points.

A discussion of the reason for the existence of chaotic geodesics in some spacetimes was given by Sota et al. [417]. They suggested that it would arise either from a change in sign of the eigenvalues of the Weyl tensor at a point, which would lead to a local “instability,” or from the existence of homoclinic orbits. The latter explanation applied only to non-reflection-symmetric spacetimes, while most spacetimes of astrophysical interest should be reflection symmetric. The Weyl-tensor analysis has not been carried out for the Manko-Novikov family of spacetimes [195]. One other proposed explanation was the existence of a region of closed timelike curves, which was found to touch the region in which chaotic motion was identified [195].

In conclusion, it seems unlikely that chaotic geodesics will be found in nature, but if they were identified we would know immediately that the spacetime was not Kerr. However, detecting chaotic motion from a GW observation is challenging. One possibility would be to observe the transition from regular to chaotic motion in a time-frequency analysis: the regular motion would be characterized by a few well-distinguished peaks in a Fourier transform of the signal, while chaotic motion would show a much broader band structure [195]. However, it is not clear that it would be possible to distinguish the chaotic phase from detector noise, and hence there would be no way to identify an inspiral that “ends” by entering a chaotic phase as opposed to one which ends at plunge into a black hole. If an orbit passed into a chaotic phase and then back into a regular phase we might see a signal turn “on” and “off” repeatedly. However, the chaotic motion would randomize the phase at the start of the regular motion, so to detect such a signal we would need each regular phase to be long enough that they were individually resolvable by matched filtering. This would require extreme fine tuning of the system parameters [195].

Persistent resonances. Eccentric and inclined EMRIs will generically pass through transient resonances at which the radial and frequencies become commensurate. For EMRIs in GR, these resonances will be isolated, and the transition through resonance will proceed on the usual radiation-reaction timescale but with a temporary modification in the energy flux on resonance [181]. However, according to the Poincaré-Birkhoff theorem, when an integrable system is perturbed it causes the appearance of a Birkhoff chain of islands whenever the frequencies of the system are at resonance. Therefore, in a perturbed Kerr spacetime, another observable consequence would be that the EMRI frequencies could remain on resonance for many more cycles, providing another “smoking gun” for a deviation from Kerr [24, 294]. Detection of a persistent resonance in a matched-filtering search will require a modification of the search pipeline, but it should be considerably more straightforward than detection of chaos, as the signal will be coherent and could therefore be identified using time-frequency methods, or a phenomenological waveform model. However, this has not yet been studied in any detail.

As mentioned in Section 5.1.3, in massive scalar-tensor theories, a different type of persistent resonance can occur, in which a super-radiant scalar flux balances the GW flux [110, 495]. Such resonances can last a significant fraction of a Hubble time and so observing a single EMRI offers significant constraining power on the space of massive scalar-tensor theories. This is not a test of black-hole structure, since the resonance is between the scalar and gravitational fluxes, rather than in the geodesics of the central object, but the observable effects are similar so we mention it here.

6.2.6 Black holes in alternative theories

Kerr black holes. The Ryan mapping approach uses observations of precession frequencies as functions of the orbital frequency to extract the spacetime metric from GWs. However, even if the metric is found to be Kerr, this is not enough to verify GR. It was pointed out by Psaltis et al. [372] that the Kerr metric is a solution to the field equations for several alternative theories of gravity. Essentially, since the Kerr metric has vanishing Ricci tensor, Rμν = 0, any theory in which the vacuum field equations depend only on Rμν will also admit Kerr as a solution. Allowing for a nonzero cosmological constant, A, a black-hole solution in GR satisfies

$${R_{\mu \nu}} = {R \over 4}{g_{\mu \nu}},\quad R = 4\Lambda ,\quad {\rm{and}}\quad {R_{;\mu}} = 0,$$

in which R denotes the Ricci scalar and is the spacetime metric [372]. Psaltis et al. discussed four different alternative theories, already described in Section 2.1

  • f(R) gravity in the metric formalism. If f(R) is expanded as a Taylor series a0 + R + a2R2 + ⋯, there are three possible cases. (i) If a0 = 0 the Kerr solution, which corresponds to R = 0, is always a solution to the equations of motion. (ii) If the Taylor series terminates at a2, all constant curvature solutions of GR (with any Λ) remain exact solutions of the f(R) theory. (iii) If a0 ≠ 0 and the series does not terminate at a2, constant-curvature solutions of GR will be solutions of the f(R) theory with different values of the curvature. The difference in curvature will be small, however.

  • f(R) gravity in the Palatini formalism. In this case, any constant-curvature solution of GR is also a solution to these equations, with the same Christoffel symbols. This is unsurprising, as it is known that Palatini f(R) gravity reduces to GR in vacuum [54].

  • General quadratic gravity. For any black-hole solution in GR, the tensors and that appear in the field equations (18) both vanish, and the field equations reduce to those of GR. Hence all black-hole solutions of GR are solutions in this theory.

  • Vector-tensor gravity. In this case, we find once again that all constant-curvature solutions of GR are solutions to the equations, but with a shifted value of the curvature that depends on the vector field strength R = 16Λ/(4 + (4ω + 3η)K2).

The action for general quadratic gravity introduced by Stein and Yunes [504] also admits the Kerr solution, but only in the non-dynamical version of the theory in which the functions are constants. In that case the field equations are once again satisfied by spacetimes with Rab = 0, and so the vacuum solutions of GR are solutions to the field equations in these theories as well. In the dynamical version of the theory, the Riemann tensor enters the field equations explicitly and so Rab = 0 is no longer sufficient to satisfy them. We will discuss those black-hole solutions in the following subsection.

Although it is true that all of these theories admit the Kerr metric as a solution, this does not mean that we have no way to distinguish between them via GW observations. This was not discussed in [372], but is argued in a comment on that paper by Barausse and Sotitirou [54]. First, the uniqueness theorems of GR [351, 224, 248, 114, 379] do not necessarily apply in these alternative theories. In other words, just because the Kerr metric is a solution does not mean that we would expect it to form as a result of gravitational collapse. This is an equally important consideration as to what we would expect to see in the universe, although this argument can be sidestepped by suitable fine tuning. In f(R) gravity, the metric of a spherically-symmetric body is not the Schwarzschild metric, but has a Yukawa correction [109]. The constraints that LISA could place on such a deviation from the Kerr solution were investigated in [66]. Expanding f(R) to quadratic order (R + a2R2/2), it was found that EMRI observations could place a bound |a2| ≲ 1017 m2, about an order of magnitude better than the bound from observations of planetary precession in the solar system, |a2| < 1.2 × 1018 m2. However, the bound from the Eöt-Wash laboratory experiments is many orders of magnitudes better, |a2| < 2 × 10−9 m2 [116, 66].

The GW constraints will be obtained in a very different curvature regime and could be of interest if something like the “chameleon mechanism” is invoked. The chameleon mechanism was introduced to allow f(R) models to explain cosmological acceleration without violating laboratory and solar-system constraints [262]. It is a nonlinear effect that could arise when the curvature is very different from the background value, e.g., in the vicinity of matter. If the matter density is high the scalar degree of freedom in f(R) gravity acquires an effective mass, which means that the effective coupling to matter becomes much smaller than the bare coupling, which is relevant on cosmological scales. Therefore the bare coupling could be much higher than inferred from laboratory constraints, allowing the theories to explain cosmological acceleration (see [146] for a full description of the mechanism and complete references). In a similar way, the effective coupling in the vicinity of a compact object could in principle be different from that in the laboratory and so the weak constraints from gravitational-wave observations are still interesting because they probe a different curvature scale.

Subsequent to the publication of [66], it has been shown that the end state of gravitational collapse in f(R) theories (and scalar-tensor theories) is not the point-mass limit of an extended body, but is in fact the Kerr solution [420]. Therefore the results in [66] do not apply to black holes in f(R) gravity, but would apply for an exotic horizonless compact object if one existed. The constraints that low-frequency GW could place on this metric can also be considered to be constraints on a strawman Yukawa-like deviation from the Kerr solution, without reference to a specific theory in which such corrections arise.

Another consideration is that the spacetime metric is not the only GW observable. GWs are generated by perturbations of the spacetime and hence depend on the full dynamical sector of the theory. Therefore the response to perturbations will be different if the field equations are different, even if the unperturbed metric is the same. In [54] it was demonstrated that, for metric f(R) gravity, linearizing about the Minkowski spacetime gives rise to massive graviton modes in addition to the standard transverse-traceless modes of GR. These cannot be zeroed out by a gauge transformation. To date, no one has considered the generation of such massive modes in a binary system nor the observational consequences. However, if these modes are generated, there would be two natural ways to find them: a direct GW detection (the modes have different polarization states and propagation velocities), and an indirect detection, as the binary system would inspiral faster than expected due to loss of additional energy in the extra modes. It is not clear whether the latter will be distinguishable from tidal interaction effects, and quantitative estimates of the power of tests that would be possible with LISA-like detectors have not yet been done.

A second example of an alternative theory in which a spacetime metric is the same as a GR solution but behaves differently perturbatively is dynamical Chern-Simons (CS) gravity. The nonrotating-black-hole solution in that theory is Schwarzschild, as in GR, but it was shown in [348] that there are 7PN and 6PN corrections respectively to the scalar and gravitational energy flux radiated to infinity by a circular EMRI. This could lead to a 1-cycle difference in waveform phasing over a year of inspiral if the CS coupling parameter |ζ| ≳ 0.1. In [483] post-Newtonian results were obtained for the emission from spinning black-hole binaries in a general quadratic-gravity theory. One model within the general class considered corresponds to dynamical CS gravity and for that case the results were found to be consistent with the perturbative calculations in [348].

In interpreting GW observations, it will be necessary to verify both the static and dynamic aspects of the theory. The Ryan mapping algorithm is the natural way to start, but if a metric is found to be consistent with Kerr it will then be necessary to verify that the observed GWs and energy loss are also in agreement before concluding that we are observing a system consistent with a Kerr black hole in GR.

Non-Kerr black-hole solutions. Certain alternative theories of gravity do not admit the Kerr metric as a solution: these include the dynamical version of general quadratic gravity described by Eq. (19). In general, it is difficult to solve the field equations in alternative theories, so few analytic black-hole solutions are known outside of GR. Some solutions are known, but only under certain approximations. Solutions for slow rotation are known in dynamical CS gravity, which is a special case of Eq. (19) with α1 = α2 = α3 = 0, to leading order in spin [496] and now to quadratic order [488]. Slow-rotation solutions are also known in Einstein-Dilaton-Gauss-Bonnet (EDGB) gravity [311], in which gravity is coupled to dilaton and axion fields. In the weak-coupling limit, spherically symmetric and stationary solutions to the general quadratic-gravity theory described by Eq. (19) take the same form as the solutions to EDGB gravity and were derived in [504]. Solutions to this same class of theories are also known for arbitrary coupling, but only under the slow rotation approximation, and these were given in [349].

In dynamical CS gravity, it was shown that the metric describing a slowly rotating black hole at linear order in the spin [496] differs from the slow-rotation limit of the Kerr metric beginning at the fourth multipole, l = 4 [416]. It was also shown that the equations describing perturbations of this metric in dynamical CS gravity coincide with those of GR at leading order in spin and coupling parameter, with corrections due to the sourcing of metric perturbations by the scalar field entering at higher order. Therefore it was argued that the deviations from GR will manifest themselves primarily through differences in the geodesic orbits in the spacetime. Therefore this is completely analogous to spacetime mapping within GR, as described in Section 6.2.1, so the prospects for detection of such deviations with LISA-like detectors are comparable. The energy-momentum tensor of the GWs was also shown to take the same form in terms of the metric perturbation as it does in GR, so the leading-order correction to the GW energy flux is determined by these modifications to the conservative dynamics. Energy balance also applies in this context and can be used to relate the adiabatic evolution of an orbit to the flux of energy and angular momentum at infinity. However, energy and angular momentum are also carried to infinity by the scalar field, and these corrections to the evolution were not estimated in [416]. Considering non-inspiraling geodesic orbits, Sopuerta and Yunes [416] estimated that IMRI or EMRI events observed by LISA would be able to put constraints on the coupling parameter ξ of dynamical CS gravity of \({\xi ^{1 \over 4}} \lesssim {10^2}\) km. A more complete study that included inspiral and used a Fisher matrix analysis to account for parameter correlations is described in [103, 104]. There it was found that LISA observations of EMRIs could place a bound \({\xi ^{{1 \over 4}}} < {10^4}\) km. This is somewhat weaker than the bound estimated for IMRIs in [416], but better than the estimate for EMRIs in that paper, so IMRI systems (involving a ∼ 10 M object inspiraling into a ∼ 103 M object) may be able to place even more stringent constraints than originally estimated, if they were observed. The EMRI bound \({\xi ^{{1 \over 4}}} < {10^4}\) km is four orders of magnitude better than the best solar-system bound, which is based on data from Gravity Probe B and LAGEOS satellites [169, 122].

In [496] the bound from current binary-pulsar observations was estimated to be four orders of magnitude better than that in the solar system and hence comparable to the expected GW result. However, this bound was based on an upper limit on the rate of precession. It was argued in [13] that an upper bound could not place a constraint on the CS deviation since the sign of the CS contribution is opposite to that in GR, and so a bound could only be placed if a lower bound on the precession lying below the GR value could be found. Therefore the solar-system bound is the best current constraint, which GW observations will improve by several orders of magnitude. The Chern-Simons black-hole metric has recently been derived to quadratic order in the spin [488], so the results discussed here can now be extended to this higher-order metric. This second-order solution had several interesting properties, in particular the Petrov typeFootnote 7 is type I, whereas it was type D in the linear-spin case; furthermore, there is no second-order Killing tensor, which means orbits do not have a third Carter-constant-like integral of the motion. This could have important observable consequences for GWs from EMRIs, as discussed in Section 6.2.5. The analogous calculation to [103], the evolution of an EMRI in the strong-field region of the metric, has not yet been carried out for the second-order CS metric. However, the GW emission from quasi-circular binaries in this theory has been determined without any restriction on the spin magnitude or coupling, but in the post-Newtonian limit [487], i.e., with a restriction on the magnitude of the velocity. This waveform model was constructed using energy balance and expressions for the gravitational-wave energy flux emitted in general quadratic-gravity theories that were derived in [483]. The model was used to estimate possible GW constraints on CS gravity. These were given in Section 5.1.3 and are two orders of magnitude better than the EMRI constraints estimated using the linear-spin metric [103]. These constraints were derived in a very different mass-ratio regime (a binary with mass ratio of 1:2), but the fact that they are so much better is probably not surprising since the quadratic in spin corrections to the metric enter at a lower post-Newtonian order than the linear-spin corrections. Therefore it is likely that the bounds that EMRIs could place on CS gravity are much stronger than quoted here. This possibility and possible qualitative signatures of the loss of the third integral in the higher-order CS metrics should be further investigated.

In the small-coupling limit, it can be shown that the energy-momentum tensor of GWs in the general quadratic-gravity theory [Eq. (19)] also follows the same quadrupole formula as in GR [426]. Using this result, a post-Newtonian expression for the energy flux emitted from a quasi-circular binary in these theories was obtained in [483]. Corrections to the energy flux come from changes to the conservative dynamics, i.e., to the orbits that test particles follow in the background metric, from energy lost in scalar field radiation and from the contribution to the GW energy-momentum integral from metric perturbations sourced by the scalar field. The authors found that, for these theories, the scalar dipole radiation dominates the correction to the energy flux and enters at a post-Newtonian order of “−1”, i.e., at a power of v/c one order before the GW energy flux of GR. There are also 0PN and 2PN corrections to the flux from the scalar-metric interaction and the modification to the conservative dynamics.

The observability of these waveform differences with space-based gravitational detectors has, so far, only been directly estimated for the specific case of Einstein-Dilaton-Gauss-Bonnet gravity [481]. In that case, the constraints derivable with space-based GW detectors were compared to those that can be obtained from observations of low-mass X-ray binaries. Some of these are observed to have orbital decay rates that are larger than predicted by radiation reaction in GR. Assuming this excess orbital-decay rate arises from additional scalar radiation emitted from the binary, it is possible to obtain a bound six times stronger than that derivable from solar-system experiments (using the binary A0620-00) of \(\sqrt {|\alpha |} < 1.9 \times {10^5}\) cm. eLISA would be able to place a bound that is slightly stronger (a factor of two) than this, while combining multiple DECIGO observations would yield a bound three orders of magnitude better [481].

For other theories in this general quadratic class, GW bounds have not yet been directly assessed. However, the “post-Einsteinian” parameters have been calculated for circular-equatorial inspirals in this class of theories. The parameters characterizing the modification to the GR waveform in the ppE framework are defined through the expression

$$\tilde h = {\vert}{\tilde h_{{\rm{GR}}}}{\vert}(1 + \alpha {\eta ^c}{u^a})\exp [{\rm{i}}\Psi _{{\rm{GW}}}^{{\rm{GR}}}(1 + \beta {\eta ^d}{u^d})],$$

in which η = m1m2/(m1 + m2)2 is the mass ratio of a binary with component masses m1 and m2, u = πℳf, = η3/5(m1 + m2) is the chirp mass and \({{\tilde h}_{{\rm{GW}}}}/\Psi _{{\rm{GW}}}^{{\rm{GR}}}\) are the general-relativistic waveform and waveform phase respectively. Under the weak-coupling approximation, the metric of non-spinning black holes in these general quadratic-gravity theories depends only on a parameter \(\varsigma \propto \alpha _3^2\). For mergers of such black holes, the ppE parameters were found to be α = (5/6)ζ, β = (50/3)ζ, a = 4/3 = b, c = −4/5 = d [504]. More recently, the ppE corrections to the waveform phase (which are parametrized by b and β) were found without making a weak-coupling approximation and for binaries with spinning components. The exponent depends on whether the black holes are spinning and on which coefficients of the theory are allowed to be non-zero. The authors considered the cases of odd parity (i.e., including the Pontryagin term α4 ≠ 0) and even parity (i.e., including the other terms quadratic in the curvature, α1, α2, α3 = 0) separately. The exponent of the ppE phase correction was found to be b = − 7/3 for spinning black holes in even-parity theories, b = −1/3 for spinning black holes in odd-parity theories, and b = 3 for non-spinning black holes [483]. The authors also obtained equations, which we do not present here, relating the ppE amplitude, β, to the coupling constants of the theory. If a GW observation is used to place bounds on these ppE parameters, these expressions can be used to translate the bound into a constraint on this set of alternative theories of gravity.

As discussed in Section 5.2.2, the authors of [458] considered more general possible forms for black-hole solutions without specifying a particular alternative theory. By imposing the existence of a Carter-like third integral of the motion, asymptotic flatness, and the “peeling theorem”Footnote 8, and by setting as many metric components to zero as possible while still recovering all known modified gravity spacetime metrics, they obtained a family of generic modified black-hole solutions. It was subsequently realized that all solutions admitting a Carter constant had been found in [65], but [458] identifies the physically-realistic subset of these general solutions. This approach is suited to the construction of eccentric, inclined inspirals, and in [201] the GWs for generic EMRIs occurring in these metrics were constructed by extending the “analytic-kludge” framework [46]. As yet, these waveforms have not been used to assess the ability of a space-based detector to constrain such deviations from the Kerr solution, but work in this area is ongoing.

6.2.7 Interpretation of observations

The previous Sections have identified some of the possible causes of deviations in the structure of the central object from the Kerr metric, as well as some observational consequences of these deviations. Nevertheless, interpreting GW observations correctly is a nontrivial challenge. Our working assumption is that the massive compact objects that occupy the centers of most galaxies are indeed Kerr black holes. Therefore it is reasonable to design an approach to spacetime mapping that looks for inspirals into Kerr black holes and quantifies any deviations from such inspirals that may be present. One such approach to spacetime mapping was described in [191]. The starting point is to assume that GR is correct, and that the source’s spacetime is vacuum and axisymmetric. The multipole moments can be extracted from the precession frequencies via Ryan’s algorithm, and then the expected GW and inspiral rate for such a spacetime in GR can be computed.

If the observations are consistent there is no evidence to contradict the initial assumptions, but we can still do several tests, by asking the following questions: (i) Does the “no-hair” property hold for the multipoles? If there is a deviation, the spacetime must either contain closed-timelike curves, or lack a horizon. (ii) Does the emission cut off at plunge or does the radiation persist? (iii) Is the tidal interaction consistent with a Kerr black hole? If the radiation still cuts off at a plunge and the multipole structure is not consistent with the Kerr metric, it might indicate the presence of a naked singularity, which would be a violation of the cosmic-censorship hypothesis.

If the observations are not consistent with GR, the first assumption to relax would be that the spacetime was vacuum. In principle, it might then be possible to deduce both the multipole moments of the spacetime and the energy-momentum tensor of the matter distribution from the observations. The GW could then be computed for such a spacetime in GR, and checked for consistency with the observations. If the observations are consistent, similar questions can be asked: (i) Does the emission cut off at plunge or does the radiation persist? (ii) Is the tidal interaction consistent with a Kerr black hole? (iii) Does the matter distribution obey the strong, null, and weak energy conditions, or have we identified an exotic matter distribution? Only if the observations are inconsistent will there be evidence of a failure in GR itself.

The main complication to this approach is that the detection of EMRIs will rely on matched filtering using waveform templates. If a system differs significantly from a Kerr inspiral, then it may not be picked out of the data stream. An alternative approach was discussed by Brink [93]. She suggested that the spacetime-mapping problem could be thought of as analogous to inversescattering problems in quantum mechanics, where the potential to be determined is the Ernst potential [165] (see discussion in Section 6.2.1), but the technical details of such a method have not yet been worked out. Brink also suggested an approach with a similar philosophy to what we discuss above: assuming that the instantaneous geodesic is triperiodic (i.e., that it has a complete set of integrals), there is a good chance it will have a high overlap with a Kerr geodesic. Thus, each “snapshot” along the inspiral defines a point in the Kerr geodesic space, and so matched filtering with Kerr snapshots can pick out an inspiral trajectory. The inspiral trajectory in the Kerr spacetime can be computed, and we can check to see if the observed inspiral deviates from the predicted one. This approach has two drawbacks: first, it relies on the GWs from a geodesic in an arbitrary spacetime having a high overlap with those from at least one Kerr geodesic; second, the interpretation of deviations in the trajectory relies on being able to compute GW emission in alternative metrics in GR, or in alternative theories of gravity. However, these will be problems common to all spacetime-mapping algorithms. One approach to addressing the first of these problems is to search for EMRIs using a template-free approach. This could be done by searching for individual harmonics of the EMRI signal separately. One method would be to search for tracks of excess power in the time-frequency domain [194, 200, 467]. However, excess-power algorithms lose information on the phase of the waveform, which is the most important observable for carrying out tests of GR. More recently, a method that attempts to match the phase of individual harmonics using a Taylor expansion with a small number of coefficients was described in [464], which built on ideas used to search for GR EMRIs in the Mock LISA Data Challenges [35]. While the method appears promising, it must be kept in mind that so far these approaches have only been used on highly simplified datasets containing a single EMRI source. Issues related to confusion between the harmonics of the multiple sources expected in real data have not yet been properly explored. Nonetheless, these techniques warrant further investigation.

In summary, it is clear that low-frequency GW detectors will be able to check that an EMRI is consistent with Kerr, which can be interpreted as a statement on how much the spacetime could differ from the Kerr metric [18, 47] and still be consistent with our search templates. What is more difficult is to say exactly what LISA will tell us if we observe something that does differ from the Kerr metric. More work is still needed in this area.

6.2.8 Extreme-mass-ratio bursts

Another class of low-frequency GW sources that are closely connected to EMRIs are Extreme-Mass-Ratio Bursts EMRBs [385]. These are the precursors to EMRIs: in the standard EMRI formation channel, compact objects are perturbed onto orbits that pass very close to the central black hole, emitting bursts of GWs as they do, which can lead to capture if sufficient orbital energy and angular momentum are lost to GWs. Furthermore, just after capture, EMRI compact objects proceed on very eccentric orbits that radiate in bursts near pericenter rather than continuously. Similar bursts will be generated by “failed” EMRIs — objects that only encounter the central black hole a few times before being perturbed onto an unbound or plunging orbit.

For a sufficiently nearby system (in the Virgo cluster or closer [385]), these bursts could be individually detected by LISA-like detectors. The low SNR of such events compared to EMRIs is partially outweighed by the significantly larger number of compact objects undergoing “flybys” at a given time, so the detection rate could be as high as 18 yr−1, of which 15 yr−1 would originate in the Milky Way [385]. These rate estimates employed several simplifying assumptions. A more careful analysis predicted only ∼ 1 event per year [236], and a proper analysis of our ability to identify such bursts in the presence of signal confusion has not been carried out. Therefore it is unclear at present whether any of these events would be identified in the data of LISA-like detectors.

It has been suggested [502] that EMRBs could be used to test black-hole structure in a similar way to EMRIs. This assertion was based on comparing approximate burst waveforms coming from spinning and nonspinning central black holes with the same system parameters. In [68], an analysis of parameter estimation for EMRBs accounting for parameter correlations was performed, which concluded that a space-based detector like LISA could measure many of the parameters of the system to moderate precision, including the mass and spin, provided that the periapse of the burst orbit was within ∼ 10 GM/c2. This analysis did not consider sky position or eccentricity to be free parameters, taking the former to be the center of the Milky Way, and the latter to be maximal, and it treated the distance and compact-object mass as a single amplitude parameter m/D. The small object mass cannot be separately determined because there is no significant orbital evolution over the duration of one burst. The analysis did not consider constraints on GR through measurements of other multipoles, but it was found that the mass and spin could both be measured to high accuracy for orbits that pass sufficiently close to the black hole. Therefore it may also be possible to place some kind of constraint on GR deviations using these systems. However, the event rate for such relativistic systems is very low [236] so we will have to be lucky to see a useful event.

Useful burst events might also be seen from nearby galaxies other than our own [67], which are thought to have black holes in the appropriate mass range. Other unknown black holes in the local universe could also provide sources, but for such extra-galactic systems, the sky position will be unknown and so a single burst will not provide enough information to allow the determination of all the system parameters. If an EMRB was observed to “repeat” several times over the lifetime of a GW mission, then more information could be derived about the system. However, there are serious practical issues with associating multiple, well-separated bursts with the same source. The inspiral orbit might also be perturbed between pericenter passages by encounters with other stars. The prospects for doing useful tests of black-hole structure with EMRBs therefore seem rather bleak. Moreover, any tests that can be carried out will inevitably be much weaker than those based on EMRIs, as an EMRB event will generate several waveform cycles rather than the several hundred thousand from an EMRI.

6.3 Tests of black-hole structure using ringdown radiation: black-hole spectroscopy

When a black hole is perturbed, it rapidly settles back down to a stationary Kerr state via the radiation of exponentially damped sinusoidal GWs known as quasi-normal modes (QNMs). For a Kerr black hole in GW, the QNM frequencies and damping times are uniquely determined by the mass and spin of the black hole. Observations of two or more QNMs from the same system will therefore test that the end-state of the merger is a Kerr black hole, if they provide consistent estimates for the mass and spin of the central object. QNMs will be excited during the merger of supermassive black holes and will dominate the late-stage radiation, so LISA-like detectors should observe QNM radiation with relatively high SNRs [182, 155, 77]. In fact, the SNR from the QNMs alone may be comparable to the total SNR in the inspiral [182]. This is illustrated in Figure 8, which shows the SNR with which LISA would detect the inspiral and ringdown for equal-mass, nonspinning MBH binaries as a function of black-hole mass. The ringdown dominates the SNR for redshifted masses greater than 3 × 106 M.

Figure 8
figure 8

Comparative SNRs, as a function of redshifted black-hole mass (1 + z)M, for the last year of inspiral of an equal-mass MBH binary and for the ringdown after the merger of the system. The method used to generate this figure follows that of [182], updated to use a modern LISA sensitivity curve [277] with a low-frequency cutoff of f = 1 × 10−4 Hz. The redshift is set to z = 1, at which the luminosity distance is DL = 6.6 Gpc using WMAP 7-year parameters [271].

Reviews of the theory of quasi-normal modes can be found in Kokkotas and Schmidt [270], in Nollert [337] and in Berti, Cardoso and Starinets [76], but we will briefly summarize the relevant results here. At late times, the gravitational radiation from a perturbed black hole can be written as a sum of exponentially-damped sinusoids. It is known that QNMs do not form a complete set in a mathematical sense, but numerical results confirm that the late-time behavior is well described by such an expansion [77]. The angular dependence can be decomposed into a sum of spheroidal harmonics of spin weight 2, which are labeled in analogy to standard spherical harmonics. For each (l, m), there are infinitely many resonant QNMs, which can be labeled in order of decreasing damping time by a third parameter n. The waveform expansion takes the form

$${h_ +} + {\rm{i}}{h_ \times} = {M \over r}\sum\limits_{lmn} {{{\mathcal A}_{lmn}}{{\rm{e}}^{{\rm{i}}({\omega _{lmn}}t + {\phi _{lmn}})}}{{\rm{e}}^{- t/{\tau _{lmn}}}}{S_{lmn}}} ,$$

where ωlmn, τlmn, \({{\mathcal A}_{lmn}}\), and ϕlmn are the frequency, damping time, amplitude, and initial phase of the mode, and Slmn is a complex number obtained by evaluating the spheroidal harmonics for the particular orientation and mode frequency. The frequency and damping time, ωlmn and τlmn, are determined solely by the mass and frequency of the central black hole and are the key observables. The damping time can also be considered as the reciprocal of the imaginary part of a complex mode frequency, The amplitude and phase depend on the exact nature of the perturbation that excited the QNMs, so they are less useful. One complication is that the mode numbers that are excited [i.e., the (l, m, n) parameters] are not known a priori. In the first analysis of SNR from QNMs [182], it was assumed that only the l = m = 2, n = 0 was excited, which simplifies analysis of the observations. However, this simplifying assumption is not needed in general.

The use of QNMs as a probe of black-hole structure was first proposed by Dreyer et al. [155], who coined the phrase “black-hole spectroscopy” for this technique. A measurement of ωlmn and τlmn for a single QNM will give several discrete solutions for the black-hole mass and spin, corresponding to different choices for the mode numbers. A measurement of a second QNM will give another discrete set of values, and in general these will not be the same as the first set except for one combination, which are the true values [155]. This not only determines the black-hole parameters, but provides a test that the central object is a Kerr black hole, since the relationship between black-hole parameters and frequencies depends on that assumption. If the object was something else, we would find no consistent combination of mass and spin parameters to arise from the analysis. Dreyer et al. [155] considered a frequentist approach to analyzing the QNM problem. The observable is a set of complex QNM frequencies. We denote a set of choices for the N modes to which the frequencies correspond by \({\mathcal Q}\). Each \({\mathcal Q}\) defines a two-dimensional curve in the 2N + 2 dimensional space, parameterized by (a, M). The probability distribution for the observed frequencies, ω, may be denoted by \(P(\omega |a,\,m,\,{\mathcal Q})\) and can be used to define a confidence interval by determining the p0 such that

$$\int\nolimits_{\{\omega :P(\omega {\vert}a,M,{\mathcal Q}) > {p_0}\}} \,\,P(\omega {\vert}a,M,{\mathcal Q}){\rm{d}}{\omega ^{2N}}.$$

Dreyer et al. [155] deemed that an observation was inconsistent with GR if the actual observed frequencies lay outside such confidence intervals for all possible choices of a, M and \({\mathcal Q}\). Using a toy model for a non-black-hole source, and assuming that two QNMs were observed, drawn from four possible excited modes, they estimated that, with a false alarm probability of 1%, the false dismissal rate would be 60% if the SNR in the weakest mode was 10, falling to 10% at an SNR of 100. Assuming a reasonable amount of energy deposition into the QNMs, they estimated that this would be satisfied by all LISA MBH merger sources, so the prospects for such tests are good.

Berti et al. [77] took a more detailed look at QNMs in general. They computed SNRs that improved on those in [182] by using an up-to-date detector noise curve and including black-hole spin. For a source of mass 105 M < M < 107 M at a distance of 3 Gpc, they found SNRs between 10 and 2 × 104 for an excitation energy of 3% of the rest mass, and between 1 and 3000 for an excitation energy of 0.1%. There is little dependence of the SNR on the central black-hole spin. Putting the source at higher z shifts the sensitive range of masses in the usual way, but even at redshift z = 10 the maximum SNR was found to be several hundred. Berti et al. [77] also looked at parameter-estimation accuracy from observations of one QNM (with known mode numbers) and found that LISA observations would measure the mass and spin to fractional accuracies in the range 10−2 −10−5 for sources at D = 3 Gpc and for the masses as quoted above.

Berti et al. [77] also performed a Fisher-matrix analysis of a two-mode problem, which accounted for all correlations between the modes. They considered both “pseudo-orthonormal” modes with different angular dependence (l′ ≠ l or m’ ≠ m) and “overtones” — that is, modes with the same angular dependence (l′ = l and m′ = m). In both cases, the accuracy of determination of the mass and spin was comparable to the results quoted for a single mode, although the parameter space was simplified significantly to just five parameters: a, M, two mode amplitudes \({{\mathcal A}_1},\,{{\mathcal A}_2}\) and a common initial phase ϕ. Berti et al. define two distinct QNMs to have resolvable frequencies if the difference in these is larger than the error in both (and similarly for the damping times). By recasting the two-mode model in terms of the two mode frequencies and damping times rather than a and M, resolvability can be examined by looking at the Fisher-matrix errors. Berti et al. [77] considered two cases (the resolvability of either frequencies or damping times, and the resolvability of both), and they computed the source SNR needed in each case. When the two modes are overtones, the SNR required to resolve “either” was ∼ 100, the exact value depending on the black-hole spin and the (l,m) numbers of the mode, but the SNR required to resolve “both” was ∼ 1000. When the two modes are “pseudo-orthonormal,” the SNR range to resolve “either” is ∼ a few if the modes differ in l, and ∼ a few tens if the modes have the same l but differ in m and the central black-hole spin is a > 0.4. There is a mode degeneracy, which means the SNR required blows up when l = l′ and a → 0. The SNRs required to resolve “both” modes are in the range 100–10 000.

Berti et al. [77] conclude that even under pessimistic assumptions about the amount of energy radiated in ringdown radiation, it should be possible to resolve one QNM and either the damping time or frequency of a second QNM, provided that the first overtone radiates ∼ 10−2 of the total ringdown energy. This will provide enough for a test of the no-hair property of the central object. A stronger test would come from detecting frequencies and damping times for both QNMs, but this would require ringdown SNRs ∼ 1000, which is rather unlikely. The main uncertainty in this analysis was in the excitation coefficients for the various modes, but numerical relativity simulations have now provided some information on the excitation of ringdown modes in a merger. Berti et al. revisited the QNM problem in [72], using numerical relativity results presented in [74]. This paper revised downward the SNR required to detect either the frequency or damping time of a second mode, which is sufficient for a test of the no-hair theorem, to ρcrit ≈ 30 for any binary with mass ratio greater than ∼ 1.25. A significant fraction of LISA events should satisfy this criterion (see for example [69]).

In [255], the authors used numerical-relativity simulations of nonspinning black-hole binaries, with a variety of mass-ratios ranging from 1:1 to 1:11, to compute the amplitude to which several different ringdown modes were excited, and hence an estimate of the SNR with which LISA would be able to observe them. For all the mass ratios considered, the (3, 3) mode was found to radiate between 2% and 20% of the energy of the (2, 2) mode, and for mass ratios more unequal than 1:2 the (4, 4) also radiated more than 1% of the energy of the (2, 2) mode. For sources at a redshift of z = 1, they estimated a total ringdown SNR between 1000 and 20000 for black-hole masses in the range 106–108 M and mass ratios from 1:1 to 1:25, with individual SNRs in the (2, 2), (2,1), and (3, 3) modes between several hundred and 10 000. The mode SNRs do not add in quadrature, but these SNRs more than meet the requirements of [77] for LISA to be able to carry out black-hole spectroscopy. The SNRs realizable with eLISA will be a few factors lower and the systems will tend to have lower masses, for which the intrinsic SNRs are also smaller. Therefore constraints from eLISA will be weaker and it is possible that eLISA will not be able to resolve sufficient separate modes, but this has not yet been explored.

In [210], Gossan et al. applied the results of [255] to explore the practicality of using QNM radiation to test relativity. The authors considered mergers with mass ratios of 1:2 and constructed a waveform model comprised of four ringdown modes. They used Bayesian methods and approached the problem of testing relativity in two ways: (i) determining the parameters of each ringdown mode separately and checking for consistency; (ii) comparing the Bayesian evidence of the GR model to a non-GR model constructed by allowing for deviations of the mode parameters from the GR predictions. The analysis was carried out for eLISA and for the Einstein Telescope, the proposed third-generation ground-based detector. Gossan et al. showed that method (i) could reveal inconsistencies of 1% in the QNM frequency for events of mass 108 M at a distance of 50 Gpc and inconsistencies of 10% for systems of mass 106 M at 6 Gpc. Better constraints will come from systems at smaller distances and will therefore be observed with higher SNR. Using method (ii), in the case of a signal from a 106 M black hole, 2%, 6%, and 10% deviations in the frequency of the dominant mode would be identifiable at distances of approximately 1 Gpc, 5 Gpc, and 8 Gpc, respectively. Deviations of 2%, 6% and 10% in the damping time of the dominant mode would only be detectable at distances of 200 Mpc, 700 Mpc, and 1.2 Gpc, respectively. For a more massive system, of 108 M, a 2% deviation in the frequency/damping time of the dominant mode would be detectable out to 35/25 Gpc, but deviations of 5% or more would be detectable at greater than 50 Gpc. Such massive systems are very rare, however, and the choice of a 1:2 mass ratio means that the QNM radiation is stronger than would be expected for typical eLISA events, which are likely to have larger mass ratios. Therefore this study was limited in extent, but these preliminary results suggest that QNMs could place interesting constraints on GR modifications for the strongest signals detected by eLISA.

Space-based QNM observations could be used directly to put constraints on alternative theories of gravity, but this will require a calculation of the QNMs in those alternative theories. It was shown in [501] that the equations governing black-hole perturbations in dynamical CS gravity were different from those in GR, with the consequence that the QNM spectrum would also be different. The authors did not compute QNMs, but QNM frequencies for non-spinning black holes in dynamical CS gravity were computed in [320]. The QNM spectrum was found to be different, due to the coupling of the black hole oscillations to the scalar field, but the detectability of these deviations by a gravitational-wave detector was not estimated. Therefore, it is not yet clear at what level ringdown radiation will be able to constrain the CS coupling parameter.

If a LISA-like observatory observes a MBH inspiral and does not detect QNM ringing from the merger remnant where radiation would be expected, this might indicate a violation of the cosmic-censorship hypothesis. QNM ringing arises as a result of GWs becoming trapped near the horizon of the black hole. If a horizon was absent, for instance if a super-extremal (a/M > 1) Kerr metric was formed, we would not expect to observe the QNM radiation. This would be evidence for the existence of a naked singularity, and a counterexample to cosmic censorship, although not an indication of a problem with GR.

6.4 Prospects from gravitational-wave and other observations

Although there are still many open questions, current research clearly indicates that low-frequency GW detectors will be able to make strong statements about the structure of the massive compact objects in the centers of galaxies by observing EMRIs and by detecting ringdown radiation following supermassive black-hole mergers. At the very least, it will be possible to test that an observation is consistent with an inspiral into a Kerr black hole, and to state quantitatively how so large a deviation from Kerr could have been masked by instrumental noise. If a system differed significantly from Kerr, we would identify the deviation and should be able to quantify its nature. How well we would be able to distinguish between different types of deviation (e.g., external matter vs. a different multipole structure of the central object) is still not totally clear, but there are ongoing efforts in this direction.

Another potential issue relates to modeling EMRIs. The tests of black-hole structure outlined in this section will rely on the detection of small differences between the observed GWs and the GWs expected in GR. This will of course rely on having reliable EMRI waveforms within GR. These waveforms can be computed using black-hole perturbation theory, by evaluating the gravitational self-force, but despite extensive work in this area the calculation of the self-force is not yet complete even at first order in the mass ratio. The first-order gravitational self-force has been computed for circular [48] and generic orbits in the Schwarzschild spacetime [49] and the scalar self-force is known for circular equatorial orbits in the Kerr spacetime [466]. Recently, the first self-force-driven inspirals have been computed, under an adiabatic approximation for the gravitational self-force [465], and self-consistently for the scalar self-force [151].Footnote 9 EMRI models will require knowledge of the gravitational self-force for generic orbits in Kerr, which is still not available at first order. It has also been recognized that the radiative part of the second-order self-force will also be required to derive EMRI waveform phase to sufficient accuracy [238]. A formalism for the evaluation of the second-order self-force has been developed [366], but not yet implemented. Complete reviews of the current status of the self-force program can be found in [45, 362]. Additional complications in EMRI modeling arise from the need to accurately follow the transition of the inspiral through transient resonances [181]. Although many of these issues will have been resolved by the time data from a space-based gravitational-wave interferometer is available, they must be borne in mind in the interpretation of future results on tests of GR.

Low-frequency GW detectors will provide information on black-hole structure to a much higher precision than can be achieved by other techniques. It has been suggested that observations of IMRIs with Advanced LIGO will have some power to do similar tests [95]. This data will be available ten or more years prior to any space-based experiment. However, IMRI-based tests will be much weaker, since many fewer cycles of the GWs will be detected in a single observation due to the larger mass ratio (η 10−2 − 10−1); furthermore, event rates are highly uncertain [302] and accurate modeling of IMRIs is far behind EMRI modeling [239, 240]. An IMRI observation would be able to detect a deviation from Kerr of fractional order unity in the quadrupole moment of a source [95], compared to the ∼ 10−3 fractional accuracies achievable with LISA EMRIs [47].

Observations in the electromagnetic spectrum can also probe black-hole structure. Psaltis [371] provides a thorough review of tests of strong-field gravity using electromagnetic observations. The most relevant tests of black-hole structure are as follows.

  • Horizon detection. X-ray interferometers (e.g., the Black Hole Imager) will soon have the angular resolution to directly image the horizon of extragalactic black holes at distances of ∼ 1 Mpc [371]. This is already almost possible with sub-mm/infrared observations. The accretion flow for the Milky Way black hole, Sgr A*, can already be imaged directly down to its innermost edge. However, interpretation of the observations is complicated by the need to simultaneously constrain the disc properties. Observations at multiple wavelengths will be required to fit out the disc and directly image the shadow of the black hole on the disc.

    Evidence for the existence of horizons also comes from observations of quiescent X-ray binaries. An X-ray binary typically comprises a star filling its Roche lobe and transferring matter to a compact-object companion. That compact object can either be a neutron star or a black hole. If the object is a neutron star then most of the gravitational potential energy of the accreting matter has to be radiated away, but for black holes a significant amount of this energy can be advected through the black-hole horizon and is therefore not radiated to infinity [371]. Among systems in the same state of mass transfer, those containing neutron stars are expected to be systematically more luminous than those containing black holes, as has been observed in our galaxy.

    Electromagnetic observations will really indicate the existence of a high-redshift surface in the system, and not necessarily an actual horizon. If a system contained a naked singularity with a high-redshift surface, but not a true event horizon, this would not be evident from the electromagnetic observations alone. By probing the multipole structure and verifying consistency with the no-hair theorem, LISA-like detectors will go much further.

  • ISCO determination. The highest-temperature emission from a disc comes from the innermost stable circular orbit (ISCO), and the flux at that temperature is proportional to the ISCO radius squared. This allows the determination of the spin of the black hole. If the inferred spin was found to exceed one, this might indicate failure in the black-hole model. Such spin determinations have typical errors of ∼ 10%, much greater than LISA’s expected errors of ∼ 0.01% [46].

    Indirect inference of the location of the inner edge of the accretion disc could also come from the interferometric observations mentioned above in the context of horizon detection. In [152] the authors use radio interferometry to resolve the base of the jet in the galaxy M87, finding it to be 5.5 ± 0.4 Schwarzschild radii. The jet-base radius is interpreted as being greater than or equal to the radius of the innermost edge, so this system provides evidence for prograde accretion onto a spinning black hole. Observations of jets in other nearby galaxies may follow in the future, but the distance to which these structures could be resolved is quite small.

  • Accretion-disc mapping. Particles in the accretion disc around a black hole move on circular geodesics of the metric. If the orbits in the disc could be mapped, this would allow spacetime mapping along the lines of Ryan’s theorem [387]. Two possible probes of accretion-disc structure have been identified: quasi-periodic oscillation (QPO) pairs and iron Kα lines. QPOs have been used to constrain possible values of black-hole masses and spins, but there are uncertainties in the radius at which they originate, and in the resonance that gives rise to the pairs of lines. Iron lines show broadening due to differential gravitational redshift and Doppler shift at different points in the disc, but again their interpretation depends on unknown details of the disc geometry. In principle, the time variability of an iron line after a single flare event could constrain both the geometry and map the disc. Future observations with instruments such as IXO/ATHENA will have the time-resolution to attempt this [371].

Electromagnetic observations are generally hampered by a lack of knowledge of the physics of the material that is generating the radiation. GW systems are, by contrast, very “clean” since the same theory describes the spacetime and the radiation generation. While there is some hope that future electromagnetic observations will perform crude spacetime mapping, LISA EMRI observations will improve any previous constraints by orders of magnitude.

7 Discussion

Although the bibliography of this review lists more than 500 references, fundamental tests of gravity remain the least explored sector of GW astronomy. It is also one where new discoveries may be hardest to come by — even if they would be momentous if they do. Space-based gravitational-wave detectors will improve certain existing constraints on alternative theories; even more important, they will provide novel and unique opportunities to precisely characterize the poorly explored nonlinear, dynamical sector of gravitation, as well as the properties of gravitational radiation fields. These measurements will provide a definite confirmation that Einstein’s theory, born from insight and beauty, applies in the most extreme regimes; or they may expose new phenomena, leading to new models for gravitational physics.

The research that we have reviewed spans a broad assortment of GW observations, which probe many disparate aspects of gravitational phenomenology. Some of these investigations have yet to be refined to the same level of formal rigor as other, more astrophysical applications of GW astronomy. Nevertheless, they paint an exciting picture of the expected fundamental-physics payoff of space-based detectors. We expect that this will remain an active and intriguing research area for many years: thus, it is appropriate that this review should be living, and we welcome the suggestions of our readers in improving it and keeping it up to date.