1 Introduction

Two of the most tantalizing mysteries of modern astrophysics are known as the dark matter and dark energy problems. These problems come from the discrepancies between, on one side, the observations of galactic and extragalactic systems (as well as the observable Universe itself in the case of dark energy) by astronomical means, and on the other side, the predictions of general relativity from the observed amount of matter-energy in these systems. In short, what astronomical observations are telling us is that the dynamics of galactic and extragalactic systems, as well as the expansion of the Universe itself, do not correspond to the observed mass-energy as they should if our understanding of gravity is complete. Thus, this indicates either (i) the presence of unseen (and yet unknown) mass-energy, or (ii) a failure of our theory of gravity, or (iii) both.

The third case is a priori the most plausible, as there are good reasons for there being more particles than those of the standard model of particle physics [257] (actually, even in the case of baryons, we suspect that a lot of them have not yet been seen and, thus, literally make up unseen mass, in the form of “missing baryons”), and as there is a priori no reason that general relativity should be valid over a wide range of scales, where it has never been tested [45], and where the need for a dark sector actually prevents the theory from being tested until this sector has been detected by other means than gravity itselfFootnote 1. However, either of the first two cases could be the dominant explanation of the discrepancies in a given class of astronomical systems (or even in all astronomical systems), and this is actually testable.

For instance, as far as (ii) is concerned, if the mass discrepancies in a class of systems are mostly caused by some subtle change in gravitational physics, then there should be a clear signature of a single, universal force law at work in this whole class of systems. If instead there is a distinct dark matter component in these, the kinematics of any given system should then depend on the particular distribution of both dark and luminous mass. This distribution would vary from system to system, depending on their environment and past history of formation, and should, in principle, not result in anything like an apparent universal force lawFootnote 2.

Over the years, there have been a large variety of such attempts to alter the theory of gravity in order to remove the need for dark matter and/or dark energy. In the case of dark energy, there is some wiggle room, but in the case of dark matter, most of these alternative gravity attempts fail very quickly, and for a simple reason: once a force law is specified, it must fit all relevant kinematic data in a given class of systems, with the mass distribution specified by the visible matter only. This is a tall order with essentially zero wiggle room: at most one particular force law can work. However, among all these attempts, there is one survivor: the Modified Newtonian Dynamics (MOND) hypothesized by Milgrom almost 30 years ago [294, 295, 293] seems to come close to satisfying the criterion of a universal force law in a whole class of systems, namely galaxies. This success implies a unique relationship between the distribution of baryons and the gravitational field in galaxies and is extremely hard to understand within the present dominant paradigm of the concordance cosmological model, hypothesizing that general relativity is correct on every relevant scale in cosmology including galactic scales, and that the dark sector in galaxies is made of non-baryonic dissipationless and collisionless particles. Even if such particles are detected directly in the near to far future, the success of MOND on galaxy scales as a phenomenological law, as well as the associated appearance of a universal critical acceleration constant a0 ≃ 10−10 m s−2 in various, seemingly unrelated, aspects of galaxy dynamics, will still have to be explained and understood by any successful model of galaxy formation and evolution. Previous reviews of various aspects of MOND, at an observational and theoretical level, can be found in [34, 81, 100, 151, 279, 311, 318, 401, 407, 429]. A website dedicated to this topic is also maintained, with all the relevant literature as well as introductory level articles [263] (see also [238]).

Here, we first review the basics of the dark matter problem (Section 2) as well as the basic ingredients of the present-day concordance model of cosmology (Section 3). We then point out a few outstanding challenges for this model (Section 4), both from the point of view of unobserved predictions of the model, and from the point of view of unpredicted observations (all uncannily involving a common acceleration constant a0). Up to that point, the challenges presented are purely based on observations, and are fully independent of any alternative theoretical frameworkFootnote 3. We then show that, surprisingly, many of these puzzling observations can be summarized within one single empirical law, Milgrom’s law (Section 5), which can be most easily (although not necessarily uniquely) interpreted as the effect of a single universal force law resulting from a modification of Newtonian dynamics (MOND) in the weak-acceleration regime a < a0, for which we present the current observational successes and problems (Section 6). We then summarize the various attempts currently made to embed this modification in a generally-covariant relativistic theory of gravity (Section 7) and how such theories allow new predictions on gravitational lensing (Section 8) and cosmology (Section 9). We finally draw conclusions in Section 10.

2 The Missing Mass Problem in a Nutshell

There exists overwhelming evidence for mass discrepancies in the Universe from multiple independent observations. This evidence involves the dynamics of extragalactic systems: the motions of stars and gas in galaxies and clusters of galaxies. Further evidence is provided by gravitational lensing, the temperature of hot, X-ray emitting gas in clusters of galaxies, the large scale structure of the Universe, and the gravitating mass density of the Universe itself (Figure 1). For an exhaustive historical review of the problem, we refer the reader to [394].

Figure 1
figure 1

Summary of the empirical roots of the missing mass problem (below line) and the generic possibilities for its solution (above line). Illustrated lines of evidence include the approximate flatness of the rotation curves of spiral galaxies, gravitational lensing in a cluster of galaxies, and the growth of large-scale structure from an initially very-nearly-homogeneous early Universe. Other historically-important lines of evidence include the Oort discrepancy, the need to stabilize galactic disks, motions of galaxies within clusters of galaxies and the hydrodynamics of hot, X-ray emitting gas therein, and the apparent excess of gravitating mass density over the mass density of baryons permitted by Big-Bang nucleosynthesis. From these many distinct problems grow several possible solutions. Generically, the observed discrepancies either imply the existence of dark matter, or the necessity to modify dynamical laws. Dark matter could, in principle, be any combination of non-luminous baryons and/or some non-baryonic form of mass-like neutrinos (hot dark matter) or some new particle, whose mass makes it dynamically cold or perhaps warm. Alternatively, the observed discrepancies might point to the need to modify the equation of gravity that is employed to infer the existence of dark matter, or perhaps some other fundamental dynamical assumption like the equivalence of inertial mass and gravitational charge. Many specific ideas of each of these types have been considered over the years. Note that none of these ideas are mutually exclusive, and that some form or the other of dark matter could happily cohabit with a modification of the gravitational law, or could even be itself the cause of an effective modification of the gravitational law. Question marks on some tree branches represent the fruit of ideas yet to be had. Perhaps these might also address the dark energy problem, with the most satisfactory result being a theory that would simultaneously explain the acceleration scale in the dark matter problem as well as the accelerating expansion of the Universe, and explain the coincidence of scales between these two problems, a coincidence exhibited in Section 4.1.

The data leave no doubt that when the law of gravity as currently known is applied to extragalactic systems, it fails if only the observed stars and gas are included as sources in the stress-energy tensor. This leads to a stark choice: either the Universe is pervaded by some unseen form of mass — dark matter — or the dynamical laws that lead to this inference require revision. Though the mass discrepancy problem is now well established [394, 465], such a dramatic assertion warrants a brief review of the evidence.

Historically, the first indications of the modern missing mass problem came in the 1930s shortly after galaxies were recognized to be extragalactic in nature. Oort [342] noted that the sum of the observed stars in the vicinity of the sun fell short of explaining the vertical motions of stars in the disk of the Milky Way. The luminous matter did not provide a sufficient restoring force for the observed stellar vertical oscillations. This became known as the Oort discrepancy. Around the same time, Zwicky [518] reported that the velocity dispersion of galaxies in clusters of galaxies was far too high for these objects to remain bound for a substantial fraction of cosmic time. The Oort discrepancy was approximately a factor of two in amplitude, and confined to the Galactic disk — it required local dark matter, not necessarily the quasi-spherical halo we now envision. It was long considered a serious problem, but has now largely (though perhaps not fully) gone away [194, 240]. The discrepancy Zwicky reported was less subtle, as the required dark mass outweighed the visible stars by a factor of at least 100. This result was apparently not taken seriously at the time.

One of the first indications of the need for dark matter in modern times came from the stability of galactic disks. Stars in spiral galaxies like the Milky Way are predominantly on approximately circular orbits, with relatively few on highly eccentric orbits [132]. The small velocity dispersion of stars relative to their circular velocities makes galactic disks dynamically cold. Early simulations [343] revealed that cold, self-gravitating disks were subject to severe instabilities. In order to prevent the rapid, self-destructive growth of these instabilities, and hence preserve the existence of spiral galaxies over a sizable fraction of a Hubble time, it was found to be necessary to embed the disk in a quasi-spherical potential well — a role that could be played by a halo of dark matter, as first proposed in 1973 by Ostriker & Peebles [343].

Perhaps the most persuasive piece of evidence was then provided, notably through the seminal works of Bosma and Rubin, by establishing that the rotation curves of spiral galaxies are approximately flat [67, 370]. A system obeying Newton’s law of gravity should have a rotation curve that, like the Solar system, declines in a Keplerian manner once the bulk of the mass is enclosed: Vcr−1/2. Instead, observations indicated that spiral galaxy rotation curves tended to remain approximately flat with increasing radius: Vc ∼ constant. This was shown to happen over and over and over again [370] with the approximate flatness of the rotation curve persisting to the largest radii observable [67], well beyond where the details of each galaxy’s mass distribution mattered, so that Keplerian behavior should have been observed. Again, a quasi-spherical halo of dark matter as proposed by Ostriker and Peebles was implicated.

Other types of galaxies exhibit mass discrepancies as well. Perhaps most notable are the dwarf spheroidal galaxies that are satellites of the Milky Way [427, 477] and of Andromeda [217]. These satellites are tiny by galaxy standards, possessing only millions, or in the case of the ultrafaint dwarfs, thousands, of individual stars. They are close enough that the line-of-sight velocities of individual stars can be measured, providing for a precise measurement of the system’s velocity dispersion. The mass inferred from these motions (roughly, M2/G) greatly exceeds the mass visible in luminous stars. Indeed, these dim satellite galaxies exhibit some of the largest mass discrepancies observed. In contrast, bright giant elliptical galaxies (often composed of much more than the ∼ 1011 stars of the Milky Way) exhibit remarkably modest and hard to detect mass discrepancies [367]. Thus, it is inferred that fainter galaxies are progressively more dark-matter dominated than bright ones. However, as we shall expand on in Section 4.3, the primary correlation is not with luminosity, but with surface brightness: the lower the surface brightness of a system, the larger its mass discrepancy [279].

On larger scales, groups and clusters of galaxies also show mass discrepancies, just as individual galaxies do. One of the earliest lines of evidence comes from the “timing argument” in the Local Group [213]. Presumably the material that was to become the Milky Way and Andromeda (M31) was initially expanding apart with the general Hubble expansion. Currently they are approaching one another at ∼ 100 km s−1. In order for the Milky Way and M31 to have overcome the initial expansion and fallen back towards one another, there must be a greater-than-average gravitating mass between the two. To arrive at their present separation with the observed blueshifted line of sight velocity after a Hubble time requires a dynamical mass-to-light ratio M/L > 80. This greatly exceeds the mass-to-light ratio of the stars themselves, which is of order unity in Solar units [42] (the Sun is a fairly average star, so averaged over many stars each Solar mass produces roughly one Solar luminosity).

Rich clusters of galaxies are rare structures containing dozens or even hundreds of bright galaxies. These objects exhibit mass discrepancies in several distinct ways. Measurements of the redshifts of individual cluster members give velocity dispersions in the vicinity of 1,000 km s−1 typically implying dynamical mass-to-light ratios in excess of 100 [24]. The actual mass discrepancy is not this large, as most of the detected baryonic mass in clusters is in a diffuse intracluster gas rather than in the stars in the galaxies (something Zwicky was not aware of back in 1933). This gas is heated to the virial temperature and emits X-rays. Mapping the temperature and emission of this X-ray gas provides another probe of the cluster mass through the equation of hydrostatic equilibrium. In order to hold the gas in the clusters at the observed temperatures, the dark matter must outweigh the gas by a factor of ∼ 8 [175]. Furthermore, some clusters are observed to gravitationally lens background galaxies (Figure 1). Once again, mass above and beyond that observed is required to explain this phenomenon [227]. Thus, three independent methods all imply the need for about the same amount of dark matter in clusters of galaxies.

In addition to the abundant evidence for mass discrepancies in the dynamics of extragalactic systems, there are also strong motivations for dark matter in cosmology. Two observations are particularly important: (i) the small baryonic mass density Ωb inferred from Big-Bang nucleosynthesis (BBN) (and from the measured Hubble parameter), and (ii) the growth of large scale structure by a factor of ∼ 105 from the surface of last scattering of the cosmic microwave background at redshift z ∼ 1000 until present-day z = 0, implying Ωm > Ωb. Together, these observations imply not only the need for dark matter, but for some exotic new form of non-baryonic cold dark matter. Indeed, observational estimates of the gravitating mass density of the Universe Ωm, measured, for instance, from peculiar galaxy (or large-scale) velocity fields, have, for several decades, persistently returned values in the range 1/4 < Ωm < 1/3 [116]. While shy of the value needed for a flat Universe, this mass density is well in excess of the baryon density inferred from BBN. The observed abundances of the light isotopes deuterium, helium, and lithium are consistent with having been produced in the first few minutes after the Big Bang if the baryon density is just a few percent of the critical value: Ωb < 0.05 [480, 107]. Thus, Ωm > Ωb. Consequently, we do not just need dark matter, we need the dark matter to be non-baryonic.

Another early Universe constraint is provided by the Cosmic Microwave Background (CMB). The small (microKelvin) amplitude of the temperature fluctuations at the time of baryon-photon decoupling (z ∼ 1000) indicates that the Universe was initially very homogeneous, roughly to one part in 105. The Universe today (z = 0) is very inhomogeneous, at least on “small” scales of less than ∼ 100 Mpc (∼ 3 × 108 ly), with huge density contrasts between planets, stars, galaxies, clusters, and empty intergalactic space. The only attractive long-range force acting on the entire Universe, that can make such structures, is gravity. In a rich-get-richer while the poor-get-poorer process, the small initial over-densities attract more mass and grow into structures like galaxies while under-dense regions become less dense, leading to voids. The catch is that gravity is rather weak, so this process takes a long time. If the baryon density from BBN is all we have to work with, we can only obtain a growth factor of ∼ 102 in a Hubble time [424], orders of magnitude short of the observed 105. The solution is to boost the growth rate with extra invisible mass displaying larger density fluctuations: dark matter. In order not to make the same mark on the CMB that baryons would, this dark matter must not interact with the photons. So, in effect, the density fluctuations in the dark matter can already be very large at the epoch of baryon-photon decoupling, especially if the dark matter is cold (i.e., with effectively zero Jeans length). The baryons fall into the already deep dark matter potential wells only after that, once released from their electromagnetic link to the photon bath. Before decoupling, the fluctuations in the baryon-photon fluid did not grow but were oscillating in the form of acoustic waves, maintaining the same amplitude as when they entered the horizon; actually they were even slightly diffusion-damped. In principle, at baryon-photon decoupling, CMB fluctuations on smaller angular scales, having entered the horizon earlier, would have been damped with respect to those on larger scales (Silk damping). Nevertheless, the presence of decoupled non-baryonic dark matter would provide a net forcing term countering the damping of the oscillations at recombination, meaning that the second and third acoustic peaks of the CMB could then be of equal amplitude rather than exhibiting a damping tail. The actual observation of a high third-peak in the CMB angular power spectrum is another piece of compelling evidence for non-baryonic dark matter (see, e.g., [229]). Both BBN and the CMB thus drive us to consider a form of mass that is non-baryonic and which does not interact electromagnetically. Moreover, in order to form structure (see Section 3.2), the mass must be dynamically cold (i.e., moving much slower than the speed of light when it decouples from the photon bath), and is known as cold dark matter (CDM).

Now, in addition to CDM, modern cosmology also requires something even more mysterious, dubbed dark energy. The fact that the baryon fraction in clusters of galaxies was such that Ωm was implied to be much smaller than 1 — the value needed for a flat Euclidean Universe favored by inflationary models —, as well as tensions between the measured Hubble parameter and independent estimates of the age of the Universe, led Ostriker & Steinhardt [344] to propose in 1995 a “concordance model of cosmology” or ΛCDM model, where a cosmological constant Λ — supposed to represent vacuum energy or dark energy — provided the major contribution to the Universe’s energy density. Three years later, the observations of SNIa [351, 365] indicating late-time acceleration of the Universe’s expansion, led most people to accept this model. This concordance model has since been refined and calibrated through subsequent large-scale observations of the CMB and of the matter power spectrum, to lead to the favored cosmological model prevailing today (see Section 3). However, as we shall see, curious coincidences of scales between the dark matter and dark energy sectors (see Section 4.1) have prompted the question of whether these two sectors are really physically independent, and the existence of dark energy itself has led to a renewed interest in modified gravity theories as a possible alternative to this exotic fluid [100].

3 A Brief Overview of the ΛCDM Cosmological Model

General relativity provides a clear and compelling cosmology, the Friedmann-Lemaître-Robertson-Walker (FLRW) model. The expansion of the Universe discovered by Hubble and Slipher found a natural explanationFootnote 4 in this context. The picture of a hot Big-Bang cosmology that emerged from this model famously predicted the existence of the 3 degree CMB and the abundances of the light isotopes via BBN.

Within the FLRW framework, we are inexorably driven to infer the existence of both non-baryonic cold dark matter and a non-zero cosmological constant as discussed in Section 2. The resulting concordance ΛCDM model — first proposed in 1995 by Ostriker and Steinhardt [344] — is encouraged by a wealth of observations: the consistency of the Hubble parameter with the ages of the oldest stars [344], the consistency between the dynamical mass density of the Universe, that of baryons from BBN (see also discussion in Section 9.2), and the baryon fraction of clusters [486], as well as the power spectrum of density perturbations [103, 452]. A prediction of the concordance model is that the expansion rate of the Universe should be accelerating; this was confirmed by observations of high redshift Type Ia supernovae [351, 365]. Another successful prediction was the scale of the baryonic acoustic oscillation [134]. Perhaps the most emphatic support for ACDM comes from fits to the acoustic power spectrum of temperature fluctuations in the CMB [229].

For a brief review of the basics and successes of the concordance cosmological model we refer the reader to, e.g., [87, 349] and all references therein. We note that, while most of the cosmological probes in the above list are not uniquely fit by the ΛCDM model on their own, when they are taken together they provide a remarkably tight set of constraints. The success of this now favoured cosmological model on large scales is, thus, remarkable indeed, as there was a priori no reason that such a parameterized cosmology could explain all these completely independent data sets with such outstanding consistency.

In this model, the Hubble constant is H0 = 70 km s−1 Mpc−1 (i.e., h = 0.7), the amplitude of density fluctuations within a top-hat sphere of 8h−1 Mpc is σ8 = 0.8, the optical depth to reionization is τ = 0.08, the spectral index measuring how fluctuations change with scale is ns =0. 97, and the price we pay for the outstanding success of the model is new physics in the form of a dark sector. This dark sector is making up 95% of the mass-energy content of the Universe in ΛCDM: it is composed separately of a dark energy sector and a cold dark matter sector, which we briefly describe below.

3.1 Dark Energy (Λ)

In ΛCDM, dark energy is a non-vanishing vacuum energy represented by the cosmological constant Λ in the field equations of general relativity. Einstein’s cosmological constant is equivalent to vacuum energy with equation of state p/ρ = w = −1. In principle, the equation of state could be merely close to, but not exactly w = −1. In this case, the dark energy could evolve and clump, depending on the value of w and its evolution . However, to date, there is no compelling observational reason to require any form of dark energy more complex than the simple cosmological constant introduced by Einstein.

The various observational datasets discussed above constrain the ratio of the dark energy density to the critical density to be \({\Omega _\Lambda} = \Lambda/3H_0^2 = 0.73\) where H0 is Hubble’s constant and ι is expressed in s−2. This value, together with the matter density Ωm (see below), leads to a total Ω = Ωιm = 1, i.e., a spatially-flat Euclidean geometry in the Robertson-Walker sense that is nicely consistent with the expectations of inflation. It is important to stress that this model relies on the cosmological principle, i.e., that our observational location in the Universe is not special, and on the fact that on large scales, the Universe is isotropic and homogeneous. For possible challenges to these assumptions and their consequences, we refer the reader to, e.g., [83, 487, 488].

3.2 Cold Dark Matter (CDM)

In ΛCDM, dark matter is assumed to be made of non-baryonic dissipationless massive particles [48], the “cold dark matter” (CDM). This dark matter outweighs the baryons that participate in BBN by about 5:1. The density of baryons from the CMB is Ωb = 0.046, grossly consistent with BBN [229]. This is a small fraction of the critical density; with the non-baryonic dark matter the total matter density is Ωm = Ωcdm + Ωb = 0.27.

The “cold” in cold dark matter means that CDM moves slowly so that it is non-relativistic when it decouples from photons. This allows it to condense and begin to form structure, while the baryons are still electromagnetically coupled to the photon fluid. After recombination, when protons and electrons first combine to form neutral atoms so that the cross-section for interaction with the photon bath suddenly drops, the baryons can fall into the potential wells already established by the dark matter, leading to a hierarchical scenario of structure formation with the repeated merger of smaller CDM clumps to form ever larger clumps.

Particle candidates for the CDM must be massive, non-baryonic, and immune to electromagnetic interactions. The currently preferred CDM candidates are Weakly Interacting Massive Particles (WIMPs, [46, 47, 48]) that condensed from the thermal bath of the early Universe. These should have masses on the order of about 100 GeV so that (i) the free-streaming length is small enough to create small-scale structures as observed (e.g., dwarf galaxies), and (ii) that thermal relics with cross-sections typical for weak nuclear reactions account for the right amount of matter density Ωm (see, e.g., Eq. 28 of [48]). This last point is known as the WIMP miracleFootnote 5.

For lighter particle candidates (e.g., ordinary neutrinos or light sterile neutrinos), the damping scale becomes too large. For instance, a hot dark matter (HDM) particle candidate with mass of a few to 15 eV would have a free-streaming length of about ∼ 100 Mpc, leading to too little power at the small-scale end of the matter power spectrum. The existence of galaxies at redshift z ∼ 6 implies that the coherence length should have been smaller than 100 kpc or so, meaning that even warm dark matter (WDM) particles with masses between 1 and 10 keV are close to being ruled out as well (see, e.g., [348]). Thus, ΛCDM presently remains the state-of-the-art in cosmology, although some of the challenges listed in Section 4 are leading to a slow drift of the standard concordance model from CDM to WDM [252], but this drift brings along its own problems, and fails to address most of the current observational challenges summarized in the following Section 4, which might perhaps point to a more radical alternative to the model.

4 Some Challenges for the ΛCDM Model

The great concordance of independent cosmological observables from Gpc to Mpc scales lends a certain air of inevitability to the ΛCDM model. If we accept these observables as sufficient to prove the model, then any discrepancy appears as trivia that will inevitably be explained away. If instead we require a higher standard, such as positive laboratory evidence for the dark sectors, then ΛCDM appears as a yet unproven hypothesis that relies heavily on two potentially fictitious invisible entities. Thus, an important test of ΛCDM as a scientific hypothesis is the existence of dark matter. By this we mean not just unseen mass, but specifically CDM: some novel form of particle with the right microscopic properties and correct cosmic mass density. Searches for WIMPs are now rather mature and not particularly encouraging. Direct detection experiments have as yet no positive detections, and have now excluded [19] the bulk of the parameter space (interaction cross-section and particle mass) where WIMPs were expected to reside. Indirect detection through the observation of γ-rays produced by the self-annihilationFootnote 6 of WIMPs in the galactic halo and in nearby satellite galaxies have similarly returned null results [6, 84, 172] at interestingly restrictive levels. For the most-plausible minimally-supersymmetric models, particle colliders should already have produced evidence for WIMPs [2, 1, 23]. The right model need not be minimal. It is always possible to construct a more complicated model that manages to evade all experimental constraints. Indeed, it is readily possible to imagine dark matter candidates that do not interact at all with the rest of the Universe except through gravity. Though logically possible, such dark matter candidates are profoundly unsatisfactory in that they could not be detected in the laboratory: their hypothesized existence could neither be confirmed nor falsified.

Apart from this current non-detection of CDM candidates, there also exists prominent observational challenges for the ΛCDM model, which might point towards the necessity of an alternative model (or, at the very least, an improved one). These challenges are that (i) some of the parameters of the model appear fine-tuned (Section 4.1), and that (ii) as far as galaxy formation and evolution are concerned (mainly processes happening on kpc scales so that the predictions are more difficult to make because the baryon physics should play a more prominent role), many predictions that have been made were not successful (Section 4.2); (iii) what is more, a number of observations on these galactic scales do exhibit regularities that are fully unexpected in any CDM context without a substantial amount of fine-tuning in terms of baryon feedback (Section 4.3).

4.1 Coincidences

What is generally considered as the biggest problem for the ΛCDM model is that it requires a large and still unexplained fine-tuning to reduce by 120 orders of magnitude the theoretical expectation of the vacuum energy to yield the observed cosmological-constant value, and, even more importantly, that it faces a coincidence problem to explain why the dark energy density ΩΛ is precisely of the same order of magnitude as the other cosmological components todayFootnote 7. This uncanny coincidence is generally seen as evidence for some yet-to-be-discovered underlying cosmological mechanism ruling the evolution of dark energy (such as quintessence or generalized additional fluid components, see, e.g., [106]). But it could also indicate that the effect attributed to dark energy is rather due to a breakdown of general relativity (GR) on the largest scales [158].

Then, as we shall see in more detail in Section 4.3, another coincidence, which is central to this whole review, is the appearance of a characteristic scale — dubbed a0 — in the behavior of the dark matter sector, a scale with units of acceleration. This acceleration scale appears in various seemingly unrelated galactic scaling relations, mostly unpredicted by the ΛCDM model (see Section 4.3). The value of this scale is a0 ≃ 10−10 m s−2, which yields in natural unitsFootnote 8, a0H0 (or, more precisely, a0cH0/2π). It is perhaps even more meaningful [51, 298, 304] to note that, in these same units:

$$a_0^2 \sim \Lambda ,$$
(1)

where Λ is the currently-favored value of the cosmological constantFootnote 9. Whether these numerical coincidences are physically relevant or just true (insignificant) coincidences remains an open question, closely related to the nature of the dark sector, which we are going to elaborate on in Sections 510. But, at this stage, it is in any case striking that the dark matter and dark energy sectors do have such a common scale. This coincidence of scales, together with the coincidence of energy densities at redshift zero, might perhaps be a strong indication that one should cease to consider dark energy as an additional component physically independent from the dark matter sector [7], and/or cease to consider that GR correctly describes gravity on the largest scales and in extremely weak gravitational fields, in order to perhaps address the two above coincidence problems at the same time.

Finally, let us note that the existence of the a0-scale is actually not the only dark-matter-related coincidence, as there is also, in principle, absolutely no reason why the mechanism leading to the baryon asymmetry (between baryonic matter and antimatter) would simultaneously leave both the baryon and dark matter densities with a similar order of magnitude (Ωdmb = 5). If the effects we attribute to dark matter are actually also due to a breakdown of GR on cosmological scales, then such a coincidence might perhaps appear more natural as the baryons would then be the actual source of the effect attributed to the dark matter sector.

4.2 Unobserved predictions

Apart from the above puzzling coincidences, the concordance ΛCDM model also has a few more concrete empirical challenges to address, in the sense of having made a few predictions in contradiction with observations (with the caveat in mind that the model itself is not always that predictive on small scales). These include the following non-exhaustive list:

  1. 1.

    The bulk flow challenge. Peculiar velocities of galaxy clusters are predicted to be on the order of 200 km/s in the ΛCDM model. These can actually be measured by studying the fluctuations in the CMB generated by the scattering of the CMB photons by the hot X-ray-emitting gas inside clusters (the kinematic SZ effect). This yields an observed coherent bulk flow of order 1000 km/s (5 times more than predicted) on scales out to at least 400 Mpc [221]. This bulk flow challenge appears not only in SZ studies but also in galaxy studies [483]. A related problem is the collision velocity larger than 3100 km/s for the merging bullet cluster 1E0657-56 at z = 0.3, much too high to be accounted for by ΛCDM [249, 455]. These observations would seem to indicate that the attractive force between DM particles is enhanced compared to what ΛCDM predicts, and changing CDM into WDM would not solve the problem.

  2. 2.

    The high-z clusters challenge. Observation of even a single massive cluster at high redshift can falsify ΛCDM [331]. In this respect the existence of the galaxy cluster XMMU J2235.3-2557 [368] with a mass of of ∼ 4 × 1014 M at z = 1.4, even though not sufficient to rule out the model, is very surprising and could indicate that structure formation is actually taking place earlier and faster than in ΛCDM (see also [420] on the Shapley supercluster and the Sloan Great Wall).

  3. 3.

    The Local Void challenge. The Local Volume is composed of 562 known galaxies at distances smaller than 8 Mpc from the center of the Local Group, and the region known as the “Local Void” hosts only 3 of them. This is much less than the expected ∼ 20 for a typical similar void in ΛCDM [350]. What is more, in the Local Volume, large luminous galaxies are over-represented by a factor of 6 in the underdense regions, exactly opposite to what is expected from ΛCDM. This could mean that the Local Volume is just a statistical anomaly, but it could also point, in line with the two previous challenges, towards more rapid structure formation, allowing sparse regions to more quickly form large galaxies cleaning their environment, making the galaxies larger and the voids emptier at early times [350].

  4. 4.

    The missing satellites challenge. It has long been known that the model predicts an overabundance of dark subhalos orbiting Milky-Way-sized galaxies compared to the observed number of satellite galaxies around the Milky Way [329]. This is a different problem from the above-predicted overabundance of small galaxies in voids. It has subsequently been suggested that stellar feedback and heating processes limit baryonic growth, that re-ionisation prevents low-mass dark halos from forming stars, and that tidal forces from the host halo limit growth of the dark-matter sub-halos and lead to their truncation. This important theoretical effort has led recent semi-analytic models to predict a reduced number of ∼ 100 to 600 faint satellites rather than the original thousands. Moreover, during the past 15 years 13 “new” and mostly ultra-faint satellite galaxies have been found in addition to the 11 previously-known classical bright ones. Since these new galaxies have been largely discovered with the Sloan Digital Sky Survey (SDSS), and since this survey covered only one fifth of the sky, it has been argued that the problem was solved. However, there are actually still missing satellites on the low mass and high mass end of the mass function predicted by “ΛCDM+re-inoisation” semi-analytic models. This is best illustrated on Figure 2 of [239] showing the cumulative distribution for the predicted and observationally-derived masses within the central 300 pc of Milky Way satellites. A lot of low-mass satellites are still missing, and the most massive predicted subhaloes are also incompatible with hosting any of the known Milky Way satellites [73, 75, 74]. This is the modern version of the missing satellites challenge. An obvious but rather discomforting way-out would be to simply state that the Milky Way must be a statistical outlier, but this is contradicted by the study of [447] on the abundance of bright satellites around Milky Way-like galaxies in SDSS. Another solution would be to change from CDM to WDM [252] (it is actually one of the only listed challenges that such a change would probably immediately solve).

  5. 5.

    The satellites phase-space correlation challenge. In addition to the above challenge, the distribution of dark subhalos around the Galaxy is also predicted by ΛCDM to be isotropic, or quasi-isotropic. However, the Milky Way satellites are currently observed to be correlated in phase-space: they lie within a seemingly rotation-supported disk [239]. Young halo globular clusters define the same disk, and streams of stars and gas, tracing the orbits of the objects from which they are stripped, preferentially lie in this disk, too [347]. Since SDSS covered only one fifth of the sky, it will be interesting to see whether future surveys such as Pan-Starrs will confirm this state of affairs. Whether or not this phase-space correlation would be unique to the Milky Way should also be carefully checked, the evidence in M31 being currently much less convincing, with a richer and more complex satellite population [289]. But in any case, the current distribution of satellites around the Milky Way is statistically incompatible with the predictions of ΛCDM at a very high level of confidence, even when taking into account the observational bias from SDSS [239]. While this might perhaps have been explained by the infall of a small group of galaxies that would have retained correlated orbits, this solution is ruled out by the fact that no nearby groups are observed to be anywhere near as spatially small as the disk of satellites [290]. Another solution might be that most Milky Way satellites are actually not primordial galaxies but old tidal dwarf galaxies created in an early major merger event, accounting for their presently-correlated phase-space distribution [346]. Note in passing that if only one or two long-lived tidal dwarfs are created in each gas-dissipational galaxy encounter, they could probably account for most of the dwarf galaxy population in the Universe, leaving no room for small CDM subhalos to create galaxies, which would transform the missing satellites challenge into a missing satellites catastrophe [239].

  6. 6.

    The cusp-core challenge. Another long-standing problem of ΛCDM is the fact that the simulations of the collapse of CDM halos lead to a density distribution as a function of radius, ρ(r), which is well fitted by a smooth function asymptoting to a central cusp with slope d ln ρ/d ln r = −1 in the central parts [126, 332], while observations clearly point towards large constant density cores in the central parts [118, 169, 479]. Even though the latest simulations [333] rather point towards Einasto [133] profiles with d ln ρ/d ln r ∝ − r(1/n) (with n slightly varying with halo mass, and n ∼ 6 for a Milky Way-sized halo, meaning that the slope is zero only very close to the nucleus [177], and is still ∼ −1 at 200 pc from the center), fitting such profiles to observed galactic kinematical data such as rotation curves [88] leads to values of n that are much smaller than simulated values (meaning that they have much larger cores), which is another way of re-assessing the old cusp problem of ΛCDM. Note that a change from CDM to WDM could solve the problem in dwarf galaxies, by leading to the formation of small cores, but certainly not in large galaxies where large cores are needed from observations. Thus, one has to rely on baryon feedback to erase the cusp from all galaxies. But this is not easily done, as the adiabatic cooling of baryons in the center of dark matter halos should lead to an even more concentrated dark matter distribution. A possibility would be that angular momentum transfer from a rotating stellar bar destroys dark-matter cusps: however, significant cusp destruction requires substantially more angular momentum than is realistically available in stellar bars [89, 286]. Note also that not all galaxies are barred (e.g., M33 is not). The state-of-the-art solution nowadays is to enforce strong supernovae outflows that move large amounts of low-angular-momentum gas from the central parts and that “pull” on the central dark matter concentration to create a core [176], but this is still a highly fine-tuned process, which fails to address the baryon fraction problem (see challenge 10 below).

  7. 7.

    The angular momentum challenge. As a consequence of the merger history of galaxy disks in a hierarchical formation scenario, as well as of the associated transfer of angular momentum from the baryonic disk to the dark halo, the specific angular momentum of the baryons ends up being much too small in simulated disks, which in turn end up much smaller than the observed ones [4]. Similarly, elliptical systems end up too concentrated as well. Addressing this challenge within the standard paradigm essentially relies on forming disks through late-time quiescent gas accretion from large-scale filaments, with much less late-time mergers than presently predicted in ΛCDM.

  8. 8.

    The pure disk challenge. Related to the previous challenge, large bulgeless thin disk galaxies are extremely difficult to produce in simulations. This is because major mergers, at any time in the galaxy formation process, typically create bulges, so bulgeless galaxies would represent the quiescent tail of a distribution of merger histories for galaxies of the Local Volume. However, these bulgeless disk galaxies represent more than half of large galaxies (with Vc > 150 km/s) in the Local Volume [178, 231]. Solving this problem would rely, e.g., on suppressing central spheroid formation for mergers with mass ratios lower than 30% [228].

  9. 9.

    The stability challenge. Round CDM halos tend to stabilize very low surface density disks against the formation of bars and spirals, due to a lack of disk self-gravity [291]. The observation [282] of Low Surface Brightness (LSB) disk galaxies with strong bars and spirals is thus challenging in the absence of a significant disk component of dark matter. What is more, in the absence of such a disk DM component, the lack of disk self-gravity prevents the creation of very-large razor-thin LSB disks, but these are observed [222, 260]. In the standard context, these observations would tend to point towards an additional disk DM component, either a CDM-one linked to in-plane accretion of satellites or a baryonic one in the form of molecular gas.

  10. 10.

    The missing baryons challenge(s). As mentioned above, constraints from the CMB imply Ωm = 0.27 and Ωb = 0.046. However, our inventory of known baryons in the local Universe, summing over all observed stars, gas, etc., comes up short of the total. For example, [42] estimate that the sum of stars and cold gas is only ∼ 5% of Ωb While there now seems to be a good chance that many of the missing baryons are in the form of highly ionized gas in the warm-hot intergalactic medium (WHIM), we are still far from being able to give a confident account of where all the baryons reside. Indeed, there could be multiple distinct reservoirs in addition to the WHIM, each comparable to the mass in stars, within the current uncertainties. But there is another missing baryons challenge, namely the halo-by-halo missing baryons. Indeed, each CDM halo can, to a first approximation, be thought of as a microcosm of the whole. As such, one would naively expect each halo to have the same baryon fraction as the whole Universe, fb = Ωbm = 0.17. On the scale of clusters of galaxies, this is approximately true (but still systematically low), but for individual galaxies, observations depart from this in a systematic way which we have yet to understand, and which has nothing to do with the truncation radius. The ratio of the galaxy-detected baryon fraction over the cosmological one, fd, is plotted as a function of the potential well of the systems in Figure 2 [284]. There is a clear correlation, less massive objects being much more dark-matter dominated than massive ones. This correlation is a priori not predicted at all by ΛCDM, at least not with the correct shape [273]. This missing baryons challenge is actually closely related to the baryonic Tully-Fisher relation, which we expand on in Section 4.3.1.

Figure 2
figure 2

The fraction of the expected baryons that are detected as a function of potential-well depth (bottom axis) and mass (top). Measurements are referenced to the radius R500, where the enclosed density is 500 times the cosmic mean [284]. The detected baryon fraction fd = Mb/(0.17 M500), where Mb is the detected baryonic mass, 0.17 is the universal baryon fraction [229], and M500 is the dynamical mass (baryonic + dark mass) enclosed by R500. Each point is a bin representing many objects. Gray triangles represent galaxy clusters, which come close to containing the cosmic fraction. The detected baryon fraction declines systematically for smaller systems. Dark-blue circles represent star-dominated spiral galaxies. Light-blue circles represent gas-dominated disk galaxies. Orange squares represent Local Group dwarf satellites for which the baryon content can be less than 1% of the cosmic value. Where these missing baryons reside is one of the challenges currently faced by ΩCDM.

However, let us note that, while challenges 1 to 3 are not real smoking guns yet for the ΛCDM model, challenges 4 to 10 are concerned with processes happening on kpc scales, for which it is fair to consider that the model is not very predictive because the baryon physics should play a more important role, and this is hard to take into account rigorously. However, it is not sufficient to qualitatively invoke handwavy baryon physics to avoid confronting predictions of ΛCDM with observations. It is also mandatory to show that the feedback from the baryons, which is needed to solve the observational problems, is what would quantitatively happen in a physical galaxy. This, presently, is not yet the case for the aforementioned challenges. However, these challenges are “model-dependent problems”, in the sense of being failed predictions of a given model, but would not have appeared a priori surprising without the standard concordance model at hand. This means that subtly changing some parameters of the model (like, e.g., swapping CDM for WDM, making DM more self-interacting, etc.) might help solving at least a few of them. But what is even more challenging is a set of observations that appear surprising independently of any specific dark matter model, as they involve a fine-tuned relation between the distribution of visible and dark matter. These are what we call hereafter “unpredicted observations”.

4.3 Unpredicted observations

There are several important examples of systematic relations between the dynamics of galaxies (in theory presumed to be dominated by dark matter) and their baryonic content. These relations are fully empirical, and as such must be explained by any viable theory. As we shall see, they inevitably involve a critical acceleration scale, or equivalently, a critical surface density of baryonic matter.

4.3.1 Baryonic Tully-Fisher relation

One of the strongest correlations in extragalactic astronomy is the Tully-Fisher relation [467]. Originally identified as an empirical relation between a galaxy’s luminosity and its HI line-width, it has been widely employed as a distance indicator. Though extensively studied for decades, the physical basis of the relation remains unclear.

Luminosity and line-width are readily accessible observational quantities. The optical luminosity of a galaxy is a proxy for its stellar mass, and the HI line-width is a proxy for its rotation velocity. The quality of the correlation improves as more accurate indicators of these quantities are employed. For example, resolved rotation curves, where the flat portion of the rotation curve Vf or the maximum peak velocity Vp can be measured, give relations that are tighter than those utilizing only line-width information [108]. Similarly, the scatter declines as we shift from optical luminosities to those in the near-infrared [475] as the latter are expected to give a more reliable mapping of starlight to stellar mass [42].

It was then realized [322, 157, 283] that a more fundamental relation was that between the total observed baryonic mass and the rotation velocity. In most bright galaxies, the stars harbor the majority of the detected baryonic mass, so luminosity suffices as a proxy for mass. The next-most-important known reservoir of baryons is the neutral atomic hydrogen (HI) of the interstellar medium. As studies have probed down the mass spectrum to lower mass, more slowly rotating systems, a higher preponderance of gas rich galaxies is found. The luminous Tully-Fisher relation breaks down [283, 272], but a tight relation persists if instead of luminosity, the detected baryonic mass Mb = M* + Mg is used [283, 475, 42, 272, 353, 31, 445, 462, 276]. This is the Baryonic Tully-Fisher Relation (BTFR), plotted on Figure 3.

Figure 3
figure 3

The Baryonic Tully-Fisher (mass-rotation velocity) relation for galaxies with well-measured outer velocities Vf. The baryonic mass is the combination of observed stars and gas: Mb = M* + Mg. Galaxies have been selected that have well observed, extended rotation curves from 21 cm interferrometric observations providing a good measure of the outer, flat rotation velocity. The dark blue points are galaxies with M* > Mg [272]. The light blue points have M* < Mg [276] and are generally less precise in velocity, but more accurate in terms of the harmlessness on the result of possible systematics on the stellar mass-to-light ratio. For a detailed discussion of the stellar mass-to-light ratios used here, see [272, 276]. The dotted line has slope 4 corresponding to a constant acceleration parameter, 1.2 × 10−10 m s−2. The dashed line has slope 3 as expected in ΛCDM with the normalization expected if all of the baryons associated with dark matter halos are detected. The difference between these two lines is the origin of the variation in the detected baryon fraction in Figure 2.

The luminous Tully-Fisher relation extends over about two decades in luminosity. Recent work extending the relation to low mass, typically LSB and gas rich galaxies [31, 445, 462] extends the dynamic range of the BTFR to five decades in baryonic mass. Over this range, the BTFR has remarkably little intrinsic scatter (consistent with zero given the observational errors) and is well described as a power law, or equivalently, as a straight line in log-log space:

$$\log {M_b} = \alpha \log {V_f} - \log \beta$$
(2)

with slope α = 4 [272, 445, 276]. This slope is consistent with a constant acceleration scale \({\rm{a =}}V_f^4/(G{M_b})\) such thatFootnote 10 the normalization constant β = Ga.

The acceleration scale a ≈ 10−10 m s−2 ∼ Λ1/2 (Eq. 1) is thus present in the data. Figure 4 shows the distribution of this acceleration \(V_f^4/{M_b}\), around the best fit line in Figure 3, strongly peaked around ∼ 2 × 10−62 in natural units. As we shall see, this acceleration scale arises empirically in a variety of distinct situations involving the mass discrepancy problem.

Figure 4
figure 4

Histogram of the accelerations \({\rm{a =}}V_f^4/(G{M_b})\) (bottom axis) and natural units [c4/(GmP) where mp is the Planck mass] for galaxies with well measured Vf. The data are peaked around a characteristic value of ∼ 10−10 m s−2 (∼ 2 × 10−62 in natural units).

A BTFR of the observed form does not arise naturally in ΛCDM. The naive expectation is \(\alpha = 3\) and \(\beta = 10f_V^3G{H_0}\) [446]Footnote 11 where H0 is the Hubble constant and fV is a factor of order unity (currently estimated to be ≈ 1.3 [361]) that relates the observed Vf to the circular velocity of the potential at the virial radiusFootnote 12. This modest fudge factor is necessary because ΛCDM does not explicitly predict either axis of the observed BTFR. Rather, there is a relationship between total (baryonic plus dark) mass and rotation velocity at very large radii. This simple scaling fails (dashed line in Figure 3), obliging us to introduce an additional fudge factor fd [273, 284] that relates the detected baryonic mass to the total mass of baryons available in a halo. This mismatch drives the variation in the detected baryon fraction fd seen in Figure 2. A constant fd is excluded by the difference between the observed and predicted slopes; fd must vary with Vf, or M, or the gravitational potential Φ

This brings us to the first fine-tuning problem posed by the data. There is essentially zero intrinsic scatter in the BTFR [276], while the detected baryon fraction fd could, in principle, obtain any value between zero and unity. Somehow galaxies must “know” what the circular velocity of the halo they reside in is so that they can make observable the correct fraction of baryons.

Quantitatively, in the ΛCDM picture, the baryonic mass plotted in the BTFR (Figure 3) is Mb = M* + Mg while the total baryonic mass available in a halo is fbMtot. The difference between these quantities implies a reservoir of dark baryons in some undetected form, Mother. It is commonly speculated that the undetected baryons could be in a hard-to-detect hot, diffuse, ionized phase mixed in with the dark matter halo (and extending to comparable radius), or that the missing baryons have been entirely blown away by winds from supernovae. For the purposes of this argument, it does not matter which form the dark baryons take. All that matters is that a substantial mass of them are required so that [283]

$${f_d} = {{{M_b}} \over {{f_b}{M_{{\rm{tot}}}}}} = {{{M_\ast} + {M_g}} \over {{M_\ast} + {M_g} + {M_{{\rm{other}}}}}}.$$
(3)

Since there is negligible intrinsic scatter in the observed BTFR, there must be effectively zero scatter in fd. By inspection of Eq. 3, it is apparent that small scatter in fd can only be obtained naturally in the limits M* + MgMother so that fd → 1 or M* + MgMother so that fd → 0. Neither of these limits apply. We require not only an appreciable mass in dark baryons Mother, but we need the fractional mass of these missing baryons to vary in lockstep with the observed rotation velocity Vf. Put another way, for any given galaxy, we know not only how many baryons we see, but also how many we do not see — a remarkable feat of non-observation.

Another remarkable fact about the BTFR is that it shows no residuals with variations in the distribution of baryons [517, 443, 109, 271]. Figure 5 shows deviations from the BTFR as a function of the characteristic baryonic surface density of the galaxies, as defined in [271], i.e., \({\Sigma _b} = 0.75{M_b}/R_p^2\) where Rp is the radius at which the rotation curve Vb(r) of baryons peaks. Over several decades in surface density, the BTFR is completely insensitive to variations in the mass distribution of the baryons. This is odd because, a priori, V2M/R, and thus V4MΣ. Yet the BTFR is \({M_b} \sim V_f^4\) with no dependence on Σ. This brings us to a second fine-tuning problem. For some time, it was thought [156] that spiral galaxies all had very nearly the same surface brightness (a condition formerly known as “Freeman’s Law”). If this is indeed the case, the observed BTFR naturally follows from the constancy of Σ. However, there do exist many LSB galaxies [264] that violate the constancy of surface brightness implied in Freeman’s Law. Thus, one would expect them to deviate systematically from the Tully-Fisher relation, with lower surface brightness galaxies having lower rotation velocities at a given mass. Yet they do not. Thus, one must fine-tune the mass surface density of the dark matter to precisely make up for that of the baryons [279]. As the surface density of baryons declines, that of the dark matter must increase just so as to fill in the difference (Figure 6 [271]). The relevant quantity is the dynamical surface density enclosed within the radius, where the velocity is measured. The latter matters little along the flat portion of the rotation curve, but the former is the sum of dark and baryonic matter.

Figure 5
figure 5

Residuals (δ log Vf) from the baryonic Tully-Fisher relation as a function of a galaxy’s characteristic baryonic surface density \(({\Sigma _b} = 0.75{M_b}/R_p^2\) [271], Rp being the radius at which the contribution of baryons to the rotation curve peaks). Color differentiates between star (dark blue) and gas (light blue) dominated galaxies as in Figure 3, but not all galaxies there have sufficient data (especially of Rp) to plot here. Stellar masses have been estimated with stellar population synthesis models [42]. More accurate data, with uncertainty on rotation velocity less than 5%, are shown as larger points; less accurate data are shown as smaller points. The rotation velocity of galaxies shows no dependence on the distribution of baryons as measured by Σb or Rp. This is puzzling in the conventional context, where V2 = GM/r should lead to a strong systematic residual [109].

Figure 6
figure 6

The fractional contribution to the total velocity Vp at the radius RP where the contribution of the baryons peaks for both baryons (Vb/Vp, top) and dark matter (Vdm/Vp, bottom). Points as per Figure 5. As the baryonic surface density increases, the contribution of the baryons to the total gravitating mass increases. The dark matter contribution declines in compensation, maintaining a see-saw balance that manages to leave no residual in the BTFR (Figure 5). The absolute amplitude of Vb, and Vdm depends on choice of stellar mass estimator, but the fine-tuning between them must persist for any choice of M*/L.

One might be able to avoid fine-tuning if all galaxies are dark-matter dominated [109]. In the limit Σ dmb, the dynamics are entirely dark-matter dominated and the distribution of the baryons is irrelevant. There is some systematic uncertainty in the mass-to-light ratios of stellar populations [42], making such an approach a priori tenable. In effect, we return to the interpretation of Σ ∼ constant originally made by [3] in the context of Freeman’s Law, but now we invoke a constant surface density of CDM rather than of baryons. But as we will see, such an interpretation, i.e., that Σb ≪Σdm in all disk galaxies, is flatly contradicted by other observations (e.g., Figure 9 and Figure 13).

The Tully-Fisher relation is remarkably persistent. Originally posited for bright spirals, it applies to galaxies that one would naively expected to deviate from it. This includes low-luminosity, gas-dominated irregular galaxies [445, 462, 276], LSB galaxies of all luminosities [517, 443], and even tidal dwarfs formed in the collision of larger galaxies [165]. Such tidal dwarfs may be especially important in this context (see also Section 6.5.4). Galactic collisions should be very effective at segregating dark and baryonic matter. The rotating gas disks of galaxies that provide the fodder for tidal tails and the tidal dwarfs that form within them initially have nearly circular, coplanar orbits. In contrast, the dark-matter particles are on predominantly radial orbits in a quasi-spherical distribution. This difference in phase space leads to tidal tails that themselves contain very little dark matter [72]. When tidal dwarfs form from tidal debris, they should be largely devoidFootnote 13 of dark matter. Nevertheless, tidal dwarfs do appear to contain dark matter [72] and obey the BTFR [165].

The critical acceleration scale of Eq. 1 also appears in non-rotating galaxies. Elliptical galaxies are three-dimensional stellar systems supported more by random motions than organized rotation. First of all, in such systems of measured velocity dispersion σ, the typical acceleration σ2/R is also on the order of a0 within a factor of a few, where R is the effective radius of the system [401]. Moreover, they obey an analogous relation to the Tully-Fisher one, known as the Faber-Jackson relation (Figure 7). In bulk, the data for these star-dominated galaxies follow the relation σ4/(GM*) ∝ a0 (dotted line in Figure 7). This is not strictly analogous to the flat part of the rotation curves of spiral galaxies, the dispersion typically being measured at smaller radii, where the equivalent circular velocity curve is often falling [367, 323], or in a temporary plateau before falling again (see also Section 6.6.1). Indeed, unlike the case in spiral galaxies, where the distribution of stars is irrelevant, it clearly does matter in elliptical galaxies (the Faber-Jackson relation is just one projection of the “fundamental plane” of elliptical galaxies [85]). This is comforting: at small radii in dense stellar systems where the baryonic mass of stars is clearly important, the data behave as Newton predicts.

Figure 7
figure 7

The Faber-Jackson relation for spheroidal galaxies, including both elliptical galaxies (red squares, [85, 232]) and Local Group dwarf satellites [285] (orange squares are satellites of the Milky Way; pink squares are satellites of M31). In analogy with the Tully-Fisher relation for spiral galaxies, spheroidal galaxies follow a relation between stellar mass and line of sight velocity dispersion (σ). The dotted line represents a constant value of the acceleration parameter σ4/(GM*). Note, however, that this relation is different from the BTFR because it applies to the bulk velocity dispersion while the BTFR applies to the asymptotic circular velocity. In the context of Milgrom’s law (Section 5) the Faber-Jackson relation is predicted only when relying on assumptions such as isothermality, isotropy, and the slope of the baryonic density distribution (see 3rd law of motion in Section 5.2). In addition, not all pressure-supported systems are in the weak-acceleration regime. So, in the context of Milgrom’s law, deviations from the weak-field regime, from isothermality and from isotropy, as well as variations in the baryonic density distribution slope, would thus explain the scatter in this relation.

The acceleration scale a0 is clearly imprinted on the data for local galaxies. This is an empirical statement that might not hold at all times, perhaps evolving over cosmic time or evaporating altogether. Substantial efforts have been made to investigate the Tully-Fisher relation to high redshift. To date, there is no persuasive evidence of evolution in the zero point of the BTFR out to z = 0.6 [356, 357] and perhaps even to z = 1 [485]. One must exercise caution in interpreting such results given the difficulty inherent in peering many Gyr back in cosmic time. Nonetheless, it appears that the scale a0 remains present in the data and has not obviously changed over the more recent half of the age of the Universe.

4.3.2 The role of surface density

The Freeman limit [156] is the maximum central surface brightness in the distribution of galaxy surface brightnesses. Originally thought to be a universal surface brightness, it has since become clear that instead galaxies exist over a wide range in surface brightness [264]. In the absence of a perverse and fine-tuned anti-correlation between surface brightness and stellar mass-to-light ratio [517], this implies a comparable range in baryonic surface density (Figure 8).

Figure 8
figure 8

Size and surface density. The characteristic surface density of baryons as defined in Figure 5 is plotted against their dynamical scale length Rp in the left panel. The dark-blue points are star-dominated galaxies and the light-blue ones gas-dominated. High characteristic surface densities at low Rp in the left panel are typical of bulge-dominated galaxies. The stellar disk component of most spiral galaxies is well approximated by the exponential disk with \(\Sigma (R) = {\Sigma _{{0^e}}}^{- R/{R_d}}\). This disk-only central surface density and the exponential scale length of the stellar disk are plotted in the right panel. Galaxies exist over a wide range in both size and surface density. There is a maximum surface density threshold (sometimes referred to as Freeman’s limit) above which disks become very rare [264]. This is presumably a stability effect, as purely Newtonian disks are unstable [343, 415]. Stable disks only appear below a critical surface density Σa0/G [299, 77].

An upper limit to the surface brightness distribution is interesting in the context of disk stability. Recall that dynamically cold, purely Newtonian disks are subject to potentially-self-destructive instabilities, one cure being to embed them in the potential wells of spherical dark-matter halos [343]. While the proper criterion for stability is much debated [131, 415], it is clear that the dark matter halo moderates the growth of instabilities and that the ratio of halo to disk self gravity is a relevant quantity. The more self-gravitating a disk is, the more likely it is to suffer undamped growth of instabilities. But, in principle, galaxies with a baryonic disk and a dark matter halo are totally scalable: if a galaxy model has a certain dynamics, and one multiplies all densities by any (positive) constant (and also scales the velocities appropriately) one gets another galaxy with exactly the same dynamics (with scaled time scales). So if one is stable, so is the other. In turn, the mere fact that there might be an upper limit to Σb is a priori surprising, and even more so that there might be a coincidence of this upper limit with the acceleration scale a0 identified dynamically.

The scale Σ = a0/G is clearly present in the data (Figure 8). Selection effects make high-surface-brightness (HSB) galaxies easy to detect and hence discover, but their intrinsic numbers appear to decline exponentially when the central surface density of the stellar disk Σ0 > Σ [264]. It seems natural to associate the dynamical scale a0 with the disk stability scale since they are numerically indistinguishable and both arise in the context of the mass discrepancy. However, there is no reason to expect this in ΛCDM, which predicts denser dark matter halos than observed [280, 169, 167, 241, 243, 478, 118]. Such dense dark matter halos could stabilize much higher density disks than are observed to exist. Lacking a clear mechanism to specify this scale, it is introduced into models by hand [115].

Poisson’s equation provides a direct relation between the force per unit mass (centripetal acceleration in the case of circular orbits in disk galaxies), the gradient of the potential, and the surface density of gravitating mass. If there is no dark matter, the observed surface density of baryons must correlate perfectly with the dynamical acceleration. If, on the other hand, dark matter dominates the dynamics of a system, as we might infer from Figure 5 [279, 109], then there is no reason to expect a correlation between acceleration and the dynamically-insignificant baryons. Figure 9 shows the dynamical acceleration as a function of baryonic surface density in disk galaxies. The acceleration ap = Vp/Rp is measured at the radius Rp, where the rotation curve Vb(r) of baryons peaks. Given the systematic variation of rotation curve shape [376, 495], the specific choice of radii is unimportant. Nevertheless, this radius is advocated by [109] since this maximizes the possibility of perceiving the baryonic contribution in the plot of Figure 5. That this contribution is not present leads to the inference that Σb ≪ Σdm in all disk galaxies [109]. This is directly contradicted by Figure 9, which shows a clear correlation between ap and Σb. The higher the surface density of baryons, the higher the observed acceleration. The slope of the relation is not unity, ap ∝ Σb, as we would expect in the absence of a mass discrepancy, but rather \({a_p} \propto \Sigma _b^{1/2}\). To simultaneously explain Figure 5 and Figure 9, there must be a strong fine-tuning between dark and baryonic surface densities (i.e., Figure 6), a sort of repulsion between them, a repulsion which is however contradicted by the correlations between baryonic and dark matter bumps and wiggles in rotation curves (see Section 4.3.4).

Figure 9
figure 9

The dynamical acceleration \({a_p} = V_p^2/{R_p}\) in units of a0 plotted against the characteristic baryonic surface density [275]. Points as per Figure 5. The dotted line shows the relation ap = GΣb that would be obtained if the visible baryons sufficed to explain the observed velocities in Newtonian dynamics. Though the data do not follow this line, they do show a correlation \(({a_p} \propto \Sigma _b^{1/2})\). This clearly indicates a dynamical role for the baryons, in contradiction to the simplest interpretation [109] of Figure 5 that dark matter completely dominates the dynamics.

4.3.3 Mass discrepancy-acceleration relation

So far we have discussed total quantities. For the BTFR, we use the total observed mass of a galaxy and its characteristic rotation velocity. Similarly, the dynamical acceleration-baryonic surface density relation uses a single characteristic value for each galaxy. These are not the only ways in which the “magical” acceleration constant a0 appears in the data. In general, the mass discrepancy only appears at very low accelerations a < a0 and not (much) above a0. Equivalently, the need for dark matter only becomes clear at very low baryonic surface densities Σ < Σ = a0/G. Indeed, the amplitude of the mass discrepancy in galaxies anti-correlates with acceleration [270].

In [270], one examined the role of various possible scales, as well as the effects of different stellar mass-to-light ratio estimators, on the mass discrepancy problem. The amplitude of the mass discrepancy, as measured by (V/Vb)2, the ratio of observed velocity to that predicted by the observed baryons, depends on the choice of estimator for stellar M*/L. However, for any plausible (non-zero) M*/L, the amplitude of the mass discrepancy correlates with acceleration (Figure 10) and baryonic surface density, as originally noted in [382, 266, 406]. It does not correlate with radius and only weakly with orbital frequencyFootnote 14.

Figure 10
figure 10

The mass discrepancy in spiral galaxies. The mass discrepancy is defined [270] as the ratio \({V^2}/V_b^2\) where V is the observed velocity and Vb is the velocity attributable to visible baryonic matter. The ratio of squared velocities is equivalent to the ratio of total-to-baryonic enclosed mass for spherical systems. No dark matter is required when V = Vb, only when V > Vb. Many hundreds of individual resolved measurements along the rotation curves of nearly one hundred spiral galaxies are plotted. The top panel plots the mass discrepancy as a function of radius. No particular linear scale is favored. Some galaxies exhibit mass discrepancies at small radii while others do not appear to need dark matter until quite large radii. The middle panel plots the mass discrepancy as a function of centripetal acceleration a = V2/r, while the bottom panel plots it against the acceleration \({g_N} = V_b^2/r\) predicted by Newton from the observed baryonic surface density Σb. Note that the correlation appears a little better with gn because the data are stretched out over a wider range in gN than in a. Note also that systematics on the stellar mass-to-light ratios can make this relation slightly more blurred than shown here, but the relation is nevertheless always present irrespective of the assumptions on stellar mass-to-light ratios [270]. Thus, there is a clear organization: the amplitude of the mass discrepancy increases systematically with decreasing acceleration and baryonic surface density.

There is no reason in the dark matter picture why the mass discrepancy should correlate with any physical scale. Some systems might happen to contain lots of dark matter; others very little. In order to make a prediction with a dark matter model, it is necessary to model the formation of the dark matter halo, the condensation of gas within it, the formation of stars therefrom, and any feedback processes whereby the formation of some stars either enables or suppresses the formation of further stars. This complicated sequence of events is challenging to model. Baryonic “gastrophysics” is particularly difficult, and has thus far precluded the emergence of a clear prediction for galaxy dynamics from ΛCDM.

ΛCDM does make a prediction for the distribution of mass in baryonless dark matter halos: the NFW halo [332, 333]. These are remarkable for being scale free. Small halos have a profile similar to large halos. No feature stands out that marks a unique physical scale as observed. Galaxies do not resemble pure NFW halos [416], even when dark matter dominates the dynamics as in LSB galaxies [241, 243, 118]. The inference in ΛCDM is that gastrophysics, especially the energetic feedback from stellar winds and supernova explosions, plays a critical role in sculpting observed galaxies. This role is not restricted to the minority baryonic constituents; it must also affect the majority dark matter [176]. Simulations incorporating these effects in a quasi-realistic way are extremely expensive computationally, so a comprehensive survey of the plausible parameter space occupied by such models has yet to be made. We have no reason to expect that a particular physical scale will generically emerge as the result of baryonic gastrophysics. Indeed, feedback from star formation is inherently a random process. While it is certainly possible for simple laws to emerge from complicated physics (e.g., the fact that SNIa are standard candles despite the complicated physics involved), the more common situation is for chaos to beget chaos. Therefore, it seems unnatural to imagine feedback processes leading to the orderly behavior that is observed (Figure 10); nor is it obvious how they would implicate any particular physical scale. Indeed, the dark matter halos formed in ΛCDM simulations [332, 333] provide an initial condition with greater scatter than the final observed one [280, 478], so we must imagine that the chaotic processes of feedback not only impart order, but do so in a way that cancels out some of the scatter in the initial conditions.

In any case, and whatever the reason for it, a physical scale is clearly observationally present in the data: a0 (Eq. 1). At high accelerations aa0, there is no indication of the need for dark matter. Below this acceleration, the mass discrepancy appears. It cannot be emphasized enough that the role played by a0 in the BTFR and this role as a transition acceleration have strictly no intrinsic link with each other, they are fully independent of each other. There is nothing in ΛCDM that stipulates that these two relations (the existence of a transition acceleration and the BTFR) should exist at all, and even less that these should harbour an identical acceleration scale.

Thus, it is important to realize not only that the relevant dynamical scale is one of acceleration, not size, but also that the mass discrepancy appears only at extremely low accelerations. Just as galaxies are much bigger than the Solar system, so too are the centripetal accelerations experienced by stars orbiting within a galaxy much smaller than those experienced by planets in the Solar system. Many of the precise tests of gravity that have been made in the Solar system do not explore the relevant regime of physical parameter space. This is emphasized in Figure 11, which extends the mass discrepancy-acceleration relation to Solar system scales. Many decades in acceleration separate the Solar system from galaxies. Aside from the possible exception of the Pioneer anomaly, there is no hint of a discrepancy in the Solar system: V = Vb. Even the Pioneer anomalyFootnote 15 is well removed from the regime where the mass discrepancy manifests in galaxies, and is itself much too subtle to be perceptible in Figure 11. Indeed, to within a factor of ∼ 2, no system exhibits a mass discrepancy at accelerations aa0.

Figure 11
figure 11

The mass-discrepancy-acceleration relation from Figure 10 extended to solar-system scales (each planet is labelled). This illustrates the large gulf in scale between galaxies and the Solar system where high precision tests are possible. The need for dark matter only appears at very low accelerations.

The systematic increase in the amplitude of the mass discrepancy with decreasing acceleration and baryonic surface density has a remarkable implication. Even though the observed velocity is not correctly predicted by the observed baryons, it is predictable from them. Independent of any theory, we can simply fit a function D(GΣ) to describe the variation of the discrepancy (V/Vb)2 with baryonic surface density [270]. We can then apply it to any new system we encounter to predict V = D1/2Vb. In effect, D boosts the velocity already predicted by the observed baryons. While this is a purely empirical exercise with no underlying theory, it is quite remarkable that the distribution of dark matter required in a galaxy is entirely predictable from the distribution of its luminous mass (see also [167]). In the conventional picture, dark matter outweighs baryonic matter by a factor of five, and more in individual galaxies given the halo-by-halo missing baryon problem (Figure 2), but apparently the baryonic tail wags the dark matter dog. And it does so again through the acceleration scale a0. Indeed, at very low accelerations, the mass discrepancy is precisely defined by the inverse of the square-root of the gravitational acceleration generated by the baryons in units of a0. This actually asymptotically leads to the BTFR.

So, up to now, we have seen five roles of a0 in galaxy dynamics. (i) It defines the zero point of the Tully-Fisher relation, (ii) it appears as the characteristic acceleration at the effective radius of spheroidal systems, (iii) it defines the Freeman limit for the maximum surface density of pure disks, (iv) it appears as a transition-acceleration above which no dark matter is needed, and below which it appears, and (v) it defines the amplitude of the mass-discrepancy in the weak-field regime (this last point is not a fully independent role as it leads to the Tully-Fisher relation). Let us eventually note that there is yet a final role played by a0, which is that it defines the central surface density of all dark matter halos as being on the order of a0/(2πG) [129, 167, 313].

4.3.4 Renzo’s rule

The relation between dynamical and baryonic surface densities appears as a global scaling relation in disk galaxies (Figure 9) and as a local correspondence within each galaxy (Figure 10). When all galaxies are plotted together as in Figure 10, this connection appears as a single smooth function D(a). This does not suffice to illustrate that individual galaxies have features in their baryon distribution that are reflected in their dynamics. While the above correlations could be interpreted as a sort of repulsion between dark and baryonic matter, the following rather indicates closer-than-natural attraction.

Figure 12 shows the spiral galaxy NGC 6946. Two multi-color images of the stellar component are given. The optical bands provide a (nearly) true color picture of the galaxy, which is perceptibly redder near the center and becomes progressively more blue further out. This is typical of spiral galaxies and reflects real differences in stellar content: the stars towards the center tend to be older and more dominated by the light of red giants, while those further out are younger on average so the light has a greater fractional contribution from bright-but-short-lived main sequence stars. The near-infrared bands [209] give a more faithful map of stellar mass, and are less affected by dust obscuration. Radio synthesis imaging of the 21 cm emission from the hydrogen spin-flip transition maps the atomic gas in the interstellar medium, which typically extends to rather larger radii than the stars.

Figure 12
figure 12

The spiral galaxy NGC 6946 as it appears in the optical (color composite from the BVR bands, left; image obtained by SSM with Rachel Kuzio de Naray using the Kitt Peak 2.1 m telescope), near-infrared (JHK bands, middle [209]), and in atomic gas (21 cm radiaiton, right [481]). The images are shown at the same physical scale, illustrating how the atomic gas typically extends to greater radii than the stars. Images like these are used to construct mass models representing the observed distribution of baryonic mass.

Surface density profiles of galaxies are constructed by fitting ellipses to images like those illustrated in Figure 12. The ellipses provide an axisymmetric representation of the variation of surface brightness with radius. This is shown in the top panels of Figure 13 for NGC 6946 (Figure 12) and the nearby, gas rich, LSB galaxy NGC 1560. The K-band light distribution is thought to give the most reliable mapping of observed light to stellar mass [42], and has been used to trace the run of stellar surface density in Figure 13. The sharp feature at the center is a small bulge component visible as the red central region in Figure 12. The bulge contains only 4% of the K-band light. The remainder is the stellar disk; a straight line fit to the data outside the central bulge region gives the parameters of the exponential disk approximation, Σ0 and Rd. Similarly, the surface density of atomic gas is traced by the 21 cm emission, with a correction for the cosmic abundance of helium — the detected hydrogen represents 75% of the gas mass believed to be present, with most of the rest being helium, in accordance with BBN.

Figure 13
figure 13

Surface density profiles (top) and rotation curves (bottom) of two galaxies: the HSB spiral NGC 6946 (Figure 12, left) and the LSB galaxy NGC 1560 (right). The surface density of stars (blue circles) is estimated by azimuthal averaging in ellipses fit to the K-band (2.2µm) light distribution. Similarly, the gas surface density (green circles) is estimated by applying the same procedure to the 21 cm image. Note the different scale between LSB and HSB galaxies. Also note features like the central bulge of NGC 6946, which corresponds to a sharp increase in stellar surface density at small radius. In the lower panels, the observed rotation curves (data points) are shown together with the baryonic mass models (lines) constructed from the observed distribution of baryons. Velocity data for NGC 6946 include both HI data that define the outer, flat portion of the rotation curve [66] and Hα data from two independent observations [54, 114] that define the shape of the inner rotation curve. Velocity data for NGC 1560 come from two independent interferometric HI observations [28, 163]. Baryonic mass models are constructed from the surface density profiles by numerical solution of the Poisson equation using GIPSY [472]. The dashed blue line is the stellar disk, the red dot-dashed line is the central bulge, and the green dotted line is the gas. The solid black line is the sum of all baryonic components. This provides a decent match to the rotation curve at small radii in the HSB galaxy, but fails to explain the flat portion of the rotation curve at large radii. This discrepancy, and its systematic ubiquity in spiral galaxies, ranks as one of the primary motivations for dark matter. Note that the mass discrepancy is large at all radii in the LSB galaxy.

Mass models (bottom panels of Figure 13) are constructed from the surface density profiles by numerical solution of the Poisson equation [52, 472]. No approximations (like sphericity or an exponential disk) are made at this step. The disks are assumed to be thin, with radial scale length exceeding their vertical scale by 8:1, as is typical of edge-on disks [236]. Consequently, the computed rotation curves (various broken lines in Figure 13) are not smooth, but reflect the observed variations in the observed surface density profiles of the various components. The sum (in quadrature) leads to the total baryonic rotation curve Vb(r) (the solid lines in Figure 13): this is what would be observed if no dark matter were implicated. Instead, the observed rotation (data points in Figure 13) exceeds that predicted by Vb,(r): this is the mass discrepancy.

It is often merely stated that flat rotation curves require dark matter. But there is considerably more information in rotation curve data than asymptotic flatness. For example, it is common that the rotation curve in the inner parts of HSB galaxies like NGC 6946 is well described by the baryons alone. The data are often consistent with a very low density of dark matter at small radii with baryons providing the bulk of the gravitating mass. This condition is referred to as maximum disk [471], and also runs contrary to our inferences of dark matter dominance from Figure 5 [414]. More generally, features in the baryonic rotation curve Vb (r) often correspond to features in the total rotation Vc(r).

Perhaps the most succinct empirical statement of the detailed connection between baryons and dynamics has been given by Renzo Sancisi, and known as Renzo’s rule [379]: “For any feature in the luminosity profile there is a corresponding feature in the rotation curve.” Both galaxies shown in Figure 13 illustrate this statement. In the inner region of NGC 6946, the small but compact bulge component causes a sharp feature in Vb(r) that declines rapidly before the rotation curve rises again, as mass from the disk begins to contribute. The up-down-up morphology predicted by the observed distribution of the baryons is observed in high resolution observations [54, 114]. A dark matter halo with a monotonically-varying density profile cannot produce such a morphology; the stellar bulge must be the dominant mass component at small radii in this galaxy.

A surprising aspect of Renzo’s rule is that it applies to LSB galaxies as well as those of high surface brightness. That the baryons should have some dynamical impact where their surface density is highest is natural, though there is no reason to demand that they become competitive with dark matter. What is distinctly unnatural is for the baryons to have a perceptible impact where dark matter must clearly dominate. NGC 1560 provides an example where they appear to do just that. The gas distribution in this galaxy shows a substantial kink in its surface density profile [28] (recently confirmed by [163]) that has a distinct impact on Vb(r). This occurs at a radius where VVb, so dark matter should be dominant. A spherical-dark-matter halo with particles on randomly oriented, highly radial orbits cannot support the same sort of structure as seen in the gas disk, and the spherical geometry, unlike a disk geometry, would smear the effect on the local acceleration. And yet the wiggle in the baryonic rotation curve is reflected in the total, as per Renzo’s rule.Footnote 16

One inference that might be made from these observations is that the dark matter is baryonic. This is unacceptable from a cosmological perspective, but it is possible to have a multiplicity of dark matter components. That is, we could have baryonic dark matter in the disks of galaxies in addition to a halo of non-baryonic cold dark matter. It is often possible to scale up the atomic gas component to fit the total rotation [193]. That implies a component of mass that is traced by the atomic gas — presumably some other dynamically cold gas component — that outweighs the observed hydrogen by a factor of six to ten [193]. One hypothesis for such a component is very cold molecular gas [352]. It is difficult to exclude such a possibility, though it also appears to be hard to sustain in LSB galaxies[292]. Dynamically, one might expect the extra mass to destabilize the LSB disk. One also returns to a fine-tuning between baryonic surface density and mass-to-light ratio. In order to maintain the balance observed in Figure 5, relatively more dark molecular gas will be required in LSB galaxies so as to maintain a constant surface density of gravitating mass, but given the interactions at hand, this might be at least a bit more promising than explaining it with CDM halos.

As a matter of fact, LSB galaxies play a critical role in testing many of the existing models for dark matter. This happens in part because they were appreciated as an important population of galaxies only after many relevant hypotheses were established, and thus provide good tests of their a priori expectations. Observationally, we infer that LSB disks exhibit large mass discrepancies down to small radii [119]. Conventionally, this means that dark matter completely dominates their dynamics: the surface density of baryons in these systems is never high enough to be relevant. Nevertheless, the observed distribution of baryons suffices to predict the total rotation [279, 120]. Once again, the baryonic tail wags the dark matter dog, with the observations of the minority baryonic component sufficing to predict the distribution of the dominant dark matter. Note that, conversely, nothing is “observable” about the dark matter, in present-day simulations, that predicts the distribution of baryons.

Thus, we see that there are many observations, mostly on galaxy scales, that are unpredicted, and perhaps unpredictable, in the standard dark matter context. They mostly involve a unique relationship between the distribution of baryons and the gravitational field, as well as an acceleration constant a0 on the order of the square-root of the cosmological constant, and they represent the most significant challenges to the current ΛCDM model.

5 Milgrom’s Empirical Law and “Kepler Laws” of Galactic Dynamics

Up to this point in this review, the challenges that we have presented have been purely based on observations, and fully independent of any alternative theoretical framework. However, at this point, it would obviously be a step forward if at least some of these puzzling observations could be summarized and empirically unified in some way, as such a unifying process is largely what physics is concerned with, rather than simply exposing a jigsaw of apparently unrelated empirical observations. And such an empirical unification is actually feasible for many of the unpredicted observations presented in the previous Section 4.3, and goes back to a rather old idea of the Israeli physicist Mordehai Milgrom.

Almost 30 years ago, back in 1983 (and thus before most of the aforementioned observations had been carried out), simply prompted by the question of whether the missing mass problem could perhaps reflect a breakdown of Newtonian dynamics in galaxies, Milgrom [293] devised a formula linking the Newtonian gravitational acceleration gN to the true gravitational acceleration g in galaxies. Such attempts to rectify the mass discrepancy by gravitational means often begin by noting that galaxies are much larger than the solar system. It is easy to imagine that at some suitably large scale, let’s say on the order of 1 kpc, there is a transition from the usual dynamics applicable in the comparatively-tiny solar system to some more general theory that applies on the scale of galaxies in order to explain the mass discrepancy problem. If so, we would expect the mass discrepancy to manifest itself at a particular length scale in all systems. However, as already noted, there is no universal length scale apparent in the data (Figure 10) [382, 266, 406, 279, 270]. The mass discrepancy appears already at small radii in some galaxies; in others there is no apparent need for dark matter until very large radii. This now observationally excludes all hypotheses that simply alter the force law at a linear length-scale.

5.1 Milgrom’s law and the dielectric analogy

Before such precise data were available, Milgrom [293] already noted that other scales were also possible, and that one that is as unique to galaxies as size is acceleration. The typical centripetal acceleration of a star in a galaxy is of order ∼ 10−10 m s−2. This is eleven orders of magnitude less than the surface gravity of the Earth. As we have seen in Section 4, this acceleration constant appears “miraculously” in very different scaling relations that should not, in principle, be related to each otherFootnote 17. This observational evidence for the universal appearance of a0 ≃ 10−10 m s−2 in galactic scaling relations was not at all observationally evident back in 1983. What Milgrom [293] then hypothesized was a modification of Newtonian dynamics below this acceleration constant a0, appropriate to the tiny accelerations encountered in galaxiesFootnote 18. This new constant a0 would then play a similar role as the Planck constant h in quantum physics or the speed of light c in special relativity. For large acceleration (or force per unit mass), F/m = ga0, everything would be normal and Newtonian, i.e., g = gN. Or, put differently, formally taking a0 → 0 should make the theory tend to standard physics, just like recovering classical mechanics for h → 0. On the other hand, formally taking a0 → ∞ (and G → 0), or equivalently, in the limit of small accelerations ga0, the modification would apply in the form:

$$g = \sqrt {{g_N}{a_0}} ,$$
(4)

where g = |g| is the true gravitational acceleration, and gN = |gN| the Newtonian one as calculated from the observed distribution of visible matter. Note that this limit follows naturally from the scale-invariance symmetry of the equations of motion under transformations (t, r) → (λt, λr) [315]. This particular modification was only suggested in 1983 by the asymptotic flatness of rotation curves and the slope of the Tully-Fisher relation. It is indeed trivial to see that the desired behavior follows from equation (4). For a test particle in circular motion around a point mass M, equilibrium between the radial component of the force and the centripetal acceleration yields \(V_c^2/r = {g_N} = GM/{r^2}\). In the weak-acceleration limit this becomes

$${{V_c^2} \over r} = \sqrt {{{GM{a_0}} \over {{r^2}}}} .$$
(5)

The terms involving the radius r cancel, simplifying to

$${V_c}^4(r) = V_f^4 = {a_0}GM.$$
(6)

The circular velocity no longer depends on radius, asymptoting to a constant Vf that depends only on the mass of the central object and fundamental constants. The equation above is the equivalent of the observed baryonic Tully-Fisher relation. It is often wrongly stated that Milgrom’s formula was constructed in an ad hoc way in order to reproduce galaxy rotation curves, while this statement is only true of these two observations: (i) the asymptotic flatness of the rotation curves, and (ii) the slope of the baryonic Tully-Fisher relation (but note that, at the time, it was not clear at all that this slope would hold, nor that the Tully-Fisher relation would correlate with baryonic mass rather than luminosity, and even less clear that it would hold over orders of magnitude in mass). All the other successes of Milgrom’s formula related to the phenomenology of galaxy rotation curves were pure predictions of the formula made before the observational evidence. The predictions that are encapsulated in this simple formula can be thought of as sort of “Kepler-like laws” of galactic dynamics. These various laws only make sense once they are unified within their parent formula, exactly as Kepler’s laws only make sense once they are unified under Newton’s law.

In order to ensure a smooth transition between the two regimes ga0 and ga0, Milgrom’s law is written in the following way:

$$\mu \left({{g \over {{a_0}}}} \right){\bf{g}} = {{\bf{g}}_{\bf{N}}},$$
(7)

where the interpolating function

$$\mu (x) \rightarrow 1\;{\rm{for}}\;x \gg 1\;{\rm{and}}\;\mu (x) \rightarrow x\;{\rm{for}}\;x \ll 1.$$
(8)

Written like this, the analogy between Milgrom’s law and Coulomb’s law in a dielectric medium is clear, as noted in [56]. Indeed, inside a dielectric medium, the amplitude of the electric field E generated by an external point charge Q located at a distance r obeys the following equation:

$$\mu (E)E = {Q \over {4\pi {\epsilon _0}{r^2}}},$$
(9)

where μ is the relative permittivity of the medium, and can depend on E. In the case of a gravitational field generated by a point mass M, it is then clear that Milgrom’s interpolating function plays the role of “gravitational permittivity”. Since it is smaller than 1, it makes the gravitational field stronger than Newtonian (rather than smaller in the case of the electric field in a dielectric medium, where μ > 1). In other words, the gravitational susceptibility coefficient χ (such that μ = 1 + χ) is negative, which is correct for a force law where like masses attract rather than repel [56]. This dielectric analogy has been explicitly used in devising a theory[60] where Milgrom’s law arises from the existence of a “gravitationally polarizable” medium (see Section 7).

Of course, inverting the above relation, Milgrom’s law can also be written as

$${\bf{g}} = \nu \left({{{{g_N}} \over {{a_0}}}} \right){{\bf{g}}_{\bf{N}}},$$
(10)

where

$$\nu (y) \rightarrow 1\;{\rm{for}}\;y \gg 1\;{\rm{and}}\;\nu (y) \rightarrow {y^{- 1/2}}{\rm{for}}\;y \ll 1.$$
(11)

However, as we shall see in Section 6, in order for g to remain a conservative force field, these expressions (Eqs. 7 and 10) cannot be rigorous outside of highly symmetrical situations. Nevertheless, it allows one to make numerous very general predictions for galactic systems, or, in other words, to derive “Kepler-like laws” of galactic dynamics, unified under the banner of Milgrom’s law. As we shall see, many of the observations unpredicted by ΛCDM on galaxy scales naturally ensue from this very simple law. However, even though Milgrom originally devised this as a modification of dynamics, this law is a priori nothing more than an algorithm, which allows one to calculate the distribution of force in an astronomical object from the observed distribution of baryonic matter. Its success would simply mean that the observed gravitational field in galaxies is mimicking a universal force law generated by the baryons alone, meaning that (i) either the force law itself is modified, or that (ii) there exists an intimate connection between the distribution of baryons and dark matter in galaxies.

It was suggested, for instance, [218] that such a relation might arise naturally in the CDM context, if halos possess a one-parameter density profile that leads to a characteristic acceleration profile that is only weakly dependent upon the mass of the halo. Then, with a fixed collapse factor for the baryonic material, the transition from dominance of dark over baryonic occurs at a universal acceleration, which, by numerical coincidence, is on the order of cH0 and thus of a0 (see also [411]). While, still today, it remains to be seen whether this scenario would quantitatively hold in numerical simulations, it was noted by Milgrom [306] that this scenario only explained the role of a0 as a transition radius between baryon and dark matter dominance in HSB galaxies, precluding altogether the existence of LSB galaxies where dark matter dominates everywhere. The real challenge for ΛCDM is rather to explain all the different roles played by a0 in galaxy dynamics, different roles that can all be summarized within the single law proposed by Milgrom, just like Kepler’s laws are unified under Newton’s law. We list these Kepler-like laws of galactic dynamics hereafter, and relate each of them with the unpredicted observations of Section 4, keeping in mind that these were mostly a priori predictions of Milgrom’s law, made before the data were as good as today, not “postdictions” like we are used to in modern cosmology.

5.2 Galactic Kepler-like laws of motion

  1. 1.

    Asymptotic flatness of rotation curves. The rotation curves of galaxies are asymptotically flat, even though this flatness is not always attained at the last observed point (see point hereafter about the shapes of rotation curves as a function of baryonic surface density). What is more, Milgrom’s law can be thought of as including the total acceleration with respect to a preferred frame, which can lead to the prediction of asymptotically-falling rotation curves for a galaxy embedded in a large external gravitational field (see Section 6.3).

  2. 2.

    Ga0 defining the zero-point of the baryonic Tully-Fisher relation. The plateau of a rotation curve is Vf = (GMa0)1/4. The true Tully-Fisher relation is predicted to be a relation between this asymptotic velocity and baryonic mass, not luminosity. Milgrom’s law yields immediately the slope (precisely 4) and zero-point of this baryonic Tully-Fisher law. The observational baryonic Tully-Fisher relation should thus be consistent with zero scatter around this prediction of Milgrom’s law (the dotted line of Figure 3). And indeed it is. All rotationally-supported systems in the weak acceleration limit should fall on this relation, irrespective of their formation mechanism and history, meaning that completely isolated galaxies or tidal dwarf galaxies formed in interaction events all behave as every other galaxy in this respect.

  3. 3.

    Ga0 defining the zero-point of the Faber-Jackson relation. For quasi-isothermal systems [296], such as elliptical galaxies, the bulk velocity dispersion depends only on the total baryonic mass via σ4GMa0. Indeed, since the equation of hydrostatic equilibrium for an isotropic isothermal system in the weak field regime reads d(σ2ρ)/dr = −ρ(GMa0)1/2/r, one has σ4 = α−2 × GMa0 where α = d ln ρ/d ln r. This underlies the Faber-Jackson relation for elliptical galaxies (Figure 7), which is, however, not predicted by Milgrom’s law to be as tight and precise (because it relies, e.g., on isothermality and on the slope of the density distribution) as the BTFR.

  4. 4.

    Mass discrepancy defined by the inverse of the acceleration in units of a0. Or alternatively, defined by the inverse of the square-root of the gravitational acceleration generated by the baryons in units of a0. The mass discrepancy is precisely equal to this in the very-low-acceleration regime, and leads to the baryonic Tully-Fisher relation. In the low-acceleration limit, gN/g = g/a0, so in the CDM language, inside the virial radius of any system whose virial radius is in the weak acceleration regime (well below a0), the baryon fraction is given by the acceleration in units of a0. If we adopt a rough relation \({M_{500}} \simeq 1.5 \times {10^5}{M_ \odot} \times V_c^3{({\rm{km/s)}}^{- 3}}\), we get that the acceleration at R500, and thus the system baryon fraction predicted by Milgrom’s formula, is Mb/M500 = a500/a0 ≃ 4 × 10−4 × Vc (km/s)−1. Divided by the cosmological baryon fraction, this explains the trend for fd = Mb/(0.17 M500) with potential \((\Phi = V_c^2)\) in Figure 2, thereby naturally explaining the halo-by-halo missing baryon challenge in galaxies. No baryons are actually missing; rather, we infer their existence because the natural scaling between mass and circular velocity \({M_{500}} \propto V_c^3\) in ΛCDM differs by a factor of Vc from the observed scaling \({M_b} \propto V_c^4\).

  5. 5.

    a0 as the characteristic acceleration at the effective radius of isothermal spheres. As a corollary to the Faber-Jackson relation for isothermal spheres, let us note that the baryonic isothermal sphere would not require any dark matter up to the point where the internal gravity falls below a0, and would thus resemble a purely baryonic Newtonian isothermal sphere up to that point. But at larger distances, in the presence of the added force due to Milgrom’s law, the baryonic isothermal sphere would fall [296] as r−4, thereby making the radius at which the gravitational acceleration is a0 the effective baryonic radius of the system, thereby explaining why, at this radius R in quasi-isothermal systems, the typical acceleration σ2/R is almost always observed to be on the order of a0. Of course, this is valid for systems where such a transition radius does exist, but going to very-LSB systems, if the internal gravity is everywhere below a0, one can then have typical accelerations as low as one wishes.

  6. 6.

    a0/G as a critical mean surface density for stability. Disks with mean surface density 〈Σ〉 ≤ Σ = a0/G have added stability. Most of the disk is then in the weak-acceleration regime, where accelerations scale as \(a \propto \sqrt M\), instead of aM. Thus, δa/a = (1/2)δM/M instead of δa/a = δM/M, leading to a weaker response to small mass perturbations [299]. This explains the Freeman limit (Figure 8).

  7. 7.

    a0 as a transition acceleration. The mass discrepancy in galaxies always appears (transition from baryon dominance to dark matter dominance) when \(V_c^2/R \sim {a_0}\), yielding a clear mass-discrepancy acceleration relation (Figure 10). This, again, is the case for every single rotationally-supported system irrespective of its formation mechanism and history. For HSB galaxies, where there exist two distinct regions where \(V_c^2/R > {a_0}\) in the inner parts and \(V_c^2/R < {a_0}\) in the outer parts, locally measured mass-to-light ratios should show no indication of hidden mass in the inner parts, but rise beyond the radius where \(V_c^2/R \approx {a_0}\) (Figure 14). Note that this is the only role of a0 that the scenario of [218] was poorly trying to address (forgetting, e.g., about the existence of LSB galaxies).

  8. 8.

    a0/G as a transition central surface density. The acceleration a0 defines the transition from HSB galaxies to LSB galaxies: baryons dominate in the inner parts of galaxies whose central surface density is higher than some critical value on the order of Σ = a0/G, while in galaxies whose central surface density is much smaller (LSB galaxies), DM dominates everywhere, and the magnitude of the mass discrepancy is given by the inverse of the acceleration in units of a0; see (5). Thus, the mass discrepancy appears at smaller radii and is more severe in galaxies of lower baryonic surface densities (Figure 14). The shapes of rotation curves are predicted to depend on surface density: HSB galaxies are predicted to have rotation curves that rise steeply, then become flat, or even fall somewhat to the not-yet-reached asymptotic flat velocity, while LSB galaxies are supposed to have rotation curves that rise slowly to the asymptotic flat velocity. This is precisely what is observed (Figure 15), and is in accordance [162] with the more complex empirical parametrization of observed rotation curves that has been proposed in [376]. Finally, the total (baryons+DM) acceleration is predicted to decline with the mean baryonic surface density of galaxies, exactly as observed (Figure 16), in the form \(a \propto \Sigma _b^{1/2}\) (see also Figure 9).

  9. 9.

    a0/G as the central surface density of dark halos. Provided they are mostly in the Newtonian regime, galaxies are predicted to be embedded in dark halos (whether real or virtual, i.e., “phantom” dark matter) with a central surface density on the order of a0/(2πG) as observedFootnote 19. LSBs should have a halo surface density scaling as the square-root of the baryonic surface density, in a much more compressed range than for the HSB ones, explaining the consistency of observed data with a constant central surface density of dark matter [167, 313].

  10. 10.

    Features in the baryonic distribution imply features in the rotation curve. Because a small variation in gN will be directly translated into a similar one in g, Renzo’s rule (Section 4.3.4) is explained naturally.

Figure 14
figure 14

The mass discrepancy (as in Figure 10) as a function of radius in observed spiral galaxies. The curves for individual galaxies (lines) are color-coded by their characteristic baryonic surface density (as in Figure 5). In order to be completely empirical and fully independent of any assumption such as maximum disk, stellar masses have been estimated with population synthesis models [42]. The amplitude of the mass discrepancy is initially small in high-surface-density galaxies, and grows only slowly at large radii. As the baryonic surface densities of galaxies decline, the mass discrepancy becomes more severe and appears at smaller radii. This trend confirms one of the predictions of Milgrom’s law [294].

Figure 15
figure 15

The shapes of observed rotation curves depend on baryonic surface density (color coding as per Figure 14). High-surface-density galaxies have rotation curves that rise steeply then become flat, or even fall somewhat to the asymptotic flat velocity. Low-surface-density galaxies have rotation curves that rise slowly to the asymptotic flat velocity. This trend confirms one of the predictions of Milgrom’s law [294].

Figure 16
figure 16

Centripetal acceleration as a function of radius and surface density (color coding as per Figure 14). The critical acceleration a0 is denoted by the dotted line. Milgrom’s formula predicts that acceleration should decline with baryonic surface density, as observed. Moreover, high-surface-density galaxies transition from the Newtonian regime at small radii to the weak-field regime at large radii, whereas low-surface-density galaxies fall entirely in the regime of low acceleration a < a0, as anticipated by Milgrom [294].

As a conclusion, all the apparently independent roles that the characteristic acceleration a0 plays in the unpredicted observations of Section 4.3 (see end of Section 4.3.3 for a summary), as well as Renzo’s rule (Section 4.3.4), have been elegantly unified by the single law proposed by Milgrom [293] in 1983 as a unique scaling relation between the gravitational field generated by observed baryons and the total observed gravitational force in galaxies.

6 Milgrom’s Law as a Modification of Classical Dynamics: MOND

Thus, it appears that many puzzling observations, that are difficult to understand in the ΛCDM context (and/or require an extreme fine-tuning of the DM distribution), are well summarized by a single heuristic law. Therefore, it would appear natural that this law derives from a universal force law, and would reflect a modification of dynamics rather than the addition of massive particles interacting (almost) only gravitationally with baryonic matterFootnote 20. However, applying blindly Eq. 7 to a set of massive bodies directly leads to serious problems [150, 293] such as the non-conservation of momentum. In a two-body configuration, as the implied force is not symmetric in the two masses, Newton’s third law (action and reaction principle) does not hold, so the momentum is not conserved. Consider a translationally invariant isolated system of two such masses m1 and m2 small enough to be in the very weak acceleration limit, and placed at rest on the x-axis. The amplitude of the Newtonian force is then Fn = Gm1m2/(x2x1)2, and applying blindly Eq. 7, would lead to individual accelerations \(\vert {{\rm{a}}_i}{\rm{\vert =}}\sqrt {{F_N}{a_0}/{m^i}}\). This then immediately leads to

$$\dot p = \sqrt {{a_0}{F_N}} (\sqrt {{m_1}} - \sqrt {{m_2}}) \neq 0\;{\rm{if}}\;{m_1} \neq {m_2},$$
(12)

meaning that for different masses, the momentum of this isolated system is not conserved. This means that Eq. 7 cannot truly represent a universal force law. If Eq. 7 is to be more than just a heuristic law summarizing how dark matter is arranged in galaxies with respect to baryonic matter, it must then be an approximation (valid only in highly symmetric configurations) of a more general force law deriving from an action and a variational principle. Such theories at the classical level can be classified under the acronym MOND, for Modified Newtonian DynamicsFootnote 21. In this section, we sketch how to devise such theories at the classical level, and list detailed tests of these theories at all astrophysical scales.

6.1 Modified inertia or modified gravity: Non-relativistic actions

If one wants to modify dynamics in order to reproduce Milgrom’s heuristic law while still benefiting from usual conservation laws such as the conservation of momentum, one can start from the action at the classical level. Clearly such theories are only toy-models until they become the weak-field limit of a relativistic theory (see Section 7), but they are useful both as targets for such relativistic theories, and as internally consistent models allowing one to make predictions at the classical level (i.e., neither in the relativistic or quantum regime).

A set of particles of mass moving in a gravitational field generated by the matter density distribution ρ = i miδ (xxi) and described by the Newtonian potential ΦN has the following actionFootnote 22:

$${S_N} = {S_{{\rm{kin}}}} + {S_{{\rm{in}}}} + {S_{{\rm{grav}}}} = \int {{{\rho {{\rm{v}}^2}} \over 2}{d^3}x\,dt -} \int {\rho {\Phi _N}{d^3}x\,dt} - \int {{{|\nabla \Phi {|^2}} \over {8\pi G}}} {d^3}x\,dt.$$
(13)

Varying this action with respect to configuration space coordinates yields the equations of motion d2x/dt2 = −∇ΦN, while varying it with respect to the potential leads to Poisson equation ∇ΦN = 4π. Modifying the first (kinetic) term is generally referred to as “modified inertia” and modifying the last term as “modified gravity”Footnote 23.

6.1.1 Modified inertia

The first possibility, modified inertia, has been investigated by Milgrom [300, 321], who constructed modified kinetic actionsFootnote 24 (the first term Skin in Eq. 13) that are functionals depending on the trajectory of the particle as well as on the acceleration constant a0. By construction, the gravitational potential is then still determined from the Newtonian Poisson equation, but the particle equation of motion becomes, instead of Newton’s second law,

$${\bf{A}}[\{{\bf{x}}(t)\} ,{a_0}] = - \nabla {\Phi _N},$$
(14)

where A is a functional of the whole trajectory {x(t)}, with the dimensions of acceleration. The Newtonian and MOND limits correspond to [a0 → 0, A → d2x/dt2] and \([{a_0} \rightarrow \infty, {\bf{A}}[\{{\rm{x(t)\},}}{a_0}] \rightarrow a_0^{- 1}{\rm{Q(\{x(}}t)\})]\) where Q has dimensions of acceleration squared.

Milgrom [300] investigated theories of this vein and rigorously showed that they always had to be time-nonlocal (see also Section 7.10) to be Galilean invariantFootnote 25. Interestingly, he also showed that quantities such as energy and momentum had to be redefined but were then enjoying conservation laws: this even leads to a generalized virial relation for bound trajectories, and in turn to an important and robust prediction for circular orbits in an axisymmetric potential, shared by all such theories. Eq. 14 becomes for such trajectories:

$$\mu \left({{{V_c^2} \over {R{a_0}}}} \right){{V_c^2} \over R} = - {{\partial {\Phi _N}} \over {\partial R}},$$
(15)

where, Vc and R are the orbital speed and radius, and μ(x) is universal for each theory, and is derived from the expression of the action specialized to circular trajectories. Thus, for circular trajectories, these theories recover exactly the heuristic Milgrom’s law. Interestingly, it is this law, which is used to fit galaxy rotation curves, while in the modified gravity framework of MOND (see hereafter), one should actually calculate the exact predictions of the modified Poisson formulations, which can differ a little bit from Milgrom’s law. However, for orbits other than circular, it becomes very difficult to make predictions in modified inertia, as the time non-locality can make the anomalous acceleration at any location depend on properties of the whole orbit. For instance, if the accelerations are small on some segments of a trajectory, MOND effects can also be felt on segments where the accelerations are high, and conversely [321]. This can give rise to different effects on bound and unbound orbits, as well as on circular and highly elliptic orbits, meaning that “predictions” of modified inertia in pressure-supported systems could differ significantly from those derived from Milgrom’s law per se. Let us finally note that testing modified inertia on Earth would require one to properly define an inertial reference frame, contrary to what has been done in [5, 179] where the laboratory itself was not an inertial frame. Proper set-ups for testing modified inertia on Earth have been described, e.g., in [201, 202]: under the circumstances described in these papers, modified inertia would inevitably predict a departure from Newtonian dynamics, even if the exact departure cannot be predicted at present, except for circular motion.

6.1.2 Bekenstein-Milgrom MOND

The idea of modified gravity is, on the one hand, to preserve the particle equation of motion by preserving the kinetic action, but, on the other hand, to change the gravitational action, and thus modify the Poisson equation. In that case, all the usual conservation laws will be preserved by construction.

A very general way to do so is to write [38]:

$${S_{{\rm{grav}}\;{\rm{BM}}}} \equiv - \int {{{a_{0}^2 F(\vert \nabla \Phi \vert ^{2}/a_0^2)} \over {8\pi G}}} {d^3}x\;dt,$$
(16)

where F can be any dimensionless function. The Lagrangian being non-quadratic in |∇Φ|, this has been dubbed by Bekenstein & Milgrom [38] Aquadratic Lagrangian theory (AQUAL). Varying the action with respect to Φ then leads to a non-linear generalization of the Newtonian Poisson equationFootnote 26:

$$\nabla .\left[ {\mu \left({{{\vert \nabla \Phi \vert} \over {{a_0}}}} \right)\nabla \Phi} \right] = 4\pi G\rho$$
(17)

where μ(x) = F′(z) and z = x2. In order to recover the μ-function behavior of Milgrom’s law (Eq. 7), i.e., μ(x) → 1 for x ≫ 1 and μ(x) → x for x ≪ 1, one needs to choose:

$$F(z) \rightarrow z\;{\rm{for}}\;z \gg 1\;{\rm{and}}\;F(z) \rightarrow {2 \over 3}{z^{3/2}}\;{\rm{for}}\;z \ll 1.$$
(18)

The general solution of the boundary value problem for Eq. 17 leads to the following relation between the acceleration g = −∇Φ and the Newtonian one, gN = −∇Φn

$$\mu \left({{g \over {{a_0}}}} \right){\bf{g}} = {{\bf{g}}_N} + {\bf{S}},$$
(19)

where g = |g|, and S is a solenoidal vector field with no net flow across any closed surface (i.e., a curl field S = ∇ × A such that ∇.S = 0). This, it is equivalent to Milgrom’s law (Eq. 7) up to a curl field correction, and is precisely equal to Milgrom’s law in highly symmetric one-dimensional systems, such as spherically-symmetric systems or flattened systems for which the isopotentials are locally spherically symmetric. For instance, the Kuzmin disk [52] is an example of a flattened axisymmetric configuration for which Milgrom’s law is precisely valid, as its Newtonian potential \({\Phi _N} = - GM/\sqrt {{R^2} + {{(b + \vert z\vert)}^2}}\) is equivalent on both sides of the disk to that of a point mass above or below the disk respectively.

In vacuum and at very large distances from a body of mass M, the isopotentials always tend to become spherical and the curl field tends to zero, while the gravitational acceleration falls well below a0 (a regime known as the “deep-MOND” regime), so that:

$$\Phi (r) \sim \sqrt {GM{a_0}} \ln (r).$$
(20)

An important point, demonstrated by Bekenstein & Milgrom [38], is that a system with a low center-of-mass acceleration, with respect to a larger (more massive) system, sees the motion of its constituents combine to give a MOND motion for the center-of-mass even if it is made up of constituents whose internal accelerations are above a0 (for instance a compact globular cluster moving in an outer galaxy). The center-of-mass acceleration is independent of the internal structure of the system (if the mass of the system is small), namely the Weak Equivalence Principle is satisfied.

In a modified gravity theory, any time-independent system must still satisfy the virial theorem:

$$2K + W = 0.$$
(21)

where K = Mv2〉/2 is the total kinetic energy of the system, M = Σi mi being the total mass of the system, 〈v2〉 the second moment of the velocity distribution, and \(W = - \int {\rho {\rm{x}}{.}\nabla \Phi {d^3}x}\) is the “virial”, proportional to the total potential energy. Milgrom [301, 302] showed that, in Bekenstein-Milgrom MOND, the virial is given by:

$$W = - {2 \over 3}\sqrt {G{M^3}{a_0}} - {1 \over {4\pi G}}\int {\left[ {{3 \over 2}a_0^2F(\vert \nabla \Phi \vert ^{2}/a_0^2) - \mu (\vert \nabla \Phi \vert /{a_0})\vert \nabla \Phi \vert ^{2}} \right]} \;{d^3}x.$$
(22)

for a system entirely in the extremely weak field limit (the “deep-MOND” limit x = g/a0 ≪ 1) where μ(x) = x and F(z) = (2/3)z3/2, the second term vanishes and we get \(W = (- 2/3)\sqrt {G{M^3}{a_0}}\)(see [301] for the specific conditions for this to be valid). In this case, we can get an analytic expression for the two-body force under the approximation that the two bodies are very far apart compared to their internal sizes [301, 509, 511]. Since the kinetic energy K = Korb + Kint can be separated into the orbital energy \({K_{{\rm{orb}}}} = {m_1}{m_2}\upsilon _{{\rm{rel}}}^2/(2M)\) and the internal energy of the bodies \({K_{{\mathop{\rm int}}}} = \sum (1/3)\sqrt {Gm_i^3{a_0}}\), we get from the scalar virial theorem of a stationary system:

$${{{m_1}{m_2}v_{{\rm{rel}}}^2} \over M} = {2 \over 3}\left[ {\sqrt {G{M^3}{a_0}} - \sum\limits_i {\sqrt {Gm_i^3{a_0}}}} \right].$$
(23)

We can then assume an approximately circular velocity such that the two-body force (satisfying the action and reaction principle) can be written analytically in the deep-MOND limit as:

$${F_{2{\rm{body}}}} = {{{m_1}{m_2}} \over {{m_1} + {m_2}}}{{v_{{\rm{rel}}}^2} \over r} = {2 \over 3}\left[ {{{({m_1} + {m_2})}^{3/2}} - m_1^{3/2} - m_2^{3/2}} \right]{{\sqrt {G{a_0}}} \over r}.$$
(24)

The latter equation is not valid for N-body configurations, for which the Bekenstein-Milgrom (BM) modified Poisson equation (Eq. 17) must be solved numerically (apart from highly-symmetric N-body configurations). This equation is a non-linear elliptic partial differential equation. It can be solved numerically using various methods [50, 77, 96, 147, 250, 457]. One of them [77, 457] is to use a multigrid algorithm to solve the discrete form of Eq. 17 (see also Figure 17):

$$\begin{array}{*{20}c}{4\pi G{\rho _{i,j,k}} =}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\\{\left[({\Phi _{i + 1,j,k}} - {\Phi _{i,j,k}}){\mu _{{M_1}}}\right. - ({\Phi _{i,j,k}} - {\Phi _{i - 1,j,k}}){\mu _{{L_1}}}}\quad\quad\\{+ ({\Phi _{i,j + 1,k}} - {\Phi _{i,j,k}}){\mu _{{M_2}}} - ({\Phi _{i,j,k}} - {\Phi _{i,j - 1,k}}){\mu _{{L_2}}}}\quad\quad\\{+ ({\Phi _{i,j,k + 1}} - {\Phi _{i,j,k}}){\mu _{{M_3}}} - \left.({\Phi _{i,j,k}} - {\Phi _{i,j,k - 1}}){\mu _{{L_3}}}\right]/{h^2}}\\\end{array}$$
(25)

where

  • ρi,j,k is the density discretized on a grid of step h,

  • Φi,j,k is the MOND potential discretized on the same grid of step h,

  • μM1, and μL1, are the values of μ(x) at points M1 and L1 corresponding to (i + 1/2, j, k) and (i − 1/2, j, k) respectively (Figure 17).

The gradient component (/∂x,/∂y,/∂z), in μ(x), is approximated in the case of μMl by \(([\Phi (B) - \Phi (A)]/h,[\Phi (I) + \Phi (H) - \Phi (K) - \Phi (J)]/(4h),[\Phi (C) + \Phi (D) - \Phi (E) - \Phi (F)]/(4h))\) (see Figure 17).

Figure 17
figure 17

Discretisation scheme of the BM modified Poisson equation (Eq. 17) and of the phantom dark matter derivation in QUMOND. The node (i,j,k) corresponds to A on the upper panel. The gradient components in μ(x) (for Eq. 25) and v(y) (for Eq. 35) are estimated at the Li and Mi points. Image courtesy of Tiret, reproduced by permission from [457], copyright by ESO.

In [457] the Gauss-Seidel relaxation with red and black ordering is used to solve this discretized equation, with the boundary condition for the Dirichlet problem given by Eq. 20 at large radii. It is obvious that subsequently devising an evolving N-body code for this theory can only be done using particle-mesh techniques rather than the gridless multipole expansion treecode schemes widely used in standard gravity.

Finally, let us note that it could be imagined that MOND, given some of its observational problems (developed in Section 6.6), is incomplete and needs a new scale in addition to a0. There are several ways to implement such an idea, but for instance, Bekenstein [36] proposed in this vein a generalization of the AQUAL formalism by adding a velocity scale s0, in order to allow for effective variations of the acceleration constant as a function of the deepness of the potential, namely:

$${S_{{\rm{grav}}\;{\rm{Bek}}}} \equiv - {1 \over {8\pi G}}\int {a_0^2{e^{- 2\Phi /s_0^2}}F(\vert \nabla \Phi \vert ^{2}{e^{2\Phi /s_0^2}}/a_0^2){d^3}x\;dt,}$$
(26)

leading to

$$\nabla .\left[ {\mu \left({{{\vert \nabla \Phi \vert} \over {{a_{0{\rm{eff}}}}}}} \right)\nabla \Phi} \right] - {{\vert \nabla \Phi \vert ^{2}} \over {s_0^2}}\mu \left({{{\vert \nabla \Phi \vert} \over {{a_{0{\rm{eff}}}}}}} \right) + {{a_{0{\rm{eff}}}^2} \over {s_0^2}}F\left({{{\vert \nabla \Phi \vert ^{2}} \over {a_{0{\rm{eff}}}^2}}} \right) = 4\pi G\rho ,$$
(27)

where \({a_{0{\rm{eff}}}} = {a_0}{e^{- \Phi/{\mathcal S}_0^2}}\). Interestingly, with this “modified MOND”, Gauss’ theorem (or Newton’s second theorem) would no longer be valid in spherical symmetry. A suitable choice of s0 (e.g., on the order of 103 km/s; see [36]) could affect the dynamics of galaxy clusters (by boosting the modification with an effectively higher value of a0) compared to the previous MOND equation, while keeping the less massive systems such as galaxies typically unaffected compared to usual MOND, while other (lower) values of s0 could allow (modulo a renormalization of a0) for a stronger modification in galaxy clusters as well as milder modification in subgalactic systems such as globular clusters, which, as we shall soon see could be interesting from a phenomenological point of view (see Section 6.6). However, the possibility of too strong a modification should be carefully investigated, as well as, in a relativistic (see Section 7) version of the theory, the consequences on the dynamics of a scalar-field with a similar action.

6.1.3 QUMOND

Another way [319] of modifying gravity in order to reproduce Milgrom’s law is to still keep the “matter action” unchanged Skin + Sin = ∫ ρ(v2/2 − Φ)d3x dt, thus ensuring that varying the action of a test particle with respect to the particle degrees of freedom leads to d2x/dt2 = −∇Φ, but to invoke an auxiliary acceleration field gN = −∇ΦN in the gravitational action instead of invoking an aquadratic Lagrangian in |∇Φ|. The addition of such an auxiliary field can of course be done without modifying Newtonian gravity, by writing the Newtonian gravitational action in the following wayFootnote 27:

$${S_{{\rm{grav}}\;{\rm{N}}}} = - {1 \over {8\pi G}}\int {(2\nabla \Phi {.}{{\bf{g}}_N} - {\bf{g}}_N^2){d^3}x\;dt.}$$
(28)

It gives, after variation over gN (or over ΦN): gN = −∇Φ. And after variation of the full action over Φ: −∇.gN = 4π, i.e., Newtonian gravity. One can then introduce a MONDian modification of gravity by modifying this action in the following way, replacing \({\rm{g}}_N^2\) by a non-linear function of it and assuming that it derives from an auxiliary potential gN = −∇ΦN, so that the new degree of freedom is this new potential:

$${S_{{\rm{grav}}\;{\rm{QUMOND}}}} \equiv - {1 \over {8\pi G}}\int {[2\nabla \Phi .\nabla {\Phi _N} - a_0^2Q(\vert \nabla {\Phi _N}\vert ^{2}/a_0^2)]{d^3}x\;dt.}$$
(29)

Varying the total action with respect to Φ yields: ∇2ΦN = 4π. And varying it with respect to the auxiliary (Newtonian) potential ΦN yields:

$${\nabla ^2}\Phi = \nabla .\left[ {\nu \left({{{\vert \nabla {\Phi _N}\vert} \over {{a_0}}}} \right)\nabla {\Phi _N}} \right]$$
(30)

where v(y) = Q′(z) and z = y2. Thus, the theory requires one only to solve the Newtonian linear Poisson equation twice, with only one non-linear step in calculating the rhs term of Eq. 30. For this reason, it is called the quasi-linear formulation of MOND (QUMOND). In order to recover the v-function behavior of Milgrom’s law (Eq. 10), i.e., v(y) → 1 for y ≫ 1 and v(y) → y−1/2 for y ≪ 1, one needs to choose:

$$Q(z) \rightarrow z\;{\rm{for}}\;z \gg 1\;{\rm{and}}\;Q(z) \rightarrow {4 \over 3}{z^{3/4}}{\rm{for}}\;z \ll 1.$$
(31)

The general solution of the system of partial differential equations is equivalent to Milgrom’s law (Eq. 10) up to a curl field correction, and is precisely equal to Milgrom’s law in highly-symmetric one-dimensional systems. However, this curl-field correction is different from the one of AQUAL. This means that, outside of high symmetry, AQUAL and QUMOND cannot be precisely equivalent. An illustration of this is given in [509]: for a system with all its mass in an elliptical shell (in the sense of a squashed homogeneous spherical shell), the effective density of matter that would source the MOND force field in Newtonian gravity is uniformly zero in the void inside the shell for QUMOND, but nonzero for AQUAL.

The concept of the effective density of matter that would source the MOND force field in Newtonian gravity is extremely useful for an intuitive comprehension of the MOND effect, and/or for interpreting MOND in the dark matter language: indeed, subtracting from this effective density the baryonic density yields what is called the “phantom dark matter” distribution. In AQUAL, it requires deriving the Newtonian Poisson equation after having solved for the MOND one. On the other hand, in QUMOND, knowing the Newtonian potential yields direct access to the phantom dark matter distribution even before knowing the MOND potential. After choosing a v-function, one defines

$$\tilde \nu (y) = \nu (y) - 1,$$
(32)

and one has, for the phantom dark matter density,

$${\rho _{{\rm{ph}}}} = {{\nabla {.}(\tilde \nu \nabla {\Phi _N})} \over {4\pi G}}.$$
(33)

This \({\tilde \nu}\)-function appears naturally in an alternative formulation of QUMOND where one writes the action as a function of an auxiliary potential Φph:

$${S_{{\rm{grav}}\;{\rm{QUMOND}}}} = - {1 \over {8\pi G}}\int {[\vert \nabla \Phi \vert ^{2} - \vert \nabla {\Phi _{{\rm{ph}}}}\vert ^{2} - a_0^2H(\vert \nabla \Phi - \nabla {\Phi _{{\rm{ph}}}}\vert ^{2}/a_0^2)]{d^3}x\;dt,}$$
(34)

leading to a potential Φph obeying a QUMOND equation with \(\tilde \nu (y) = {H{\prime}}({y^2})\) and Φ = ΦN + Φph.

Numerically, for a given Newtonian potential discretized on a grid of step h, the discretized phantom dark matter density is given on grid points (i,j,k) by (see Figure 17 and cf. Eq. 25, see also [11]):

$$\begin{array}{*{20}c}{{\rho _{{\rm{ph}}\;(i,j,k)}} =}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\\{\left[({\Phi _{N(i + 1,j,k)}} - {\Phi _{N(i,j,k)}}){{\tilde \nu}_{{M_1}}}\right. - ({\Phi _{N(i,j,k)}} - {\Phi _{N(i - 1,j,k)}}){{\tilde \nu}_{{L_1}}}}\quad\quad\quad\quad\\{+ ({\Phi _{N(i,j + 1,k)}} - {\Phi _{N(i,j,k)}}){{\tilde \nu}_{{M_2}}} - ({\Phi _{N(i,j,k)}} - {\Phi _{N(i,j - 1,k)}}){{\tilde \nu}_{{L_2}}}}\quad\quad\quad\quad\\{+ ({\Phi _{N(i,j,k + 1)}} - {\Phi _{N(i,j,k)}}){{\tilde \nu}_{{M_3}}} -\left.({\Phi _{N(i,j,k)}} - {\Phi _{N(i,j,k - 1)}}){{\tilde \nu}_{{L_3}}}\right]/(4\pi G{h^2}){.}}\\\end{array}$$
(35)

This means that any N-body technique (e.g., treecodes or fast multipole methods) can be adapted to QUMOND (a grid being necessary as an intermediate step). Once the Newtonian potential (or force) is locally known, the phantom dark matter density can be computed and then represented by weighted particles, whose gravitational attraction can then be computed in any traditional manner. An example is given in Figure 18, where one considers a rather typical baryonic galaxy model with a small bulge and a large disk. Applying Eq. 35 (with the v-function of Eq. 43) then yields the phantom density [253]. Interestingly, this phantom density is composed of a round “dark halo” and a flattish “dark disk” (see [305] for an extensive discussion of how such a dark disk component comes about; see also [50] and Section 6.5.2 for observational considerations). Let us note that this phantom dark matter density can be slightly separated from the baryonic density distribution in non-spherical situations [226], and that it can be negative [297, 490], contrary to normal dark matter. Finding the signature of such a local negative dark matter density could be a way of exhibiting a clear signature of MOND.

Figure 18
figure 18

(a) Baryonic density of a model galaxy made of a small Plummer bulge with a mass of 2 × 108 M⊙ and Plummer radius of 185 pc, and of a Miyamoto-Nagai disk of 1.1 × 1010 M⊙, a scale-length of 750 pc and a scale-height of 300 pc. (b) The derived phantom dark matter density distribution: it is composed of a spheroidal component similar to a dark matter halo, and of a thin disk-like component (Figure made by Fabian Lüghausen [253])

Finally, let us note that, as shown in [319, 509], (i) a system made of high-acceleration constituents, but with a low-acceleration center-of-mass, moves according to a low-acceleration MOND law, while (ii) the virial of a system is given by

$$W = - {2 \over 3}\sqrt {G{M^3}{a_0}} - {1 \over {4\pi G}}\int {\left[ {- {3 \over 2}a_0^2Q(\vert \nabla {\Phi _N}\vert ^{2}/a_0^2) + 2\nu (\vert \nabla {\Phi _N}\vert /{a_0})\vert \nabla {\Phi _N}\vert ^{2}} \right]} {d^3}x,$$
(36)

meaning that for a system entirely in the extremely weak field limit where v(y) = y−1/2 and Q(z) = (4/3)z3/4, the second term vanishes and we get \(W = (- 2/3)\sqrt {G{M^3}{a_0}}\) precisely like in Bekenstein-Milgrom MOND. This means that, although the curl-field correction is in general different in AQUAL and QUMOND, the two-body force in the deep-MOND limit is the same [509].

6.2 The interpolating function

The basis of the MOND paradigm is to reproduce Milgrom’s law, Eq. 7, in highly symmetrical systems, with an interpolating function asymptotically obeying the conditions of Eq. 8, i.e., μ(x) → 1 for x ≫ 1 and μ(x) → x for x ≪ 1. Obviously, in order for the relation between g and gN to be univocally determined, another constraint is that (x) must be a monotonically increasing function of x, or equivalently

$$\mu (x) + x{\mu {\prime}}(x) > 0,$$
(37)

or equivalently

$${{d\ln \mu} \over {d\ln x}} > - 1.$$
(38)

Even though this leaves some freedom for the exact shape of the interpolating function, leading to the various families of functions hereafter, let us insist that it is already extremely surprising, from the dark matter point of view, that the MOND prescriptions for the asymptotic behavior of the interpolating function did predict all the aspects of the dynamics of galaxies listed in Section 5.

As we have seen in Section 6.1, an alternative formulation of the MOND paradigm relies on Eq. 10, based on an interpolating function

$$\nu (y) = 1/\mu (x)\;{\rm{where}}\;y = x\mu (x).$$
(39)

In that case, we also have that yv (y) must be a monotonically increasing function of y.

Finally, as we shall see in detail in Section 7, many MOND relativistic theories boil down to multifield theories where the weak-field limit can be represented by a potential Φ = Σi ϕi, where each ϕi obeys a generalized Poisson equation, the most common case being

$$\Phi = {\Phi _N} + \phi ,$$
(40)

where Φn obeys the Newtonian Poisson equation and the scalar field ϕ (with dimensions of a potential) plays the role of the phantom dark matter potential and obeys an equation of either the type of Eq. 17 or of Eq. 30. When it obeys a QUMOND type of equation (Eq. 30), the v-function must be replaced by the \({\tilde \nu}\)-function of Eq. 32. When it obeys a BM-like equation (Eq. 17), the classical interpolating function μ(x) acting on x = |∇Φ|/a0 must be replaced by another interpolating function \(\tilde \mu ({\mathcal S})\) acting on = |∇ |/a0, in order for the total potential Φ to conform to Milgrom’s lawFootnote 28. In the absence of a renormalization of the gravitational constant, the two functions are related through [145]

$$\tilde \mu (s) = (x - s){s^{- 1}}{\rm{where}}\;s = x[1 - \mu (x)].$$
(41)

for x ≪ 1 (the deep-MOND regime), one has s = x(1 − x) ≪ 1 and xs(1 + s), yielding \(\tilde \mu (s) \sim s\), i.e., although it is generally different, \({\tilde \mu}\) has the same low-gravity asymptotic behavior as μ.

In spherical symmetry, all these different formulations can be made equivalent by choosing equivalent interpolating functions, but the theories will typically differ slightly outside of spherical symmetry (i.e., the curl field will be slightly different). As an example, let us consider a widely-used interpolating function [141, 166, 402, 508] yielding excellent fits in the intermediate to weak gravity regime of galaxies (but not in the strong gravity regime of the Solar system), known as the “simple” μ-function (see Figure 19):

$$\mu (x) = {x \over {1 + x}}.$$
(42)

This yields y = x2/(1 + x), and thus \(x = (y + \sqrt {{y^2} + 4y})/2\) and v = (1 + x)/x yields the “simple” v-function:

$$\nu (y) = {{1 + {{(1 + 4{y^{- 1}})}^{1/2}}} \over 2}.$$
(43)

It also yields s = x[1 − μ(x)] = x/(1 + x) = μ, and hence x = s/(1 − s), yielding for the “simple” \({\tilde \mu}\)-function:

$$\tilde \mu (s) = {s \over {1 - s}}.$$
(44)

A more general family of \({\tilde \mu}\)-functions is known as the α-family [15], valid for 0 ≤ α ≤ 1 and including the simple function of the α = 1 case Footnote 29:

$${\tilde \mu _\alpha}(s) = {s \over {1 - \alpha s}}$$
(45)

corresponding to the following family of μ-functions:

$${\mu _\alpha}(x) = {{2x} \over {1 + (2 - \alpha)x + {{[{{(1 - \alpha x)}^2} + 4x]}^{1/2}}}}.$$
(46)

The α = 0 case is sometimes referred to as “Bekenstein’s μ-function” (see Figure 19) as it was used in [33]. The problem here is that all these μ-functions approach 1 quite slowly, with ζ ≤ 1 in their asymptotic expansion for x → ∞, μ(x) ∼ 1 − Ax−ζ. Indeed, since s = x[1 − μ(x)], its asymptotic behavior is sAx−ζ+1. So, if ζ > 1, s → 0 for x → ∞ as well as for x → 0, which would imply that \(x(s) = s \tilde \mu (s) + s\) would be a multivalued function, and that the gravity would be ill-defined. This is problematic because even for the extreme case ζ = 1, the anomalous acceleration does not go to zero in the strong gravity regime: there is still a constant anomalous “Pioneer-like” acceleration x[1 − μ(x)] → A, which is observationally excludedFootnote 30 from very accurate planetary ephemerides [154]. What is more, these \({\tilde \mu}\)-functions, defined only in the domain 0 < s < α−1, would need very-carefully-chosen boundary conditions to avoid covering values of outside of the allowed domain when solving for the Poisson equation for the scalar field.

Figure 19
figure 19

Various μ-functions. Dotted green line: the α = 0 “Bekenstein” function of Eq. 46. Dashed red line: the α = n = 1 “simple” function of Eq. 46 and Eq. 49. Dot-dashed black line: the n = 2 “standard” function of Eq. 49. Solid blue line: the γ = δ = 1 μ-function corresponding to the v-function defined in Eq. 52 and Eq. 53. The latter function closely retains the virtues of the n = 1 simple function in galaxies (x <∼ 10), but approaches 1 much more quickly and connects with the n = 2 standard function as x ≫ 10.

The way out to design \({\tilde \mu}\)-functions corresponding to acceptable μ-functions in the strong gravity regime is to proceed to a renormalization of the gravitational constant[145]: this means that the bare value of in the Poisson and generalized Poisson equations ruling the bare Newtonian potential ϕN and the scalar field ϕ in Eq. 40 is different from the gravitational constant measured on Earth, GN (related to the true Newtonian potential ΦN). One can assume that the bare gravitational constant G is related to the measured one through

$${G_N} = \xi G,$$
(47)

meaning that x = y + s where \(x = \nabla \Phi/{a_0},y = \nabla {\phi _N}/{a_0} = \nabla {\Phi _N}(\xi {a_0})\), and \({s}\tilde \mu ({s}) = y\) We then have for Milgrom’s law:

$$x\mu (x) = \xi (x - s) = \xi s\tilde \mu (s).$$
(48)

In order to recover μ(x) → 1 for x → ∞, it is straightforward to show [145] that it suffices that \(\tilde \mu ({s}) \rightarrow {{\tilde \mu}_0}\) for s → ∞, and that \(\xi = 1 + \tilde \mu _0^{- 1}\). Then, if ζ > 1 in the asymptotic expansion μ(x) ∼ 1 − x−ζ, one has \({s} \sim {(1 + \tilde \mu _0^{- 1})^{- 1}}{x^{- \zeta + 1}} + {(1 + {{\tilde \mu}_0})^{- 1}}x\). This second linear term allows s to go to infinity for large x and thus x(s) to be single-valued. On the other hand, for the deep-MOND regime, the renormalization of G implies that \(\tilde \mu ({s}) \rightarrow {s}/\xi\) for \({s} \ll 1\).

We can then use, even in multifield theories, μ-functions quickly asymptoting to 1. For each of these functions, there is a one-parameter family of corresponding \({\tilde \mu}\)-functions (labelled by the parameter \(\tilde \mu (\infty) = {{\tilde \mu}_0})\), obtained by inserting μ(x) into \({s} = x[1 - {\xi ^{- 1}}\mu (x)]\) and making sure that the function is increasing and thus invertible. A useful family of such μ-functions asymptoting more quickly towards 1 than the α-family is the n-family:

$${\mu _n}(x) = {x \over {{{(1 + {x^n})}^{1/n}}}}.$$
(49)

The case n= 1 is again the simple-function, while the case n= 2 has been extensively used in rotation curve analysis from the very first analyses [28, 223], to this day [401], and is thus known as the “standard” μ-function (see Figure 19). The corresponding \({\tilde \mu}\)-function for n ≥ 2 has a very peculiar shape of the type shown in Figure 3 of [81] (which might be considered a fine-tuned shape, necessary to account for solar system constraints). On the other hand, the corresponding v-function family is:

$${\nu _n}(y) = {\left[ {{{1 + {{(1 + 4{y^{- n}})}^{1/2}}} \over 2}} \right]^{1/n}}.$$
(50)

As the simple μ-function (α = 1 or n =1) fits galaxy rotation curves well (see Section 6.5.1) but is excluded in the solar system (see Section 6.4), it can be useful to define μ-functions that have a gradual transition similar to the simple function in the low to intermediate gravity regime of galaxies, but a more rapid transition towards one than the simple function. Two such families are described in [325] in terms of their v-function:

$${\nu _\beta}(y) = {(1 - {e^{- y}})^{- 1/2}} + \beta {e^{- y}}$$
(51)

and

$${\nu _\gamma}(y) = {(1 - {e^{- {y^{\gamma /2}}}})^{- 1/\gamma}} + (1 - {\gamma ^{- 1}}){e^{- {y^{\gamma /2}}}}.$$
(52)

Finally, yet another family was suggested in [274], obtained by deleting the second term of the γ-family, and retaining the virtues of the n-family in galaxies, but approaching one more quickly in the solar system:

$${\nu _\delta}(y) = {(1 - {e^{- {y^{\delta /2}}}})^{- 1/\delta}}.$$
(53)

To be complete, it should be noted that other μ-functions considered in the literature include [304, 505] (see also Section 7.10):

$$\mu (x) = {{{{(1 + 4{x^2})}^{1/2}} - 1} \over {2x}},$$
(54)

and

$$\mu (x) = 1 - {(1 + x/3)^{- 3}}.$$
(55)

This simply shows the variety of shapes that the interpolating function of MOND can in principle takeFootnote 31. Very precise data for rotation curves, including negligible errors on the distance and on the stellar mass-to-light ratios (or, in that case, purely gaseous galaxies) should allow one to pin down its precise form, at least in the intermediate gravity regime and for “modified inertia” theories (Section 6.1.1) where Milgrom’s law is exact for circular orbits. Nowadays, galaxy data still allow some, but not much, wiggle room: they tend to favor the α = n = 1 simple function [166] or some interpolation between n = 1 and n = 2 [141], while combined data of galaxies and the solar system (see Sections 6.4 and 6.5) rather tend to favor something like the γ = δ = 1 function of Eq. 52 and Eq. 53 (which effectively interpolates between n =1 and n = 2, see Figure 19), although slightly higher exponents (i.e., γ > 1 or δ > 1) might still be needed in the weak gravity regime in order to pass solar system tests involving the external field from the galaxy [62]. Again, it should be stressed that the most salient aspect of MOND is not its precise interpolating function, but rather its successful predictions on galactic scaling relations and Kepler-like laws of galactic dynamics (Section 5.2), as well as its various beneficial effects on, e.g., disk stability (see Section 6.5), all predicted from its asymptotic form. The very concept of a pre-defined interpolating function should even in principle fully disappear once a more profound parent theory of MOND is discovered (see, e.g., [22]).

To end this section on the interpolating function, let us stress that if the μ-function asymptotes as μ(x) = x for x → 0, then the energy of the gravitational field surrounding a massive body is infinite [38]. What is more, if the \({\tilde \mu}\) function of relativistic multifield theories asymptotes in the same way to zero before going to negative values for time-evolution dominated systems (see Section 9.1), then a singular surface exists around each galaxy, on which the scalar degree of freedom does not propagate, and can therefore not provide a consistent picture of collapsed matter embedded into a cosmological background. A simple solution [145, 380] consists in assuming a modified asymptotic behavior of the μ-function, namely of the form

$$\mu (x) \sim {\varepsilon _0} + x\;{\rm{for}}\;x \ll 1.$$
(56)

In that case there is a return to a Newtonian behavior (but with a very strong renormalized gravitational constant GN/ε0) at a very low acceleration scale x ≪ ε0, and rotation curves of galaxies are only approximately flat until the galactocentric radius

$$R \sim {1 \over {{\varepsilon _0}}}\sqrt {{{{G_N}M} \over {{a_0}}}} .$$
(57)

Thus, one must have ε0 ≪ 1 to not affect the observed phenomenology in galaxies. Note that the μ-function will never go to zero, even at the center of a system. Conversely, in QUMOND and the like, one can modify the v-function in the same way:

$$\nu (y) \sim {1 \over {{\varepsilon _0} + {y^{1/2}}}}{\rm{for}}\;y \ll 1.$$
(58)

6.3 The external field effect

The above return to a rescaled Newtonian behavior at very large radii and in the central parts of isolated systems, in order to avoid theoretical problems with the interpolating function, would happen anyway, even with the interpolating function going to zero, for any non-isolated system in the universe (and this return to Newtonian behavior could actually happen at much lower radii) because of a very peculiar aspect of MOND: the external field effect, which appeared in its full significance already in the pristine formulation of MOND [293].

In practice, no objects are truly isolated in the Universe and this has wider and more subtle implications in MOND than in Newton-Einstein gravity. In the linear Newtonian dynamics, the internal dynamics of a subsystem (a cluster in a galaxy, or a galaxy in a galaxy cluster for instance) in the field of its mother system decouples. Namely, the internal dynamics is always the same independent of any external field (constant across the subsystem) in which the system is embedded (of course, if the external field varies across the subsystem, it manifests itself as tides). This has subsequently been built in as a fundamental principle of GR: the Strong Equivalence Principle (see Section 7). But MOND has to break this fundamental principle of GR. This is because, as it is an acceleration-based theory, what counts is the total gravitational acceleration with respect to a pre-defined frame (e.g., the CMB frameFootnote 32). Thus, the MOND effects are only observed in systems where the absolute value of the gravity both internal, g, and external, ge (from a host galaxy, or astrophysical system, or large scale structure), is less than a0. If ge < g < a0 then we have standard MOND effects. However, if the hierarchy goes as g < a0 < ge, then the system is purely NewtonianFootnote 33, and if g < ge < a0 then the system is Newtonian with a renormalized gravitational constant. Ultimately, whenever g falls below ge (which always happens at some point) the gravitational attraction falls again as 1/r2. This is most easily illustrated in a thought experiment where one considers MOND effects in one dimension. In Eq. 17, one has ∇Φ = g + ge and 4πGρ = ∇.(gN + gNe), which in one dimension leads to the following revised Milgrom’s law (Eq. 7) including the external field:

$$g\mu \left({{{g + {g_e}} \over {{a_0}}}} \right) + {g_e}\left[ {\mu \left({{{g + ge} \over {{a_0}}}} \right) - \mu \left({{{{g_e}} \over {{a_0}}}} \right)} \right] = {g_N},$$
(59)

such that, when g → 0, we have Newtonian gravity with a renormalized gravitational constant GnormG/[μe(1 + Le)] where μe = μ(ge/a0) and Le = (d ln μ/d ln x)x=ge/a0, assuming, as before, that the external field only varies on a much larger scale than the internal system. Similarly, for QUMOND (Eq. 30) in one dimension, one gets the equivalent of Eq. 10:

$$g = {g_N}\nu \left({{{{g_N} + {g_{Ne}}} \over {{a_0}}}} \right) + {g_{Ne}}\left[ {\nu \left({{{{g_N} + {g_{Ne}}} \over {{a_0}}}} \right) - \nu \left({{{{g_{Ne}}} \over {{a_0}}}} \right)} \right].$$
(60)

When dealing in the future with very extended rotation curves whose last observed point is in the extreme weak-field limit, it could be interesting, as a first-order approximation, to use the latter formulaeFootnote 34, adding the external field as an additional parameter of the MOND fit to the external parts of the rotation curve. Of course, this would only be a first-order approximation because it would neglect the three-dimensional nature of the problem and the direction of the external field.

Now, in three dimensions, the problem can be analytically solved only in the extreme case of the completely-external-field-dominated part of the system (where gge) by considering the perturbation generated by a body of low mass m inside a uniform external field, assumed along the b-direction, ge = ge 1z. Eq. 17 can then be linearized and solved with the boundary condition that the total field equals the external one at infinity [38] to yield:

$$\Phi (x,y,z) = - {{Gm} \over {{\mu _e}\tilde r}},$$
(61)

with

$$\tilde r = r{(1 + {L_e}({x^2} + {y^2})/{r^2})^{1/2}},$$
(62)

squashing the isopotentials along the external field direction. Thus, this is the asymptotic behavior of the gravitational field in any system embedded in a constant external field. Similarly, in QUMOND (Eq. 30), one gets

$$\Phi (x,y,z) = - {{Gm{\nu _e}} \over {\tilde r}},$$
(63)

with

$$\tilde r = r/[1 + ({L_{Ne}}/2)({x^2} + {y^2})/{r^2}],$$
(64)

where LNe = (d ln ν/d ln y)y=gNea0

For the exact behavior of the MOND gravitational field in the regime where g and ge are of the same order of magnitude, one again resorts to a numerical solver, both for the BM equation case and for the QUMOND case (see Eq. 25 and Eq. 35). For the BM case, one adds the three components of the external field (no longer assumed to be in the z-direction only) in the argument of μM1 which becomes {[(Φ(B) − Φ(A))/hgex]2 + [(Φ(I) + Φ(H) − Φ(K) − Φ(J))/(4h) − gey]2 + [(Φ(C) + Φ(D) − Φ(E) − Φ(F))/(4h) − gez]2}1/2, and similarly for the other Mi and Li points on the grid (Figure 17). One also adds the respective component of the external field to the term estimating the force at the Mi and Li points in Eq. 25. With M1, for instance, one changes (Φi+1,j,k − Φi,j,k) → (Φi+1,j,k − Φi,j,khgex) in the first term of Eq. 25. One then solves this discretized equation with the large radius boundary condition for the Dirichlet problem given by Eq. 61 instead of Eq. 20. Exactly the same is applicable to calculating the phantom dark matter component of QUMOND with Eq. 35, except that now the Newtonian external field is added to the terms of the equation in exactly the same way.

This external field effect (EFE) is a remarkable property of MONDian theories, and because this breaks the strong equivalence principle, it allows us to derive properties of the gravitational field in which a system is embedded from its internal dynamics (and not only from tides). For instance, the return to a Newtonian (Eq. 61 or Eq. 63) instead of a logarithmic (Eq. 20) potential at large radii is what defines the escape speed in MOND. By observationally estimating the escape speed from a system (e.g., the Milky Way escape speed from our local neighborhood; see discussion in Section 6.5.2), one can estimate the amplitude of the external field in which the system is embedded, and by measuring the shape of its isopotential contours at large radii, one can determine the direction of that external field, without resorting to tidal effects. It is also noticeable that the phantom dark matter has a tendency to become negative in “conoidal” regions perpendicular to the external field direction (see Figure 3 of [490]): with accurate-enough weak-lensing data, detecting these pockets of negative phantom densities could, in principle, be a smoking gun for MOND [490], but such an effect would be extremely sensitive to the detailed distribution of the baryonic matter. A final important remark about the EFE is that it prevents most possible MOND effects in Galactic disk open clusters or in wide binaries, apart from a possible rescaling of the gravitational constant. Indeed, for wide binaries located in the solar neighborhood, the galactic EFE (coming from the distribution of mass in our galaxy) is about 1.5 × a0. The corresponding rescaling of the gravitational constant then depends on the choice of the μ-function, but could typically account for up to a 50% increase of the effective gravitational constant. Although this is not, properly speaking, a MOND effect, it could still perhaps imply a systematic offset of mass for very-long-period binaries. However, any effect of the type claimed to be observed by [188] would not be expected in MOND due to the external field effect.

6.4 MOND in the solar system

The primary place to test modified gravity theories is, of course, the solar system, where general relativity has, until now, passed all the proposed tests. Detecting a deviation from Einsteinian gravity in our backyard would actually be the holy grail of modified gravity theories, in the same sense as direct detection in the lab is the holy grail of the CDM paradigm. However, MOND anomalies typically manifest themselves only in the weak-gravity regime, several orders of magnitudes below the typical gravitational field exerted by the sun on, e.g., the inner planets. But in the case of modified inertia (Section 6.1.1), the anomalous acceleration at any location depends on properties of the whole orbit (non-locality), so that anomalies may appear in the motion of Solar system bodies that are on highly-eccentric trajectories taking them to large distances (e.g., long period comets or the Pioneer spacecraft), where accelerations are low [314]. Such MOND effects have been proposed as a possible mechanism for generating the Pioneer anomaly [314, 469], without affecting the motions of planets, whose orbits are fully in the high acceleration regime. On the other hand, in classical, non-relativistic modified gravity theories (Sections 6.1.2 and 6.1.3), small effects could still be observable and would primarily probe two aspects of the theory: (i) the shape of the interpolating function (Section 6.2) in the regime x ≫ 1, and (ii) the external Galactic gravitational field (Section 6.3) acting on the solar system, testing the interpolating function in the regime x ≪ 1.

If, as a first approximation, one considers the solar system as isolated, and the Sun as a point mass, the MOND effect in the inner solar system appears as an anomalous acceleration field in addition to the Newtonian one. In units of 0, the amplitude of the anomalous acceleration is given by x[1 − μ(x)], which can be constrained from the motion of the inner planets, typically their perihelion precession and the (non)-variation of Kepler’s constant [293, 391, 417]. These constraints typically exclude the whole-family of interpolating functions (Eq. 46) that are natural for multifield theories such as TeVeS (see Section 6.2 and Section 7) because they yield x[1 − μ(x)] > 1 for x ≫ 1 while it must be smaller than 0.04 at the orbit of Mars [391]Footnote 35. Of course, this does not mean that the μ-function cannot be represented by the α-family in the intermediate gravity regime characterizing galaxies, but it must be modified in the strong gravity regimeFootnote 36. Another potential effect of MOND is anomalously strong tidal stresses in the vicinity of saddle points of the Newtonian potential, which might be tested with the LISA pathfinder [37, 49, 255, 464]. The MOND bubble can be quite big and clearly detectable, or the effect could be small and undetectable, depending on the interpolating function [255, 161].

The approximation of an isolated Solar system being incorrect, it is also important to add the effect of the external field from the galaxy. Its amplitude is typically on the order of ∼ 1.5 × a0. From there, Milgrom [314] has predicted (both analytically and numerically) a subtle anomaly in the form of a quadrupole field that may be detected in planetary and spacecraft motions (as subsequently confirmed by [62, 185]). This has been used to constrain the form of the interpolating function in the weak acceleration regime characteristic of the external field itself. Constraints have essentially been set on the n-family of μ-functions from the perihelion precession of Saturn [63, 154], namely that one must have n > 8 in order to fit these dataFootnote 37.

However, it should be noted that it is slightly inconsistant to compare the classical predictions of MOND with observational constraints obtained by a global fit of solar system orbits using a fully-relativistic first-post-Newtonian model. Although the above constraints on classical MOND models are useful guides, proper constraints can only truly be set on the various relativistic theories presented in Section 7, the first-order constraints on these theories coming from their own post-Newtonian parameters [65, 99, 173, 372, 391, 450]. What is more, and makes all these tests perhaps unnecessary, it has recently been shown that it was possible to cancel any deviation from general relativity at small distances in most of these relativistic theories, independently of the form of the μ-function [22].

6.5 MOND in rotationally-supported stellar systems

6.5.1 Rotation curves of disk galaxies

The root and heart of MOND, as modified inertia or modified gravity, is Milgrom’s formula (Eq. 7). Up to some small corrections outside of symmetrical situations, this formula yields (once a0 and the form of the transition function μ are chosen) a unique prediction for the total effective gravity as a function of the gravity produced by the visible baryons. It is absolutely remarkable that this formula, devised 30 years ago, has been able to successfully predict an impressive number of galactic scaling relations (the “Kepler-like” laws of Section 5.2, backed by the modern data of Section 4.3) that were very unprecise and/or unobserved at the time, and which still are a puzzle to understand in the ΛCDM framework. What is more, this formula not only predicts global scaling relations successfully, we show in this section that it also predicts the shape and amplitude of galactic rotation curves at all radii with uncanny precision, and this for all disk galaxy Hubble types [168, 402]. Of course, the absolute exact prediction of MOND depends on the exact formulation of MOND (as modified inertia or some form or other of modified gravity), but the differences are small compared to observational error bars, and even compared with the differences between various μ-functions.

In order to illustrate this, we plot in Figure 20 the theoretical rotation curve of an HSB exponential disk (see [145] for exact parameters) computed with three different formulations of MONDFootnote 38: Milgrom’s formula (Eq. 7), representative of circular orbits in modified inertia, AQUAL (Eq. 17), and a multi-field theory (Eq. 40) representative of a whole class of relativistic theories (see Sections 7.1 to 7.4), all with the α= n = 1 “simple” μ-function of Eq. 46 and Eq. 49. One can see velocity differences of only a few percents in this case, while, in general, it has been shown that the maximum difference between formulations is on the order of 10% for any type of disk [76]. This justifies using Milgrom’s formula as a proxy for MOND predictions on rotation curves, keeping in mind that, in order to constrain MOND within the modified gravity framework, one should actually calculate predictions of the various modified Poisson formulations of Section 6.1 for each galaxy model, and for each choice of galaxy parameters [18].

Figure 20
figure 20

Comparison of theoretical rotation curves for the inner parts (before the rotation curve flattens) of an HSB exponential disk [145], computed with three different formulations of MOND. Green: Milgrom’s formula; Blue: Bekenstein-Milgrom MOND (AQUAL); Red: TeVeS-like multi-field theory. Image reproduced by permission from [145], copyright by APS.

The procedure is then the following (see Section 4.3.4 for more detail). One usually assumes that light traces stellar mass (constant mass-to-light ratio, but see the counter-example M33), and one adds to this baryonic density the contribution of observed neutral hydrogen, scaled up to account for the contribution of primordial helium. The Newtonian gravitational force of baryons is then calculated via the Newtonian Poisson equation, and the MOND force is simply obtained via Eq. 7 or Eq. 10. First of all, an interpolating function must be chosen, then one can determine the value of a0 by fitting, all at once, a sample of high-quality rotation curves with small distance uncertainties and no obvious non-circular motions. Then, all individual rotation curve fits can be performed with the mass-to-light ratio of the disk as the single free parameter of the fitFootnote 39. It turns out that using the simple interpolating function (α= n = 1, see Eqs. 46 and 49) yields a value of a0 = 1.2 × 10−10 m s−2, and excellent fits to galaxy rotation curves [166]. However, as already pointed out in Sections 6.3 and 6.4, this interpolating function yields too strong a modification in the solar system, so hereafter we use the γ = δ = 1 interpolating function of Eqs. 52 and 53 (solid blue line on Figure 19), very similar to the simple interpolating function in the intermediate to weak gravity regime.

Figure 21 shows two examples of detailed MOND fits to rotation curves of Figure 13. The black line represents the Newtonian contribution of stars and gas and the blue line is the MOND fit, the only free parameter being the stellar mass-to-light ratioFootnote 40. Not only does MOND predict the general trend for LSB and HSB galaxies, it also predicts the observed rotation curves in great detail. This procedure has been carried out for 78 nearby galaxies (all galaxy rotation curves to which the authors have access), and the residuals between the observed and predicted velocities, at every point in all these galaxies (thus about two thousand individual measurements), are plotted in Figure 23. As an illustration of the variety and richness of rotation curves fitted by MOND, as well as of the range of magnitude of the discrepancies covered, we display in Figure 24 fits to rotation curves of extremely massive HSB early-type disk galaxies [402] with Vf up to 400 km/s, and in Figure 25 fits to very low mass LSB galaxies [324] with Vf down to 15 km/s. In the latter, gasrich, small galaxies, the detailed fits are insensitive to the exact form of the interpolating function (Section 6.2) and to the stellar mass-to-light ratio [168, 324]. We then display in Figure 26 eight fits for representative galaxies from the latest high-resolution THINGS survey [166, 481], and in Figure 27 six fits of yet other LSB galaxies (as these provide strong tests of MOND and depend less on the exact form of the interpolating function than HSB ones) from [120], updated with high resolution Hα data [242, 241]. The overall results for the whole 78 nearby galaxies (Figure 23) are globally very impressive, although there are a few outliers among the 2000 measurements. These are but a few trees outlying from a very clear forest. It is actually only as the quality of the data decline [384] that one begins to notice small disparities. These are sometimes attributable to external disturbances that invalidate the assumption of equilibrium [403], non-circular motions or bad observational resolution. For targets that are intrinsically difficult to observe, minor problems become more common [120, 448]. These typically have to do with the challenges inherent in combining disparate astronomical data sets (e.g., rotation curves measured independently at optical and radio wavelengths) and constraining the inclinations. A single individual galaxy that can be considered as a bit problematic is NGC 3198 [68, 166], but this could simply be due to a problem with the potentially too high Cepheids-based distance (reddening problem mentioned in [254]). Indeed, the adopted distance plays an important role in the MOND fitting procedure, as the value of the centripetal acceleration \(V_c^2/R\) depends on the distance through the conversion of the observed angular radius in arcsec into the physical radius R in kpc. Note that other galaxies such as NGC 2841 had historically-posed problems to MOND but that these have largely gone away with modern data (see [166] and Figure 26).

Figure 21
figure 21

Examples of detailed MOND rotation curve fits of the HSB and LSB galaxies of Figure 13 (NGC 6946 on the left and NGC 1560 on the right). The black line represents the Newtonian contribution of stars and gas as determined by numerical solution of the Newtonian Poisson equation for the observed light distribution, as per Figure 13. The blue line is the MOND fit with the γ = δ = 1 function of Eq. 52 and Eq. 53, the only free parameter being the stellar mass-to-light ratio. In the K-band, the best fit value is 0.37 M/L for NGC 6946 and 0.18 M/L for NGC 1560. In practice, the best fit mass-to-light ratio can co-vary with the distance to the galaxy and a0; here a0 is held fixed (1.2 × 10−10 m s−2) and the distance has been held fixed to the best observed value (5.9 Mpc for NGC 6946 [220] and 3.45 Mpc for NGC 1560 [219]). Milgrom’s formula provides an effective mapping between the rotation curve predicted by the observed baryons and the observed rotation, including the bumps and wiggles.

Figure 22
figure 22

The rotation curve [124] and MOND fit [384] of the Local Group spiral M33 assuming a constant stellar mass-to-light ratio (top panel). While the overall shape is a good match, there is a slight mismatch at ∼ 3 kpc and above 7 kpc. The observed color gradient implies a slight variation in the mass-to-light ratio, in the sense that the stars at small radii are slightly redder and heavier than those at large radii. Applying stellar population models [42] to the observed color gradient produces a slight adjustment of the Newtonian mass model. The dotted line in the lower panel reiterates the constant M/L model from the top panel, while the solid line has been corrected for the observed color gradient. This slight adjustment to the baryonic mass distribution considerably improves the fit.

Figure 23
figure 23

Residuals of MOND fits to the rotation curves of 78 nearby galaxies (all data to which authors have access) including about two thousand individual resolved measurements. Data for 21 galaxies are either new or improved in terms of spatial resolution and velocity accuracy over those in [401]. More accurate points are illustrated with larger symbols. The histogram of residuals is plotted on the right panel, and is well fitted by a Gaussian of width ∇v/v ∼ 0.04. The bulk of the more accurate data are in good accord with MOND. There are a few deviant points, mostly at small radii where non-circular motions are ubiquitous and observational resolution (beam smearing) can be a challenge. These are but a few trees outlying from a very clear forest.

Figure 24
figure 24

Examples of MOND fits (blue lines, using Eq. 53 with δ = 1) to two massive galaxies [402]. With baryonic masses in excess of 1011 M, these are among the most massive, rapidly rotating disk galaxies known. Stars dominate the mass, and Newtonian dynamics suffices to explain the innermost regions because of the high acceleration, but the mass discrepancy becomes apparent as the Keplerian decline (black lines) falls well below the data at the enormous radii spanned by these giant disks (the diameter of UGC 2487 spans half a million lightyears).

Figure 25
figure 25

Examples of MOND fits (blue lines) to two dwarf galaxies [324]. The data for DDO 210 come from [29], and those for UGC 11583 (also known as KK98 250) are from [30] augmented with high resolution data from [281, 242]. The high gas content of these galaxies make them strong tests of MOND, as the one fit-parameter — the mass-to-light ratio of the stars — has only a minor impact on the fit. What is more, as they are deep in the MOND regime, the exact form of the interpolating function (Section 6.2) also has little impact on the fits, making them the cleanest tests of MOND, with essentially no wiggle room. Note that, with a mass of only a few million solar masses (comparable in mass to the largest globular clusters), the Local Group dwarf DDO 210 is the smallest galaxy known to show clear rotation (Vf ∼ 15 km/s). It is the lowest point in Figure 3.

Figure 26
figure 26

MOND rotation curve fits for representative galaxies from the THINGS survey [121, 166, 481]. Galaxies are chosen to illustrate a broad range of mass, from Mb ∼ 3 × 108 M to ∼ 3 × 1011 M. All galaxies have high resolution interferometric 21 cm data for the gas and 3.6μ photometry for mapping the stars. The Newtonian baryonic mass model is shown as a black line and the MOND fit as a blue line (as in Figure 21). The fits use the interpolating function of Eq. 53 with δ = 1.

Figure 27
figure 27

MOND rotation curve fits for LSB galaxies [120] updated with high resolution Hα data [242, 241] and using Eq. 53 with δ = 1. LSB galaxies are important tests of MOND because their low surface densities (Σ ≪ a0/G) place them well into the MOND regime everywhere, and the exact form of the interpolating function is rather unimportant. Their baryonic mass models fall well short of explaining the observed rotation at any but the smallest radii in Newtonian dynamics, and MOND nevertheless provides the necessary additional force everywhere (lines as per Figure 21).

We finally note that what makes all these rotation curve fits really impressive is that either (i) stellar mass-to-light ratios are unimportant (in the case of gas-rich galaxies) yielding excellent fits with essentially zero free parameters (apart from some wiggle room on the distance), or (ii) stellar mass-to-light ratios are important, and their best-fit value, obtained on purely dynamical grounds assuming MOND, vary with galaxy color as one would expect on purely astrophysical grounds from stellar population synthesis models [42]. There is absolutely nothing built into MOND that would require that redder galaxies should have higher stellar mass-to-light ratios in the B-band, but this is what the rotation curve fits require. This is shown on Figure 28, where the best-fit mass-to-light ratio in the B-band is plotted against B — V color index (left panel), and the same for the K-band (right panel).

Figure 28
figure 28

A comparison of the mass-to-light ratios obtained from MOND rotation curve fits (points) with the independent expectations of stellar population synthesis models (lines) [42]. The mass-to-light ratio in the optical (blue B-band, left) and near-infrared (2.2 µm K-band, right) are shown as a function of B — V color (the ratio of blue to green light). The one free parameter of MOND rotation curve fits reproduces the normalization, slope, and scatter expected from what we know about stars. Not all galaxies illustrated here have both B and K-band data. Some have neither, instead having photometry in some other bandpass (e.g., V or R or I).

6.5.2 The Milky Way

Our own Milky Way galaxy (an HSB galaxy) is a unique laboratory within which present and future surveys will allow us to perform many precision tests of MOND (at a level of precision that might even discriminate between the various versions of MOND described in Section 6.1) that are not feasible with external galaxies. However, concerning the rotation curve, the test is, at present, not the most conclusive, as the outer rotation curve of the Milky Way is paradoxically much less precisely known than that of external galaxies (the forthcoming Gaia mission should allow improvement to this situation, although the rotation curve will not be measured directly). Nevertheless, past studies of the inner rotation curve of the Milky Way [141, 142, 274], measured with the tangent point method, compared to the baryonic content of the inner Galaxy [53, 155], have shown full agreement between the rotation curve and MOND, assuming, as usual, the simple interpolating function (α = n = 1 in Eqs. 46 and 49) or the γ = δ =1 interpolating function (Eqs. 52 and 53). The inverse problem was also tackled, i.e., deriving the surface density of the inner Milky Way disk from its rotation curve (see Figure 29): this exercise [274] led to a derived surface density fully consistent with star count data, and also even reproducing the details of bumps and wiggles in the surface brightness (Renzo’s rule, Section 4.3.4), while being fully consistent with the (somewhat imprecise) constraints on the outer rotation curve of the galaxy [494].

Figure 29
figure 29

The mass distribution of the Milky Way disk (left) inferred from fitting in MOND the observed bumps and wiggles in the rotation curve of the galaxy (right) [274]. The Newtonian contributions of the stellar and gas disk are shown as dashed and dotted lines as per Figure 13. The resulting model is consistent with independent star count data [155] and compares favorably to constraints on the rotation curve at radii beyond those included in the fit [494]. The prominent feature at R ≈ 6 kpc corresponds to the Centaurus spiral arm.

However, especially with the advent of present and future astrometric and spectroscopic surveys, the Milky Way offers a unique opportunity to test many other predictions of MOND. These include the effect of the “phantom dark disk” (see Figure 18) on vertical velocity dispersions and on the tilt of the stellar velocity ellipsoid, the precise shape of tidal streams around the galaxy, or the effects of the external gravitational field in which the Milky Way is embedded on fundamental parameters such as the local escape speed. However, all these predictions can vary slightly depending on the exact formulation of MOND (mainly Bekenstein-Milgrom MOND, QUMOND, or multi-field theories, the predictions being anyway difficult to make in modified inertia versions of MOND when non-circular orbits are considered). Most of the predictions made until today and reviewed hereafter have been using the Bekenstein-Milgrom version of MOND (Eq. 17).

Based on the baryonic distribution from, e.g., the Besançon model of the Milky Way [366], one can compute the MOND gravitational field of the Galaxy by solving the BM-equation (Eq. 17). This has been done in [490]. Then one can apply the Newtonian Poisson equation to it, in order to find back the density distribution that would have yielded this potential within Newtonian dynamics [50, 140]. In this context, as already shown (Figure 18), MOND predicts a disk of “phantom dark matter” allowing one to clearly differentiate it from a Newtonian model with a dark halo:

  1. (i)

    By measuring the force perpendicular to the galactic plane: at the solar radius, MOND predicts a 60 percent enhancement of the dynamic surface density at 1.1 kpc above the plane compared to the baryonic surface density, a value in agreement with current data (Table 1, see also [339]). The enhancement would become more apparent at large galactocentric radii where the stellar disk mass density becomes negligible.

  2. (ii)

    By determining dynamically the scale length of the disk mass density distribution. This scale length is a factor ∼ 1.25 larger than the scale length of the visible stellar disk if Bekenstein-Milgrom MOND applies. Such a test could be applied with existing RAVE data [423], but the accuracy of available proper motions still limits the possibility to explore the gravitational forces too far from the solar neighborhood.

  3. (iii)

    By measuring the velocity ellipsoid tilt angle within the meridional galactic plane. This tilt is different within the MOND and Newton+dark halo cases in the inner part of the Galactic disk. The tilt of about 6 degrees at z =1 kpc at the solar radius is in agreement with the recent determination of 7.3 ± 1.8 degrees obtained by [422]. The difference between MOND and a Newtonian model with a spherical halo becomes significant at z =2 kpc. Interestingly, recent data [328] on the tilt of the velocity ellipsoid at these heights clearly favor the MOND prediction [50].

Table 1 Values predicted from the Besançon model of the Milky Way in MOND as seen by a Newtonist (i.e., in terms of phantom dark matter contributions) compared to current observational constraints in the Milky Way, for the local dynamical surface density and the tilt of the stellar velocity ellipsoid [50]. Predictions for a round dark halo without a dark disk are also compatible with the current constraints, though [194, 422]. The tilt at z =2 kpc should be more discriminating.

Such tests of MOND could be applied with the first release of future Gaia data. To fix the ideas on the current local constraints, the predictions of the Besancon MOND model are compared with the relevant observations in Table 1. However, let us note that these predictions are extremely dependent on the baryonic content of the model [53, 155, 366], so that testing MOND at the precision available in the Milky Way heavily relies on star counts, stellar population synthesis, census of the gaseous content (including molecular gas), and inhomogeneities in the baryonic distribution (clusters, gas clouds).

Another test of the predictions of MOND for the gravitational potential of the Milky Way is the thickness of the HI layer as a function of position in the disk (see Section 6.5.3): it has been found [378] that Bekenstein-Milgrom MOND and it phantom disk successfully accounts for the most recent and accurate flaring of the HI layer beyond 17 kpc from the center, but that it slightly underpredicts the scale-height in the region between 10 and 15 kpc. This could indicate that the local stellar surface density in this region should be slightly smaller than usually assumed, in order for MOND to predict a less massive phantom disk and hence a thicker HI layer. Another explanation for this discrepancy would rely on non-gravitational phenomena, namely ordered and small-scale magnetic fields and cosmic rays contributing to support the disk.

Yet another test would be the comparison of the observed Sagittarius stream [198, 248] with the predictions made for a disrupting galaxy satellite in the MOND potential of the Milky Way. Basic comparisons of the stream with the orbit of a point mass have shown accordance at the zeroth order [358]. In reality, such an analysis is not straightforward because streams do not delineate orbits, and because of the non-linearity of MOND. However, combining a MOND N-body code with a Bayesian technique [474] in order to efficiently explore the parameter space, it should be possible to rigorously test MOND with such data in the near future, including for external galaxies, which will lead to an exciting battery of new observational tests of MOND.

Finally, a last test of MOND in the Milky Way involves the external field effect of Section 6.3. As explained there, the return to a Newtonian (Eq. 61 or Eq. 63) instead of a logarithmic (Eq. 20) potential at large radii is defining the escape speed in MOND. By observationally estimating the escape speed from a system (e.g., the Milky Way escape speed from our local neighborhood), one can estimate the amplitude of the external field in which the system is embedded. With simple analytical arguments, it was found [144] that with an external field of 0.01a0, the local escape speed at the Sun’s radius was about 550 km/s, exactly as observed (within the observational error range [433]). This was later confirmed by rigorous modeling in the context of Bekenstein-Milgrom MOND and with the Besancon baryonic model of the Milky Way [492]. This value of the external field, 10−2 × a0, corresponds to the order of magnitude of the gravitational field exerted by Large Scale Structure, estimated from the acceleration endured by the Local Group during a Hubble time in order to attain a peculiar velocity of 600 km/s.

6.5.3 Disk stability and interacting galaxies

A lot of questions in galaxy dynamics require using N-body codes. This is notably necessary for studying stability of galaxy disks, the formation of bars and spirals, or highly time-varying configurations such as galaxy mergers. As we have seen in Section 6.1.2, the BM modified Poisson equation (Eq. 17) can be solved numerically using various methods [50, 77, 96, 147, 250, 457]. Such a Poisson solver can then be used in particle-mesh N-body codes. More general codes based on QUMOND (Section 6.1.3) are currently under development.

The main results obtained via these simulations are the following (the comparison with observations will be discussed below):

  1. (i)

    LSB disks are more unstable regarding bar and spiral instabilities in MOND than in the Newton+spherical halo equivalent case,

  2. (ii)

    Bars always tend to appear more quickly in MOND than in the Newton+spherical halo equivalent, and are not slowed down by dynamical friction, leading to fast bars,

  3. (iii)

    LSB disks can be both very thin and extended in MOND thanks to the effect of the “phantom disk”, and vertical velocity dispersions level off at 8 km/s, instead of 2 km/s for Newtonian disks,

  4. (iv)

    Warps can be created in apparently isolated galaxies from the external field effect of large scale structure in MOND,

  5. (v)

    Merging time-scales are longer in MOND for interacting galaxies,

  6. (vi)

    Reproducing interacting systems such as the Antennae require relatively fine-tuned initial conditions in MOND, but the resulting galaxy is more extended and thus closer to observations, thanks to the absence of angular momentum transfer to the dark halo.

Concerning the first point (i), Brada & Milgrom [77] investigated the important problem of stability of disk galaxies. They demonstrated that MOND, as anticipated [299], has an effect similar to a dark halo in stabilizing a rotationally-supported disk, thereby explaining the upper limit in surface density seen in the data (Section 4.3.2), and also showing how it damps the growth-rate of bar-forming modes in the weak gravitational field regime. In a comparison of MOND disks with the equivalent Newtonian+halo counterpart (with identical rotation curves), they found that, as the surface density of the disk decreases, the growth-rate of the bar-forming mode decreases similarly in both cases. However, in the limit of very low surface densities, typical of LSB galaxies, the MOND growth rate stops decreasing, contrary to the Newton+dark halo case (Figure 30). This could provide a solution to the stability challenge of Section 4.2, as observed LSBs do exhibit bars and spirals, which would require an ad hoc dark component within the self-gravitating disk of the Newtonian system. One can also see on this figure that if the surface density is typical of intermediate HSB galaxies, the bar systematically forms quicker in MOND.

Figure 30
figure 30

The scaled growth-rate of the m = 2 instability in Newtonian disks with a dark halo (dotted line) and MONDian disks (solid line) as a function of disk mass. In the MOND case, as the disk mass decreases, the surface density decreases and the disk sinks deeper into the MOND regime. However, at very low masses the growth-rate saturates. In the equivalent Newtonian case, the rotation curve is maintained at the MOND level by supplementing the force with a round stabilizing dark halo, which causes the growth-rate to crash [77, 401]. An ad-hoc dark disk could help maintain the growth rate in the dark matter context. Image reproduced by permission from [401].

This was confirmed in recent simulations [104, 457], where it was additionally found that (ii) the bar is sustained longer, and is not slowed down by dynamical friction against the dark halo, which leads to fast bars, consistent with the observed fast bars in disk galaxies (measured through the position of resonances). However, when gas inflow and external gas accretion are included, a larger range of situations are met regarding pattern speeds in MOND, all compatible with observations [458]. Since the bar pattern speed has a tendency to stay constant, the resonances remain at the same positions, and particles are trapped on these orbits more easily than in the Newtonian case, which leads to the formation of rings and pseudo-rings as observed (see Figure 31 and Figure 32). All these results have been shown to be independent of the exact choice of interpolating μ-function [458].

Figure 31
figure 31

(a) The galaxy ESO 509-98. (b) The galaxy NGC 1543. These are two examples of galaxies that exhibit clear ring and pseudo-ring structures. Image courtesy of Tiret, reproduced by permission from [458], copyright by ESO.

Figure 32
figure 32

Simulations of ESO 509-98 and NGC 1543 in MOND, to be compared with Figure 31. Rings and pseudo-ring structures are well reproduced with modified gravity. Image courtesy of Tiret, reproduced by permission from [458], copyright by ESO.

What is more, (iii) LSB disks can be both very thin and extended in MOND thanks to the stabilizing effect of the “phantom disk”, and vertical velocity dispersions level off at 8 km/s, as typically observed [25, 241], instead of 2 km/s for Newtonian disks with Σ = 1 M pc−2 (depending on the thickness of the disk). However, the observed value is usually attributed to non-gravitational phenomena. Note that [279] utilized this fact to predict that conventional analyses of LSB disks would infer abnormally high mass-to-light ratios for their stellar populations — a prediction that was subsequently confirmed [159, 371]. But let us also note that this stabilizing effect of the phantom disk, leading to very thin stellar and gaseous layers, could even be too strong in the region between 10 and 15 kpc from the galactic center in the Milky Way (see Section 6.5.2), and in external galaxies [497], even though, as said, non-gravitational effects such as ordered and small-scale magnetic fields and cosmic rays could significantly contribute to the prediction in these regions.

Via these simulations, it has also been shown (iv) that the external field effect of MOND (Section 6.3) offers a mechanism other than the relatively weak effect of tides in inducing and maintaining warps [79]. It was demonstrated that a satellite at the position and with the mass of the Magellanic clouds can produce a warp in the plane of the galaxy with the right amplitude and form [79], and even more importantly, that isolated galaxies could be affected by the external field of large scale structure, inducing a differential precession over the disk, in turn causing a warp [104]. This could provide a new explanation for the puzzle of isolated warped galaxies.

Interactions and mergers of galaxies are (v) very important in the cosmological context of galaxy formation (see also Section 9.2). It has been found [95] from analytical arguments that dynamical friction should be much more efficient in MOND, for instance for bar slowing down or mergers occurring more quickly. But simulations display exactly the opposite effect, in the sense of bars not slowing down and merger time-scales being much larger in MOND [338, 459]. Concerning bars, Nipoti [335] found that they were indeed slowed down more in MOND, as predicted analytically [95], but this is because their bars were unrealistically small compared to observed ones. In reality, the bar takes up a significant fraction of the baryonic mass, and the reservoir of particles to interact with, assumed infinite in the case of the analytic treatment [95], is in reality insufficient to affect the bar pattern speed in MOND. Concerning long merging time-scales, an important constraint from this would be that, in a MONDian cosmology, there should perhaps be fewer mergers, but longer ones than in ΛCDM, in order to keep the total observed amount of interacting galaxies unchanged. This is indeed what is expected (see Section 9.2). What is more, the long merging time-scales would imply that compact galaxy groups do not evolve statistically over more than a crossing time. In contrast, in the Newtonian+dark halo case, the merging time scale would be about one crossing time because of dynamical friction, such that compact galaxy groups ought to undergo significant merging over a crossing time, contrary to what is observed [239]. Let us also note that, in MOND, many passages in binary galaxies will happen before the final merging, with a starburst triggered at each passage, meaning that the number of observed starbursts as a function of redshift cannot be used as an estimate of the number of mergers [104].

Finally, (vi) at a more detailed level, the Antennae system, the prototype of a major merger, has been shown to be nicely reproducible in MOND [459]. This is illustrated in Figure 33. On the contrary, while it is well established that CDM models can result in nice tidal tails, it turns out to be difficult to simultaneously match the narrow morphology of many observed tidal tails with rotation curves of the systems from which they come [130]. In MOND, reproducing the Antennae requires relatively fine-tuned initial conditions, but the resulting tidal tails are narrow and the galaxy is more extended and thus closer to observations than with CDM, thanks to the absence of angular momentum transfer to the dark halo (solution to the angular momentum challenge of Section 4.2).

Figure 33
figure 33

Simulation of the Antennae with MOND (right, [459]) compared to the observations (left, [190]). In the observations, the gas is represented in blue and the stars in green. In the simulation the gas is in blue and the stars are in yellow/red. Image courtesy of Tiret, reproduced by permission from [459], copyright by ASP.

6.5.4 Tidal dwarf galaxies

As seen in, e.g., Figure 33, left panel, major mergers between spiral galaxies are frequently observed with dwarf galaxies at the extremity of their tidal tails, called Tidal Dwarf Galaxies (TDG). These young objects are formed through gravitational instabilities within the tidal tails, leading to local collapse of gas and star formation. These objects are very common in interacting systems: in some cases dozens of such condensations are seen in the tidal tails, with a few ones having a mass typical of other dwarf galaxies in the Universe. However, in the ΛCDM model, these objects are difficult to form, and require very extended dark matter distribution [71]. In MOND simulations [459, 104], the exchange of angular momentum occurs within the disks, whose sizes are inflated. For this reason, it is much easier with MOND to form TDGs in extended tidal tails.

What is more, in the ΛCDM context, these objects are not expected to drag CDM around them, the reason being that these objects are formed out of the material in the tidal tails, itself made of the dynamically cold, rotating, material in the progenitor disk galaxies. In these disks, the local ratio of dark matter to baryons is close to zero. For this reason, the ΛCDM prediction is that these objects should not exhibit a mass discrepancy problem. However, the first ever measurement of the rotation curve of three TDGs in the NGC 5291 ring system (Figure 34) has revealed the presence of dark matter in these three objects [72]. A solution to explain this in the standard picture could then be to resort to dark baryons in the form of cold molecular gas in the disks of the progenitor galaxies. However, it is very surprising that a very different kind of dark matter, in this case baryonic dark matter, would conspire to assemble itself precisely in the right way such as to put the three TDGs (see Section 4.3.1) on the baryonic Tully-Fisher relation (when this baryonic dark matter is not taken into account in the baryonic budget of the BTF). Another possibility, not resorting to baryonic dark matter, would be that, by chance, the three TDGs have been observed precisely edge-on. However, if we simply consider the most natural inclination coming from the geometry of the ring (i = 45ΰ, see [72]), and apply Milgrom’s formula to the visible matter distribution with zero free parameters [165, 309], one gets very reasonable curves (Figure 35). Playing around a little bit with the inclinations allows perfect fits to these rotation curves [165], while the influence of the external field effect has been shown not to significantly change the result. Therefore, we can conclude that ΛCDM has severe problems with these objects, while MOND does exceedingly well in explaining their observed rotation curves.

Figure 34
figure 34

The NGC 5291 system [72]. VLA atomic hydrogen 21-cm map (blue) superimposed on an optical image (white). The UV emission observed by GALEX (red) traces dense star-forming concentrations. The most massive of these objects are rotating with the projected spin axis as indicated by dashed arrows. The three most massive ones are denoted as NGC5291N, NGC5291S, and NGC5291W. Image courtesy of Bournaud, reproduced by permission from [72].

Figure 35
figure 35

Rotation curves of the three TDGs in the NGC 5291 system. In red: ΛCDM prediction (with no additional cold molecular gas), with the associated uncertainties. In black: MOND prediction with the associated uncertainties (prediction with zero free parameter, “simple” μ-function assumed). Image reproduced by permission from [165], copyright by ESO.

However, the observations of only three TDGs are, of course, not enough, from a statistical point of view, in order for this result to be as robust as needed. Many other TDGs should be observed to randomize the uncertainties, and consolidate (or invalidate) this potentially extremely important result, that could allow one to really discriminate between Milgrom’s law being either a consequence of some fundamental aspect of gravity (or of the nature of dark matter), or simply a mere recipe for how CDM organizes itself inside spiral galaxies. As a summary, since the internal dynamics of tidal dwarfs should not be affected by CDM, they cannot obey Milgrom’s law for a statistically-significant sample of TDGs if Milgrom’s law is only linked to the way CDM assembles itself in galaxies. Thus, observations of the internal dynamics of TDGs should be one of the observational priorities of the coming years in order to settle this debate.

Finally, let us note that it has been suggested [239], as a possible solution to the satellites phase-space correlation problem of Section 4.2, that most dwarf satellites of the Milky Way could have been formed tidally, thereby being old tidal dwarf galaxies. They would then naturally appear in closely related planes, explaining the observed disk-of-satellites. While this scenario would lead to a missing satellites catastrophe in ΛCDM (see Section 4.2), it could actually make sense in a MONDian Universe (see Section 9.2).

6.6 MOND in pressure-supported stellar systems

We have already outlined (Section 5.2) how Milgrom’s formula accounts for general scaling relations of pressure-supported systems such as the Faber-Jackson relation (Figure 7 and see [395]), and that isothermal systems have a finite mass in MOND with the density at large radii falling approximately as r−4 [296]. Note also that, in order to match the observed fundamental plane, MOND models must actually deviate somewhat from being strictly isothermal and isotropic: a radial orbit anisotropy in the outer regions is needed [388, 86]. Here we concentrate on slightly more detailed predictions and scaling relations. In general, these detailed predictions are less obvious to make than in rotationally-supported systems, precisely because of the new degree of freedom introduced by the anisotropy of the velocity distribution, very difficult to constrain observationally (as higher-order moments than the velocity dispersions would be needed to constrain it). As we shall see, the successes of MOND are in general a bit less impressive in pressure-supported systems than in rotationally-supported ones, and even in some cases really problematic (e.g., in the case of galaxy clusters, see Section 6.6.4). Whether this is due to the fact that predictions are less obvious to make, or whether this truly reflects a breakdown of Milgrom’s formula for these objects (or the fact that certain theoretical versions of MOND would explicitly deviate from Milgrom’s formula in pressure-supported systems, see Section 6.1.1) remains unclear.

6.6.1 Elliptical galaxies

Luminous elliptical galaxies are dense bodies of old stars with very little gas and typically large internal accelerations. The age of the stellar populations suggest they formed early and all the gas has been used to form stars. To form early, one might expect the presence of a massive dark-matter halo, but the study of, e.g., [367] showed that actually, there is very little evidence for dark matter within the effective radius, and even several effective radii, in ellipticals. On the other hand, these are very-HSB objects and would thus not be expected to show a large mass discrepancy within the bright optical object in MOND. And indeed, the results of [367] were shown to be in perfect agreement with MOND predictions, assuming very reasonable anisotropy profiles [323]. On the theoretical side, it was also importantly shown that triaxial elliptical galaxies can be reproduced using the Schwarzschild orbit superposition technique [482], and that these models are stable [493]Footnote 41.

Interestingly, some observational studies circumvented the mass-anisotropy degeneracy by constructing non-parametric models of observed elliptical galaxies, from which equivalent circular velocity curves, radial profiles of mass-to-light ratio, and anisotropy profiles, as well as high-order moments, could be computed [171]. Thanks to these studies, it was, e.g., shown [171] that, although not much dark matter is needed, the equivalent circular velocity curves (see also [484] where the rotation curve could be measured directly) tend to become flat at much larger accelerations than in thin exponential disk galaxies. This would seem to contradict the MOND prescription, for which flat circular velocities typically occur well below the acceleration threshold a0, but not at accelerations on the order of a few times a0 as in ellipticals. However, as shown in [363], if one assumes the simple interpolating function (α = n = 1 in Eq. 46 and Eq. 49), known to yield excellent fits to spiral galaxy rotation curves (see Section 6.5.1), one finds that MONDian galaxies exhibit a flattening of their circular velocity curve at high accelerations if they can be described by a Jaffe profile [208] in the region where the circular velocity is constant. Since this flattening at high accelerations is not possible for exponential profiles, it is remarkable that such flattenings of circular velocity curves at high accelerations are only observed in elliptical galaxies. What is more, [171], as well as [454], derived from their models scaling relations for the configuration space and phase-space densities of dark matter in ellipticals, and these DM scaling relations have been shown [363] to be in very good agreement with the MOND predictions on “phantom DM” (Eq. 33) scaling relations. This is displayed on Figure 36. Of course, some of these galaxies are residing in clusters, and the external field effect (see Section 6.3) could modify the predictions, but this was shown to be negligible for most of the analyzed sample, because the galaxies are far away from the cluster center [363]. Note that when closer to the center of galaxy clusters, interesting behaviors such as lopsidedness caused by the external field effect could allow new tests of MOND in the near future [491]. However, this would require modelling both the orbit of the galaxy in the cluster to take into account time-variations of the external field, as well as a precise estimate of the external field from the cluster itself, which can be tricky as the whole cluster should be modelled at once due to the non-linearity of MOND [113, 259].

Figure 36
figure 36

MOND phantom dark matter scaling relations in ellipticals. The circles display central density ρ0, and central phase space density f of the phantom dark halos predicted by MOND for different masses of baryonic Hernquist profiles (with scale-radius rh related to the effective radius by Reff = 1.815 rh). The dotted lines are the scaling relations of [171], and the dashed lines those of [454], which exhibit a very large observational scatter in good agreement with the MOND prediction [363]. Image reproduced by permission from [363], copyright by ESO.

At a more detailed level, precise full line-of-sight velocity dispersion profiles of individual ellipticals, typically measured with tracers such as PNe or globular-cluster populations, have been reproduced by solving Jeans equation in spherical symmetry:

$${{d{\sigma ^2}} \over {dr}} + {\sigma ^2}{{(2\beta + \alpha)} \over r} = - g(r),$$
(65)

where σ is the radial velocity dispersion, α = d ln ρ/d ln r is the slope of the tracer density ρ, and \(\beta = 1 - (\sigma _\theta ^2 + \sigma _\phi ^2)/2{\sigma ^2}\) is the velocity anisotropy. Note that on the left-hand side, one uses the density and the velocity dispersion of the tracers only, which can be different from the density producing the gravity on the right-hand side, if a specific population of tracers such as globular clusters is used. When the global kinematics of a galaxy is analyzed, we do expect in MOND that the gravity on the right-hand side of Eq. 65 is generated by the observed mass distribution, so both should be fit simultaneously: Figure 37 (provided by [399]) shows an example. In general, it was found that field galaxies all fit very naturally with MOND [461, 410] (see also [484]). On the other hand, the MOND modification has been found to slightly underpredict the velocity dispersions in large elliptical galaxies at the very center of galaxy clusters [364], which is just the small-scale equivalent of the problem of MOND in clusters, pointing towards missing baryons (see Section 6.6.4).

Figure 37
figure 37

The surface brightness (a) and velocity dispersion (b) profiles of the elliptical galaxy NGC 7507 [375] fitted by MOND (lines [399]). Elliptical galaxies can be approximated in MOND as high-order polytropes with some radial orbit anisotropy [388]. This particular case has a polytropic index of 14 with anisotropy of the Osipkov-Merritt form with an anisotropy radius of 5 kpc and maximum anisotropy β = 0.75 at large radii [399]. The stellar mass-to-light ratio is \(\Upsilon _\ast^B = 3.03{M_ \odot}/{L_ \odot}\). This simple model captures the gross properties of both the surface brightness and velocity dispersion profiles. The galaxy is well-fitted by MOND, contrary to the claim of [375].

On the other hand, [225] used satellite galaxies of ellipticals to test MOND at distances of several 100 kpcs. They used the stacked SDSS satellites to generate a pair of mock galaxy groups with reasonably precise line-of-sight velocity dispersions as a function of radius across the group. When these systems were first analysed by [225] they claimed that MOND was excluded by 10σ, but this was only for models that had constant velocity anisotropy. It was then found [14] that with varying anisotropy profiles similar to those found in simulations of the formation of ellipticals by dissipationless collapse in MOND [337], excellent fits to the line-of-sight velocity dispersions of both mock galaxies could be found. This can be taken as strong evidence that MOND describes the dynamics in the surroundings of relatively isolated ellipticals very well.

Finally, let us note an intriguing possibility in a MONDian universe (see also Section 9.2). While massive ellipticals would form at z ≈ 10 [393] from monolithic dissipationless collapse [337], dwarf ellipticals could be more difficult to form. A possibility to form those would then be that tidal dwarf galaxies would be formed and survive more easily (see Section 6.5.4) in major mergers, and could then evolve to lead to the population of dwarf ellipticals seen today, thereby providing a natural explanation for the observed density-morphology relation [239] (more dwarf ellipticals in denser environments).

6.6.2 Dwarf spheroidal galaxies

Dwarf spheroidal (dSph) satellites of the Milky Way [427, 477] exhibit some of the largest mass discrepancies observed in the universe. In this sense, they are extremely interesting objects in which to test MOND. Observationally, let us note that there are essentially two classes of objects in the galactic stellar halo: globular clusters (see Section 6.6.3) and dSph galaxies. These overlap in baryonic mass, but not in surface brightness, nor in age or uniformity of the stellar populations. The globular clusters are generally composed of old stellar populations, they are HSB objects and mostly exhibit no mass discrepancy problem, as expected for HSB objects in MOND. The dSphs, on the contrary, generally contain slightly younger stellar populations covering a range of ages, they are extreme LSB objects and exhibit, as said before, an extreme mass discrepancy, as generically expected from MOND. So, contrary to the case of ΛCDM where different formation scenarios have to be invoked (see Section 6.6.3), the different mass discrepancies in these objects find a natural explanation in MOND.

At a more detailed level, MOND should also be able to fit the whole velocity dispersion profiles, and not only give the right ballpark prediction. This analysis has recently been possible for the eight “classical” dSph around the Milky Way [477]. Solving Jeans equation (Eq. 65), it was found [8] that the four most massive and distant dwarf galaxies (Fornax, Sculptor, Leo I and Leo II) have typical stellar mass-to-light ratios, exactly within the expected range. Assuming equilibrium, two of the other four (smallest and most nearby) dSphs have mass-to-light ratios that are a bit higher than expected (Carina and Ursa Minor), and two have very high ones (Sextans and Draco). For all these dSphs, there is a remarkable correlation between the stellar M/L inferred from MOND and the ages of their stellar populations [189]. Concerning the high inferred stellar M/L, note that it has been shown [78] that a dSph will begin to suffer tidal disruption at distances from the Milky Way that are 4–7 times larger in MOND than in CDM, Sextans and Draco could thus actually be partly tidally disrupted in MOND. And indeed, after subjecting the five dSphs with published data to an interloper removal algorithm [418], it was found that Sextans was probably littered with unbound stars, which inflated the computed M/L, while Draco’s projected distance-l.o.s. velocity diagram actually looks as out-of-equilibrium as Sextans’ one. Ursa Minor, on the other hand, is the typical example of an out-of-equilibrium system, elongated and showing evidence of tidal tails. In the end, only Carina has a suspiciously high M/L (> 4; see [418]).

What is more, there is a possibility that, in a MONDian Universe, dSphs are not primordial objects but have been tidally formed in a major merger (see Section 9.2 as a solution to the phasespace correlation challenge of Section 4.2). In addition to the MOND effect, it would be possible that these objects never really reach a stable equilibrium [237], and exhibit an artificially high M/L ratio. This is even more true for the recently discovered “ultra-faint” dwarf spheroidals, that are also, due to to their extremely low-density, very much prone to tidal heating in MOND. Indeed, at face value, if these ultrafaints are equilibrium objects, their velocity dispersions are much too high compared to what MOND predicts, and rule out MOND straightforwardly. However, unless this is due to systematic errors linked with the smallness of the velocity dispersion to measure (one must distinguish between σ ≈ 2 km s−1 and σ ≈ 5 km s−1), and/or to high intrinsic stellar M/L ratios related to stochastic effects linked with the small number of stars [186], it was also found [285] that these objects are all close to filling their MONDian tidal radii, and that their stars can complete only a few orbits for every orbit of the satellite itself around the Milky Way (see Figure 38). As Brada & Milgrom [78] have shown, it then comes as no surprise that they are displaying out-of-equilibrium dynamics in MOND (and even more so in the case of a tidal formation scenario [237]).

Figure 38
figure 38

The characteristic acceleration, in units of a0, in the smallest galaxies known: the dwarf satellites of the Milky Way (orange squares) and M31 (pink squares) [285]. The classical dwarfs, with thousands of velocity measurements of individual stars [477], are largely consistent with MOND. The more recently discovered “ultrafaint” dwarfs, tiny systems with only a handful of stars [427], typically are not, in the sense that their measured velocity dispersions and accelerations are too high. This could be due to systematic uncertainties in the data [230], as we must distinguish between σ ≈ 2 km s−1 and σ ≈ 5 km s−1. Nevertheless, there may be a good physical reason for the non-compliance of the ultrafaint galaxies in the context of MOND. The deviation of these objects only occurs in systems where the stars are close to filling their MONDian tidal radii: the left panel shows the half light radius relative to the tidal radius. Such systems may not be in equilibrium. Brada & Milgrom [78] note that systems will no longer respond adiabatically to the influence of their host galaxy when a star in a satellite galaxy can complete only a few orbits for every orbit the satellite makes about its host. The deviant dwarfs are in this regime (right panel).

6.6.3 Star clusters

Star clusters come in two types: open clusters and globular clusters. Most observed open clusters are in the inner parts of the Milky Way disk, and for that reason, the prediction of MOND is that their internal dynamics is Newtonian [293] with, perhaps, a slightly renormalized gravitational constant and slightly squashed isopotentials, due to the external field effect (Section 6.3). Therefore, the possibility of distinguishing Newtonian dynamics from MOND in these objects would require extreme precision. On the other hand, globular clusters are mostly HSB halo objects (see Section 6.6.2), and are consequently predicted to be Newtonian, and most of those that are fluffy enough to display MONDian behavior are close enough to the Galactic disk to be affected by the external field effect (Section 6.3), and so are Newtonian, too. Interestingly, MOND thus provides a natural explanation for the dichotomy between dwarf spheroidals and globular clusters. In ΛCDM, this dichotomy is rather explained by the formation history [235, 397]: globular clusters are supposedly formed in primordial disk-bound supermassive molecular clouds with high baryon-to-dark matter ratio, and later become more spheroidal due to subsequent mergers. In MOND, it is, of course, not implied that the two classes of objects have necessarily the same formation history, but the different dynamics are qualitatively explained by MOND itself, not by the different formation scenarios.

However, there exist a few globular clusters (roughly, less than ∼ 10 compared to the total number of ∼ 150) both fluffy enough to display typical internal accelerations well below a0, and far away enough from the galactic plane to be more or less immune from the external field effect [27, 182, 181, 436]. Thus, these should, in principle, display a MONDian mass discrepancy. They include, e.g., Pal 14 and Pal 3, or the large fluffy globular cluster NGC 2419. Pal 3 is interesting, because it indeed tends to display a larger-than-Newtonian global velocity dispersion, broadly in agreement with the MOND prediction (Baumgardt & Kroupa, private communication). However, it is difficult to draw too strong a conclusion from this (e.g., on excluding Newtonian dynamics), since there are not many stars observed, and one or two outliers would be sufficient to make the dispersion grow artificially, while a slightly-higher-than-usual mass-to-light ratio could reconcile Newtonian dynamics with the data. Other clusters such as NGC 1851 and NGC 1904 apparently display the same MONDian behavior [408] (see also [187]). On the other hand, Pal 14 displays exactly the opposite behavior: the measured velocity dispersion is Newtonian [212], but again the number of observed stars is too small to draw a statistically significant conclusion [164], and it is still possible to reconcile the data with MOND assuming a slightly low stellar mass-to-light ratio [437]. Note that if the cluster is on a highly eccentric orbit, the external gravitational field could vary very rapidly both in amplitude and direction, and it is possible that the cluster could take some time to accomodate this by still displaying a Newtonian signature in its kinematics after a sudden decrease of the external field.

NGC 2419 is an interesting case, because it allows not only for a measure of the global velocity dispersion, but also of the detailed velocity dispersion profile [199]. And, again, like in the case of Pal 14 (but contrary to Pal 3), it displays Newtonian behavior. More precisely, it was found, solving Jeans equations (Eq. 65), that the best MOND fit, although not extremely bad in itself, was 350 times less likely than the best Newtonian fit without DM [199, 200]. However, the stability [336] of this best MOND fit has not been checked in detail. These results are heavily debated as they rely on the small quoted measurement errors on the surface density, and even a slight rotation of only the outer parts of this system near the plane of the sky (which would not show up in th velocity data) would make a considerable difference in the right direction for MOND [398]. However, these observations, together with the results on Pal 14, although not ruling out any theory, are not a resounding success for MOND. However, it could perhaps indicate that globular clusters are generically on highly eccentric orbits, and out of equilibrium due to this (however, the effect would have to be opposite to that prevailing in ultra-faint dwarfs, where the departure from equilibrium would boost the velocity dispersion instead of decreasing it). A stronger view on these results could indicate that MOND as formulated today is an incomplete paradigm (see, e.g., Eq. 27), or that MOND is an effect due to the fundamental nature of the DM fluid in galaxies (see Sections 7.6 and 7.9), which is absent from globular clusters. Concerning NGC 2419, it is perhaps useful to remind oneself that it is very plausibly not a globular cluster. It is part of the Virgo stream and is thus most probably the remaining nucleus of a disrupting satellite galaxy in the halo of the Milky Way, on a generically-highly-eccentric orbit. Detailed N-body simulations of such an event, and of the internal dynamics of the remaining nucleus, would thus be the key to confront MOND with observations in this object. All in all, the situation regarding MOND and the internal dynamics of globular clusters remains unclear.

On the other hand, it has been noted that MOND seems to overpredict the Roche lobe volume of globular clusters [499, 500, 512]. Again, the fact that globular clusters could generically be on highly eccentric orbits could come to the rescue here. What is more, it was shown that, in MOND, globular clusters can have a cutoff radius, which is unrelated to the tidal radius when non-isothermal [397]. In general, the cutoff radii of dwarf spheroidals, which have comparable baryonic masses, are larger than those of the globular clusters, meaning that those may well extend to their tidal radii because of a possibly different formation history than globular clusters.

Finally, a last issue for MOND related to globular clusters [335, 377] is the existence of five such objects surrounding the Fornax dwarf spheroidal galaxy. Indeed, under similar environmental conditions, dynamical friction occurs on significantly shorter timescales in MOND than standard dynamics [95], which could cause the globular clusters to spiral in and merge within at most 2 Gyrs [377]. However, this strongly depends on the orbits of the globular clusters, and, in particular, on their initial radius [10], which can allow for a Hubble time survival of the orbits in MOND.

6.6.4 Galaxy groups and clusters

As pointed out earlier (3rd Kepler-like law of Section 5.2), it is a natural consequence of Milgrom’s law that, at the effective baryonic radius of the system, the typical acceleration σ2/R is always observed to be on the order of a0, thereby naturally explaining the linear relation between size and temperature for galaxy clusters [327, 392]. However, one of the main predictions of Milgrom’s formula is the baryonic Tully-Fisher relation (circular velocity vs. baryonic mass, Figure 3), and its equivalent for isotropic pressure-supported systems, the Faber-Jackson relation (stellar velocity dispersion vs. baryonic mass, Figure 7), both for their slope and normalization. For systems such as galaxy clusters, where the hot intra-cluster gas is the major baryonic component, this relation can also be translated into a “gas temperature vs. baryonic mass” relation, MbT2, plotted on Figure 39, as the line log(Mb/M) = 2 log(T/keV) + 12.9 (note that this differs slightly from [389] where solar metallicity gas is assumed). Note on this figure that observations are closer to the MOND predicted slope than to the conventional prediction of M ∝ T3/2 in ΛCDM, without the need to invoke preheating (a need that may arise as an artifact of the mismatch in slopes).

Figure 39
figure 39

The baryonic mass-X-ray temperature relation for rich clusters (gray triangles [359, 389]) and groups of galaxies (green triangles [12]). The solid line indicates the prediction of MOND: the data are reasonably consistent with the slope (MT2), but not with the normalization. This is the residual missing baryon problem in MOND: there should be roughly twice as much mass (on average) as observed. Also shown is the scaling relation expected in ΛCDM (dashed line [137]). This is in better (if not perfect) agreement with the normalization of the data for rich clusters, but not the slope. The difference is sometimes attributed to preheating of the gas [496], which might also occur in MOND.

So, interestingly, the data are still reasonably consistent with the slope predicted by MOND [383], but not with the normalization. There is roughly a factor of two of residual missing mass in these objects [170, 354, 387, 389, 392, 453]. This conclusion, reached from applying the hydrostatic equilibrium equation to the temperature profile of the X-ray emitting gas of these objects, has also been reached for low mass X-ray emitting groups [12]. This is essentially because, contrary to the case of galaxies, there is observationally a need for “Newtonian” missing mass in the central partsFootnote 42 of clusters, where the observed acceleration is usually slightly larger than a0, meaning that the MOND prescription is not enough to explain the observed discrepancy between visible and dynamical mass there. For this reason, the residual missing mass in MOND is essentially concentrated in the central parts of clusters, where the ratio of MOND dynamical mass to observed baryonic mass reaches a value of 10, to then only decrease to a value of roughly ∼ 2 in the very outer parts, where almost no residual mass is present. Thus, the profile of this residual mass would thus consist of a large constant density core of about 100–200 kpc in size (depending on the size of the group/cluster in question), followed by a sharp cutoff.

The need for this residual missing mass in MOND might be taken in one of the five following ways:

  1. (i)

    Practical falsification of MOND,

  2. (ii)

    Evidence for missing baryons in the central parts of clusters,

  3. (iii)

    Evidence for non-baryonic dark matter (existing or exotic),

  4. (iv)

    Evidence that MOND is an incomplete paradigm,

  5. (v)

    Evidence for the effect of additional fields in the parent relativistic theories of MOND, not included in Milgrom’s formula.

If (i) is correct, one still needs to explain the success of MOND on galaxy scales with ΛCDM. Such an explanation has yet to be offered. Thus, tempting as case (i) is, it is worth giving a closer inspection to the four other possibilities.

The second case (ii) would be most in line with the elegant absence of need for any non-baryonic mass in MOND (however, see the “dark fields” invoked in Section 7). It has happened before that most of the baryonic mass was in an unobserved component. From the 1930s when Zwicky first discovered the missing mass problem in clusters till the 1980s, it was widely presumed that the stars in the observed galaxies represented the bulk of baryonic mass in clusters. Only after the introduction of MOND (in 1983) did it become widely appreciated that the diffuse X-ray emitting intracluster gas (the ICM) greatly outweighed the stars. That is to say, some of the missing mass problem in clusters was due to optically dark baryons — instead of the enormous mass discrepancies implied by cluster dynamical mass to optical light ratios in excess of 100 [24], the ratio of dark to baryonic mass is only ∼ 8 conventionally [175, 278]. So we should not be too hasty in presuming we now have a complete census of baryons in clusters. Indeed, in the global baryon inventory of the universe, ∼ 30% of the baryons produced during BBN are missing (Figure 40), and presumably reside in some, as yet undetected, (dark) form. It is estimated [160, 421] that the observed baryons in clusters only account for about 4% of those produced during BBN (Figure 40). This is much less than the 30% of baryons that are still missing. Consequently, only a modest fraction of the dark baryons need to reside in clusters to solve the problem of missing mass in the central regions of clusters in MOND. It should be highlighted that this missing mass only appears in MOND for systems with a high abundance of ionised gas and X-ray emission. Indeed, for even smaller galaxy groups, devoid of gas, the MOND predictions for the velocity dispersions of individual galaxies are again perfectly in line with the observations [303, 307]. It is then0 no stretch of the imagination to surmise that these gas rich systems, where the residual-missing-baryons problem have equal quantities of molecular hydrogen or other molecules. Milgrom [310] has, e.g., proposed that the missing mass in MOND could entirely be in the form of cold, dense gas clouds. There is an extensive literature discussing searches for cold gas in the cores of galaxy clusters, but what is usually meant there is quite different from what is meant here, since those searches consisted in trying to find the signature of diffuse cold molecular gas at a temperature of ∼ 30 K. The proposition of Milgrom [310] rather relies on the work of Pfenniger & Combes [352], where dense gas clouds with a temperature of only a few Kelvin (∼ 3 K), solar-system size, and of a Jupiter mass, were considered to be possible candidates for both galactic and extragalactic dark matter. These clouds would behave in a collisionless way, just like stars. However, since the dark mass considered in the context of MOND cannot be present in galaxies, it is not subject to the galactic constraints on such gas clouds. Note that the total sky covering factor of such clouds in the core of the clusters would be on the order of only 10−4, so that they would only occult a minor fraction of the X-rays emitted by the hot gas (and it would be a rather constant fraction). For the same reason, the chances of a given quasar having light absorbed by them is very small. Still, [310] notes that these clouds could be probed through X-ray flashes coming out of individual collisions between them. Of course, this speculative idea also raises a number of questions, the most serious one being how these clumps form and stabilize, and why they form only in clusters, X-ray emitting groups and some ellipticals at the center of these groups and clusters, but not in individual spiral galaxies. As noted above, the fact that missing mass in MOND is necessarily associated with an abundance of ionised gas could be a hint at a formation and stabilization process somehow linked with the presence of hot gas and X-ray emission themselves. Then, there is the issue of knowing whether the cloud formation would be prior to or posterior to the cluster formation. We note that a rather late formation mechanism could help increase the metal abundance, solving the problem of small-scale variations of metallicity in clusters when the clouds are destroyed [330]. Milgrom [310] also noted that these clouds could alleviate the cooling flow conundrum, because whatever destroys them (e.g., cloud-cloud collisions and dynamical friction between the clouds and the hot gas) is conducive to heating the core gas, and thus preventing it from cooling too quickly. Such a heating source would not be transient and would be quite isotropic, contrary to AGN heating.

Figure 40
figure 40

The baryon budget in the low redshift universe adopted from [421]. The census of baryons includes the detected Warm-Hot Intergalactic Medium (WHIM), the Lymanα forest, stars in galaxies, detected cold gas in galaxies (atomic HI and molecular H2), other gas associated with galaxies (the Circumgalactic Medium, CGM), and the Intracluster Medium (ICM) of groups and clusters of galaxies. The sum of known baryons falls short of the density of baryons expected from BBN: ∼ 30% are missing. These missing baryons presumably exist in some as yet undetected (i.e., dark) form. If a fraction of these dark baryons reside in clusters (an amount roughly comparable to that in the ICM) it would suffice to explain the residual mass discrepancy problem MOND suffers in galaxy clusters.)

Another possibility (iii) would be that this residual missing mass in clusters is in the form of non-baryonic matter. There is one obviously existing form of such matter: neutrinos. If \({m_\nu} \approx \sqrt {\Delta {m^2}}\) [434], then the neutrino mass is too small to be of interest in this context. But there is nothing that prevents it from being larger (note that the “cosmological” constraints from structure formation in the ΛCDM context obviously do not apply in MOND). Actual model-independent experimental limits on the electron neutrino mass from the Mainz/Troitsk experiments, counting the highest energy electrons in the β-decay of Tritium [234] are mν < 2.2 eV. Interestingly, the KATRIN experiment (the KArlsruhe TRItium Neutrino experiment, under construction) will be able to falsify these 2 eV electron neutrinos at 95% confidence. If the neutrino mass is substantially larger than the mass differences, then all types have about the same mass, and the cosmological density of three left-handed neutrinos and their antiparticles [392] would be

$${\Omega _\nu} = 0.062\,{m_\nu},$$
(66)

where mν is the mass of a single neutrino type in eV. If one assumes that clusters of galaxies respect the baryon-neutrino cosmological ratio, and that the MOND missing mass is mostly made of neutrinos as suggested by [389, 392], then the mass of neutrinos must indeed be around 2 eV. Combined with the effect of additional degrees of freedom in relativistic MOND theories (Section 7), it has been shown that the CMB anisotropies could also be reproduced (see Section 9.2 and [430]), while this hot dark matter would obviously free-stream out of spiral galaxies and would thus not perturb the MOND fits of Section 6.5.1. The main limit on the neutrino ability to condense in clusters comes from the Tremaine-Gunn limit [463], stating that the phase space density must be preserved during collapse. This is a density level half the quantum mechanical degeneracy level in phase-space:

$${f_{\max}} = {1 \over 2}\sum\limits_{i = 1}^{i = 6} {{{m_{{\nu _i}}^4} \over {{h^3}}}.}$$
(67)

Converting this into configuration space, the maximum density for a cluster of a given temperature, T, is defined for a given mass of one neutrino type as [463]:

$${{\rho _\nu ^{\max}} \over {7 \times {{10}^{- 5}}{M_ \odot}\;{\rm{p}}{{\rm{c}}^{- 3}}}} = {\left({{T \over {1keV}}} \right)^{1.5}}{\left({{{{m_\nu}} \over {2eV}}} \right)^4}.$$
(68)

Assuming the temperature of the neutrino fluid as being equal (due to violent relaxation) to the mean emission weighted temperature of the gas, Sanders [389] showed that such 2 eV neutrinos at the limit of experimental detection could indeed account for the bulk of the dynamical mass in his sample of galaxy clusters of T > 4 keV (see also Section 8.3 for gravitational lensing constraints). This has the great advantage of naturally reproducing the proportionality of the electron density in the cores of clusters to T3/2, as observed in [392]. However, looking at the central region of low-temperature X-ray emitting galaxy groups, it was found [12] that the needed central density of missing mass far exceeded this limit by a factor of several hundred. One would need one neutrino species with m ∼ 10 eV to reach the required densities. One exotic possibility is then the idea of right-handed eV-scale sterile neutrinos [13]: as strange as this sounds, this mass for sterile neutrinos could also provide a good fit to the CMB acoustic peaks (see Section 9.2). This could indeed sound like the strangest and most complicated universe possible, combining true non-baryonic (hot) dark matter with a modification of gravity, but if this is what it takes to simultaneously explain the Kepler-like laws of galactic dynamics and the extragalactic evidence for dark matter, it is useful to remember that there are both good reasons for there being more particles than those of the standard model of particle physics and that there is no reason that general relativity should be valid over a wide range of scales where it has never been tested. In any case, experiments that can address the existence of such a ∼ 10 eV-scale sterile neutrino would thus be very interesting, as this kind of particle could provide the dark matter candidate only in a modified gravity framework, since such a hot dark matter particle would be unable to form small structures and to provide the dark matter that would be needed in galaxies.

Yet another possibility (iv) would be that MOND is incomplete, and that a new scale should be introduced, in order to effectively enhance the value of a0 in galaxy clusters, while lowering it to its preferred value in galaxies. There are several ways to implement such an idea. For instance, Bekenstein [36] proposed adding a second scale in order to allow for effective variations of the acceleration constant as a function of the deepness of the potential (Eq. 27). This idea should be investigated more in the future, but it is not clear that such a simple rescaling of a0 would account for the exact spatial distribution of the residual missing mass in MOND clusters, especially in cases where it is displaced from the baryonic distribution (see Section 8.3). However, as even Gauss’ theorem would not be valid anymore in spherical symmetry, the high non-linearity might provide non-intuitive results, and it would thus clearly be worth investigating this suggestion in more detail, as well as developing similar ideas with other additional scales in the future (such as, for instance, the baryonic matter density; see [82, 143] and Section 7.6).

Finally, as we shall see in Section 7, parent relativistic theories of MOND often require additional degrees of freedom in the form of “dark fields”, which can nevertheless be globally subdominant to the baryon density, and thus do not necessarily act precisely as true “dark matter”. Thus, the last possibility (v) is that these fields, which are obviously not included in Milgrom’s formula, are responsible for the cluster missing mass in MOND. An example of such fields are the vector fields of TeVeS (Section 7.4) and Generalized Einstein-Aether theories (Section 7.7). It has been shown (see Section 9.2) that the growth of the spatial part of the vector perturbation in the course of cosmological evolution can successfully seed the growth of baryonic structures, just as dark matter does. If these seeds persist, it was shown [112] that they could behave in very much the same way as a dark matter halo in relatively unrelaxed galaxy clusters. However, it remains to be seen whether the spatially-concentrated distribution of missing mass in MOND would be naturally reproduced in all clusters. In other relativistic versions of MOND (see, e.g., Sections 7.6 and 7.9), the “dark fields” are truly massive and can be thought of as true dark matter (although more complex than simple collisionless dark matter), whose energy density outweighs the baryonic one, and could provide the missing mass in clusters. However, again, it is not obvious that the centrally-concentrated distribution of residual missing mass in clusters would be naturally reproduced. All in all, there is no obviously satisfactory explanation for the problem of residual missing mass in the center of galaxy clusters, which remains one of the most serious problems facing MOND.

7 Relativistic MOND Theories

In Section 6, we have considered the classical theories of MOND and their predictions in a vast number of astrophysical systems. However, as already stated at the beginning of Section 6, these classical theories are only toy-models until they become the weak-field limit of a relativistic theory (with invariant physical laws under differentiable coordinate transformations), i.e., an extension of general relativity (GR) rather than an extension of Newtonian dynamics. Here, we list the various existing relativistic theories boiling down to MOND in the quasi-static weak-field limit. It is useful to restate here that the motivation for developing such theories is not to get rid of dark matter but to explain the Kepler-like laws of galactic dynamics predicted by Milgrom’s law (see Section 5). As we shall see, many of these theories include new fields, so that dark matter is often effectively replaced by “dark fields” (although, contrary to dark matter, their energy density can be subdominant to the baryonic one; note that, even more importantly, in a static configuration these dark fields are fully determined by the baryons, contrary to the traditional dark matter particles, which may, in principle, be present independent of baryons).

These theories are great advances because they enable us to calculate the effects of gravitational lensing and the cosmological evolution of the universe in MOND, which are beyond the capabilities of classical theories. However, as we shall see, many of these relativistic theories still have their limitations, ranging from true theoretical or observational problems to more aesthetic problems, such as the arbitrary introduction of an interpolating function (Section 6.2) or the absence of an understanding of the \(\Lambda \sim a_0^2\) coincidence. What is more, the new fields introduced in these theories have no counterpart yet in microphysics, meaning that these theories are, at best, only effective. So, despite the existing effective relativistic theories presented here, the quest for a more profound relativistic formulation of MOND continues. Excellent reviews of existing theories can also be found in, e.g., [34, 35, 81, 100, 136, 183, 318, 429, 431].

The heart of GR is the equivalence principle(s), in its weak (WEP), Einstein (EEP) and strong (SEP) form. The WEP states the universality of free fall, while the EEP states that one recovers special relativity in the freely falling frame of the WEP. These equivalence principles are obtained by assuming that all known matter fields are universally and minimally coupled to one single metric tensor, the physical metric. It is perfectly fine to keep these principles in MOND, although certain versions can involve another type of (dark) matter not following the same geodesics as the known matter, and thus effectively violating the WEP. Additionally, note that the local Lorentz invariance of special relativity could be spontaneously violated in MOND theories. The SEP, on the other hand, states that all laws of physics, including gravitation itself, are fully independent of velocity and location in spacetime. This is obtained in GR by making the physical metric itself obey the Einstein-Hilbert action. This principle has to be broken in MOND (see also Section 6.3). We now recall how GR connects with Newtonian dynamics in the weak-field limit, which is actually the regime in which the modification must be set in order to account for the MOND phenomenology of the ultra-weak-field limit. The action of GR written as the sum of the matter action and the Einstein-Hilbert (gravitational) actionFootnote 43:

$${S_{{\rm{GR}}}} \equiv {S_{{\rm{matter}}}}[{\rm{matter}},{g_{\mu \nu}}] + {{{c^4}} \over {16\pi G}}\int {{d^4}x\sqrt {- g} R,}$$
(69)

where g denotes the determinant of the metric tensor gμv with (−, +, +, +) signatureFootnote 44, and R = Rμvgμv is its scalar curvature, Rμv being the Ricci tensor (involving second derivatives of the metric). The matter action is a functional of the matter fields, depending on them and their first derivatives. For instance, the matter action of a free point particle Spp writes:

$${S_{{\rm{pp}}}} \equiv - \int {mcds = - \int {mc\sqrt {- {g_{\mu \nu}}(x){v^\mu}{v^\nu}} dt,}}$$
(70)

depending on the positions x and on their time-derivatives υμ. Varying the matter action with respect to (w.r.t.) matter fields degrees of freedom yields the equations of motion, i.e., the geodesic equation in the case of a point particle:

$${{{d^2}{x^\mu}} \over {d{\tau ^2}}} = - \Gamma _{\alpha \beta}^\mu {{d{x^\alpha}} \over {d\tau}}{{d{x^\beta}} \over {d\tau}},$$
(71)

where the proper time τ = s is approximately equal to ct for slowly moving non-relativistic particles, and is the Christoffel symbol involving first derivatives of the metric. On the other hand, varying the total action w.r.t. the metric yields Einstein’s field equations:

$${R_{\mu \nu}} - {1 \over 2}R{g_{\mu \nu}} = {{8\pi G} \over {{c^4}}}{T_{\mu \nu}},$$
(72)

where Tμv is the stress-energy tensor defined as the variation of the Lagrangian density of the matter fields over the metric.

In the static weak-field limit, the metric is written as (up to third-order corrections in 1/c3)Footnote 45:

$${g_{0i}} = {g_{i0}} = 0,\quad {g_{00}}\underbrace = _{{\rm{Taylor}}} - 1 - {{2\Phi} \over {{c^2}}},\quad {g_{ij}}\underbrace = _{{\rm{Taylor}}}\left({1 + {{2\Psi} \over {{c^2}}}} \right){\delta _{ij}},$$
(73)

where, in GR,

$$\Phi = {\Phi _N}\;{\rm{and}}\;\Psi = - {\Phi _N},$$
(74)

and Φn is the Newtonian gravitational potential. From the (0,0) components of the weak-field metric, one gets back Newton’s second law for massive particles \({d^2}{x^i}/d{t^2} = - \Gamma _{00}^i = - \partial {\Phi _N}/d{x^i}\) from the geodesic equation (Eq. 71). On the other hand, Einstein’s equations (Eq. 72) give back the Newtonian Poisson equation ∇2ΦN = 4π. Thus, the metric plays the role of the gravitational potential, and the Christoffel symbol plays the role of acceleration. Note, however, that if timelike geodesics are determined by the (0, 0) component of the metric, this is not the case for null geodesics. While the gravitational redshift for light-rays is solely governed by the g00 component of the metric too, the deflection of light is, on the other hand, also governed by the components (more specifically by Φ — Ψ in the weak-field limit). This means that, in order for the anomalous effects of any modified gravity theory on lensing and dynamics to correspond to a similarFootnote 46 amount of “missing mass” in GR, it is crucial that Ψ ≃ −Φ in Eq. 73.

7.1 Scalar-tensor k-essence

MOND is an acceleration-based modification of gravity in the ultra-weak-field limit, but since the Christoffel symbol, playing the role of acceleration in GR, is not a tensor, it is, in principle, not possible to make a general relativistic theory depend on it. Another natural way to account for the departure from Newtonian gravity in the weak-field limit and to account for the violation of the SEP inherent to the external field effect is to resort to a scalar-tensor theory, as first proposed by [38]. The added scalar field can play the role of an auxiliary potential, and its gradient then has the dimensions of acceleration and can be used to enforce the acceleration-based modification of MOND.

The relativistic theory of [38] depends on two fields, an “Einstein metric” \({{\tilde g}_{\mu \nu}}\) and a scalar field ϕ. The physical metric gμv entering the matter action is then given by a conformal transformation of the Einstein metricFootnote 47 through an exponential coupling function:

$${g_{\mu \nu}} \equiv {e^{2\phi}}{\tilde g_{\mu \nu}}.$$
(75)

In order to recover the MOND dynamics, the Einstein-Hilbert action (involving the Einstein metric) remains unchanged \(\left({\int {{d^4}x\sqrt {- \tilde g} \tilde R}} \right)\), and the dimensionless scalar field is given a k-essence action, with no potential and a non-linear, aquadratic, kinetic termFootnote 48 inspired by the AQUAL action of Eq. 16:

$${S_\phi} \equiv - {{{c^4}} \over {2{k^2}{l^2}G}}\int {{d^4}x\sqrt {- \tilde g} f(X),}$$
(76)

where k is a dimensionless constant, l is a length-scale, \(X = k{l^2}{{\tilde g}^{\mu \nu}}\phi {,_\mu}\phi {,_\nu}\), and f (X) is the “MOND function”. Since the action of the scalar field is similar to that of the potential in the Bekenstein-Milgrom version of classical MOND, this relativistic version is known as the Relativistic Aquadratic Lagrangian theory, RAQUAL.

Varying the action w.r.t., the scalar field yields, in a static configuration, the following modified Poisson’s equation for the scalar field:

$${c^2}\nabla .[\nabla \phi {f{\prime}}(k{l^2}\vert \nabla \phi \vert ^{2})] = kG\rho ,$$
(77)

and the (0, 0) component of the physical metric is given by \({g_{00}} = - {e^{2({\Phi _N} + {c^2}\phi)/{c^2}}}\), leading us precisely to the situation of Eq. 40 in the weak-field, with Φ = Φn + c2ϕ, with

$$s = ({c^2}/{a_0})\vert \nabla \phi \vert = {(X{c^4}/k{l^2}a_0^2)^{1/2}}$$
(78)

and

$$\tilde \mu (s) = (4\pi {c^2}/k){f{\prime}}(X),$$
(79)

whose finely tuned relation with the μ-function of Milgrom’s law is extensively described in Section 6.2. We note that the standard choice for X ≪ 1 is f′ (X) ∼ (X/3)1/2, meaning that in order to recover \(\tilde \mu ({s}) = {s}/\xi\) for small s, where ξ = Gn/G (see Section 6.2), one must define the length-scale as

$$l \equiv ({c^2}\sqrt {3k})/(4\pi \xi {a_0}){.}$$
(80)

It was immediately realized [38] that a k-essence theory such as RAQUAL can exhibit superluminal propagations whenever f″(X) > 0 [80]. Although it does not threaten causality [80], one has to check that the Cauchy problem is still well-posed for the field equations. It has been shown [80, 360] that it requires the otherwise free function f to satisfy the following properties, ∀X:

$${f{\prime}}(X) > 0$$
(81)
$${f{\prime}}(X) + 2X{f{\prime\prime}}(X) > 0,$$
(82)

which is the equivalent of the constraints of Eq. 37 on Milgrom’s μ-function.

However, another problem was immediately realized at an observational level [38, 40]. Because of the conformal transformation of Eq. 75, one has that Ψ ≠ −Φ in the RAQUAL equivalent of Eq. 73. In other words, as it is well-known that gravitational lensing is insensitive to conformal rescalings of the metric, apart from the contribution of the stress-energy of the scalar field to the source of the Einstein metric [40, 81], the “non-Newtonian” effects of the theory respectively on lensing and dynamics do not at all correspond to similar amounts of “missing mass”. This is also considered a generic problem with any local pure metric formulation of MOND [441].

7.2 Stratified theory

A solution to the above gravitational lensing problem due to the conformal rescaling of the metric in RAQUAL has been presented in [385]. Inspired by “stratified” theories of gravity [334], Sanders [385] suggested, in addition to the scalar field ϕ of RAQUAL, the use of a non-dynamical timelike vector field Uμ = (−1,0,0,0) with unit-norm U2 = −1 (in terms of the Einstein metric), in order to enforce a disformal relation between the Einstein and physical metrics:

$${g_{\mu \nu}} \equiv {e^{- 2\phi}}{\tilde g_{\mu \nu}} - 2\sinh (2\phi){U_\mu}{U_\nu}.$$
(83)

The second term only affects the g00 component, and it then appears immediately that Ψ = −Φ in the weak-field limit (rhs terms of Eq. 73), and the problem of lensing is cured. However, the prescription that a 4-vector points in the time direction is not a covariant one, and the theory should involve strong preferred frame effects, although these can now be fully suppressed, as well as any deviation from GR at small distances, with an appropriate additional “Galileon” term in addition to the asymptotic deep-MOND k-essence term in the action of the scalar field [22] (the other advantage being that the interpolating function then does not have to be inserted by hand). In any case, endowing the vector field with covariant dynamics of its own has been the next logical step in developing relativistic MOND theories.

7.3 Original Tensor-Vector-Scalar theory

The idea of the Tensor-Vector-Scalar theory of Bekenstein [33], dubbed TeVeS, is to keep the disformal relation of Eq. 83 between the Einstein metric \({{\tilde g}_{\mu \nu}}\) and the physical metric \({g_{\mu \nu}}\) to which matter fields couple, but to replace the above non-dynamical vector field by a dynamical vector field Uμ with an action (K being a dimensionless constant):

$${S_U} \equiv - {{{c^4}} \over {16\pi G}}\int {{d^4}x\sqrt {- \tilde g}} \left[ {{K \over 2}{{\tilde g}^{\alpha \beta}}{{\tilde g}^{\mu \nu}}{U_{[\alpha ,\mu ]}}{U_{[\beta ,\nu ]}} - \lambda ({{\tilde g}^{\mu \nu}}{U_\mu}{U_\nu} + 1)} \right],$$
(84)

akin to that of the electromagnetic 4-potential vector field (U[μv] playing the role of the Faraday tensor), but without the coupling term to the 4-current, and with a constraint term forcing the unit norm \({U^\mu}{U_\nu} = {{\tilde g}^{\mu \nu}}{U_\mu}{U_\nu} = - 1\) (λ being a Lagrange multiplier function, to be determined as the equations are solved). The first term in the integrand takes care of approximately aligning Uμ with the 4-velocity of matter (when simultaneously solving for (i) the Einstein-like equation of the Einstein metric \({{\tilde g}_{\mu \nu}}\), and for (ii) the vector equation obtained by varying the total action with respect to Uμ).

Finally, the k-essence action for the scalar field is kept as in RAQUAL (Eq. 76), but with

$${X_{{\rm{TeVeS}}}} = k{l^2}({\tilde g^{\mu \nu}} - {U^\mu}{U^\nu}){\phi _{,\mu}}{\phi _{,\nu}}.$$
(85)

Contrary to RAQUAL, this scalar field exhibits no superluminal propagation modes. However, [81] noted that such superluminal propagation might have to be re-introduced in order to avoid excessive Cherenkov radiation and suppression of high-energy cosmic rays (see also [320]).

The static weak-field limit equation for the scalar field is precisely the same as Eq. 77, and the scalar field enters the static weak field metric Eq. 73 as Φ = −Ψ = ΞΦN + c2ϕ meaning that lensing and dynamics are compatible, with Ξ being a factor depending on K and on the cosmological value of the scalar field (see Eq. 58 of [33]). This can be normalized to yield Ξ = 1 at redshift zero. Again, all the relations between the free function and Milgrom’s μ-function can be found in Section 6.2 (see also [145, 431]).

This theory has played a true historical role as a proof of concept that it was possible to construct a fully relativistic theory both enhancing dynamics and lensing in a coherent way and reproducing the MOND phenomenology for static configurations with the dynamical 4-vector pointing in the time direction. However, the question remained whether these static configurations would be stable. What is more, although a classical HamiltonianFootnote 49 unbounded from below in flat spacetime would not necessarily be a concern at the classical level (and even less if the model is only “phenomenological”), it would inevitably become a worry for the existence of a stable quantum vacuum (see however [196]). And indeed, it was shown in [98] that models with such “Maxwellian” vector fields having a TeVeS-like Lagrange multiplier constraint in their action have a corresponding Hamiltonian density that can be made arbitrarily large and negative (see also Section IV.A of [81]). What is more, even at the classical level, it has been shown that spherically-symmetric solutions of TeVeS are heavily unstable [412, 413], and that this type of vector field causes caustic singularities [105], in the sense that the integral curves of the vector are timelike geodesics meeting each other when falling into gravity potential wells. Thus, another form was needed for the action of the TeVeS vector field.

7.4 Generalized Tensor-Vector-Scalar theory

The generalization of TeVeS was proposed by Skordis [428]. Inspired by the fact that Einstein-Aether theories [206, 207] also present instabilities when the unit-norm vector field is “Maxwellian” as above, it was simply proposed to use a more general Lagrangian density for the vector field, akin to that of Einstein-Aether theories:

$${S_U} \equiv - {{{c^4}} \over {16\pi G}}\int {{d^4}x\sqrt {- \tilde g}} \;[{K^{\alpha \beta \mu \nu}}{U_{\beta ,\alpha}}{U_{\nu ,\mu}} - \lambda ({\tilde g^{\mu \nu}}{U_\mu}{U_\nu} + 1)],$$
(86)

where

$${K^{\alpha \beta \mu \nu}} = {c_1}{\tilde g^{\alpha \mu}}{\tilde g^{\beta \nu}} + {c_2}{\tilde g^{\alpha \beta}}{\tilde g^{\mu \nu}} + {c_3}{\tilde g^{\alpha \nu}}{\tilde g^{\beta \mu}} + {c_4}{U^\alpha}{U^\mu}{\tilde g^{\beta \nu}}$$
(87)

for a set of constants c1, c2, c3, c4. Interestingly, spherically-symmetric solutions depend only on the combination c1c4, not on c2 and c3 that can, in principle be chosen to avoid the instabilities of the original TeVeS theory. Of course, the original unstable theory is also included in this generalization through a specific combination of the four ci (see, e.g., [431]).

Thus, this generalized version is the current “working version” of what is now called TeVeS: a tensor-vector-scalar theory with an Einstein-like metric, an Einstein-Aether-like unit-norm vector field, and a k-essence-like scalar field, all related to the physical metric through Eq. 83. It has been extensively studied, both in its original and generalized form. It has for instance been shown that, contrary to many gravity theories with a scalar sector, the theory evidences no cosmological evolution of the Newtonian gravitational constant and only minor evolution of Milgrom’s constant a0 [145, 39]. However, the fact that the latter is still put in by hand through the length-scale of the theory lc2/a0, and has no dynamical connection with the Hubble or cosmological constant is perhaps a serious conceptual shortcoming, together with the free function put by hand in the action of the scalar field (but see [22] for a possible solution to the latter shortcoming). The relations between this free function and Milgrom’s μ can be found in [145, 431] (see also Section 6.2), the detailed structure of null and timelike geodesics of the theory in [431], the analysis of the parametrized post-Newtonian coefficients (including the preferred-frame parameters quantifying the local breaking of Lorentz invariance) in [173, 372, 391, 450], solutions for black holes and neutron stars in [244, 245, 247, 246, 374, 438, 439], and gravitational waves in [216, 214, 215, 373]. It is important to remember that TeVeS is not equivalent to GR in the strong regime, which is why it can be tested there, e.g., with binary pulsars or with the atomic spectral lines from the surface of stars [122], or other very strong field effectsFootnote 50. However, these effects can always generically be suppressed (at the price of introducing a Galileon type term in the action [22]), and such tests would never test MOND as a paradigm. It is by testing gravity in the weak field regime that MOND can really be put to the test.

Finally, let us note that TeVeS (and its generalization) has been shown to be expressible (in the “matter frame”) only in terms of the physical metric gμv, and the vector field Uμ [513], the scalar field being eliminated from the equations through the “unit-norm” constraint in terms of the Einstein metric \({{\tilde g}^{\mu \nu}}{U_\mu}{U_\nu} = - 1\), leading to gμvUμUv = −e−2ϕ. In this form, TeVeS is sometimes thought of as GR with an additional “dark fluid” described by a vector field [503].

7.5 Bi-Scalar-Tensor-Vector theory

In TeVeS [33], the “MOND function” f (XTeVeS) of Eq. 76, where \({X_{{\rm{TeVeS}}}} \sim ({{\tilde g}^{\mu \nu}} - {U^\mu}{U^\nu})\phi {,_\mu}\phi {,_\nu}\), could also be expressed as a potential V of a non-dynamical scalar field q, i.e., a scalar action for TeVeS of the form:

$${S_\phi} \propto - \int {{d^4}x\sqrt {- \tilde g}} \left[ {{1 \over 2}{q^2}{X_{{\rm{TeVeS}}}} + V(q)} \right].$$
(88)

After variation of the action w.r.t. this non-dynamical field, one gets qX = −V′(q), and variation w.r.t. ϕ yields the usual BM Poisson equation for ϕ (Eq. 17), with \({q^2} \propto \tilde \mu (\sqrt X)\). Inspired by an older theory (Phase Coupling Gravity [32, 381]) devised in a partially successful attempt to eliminate superluminal propagation from RAQUAL (but plagued with the same gravitational lensing problem as RAQUAL, and with additional instabilities), Sanders [390] proposed to make this field dynamical by adding a kinetic term \({{\tilde g}^{\mu \nu}}q{,_\mu}q{,_\nu}\) in the action, leading to the following very general action for the scalar fields ϕ and q:

$${S_{(\phi \;q)}} \propto - \int {{d^4}x\sqrt {- \tilde g}} \left[ {{1 \over 2}({{\tilde g}^{\mu \nu}}q{,_\mu}q{,_\nu} + H(q)({{\tilde g}^{\mu \nu}} + {U^\mu}{U^\nu})\phi {,_\mu}\phi {,_\nu}) - F(q){U^\mu}{U^\nu}\phi {,_\mu}\phi {,_\nu} + V(q)} \right].$$
(89)

In this theory (dubbed BSTV for bi-scalar-tensor-vector theory), the physical metric has the same form as in TeVeS, meaning that ϕ is the matter-coupling scalar field, while q only influences the strength of that coupling. A remarkable achievement of the theory is that the quasi-static field equation for ϕ can be obtained only in a cosmological context, and thereby naturally explains the connection between a0 and H0 [390]. What is more, oscillations of the q field around its expectation value can be considered as massive dark matter, and is allowing an explanation of the peaks of the angular power spectrum of the Cosmic Microwave Background [390]. Unfortunately, various instabilities and a Hamiltonian unbounded by below have been evidenced in Section IV.A of [81], thus most likely ruling out this theory, at least in its present form.

7.6 Non-minimal scalar-tensor formalism

As a consequence of the inability of RAQUAL (the scalar-tensor k-essence of Section 7.1) to enhance gravitational lensing, all other attempts reviewed so far (Sections 7.2 to 7.5) have been plagued with an aesthetically unpleasant growth of additional fields and free parameters. This has led Bruneton & Esposito-Farèse [81] to consider models with fewer additional fields. They first considered pure metric theories in which matter is not only coupled to the metric but also non-minimally to its curvature (Eqs. 5.1 and 5.2 of [81]). While they showed that such models can indeed reproduce the MOND dynamics, they also concluded that they are generically unstable if locality is to be preserved (but see Section 7.10). They then considered models in which at most one scalar field is added, without any additional vector field, but where this field is coupled non-minimally to matter, in the sense that the matter-coupling depends on the scalar field itself but also on its first derivatives. In other words, the gradient of the scalar field is replacing the dynamical vector field of TeVeS. The simple scalar field action is just the normal action of a massive scalar field:

$${S_\phi} = - {{{c^4}} \over {8\pi G{l^2}}}\int {{d^4}x\sqrt {- \tilde g}} [X + 2V(\phi)],$$
(90)

with \(X = {l^2}{{\tilde g}^{\mu \nu}}\phi {,_\mu}\phi {,_\nu}\) and V(ϕ) = l2m2ϕ2/2. The physical metric gμv is then disformally related to the Einstein metric through (see Eq. 5.11 of [81]):

$${g_{\mu \nu}} \equiv {A^2}{\tilde g_{\mu \nu}} + B\phi _{,\mu}\phi _{,\nu},$$
(91)

with the functionals

$$A(\phi ,X) = {e^{\eta \phi}} - \phi h(Y)Y/\eta ,\;B(\phi ,X) = - 4\phi {\eta ^{- 1}}Y/X,$$
(92)

where Y = (ηa0)1/2c−1X−1/4. The free function h(Y) is the “MOND function” playing the role of Milgrom’s μ. An alternative formulation of the model is obtained by separating the matter action into a normal matter action and an “interaction term” between the scalar field, the metric and the matter fields [82]. Considering the massive scalar field as a dark matter fluid, this model can thus be interpreted as a non-standard baryon-dark-matter interaction leading to the MOND behavior. If the scalar mass is small enough, it is a pure MOND theory, but if it is higher, it can lead to a “DM+ MOND” behavior, especially noteworthy in regions of high gravity such as the center of galaxy clusters (see Section 6.6.4 and discussions in [82]). Let us note that, while this theory exhibits superluminal propagations outside of matter, it is, in principle, not a problem for causality [80]. It has also been possible to study the behavior of the theory within matter, e.g., within the dilute HI gas inside galaxy disks (an analysis, which is mostly too difficult to perform in other models reviewed so far): this led to a deadly problem, i.e., that the Cauchy problem becomes ill-posed and the solutions to field equations ill-defined. A possible solution was proposed in [82], namely to make the matter coupling (or, equivalently, the baryon-scalar DM interaction) depend on the local density of matterFootnote 51: this can also lead to an interesting phenomenology, where only gas-rich systems behave according to Milgrom’s law, while others would behave in a CDM way [143]. A lot remains to be studied within this framework.

7.7 Generalized Einstein-Aether theories

All theories reviewed so far are best expressed in the “Einstein frame”, and involve a form for the physical metric to which matter couples (an form expressed as a function of the Einstein metric and of the other additional fields). However, the work of [513] has shown that, for instance, TeVeS (Sections 7.3 and 7.4) is expressible as a pure Tensor-Vector theory in the matter frame, and that the physical metric then both satisfies the Einstein-Hilbert action and couples minimally to the matter fields, just like in GR. In fact, the modification of gravity in TeVeS thus only comes from the coupling of the physical metric to the vector field. The idea of Zlosnik et al. [514] was then that a similar, but simpler, modification of gravity could be obtained by devising a simple tensor-vector theory in the matter frame, with no a priori on the geometry of the physical metric. Starting from the extensively studied Einstein-Aether theories [206, 207], with a vector action of the type of Eq. 86, the idea is to make the k-essence free function f (X) (the “MOND function” of Eq. 76) act directly on the vector field rather than on an additional scalar field. This leads to vector kessence, or Generalized Einstein-Aether (GEA) theories (also called non-canonical Einstein-Aether theories), in which the Einstein-Hilbert and matter actions remain as in GR, but with an additional unit-norm vector field with the following action [431, 514]:

$${S_U} \equiv - {{{c^4}} \over {16\pi G{l^2}}}\int {{d^4}x\sqrt {- g}} \;[f({X_{{\rm{gea}}}}) - {l^2}\lambda ({g^{\mu \nu}}{U_\mu}{U_\nu} + 1)],$$
(93)

where (see Eq. 87 and replacing \({{\tilde g}^{\mu \nu}}\) by \({g^{\mu \nu}}\)

$${X_{{\rm{gea}}}} = {l^2}{K^{\alpha \beta \mu \nu}}{U_{\beta ,\alpha}}{U_{\nu ,\mu}}.$$
(94)

The unit-norm constraint fixes the vector field in terms of the metric, and from there we have that, in the weak-field limit, Xgea ∝ − |∇Φ|2, with Φ defined as in Eq. 73. The Einstein equation in the weak-field limit then yields a BM type of Poisson equation (Eq. 17) for the full gravitational potential Φ, with μ = f′ + (1 − f′)/(1 − C/2) and C = c1c4 [431]. In the deep-MOND limit, the usual choice for f is of the type \(f({X_{{\rm{gea}}}}) \propto {(- {X_{{\rm{gea}}}})^{3/2}} + 2{X_{{\rm{gea}}}}/C\), and the length-scale must be fixed as:

$$l \equiv {{(2 - C){c^2}} \over {3/2{C^{3/2}}{a_0}}}.$$
(95)

Let us note that this weak-field limit of GEA theories is different from that of RAQUAL or TeVeS, where only the scalar field ϕ obeys a BM-like equation governed by an interpolating function \(\tilde \mu ({s})\), and where the total potential is given by Eq. 40.

The remarkable feature of GEA theories allowing for the desired enhancing of gravitational lensing without any on the form of the physical metric is that, writing the metric as in Eq. 73, it can be shown [431] that in the limit Xgea → 0 the action of Eq. 93 is only a function of ϒ = Φ + Ψ and is thus invariant under disformal transformations [Φ → Φ + β(r); Ψ → Ψ − β(r)], of the type of Eq. 83. These GEA theories are currently extensively studied, mostly in a cosmological context (see Section 9), but also for their parametrized post-Newtonian coefficients in the solar system [65] or for black hole solutions [451].

Interestingly, it has been shown that all these vector field theories (TeVeS, BSTV, GEA) are all part of a broad class of theories studied in [183]. Yet other phenomenologically-interesting theories exist among this class, such as, for instance, the VΛ models considered by Zhao & Li [502, 506, 510] with a dynamical norm vector field, whose norm obeys a potential (giving it a mass) and has a non-quadratic kinetic term à-la-RAQUAL, in order to try reproducing both the MOND phenomenology and the accelerated expansion of the universe, while interpreting the vector field as a fluid of neutrinos with varying mass [504, 505]. This has the advantage of giving a microphysics meaning to the vector field. Such vector fields have also been argued to arise naturally from dimensional reduction of higher-dimensional gravity theories [34, 261], or, more generally, to be necessary from the fact that quantum gravity could need a preferred rest frame [206] in order to protect the theory against instabilities when allowing for higher derivatives to make the theory renormalizable (e.g., in Horava gravity [64, 195]). Inspired by this possible need of a preferred rest frame in quantum gravity, relativistic MOND theories boiling down to particular cases of GEA theories in which the vector field is hypersurface-orthogonal have, for instance, been proposed in [61, 396].

7.8 Bimetric theories

In the previous theories, the acceleration-dependence of MOND enters the equations through a free “MOND function” f (X) acting either on the contracted gradient of an added scalar field, with dimensions of acceleration (Eq. 85), or on a scalar formed with the first derivatives of a vector field (Eq. 94) with a unit-norm constraint relating it to the gradient of the potential in the physical metric. The “MOND function” could not act directly on the Christoffel symbol because this is not a tensor, and such a theory would thus violate general covariance. However, if there is more than one metric entering gravitation, the difference between the associated Christoffel symbols is a tensor, and one can construct from it a scalar with dimensions of acceleration, on which the “MOND function” can act. Such theories in which there are two dynamical rank-2 symmetric tensor fields are called bimetric theories [204, 205, 369]. Milgrom [312, 317] proposed to construct a whole parametrized class of bimetric MOND theories (dubbed BIMOND), involving an auxiliary metric, with various phenomenological behaviors in the weak-field limit, ranging from Bekenstein-Milgrom MOND to QUMOND as well as a mix of both (see [318]). As one example (parameters α = −β = − 1 in the general class of BIMOND theories, for which we refer the reader to the review [318]), the auxiliary metric \({{\hat g}_{\mu \nu}}\) can, e.g., be introduced precisely in the same way as the auxiliary potential Φph in the QUMOND classical action of Eq. 34:

$$S \equiv {S_{\rm{m}}}[{\rm{matter}},{g_{\mu \nu}}] + {S_{\rm{m}}}[{\rm{twin}}\;{\rm{matter}},{\hat g_{\mu \nu}}] + {{{c^4}} \over {16\pi G}}\int {{d^4}x\sqrt {- g}} \;[R - \hat R - 2{l^{- 2}}f({X_{{\rm{BIMOND}}}})],$$
(96)
$${X_{{\rm{BIMOND}}}} = {l^2}{g^{\mu \nu}}(C_{\mu \beta}^\alpha C_{\nu \alpha}^\beta - C_{\mu \nu}^\alpha C_{\beta \alpha}^\beta),$$
(97)

where \(C_{\mu \nu}^\alpha = \Gamma _{\mu \nu}^\alpha - \hat \Gamma _{\mu \nu}^\alpha\). The MONDian modification of gravity is thus introduced through the interaction between the spacetime on which matter lives and the auxiliary spacetime (on which some “twin matter” might live). This modification is acceleration-based since the interaction involves the difference of Christoffel symbols, playing the role of acceleration. By varying the action w.r.t. both metrics, we obtain two sets of Einstein-like equations, which boil down in the static weak-field limit to \(\hat \Phi = - \hat \Psi\) and Φ = −Ψ in Eq. 73 (so this yields the correct amount of gravitational lensing for normal photons w.r.t. the “matter metric” gμv), as well as the following generalized Poisson equations:

$${\nabla ^2}\Phi = 4\pi G\rho + \nabla .[{f{\prime}}(\vert \nabla (\Phi - \hat \Phi)\vert ^{2}/a_0^2)\nabla (\Phi - \hat \Phi)]\;{\rm{and}}\;{\nabla ^2}\hat \Phi = 4\pi G\hat \rho + \nabla .[{f{\prime}}(\vert \nabla (\Phi - \hat \Phi)\vert ^{2}/a_0^2)\nabla (\Phi - \hat \Phi)]$$
(98)

or, equivalently,

$${\nabla ^2}(\Phi - \hat \Phi) = 4\pi G(\rho - \hat \rho)\;{\rm{and}}\;{\nabla ^2}\Phi = 4\pi G\rho + \nabla .[{f{\prime}}(\vert \nabla (\Phi - \hat \Phi)\vert ^{2}/a_0^2)\nabla (\Phi - \hat \Phi)].$$
(99)

This is equivalent to QUMOND (Eq. 30) if the matter and twin matter are well separated (which is natural if they repel each other), the function f playing the role of H in Eq. 34, with f′(Xbimond) → 0 for Xbimond ≫ 1 and \({f{\prime}}({X_{{\rm{BIMOND}}}}) \rightarrow X_{{\rm{BIMOND}}}^{- 1/4}\,{\rm{for}}\,{X_{{\rm{BIMOND}}}} \ll 1\). Note that the existence of this putative twin matter is far from being necessary (putting \(\hat \rho = 0\) everywhere yields exactly QUMOND), but it might be suggested by the existence of the auxiliary metric within the theory. Again, it is mandatory to stress that the formulation of BIMOND sketched above is actually far from unique and can be suitably parametrized to yield a whole class of BIMOND theories with various phenomenological behaviors [312, 317, 318]. For instance, in matter-twin matter symmetric versions of BIMOND (α = β =1, see [318]), and within a fully symmetric matter-twin matter system, a cosmological constant is given by the zero-point of the MOND function, naturally on the order of one, thereby naturally leading to \(\Lambda \sim a_0^2\) for the large-scale universe. Matter and twin matter would not interact at all in the high-acceleration regime, and would repel each other in the MOND regime (i.e., when the acceleration difference of the two sectors is small compared to a0), thereby possibly playing a crucial role in the universe expansion and structure formation [316].

This promising broad class of theories should be carefully theoretically investigated in the future, notably against the existence of ghost modes [69]. At a more speculative level, this class of theories can be interpreted as a modification of gravity arising from the interaction between a pair of membranes: matter lives on one membrane, twin matter on the other, each membrane having its own standard elasticity but coupled to the other one. The way the shape of the membrane is affected by matter then depends on the combined elasticity properties of the double membrane, but matter response depends only on the shape of its home membrane. Interestingly, bimetric theories have also been advocated [256] to be a useful ingredient for the renormalizability of quantum gravity (although they currently considered theories with only metric interactions, not derivatives like in BIMOND).

7.9 Dipolar dark matter

As we have seen, many relativistic MOND theories do invoke the existence of new “dark fields” (scalar or vector fields), which, if massive, can even sometimes truly be thought of as “dark matter” enjoying non-standard interactions with baryonsFootnote 52 (Section 7.6 and [82]). The bimetric version of MOND (Section 7.8) also invokes the existence of a new type of matter, the “twin matter”. This clearly shows that, contrary to common misconceptions, MOND is not necessarily about “getting rid of dark matter” but rather about reproducing the success of Milgrom’s law in galaxies. It might require adding new fields, but the key point is that these fields, very massive or not, would not behave simply as collisionless particles.

In a series of papers, Blanchet & Le Tiec [55, 56, 57, 58, 59, 60] have pushed further the idea that the MOND phenomenology could arise from the fundamental properties of a form of dark matter itself, by suggesting that dark matter could carry a space-likeFootnote 53 four-vector gravitational dipole moment ξμ, following the analogy between Milgrom’s law and Coulomb’s law in a dieletric medium proposed by [56] (see Eq. 9) or between the Bekenstein-Milgrom modified Poisson equation and Gauss’ law in terms of free charge density (see Eq. 17). The dark matter medium is described as a fluid with mass current Jμ = ρuμ (where ρ is the equivalent of the mass density of the atoms in a dielectric medium, i.e., it is the ordinary mass density of a pressureless perfect fluid, and uμ is the four-velocity of the fluidFootnote 54.) endowed with the dipole moment vector ξμ (which will affect the total density in addition to the above mass density ρ), with the following action [60]:

$${S_{{\rm{DM}}}} \equiv \int {{d^4}x\sqrt {- g}} [{c^2}({J_\mu}{\dot \xi ^\mu} - \rho) - W(P)],$$
(100)

where is the norm of the projection perpendicular to the four-velocity (not the norm of the polarization fieldFootnote 55) of the polarization field Pμ = ρξμ, and where the dot denotes the covariant proper time derivative. The specific dynamics of this dark matter fluid will thus arise from the coupling between the current and the dipolar field (analogue to the coupling to an external polarization field in electromagnetism), as well as from the internal non-gravitational force acting on the dipolar dark particles and characterized by the potential (P). Let us note that the normal matter action and the gravitational Einstein-Hilbert action are just the same as in GR.

The equations of motion of the dark matter fluid are then gotten by varying the action w.r.t. the dipole moment variable ξμ and w.r.t. to the current Jμ, boiling down in the non-relativistic limit to:

$${{d{\bf{v}}} \over {dt}} = {\bf{g}} - {\bf{f}},$$
(101)
$${{{d^2}\xi} \over {d{t^2}}} = {\bf{f}} + {1 \over \rho}\nabla [W(P) - P{W{\prime}}(P)] + ({\bf{P}}\nabla){\bf{g}},$$
(102)

where v is the ordinary velocity of the fluid, g = −∇Φ is the gravitational field, and f = − (P/P)W′/ρ is the internal non-gravitational force field making the dark particles motion non-geodesic. What is more, the Poisson equation in the weak-field limit is recovered as:

$$- \nabla {.}({\bf{g}} - 4\pi {\bf{P}}) = 4\pi G({\rho _b} + \rho){.}$$
(103)

In order to then reproduce the MOND phenomenology in galaxies, the next step is the “weak-clustering hypothesis”, namely the fact that, in galaxies, the dark matter fluid does not cluster much (ρρb) and is essentially at rest (v = 0) because the internal force of the fluid precisely balances the gravitational force in such a way that the polarization field P is precisely aligned with the gravitational one g, and g ∝ − W′(P). The potential thus plays the role of the “MOND function”, and, e.g., choosing to determine it up to third order in expansion as

$$W(P) \propto \Lambda /(8\pi) + 2\pi {P^2} + 16{\pi ^2}{P^3}/(3{a_0}) + \mathcal{O}({P^4})$$
(104)

then yields the desired MOND behavior in Eq. 103, with the n = 1 “simple” μ-function (see Eqs. 42 and 49).

This model has many advantages. The monopolar density of the dipolar atoms ρ will play the role of CDM in the early universe, while the minimum of the potential W(P) naturally adds a cosmological constant term, thus making the theory precisely equivalent to the ΛCDM model for expansion and large scale structure formation. The dark matter fluid behaves like a perfect fluid with zero pressure at first-order cosmological perturbation around a FLRW background and thus reproduces CMB anisotropies. Let us also note that, if the potential W(P) defining the internal force of the dipolar medium is to come from a fundamental theory at the microscopic level, one expects that the dimensionless coefficients in the expansion all be of order unity after rescaling by \(a_0^2\), thus naturally leading to the coincidence \(\Lambda \sim a_0^2\).

However, while the weak clustering hypothesis and stationarity of the dark matter fluid in galaxies are suppported by an exact and stable solution in spherical symmetry [58], it remains to be seen whether such a configuration would be a natural outcome of structure formation within this model. The presence of this stationary DM fluid being necessary to reproduce Milgrom’s law in stellar systems, this theory loses a bit of the initial predictability of MOND, and inherits a bit of the flexibility of CDM, inherent to invoking the presence of a DM fluid. This DM fluid could, e.g., be absent from some systems such as the globular clusters Pal 14 or NGC 2419 (see Section 6.6.3), thereby naturally explaining their apparent Newtonian behavior. However, the weak clustering hypothesis in itself might be problematic for explaining the missing mass in galaxy clusters, due to the fact that the MOND missing mass is essentially concentrated in the central parts of these objects (see Section 6.6.4).

7.10 Non-local theories and other ideas

All the models so far somehow invoke the existence of new “dark fields”, notably because for local pure metric theories, the Hamiltonian is generically unbounded from below if the action depends on a finite number of derivatives [81, 136, 441]. A somewhat provocative solution would thus be to consider non-local theories. A non-local action could, e.g., arise as an effective action due to quantum corrections from super-horizon gravitons [440]. Deffayet, Esposito-Farèse & Woodard [123] have notably exhibited the form that a pure metric theory of MOND could take in order to yield MONDian dynamics and MONDian lensing for a static, spherically-symmetric baryonic source.

In such a static spherically-symmetric geometry, the Einstein-Hilbert action of Eq. 69 can be rewritten in the weak-field expansion as [123]:

$${S_{{\rm{EH}}}} = {\rm{surfaceterm}} + {{{c^4}} \over {16\pi G}}\int {{d^4}x[ - ra{b{\prime}} + {a^2}/2 + \mathcal{O}({a^3},{b^3})],}$$
(105)

where (1 + a) and − (1 + b) are the weak-field grr and g00 components of the static weak-field metric, respectively. The MOND modification to this action implies to obtain a = rb′ = 2(GMa0)1/2/c2 as a solution in the deep-MOND limit, where the first equality ensures that lensing and dynamics are consistent, leading to the following tentative action in the ultra-weak-field limit [123]:

$${S_{{\rm{MOND}}}} \sim {{{c^4}} \over {16\pi G}}\int {{d^4}x\left[ {l{r^2}\left({{\alpha \over 3}\left({{b{\prime}} - {a \over r}} \right) - {{{b\prime^{3}}} \over 6} + \mathcal{O}({a^4},{b^4})} \right)} \right]} ,$$
(106)

where l = c2/a0 and α is an arbitrary constant. While it is impossible to express this form of the action as a local functional of a general metric, Deffayet et al. [123] showed that it was entirely possible to do so in a non-local model, making use of the non-local inverse d’Alembertian and of a TeVeS-like vector field, introduced not as an additional “dark field”, but as a non-local functional of the metric itself (by, e.g., normalizing the gradient of the volume of the past light-cone). A whole class of such models is constructible, and a few examples are given in [123], for which stability analyses are still needed, though.

As already mentioned in Section 6.1.1, this non-locality was also inherent to classical toy models of “modified inertia”. In GR, this would mean making the matter action of a point particle (Eq. 70) depend on all derivatives of its position, but such models are very difficult to construct [300] and no fully-fledged theory exists along these lines. However, a few interesting heuristic ideas have been proposed in this context. For instance, Milgrom [304] proposed that the inertial force in Newton’s second law could be defined to be proportional to the difference between the Unruh temperature and the Gibbons-Hawking one. It is indeed well known that, in Minkowski spacetime, an accelerated observer sees the vacuum as a thermal bath with a temperature proportional to the observer’s acceleration TU = ah/(4π2kc) [110, 470], where h is the Planck constant and k the Boltzmann constant. On the other hand, a constant-accelerated observer in de Sitter spacetime (curved with a positive cosmological constant Λ) sees a non-linear combination of that vacuum radiation and of the Gibbons-Hawking radiation (with temperature TGH = (Λ/3)1/2h/(4π2k) [174]) due to the cosmological horizon in the presence of a positive Λ. Namely, the Unruh temperature of the radiation seen by such an accelerated observer in de Sitter spacetime is [174] TU = (a2 + c2Λ/3)1/2 h/(4π2kc). The idea of Milgrom [304] is to then define the right-hand side of the norm of Newton’s second law as being proportional to the difference between the two temperatures:

$$\vert {\bf{F}}\vert = {{4{\pi ^2}mkc} \over h}({T_U} - {T_{{\rm{GH}}}}),$$
(107)

which trivially leads to F = (a/a0)a with a0 = c(Λ/3)1/2 (which is, however, observationally too large by a factor 2π) and the interpolating function μ(x) having the exact form of Eq. 54. In short, observers experiencing a very small acceleration would see an Unruh radiation with a small temperature close to the Gibbons-Hawking one, meaning that the inertial resistance defined by the difference between the two radiation temperatures would be smaller than in Newtonian dynamics, and thus the corresponding acceleration would be larger. However, no relativistic version (if at all possible) of this approach has been developed yet: a few difficulties arise due to the direction of the acceleration, or by the fact that stars in galaxies are free-falling objects along geodesics, and not accelerated by a non-gravitational force, as in the case of basic Unruh radiation. It was interestingly noted [308] that the de Sitter spacetime could be seen as a 4-dimensional pseudosphere embedded in a 5-dimensional flat Minkowski space, and that the acceleration of a constant-accelerated observer in this flat space would be exactly a5 = (a2 + c2 Λ/3)1/2. Then, MOND could arise from symmetry arguments in this 5-dimensional space similar to those leading to special relativity in Minkowski space [308]. Interestingly, arguments very similar to this whole vacuum radiation approach have also recently been made in the context of entropic gravity [191, 192, 224, 476]. Finally, another interesting idea to get MOND dynamics has been the tentative modification of special relativity, making the Planck length and the length l = Λ−1/2c2/a0 two new invariants, in addition to the speed of light, an attempt known as Triply Special Relativity [233]. In any case, despite all these attempts, there is still no fully-fledged theory of MOND at hand, which would derive from first principles, and the quest for such a formulation of MOND continues.

8 Gravitational Lensing in Relativistic MOND

The viable MOND theories from Section 7, although still mostly effective, have the great advantage of proving that constructing relativistic MOND theories is possible, and that it is thus possible to calculate from them the effects of gravitational lensing. But the non-uniqueness of the theories means that there is not really a unique prediction for gravitational lensing, especially in heavily-time-dependent configurations, or when the predictions of the theories for the expansion history of the universe deviate from the concordance model. As we have seen, some theories also deviate slightly from classical MOND predictions for dynamics of quasi-static systems, due to the presence of massive dark fields, and the same would of course happen for gravitational lensing. However, at the zeroth order, and in static weak-field configurations, we can make predictions for all theories whose expansion history would be similar to that of ΛCDM (see Section 9.1) and whose static weak-field limit is represented by a physical metricFootnote 56 with Ψ = −Φ in Eq. 73 (Φ obeying Eq. 17). In this case, the way the light propagates on the null geodesics of this metric is exactly the same in all these theories once Φ is known. What differs from GR is only the relation between Φ and the underlying mass distribution of the lens.

8.1 Strong lensing by galaxies

When multiple images of a background source are produced by a gravitational lens, one talks about strong lensing. In that case, most of the light bending occurs within a small range around the lens compared to the lens-source distance Dls and the observer-source distance Ds (where the distances are the usual luminosity distances in cosmology). In this thin-lens approximationFootnote 57, the resulting deflection angle can be written as:

$$\alpha = {2 \over {{c^2}}}\int\nolimits_{- \infty}^\infty {{\nabla _ \bot}\Phi dz,}$$
(108)

where Φ = −Ψ is the non-relativistic gravitational potential of Eq. 73 (obeying a MONDian Poisson equation), and ∇⊥ denotes the two-dimensional gradient operator perpendicular to light propagation. The lens equation then relates the observed two-dimensional angular position of the source in the lens plane θ to its original angular position in the source plane β through:

$$\theta = \beta + {{{D_{ls}}} \over {{D_s}}}\alpha ,$$
(109)

where it appears clearly that the expansion history will play an important role in converting redshifts to distances. It is also convenient to make the deflection angle α derive from a deflection potential ϒ in the lens-plane:

$$\Upsilon (\theta) = {{2{D_{ls}}} \over {{c^2}{D_s}{D_l}}}\int\nolimits_{- \infty}^\infty {\Phi ({D_l}\theta ,z)dz} .$$
(110)

if a source is much smaller than the angular scale on which the lens properties change, the lens equation 109 can locally be linearized as:

$$\beta (\theta) = {\beta _0} + \mathcal{A}(\theta)(\theta - {\theta _0}),$$
(111)

where the inverse magnification matrix is

$$\mathcal{A}(\theta) = {{\partial \beta} \over {\partial \theta}},\;{\rm{where}}\;{\mathcal{A}_{11}} = 1 - \kappa - {\gamma _1},\;{\mathcal{A}_{12}} = {\mathcal{A}_{21}} = - {\gamma _2},\;{\mathcal{A}_{22}} = 1 - \kappa + {\gamma _1}$$
(112)

The convergence k is directly given by the Laplacian of the deflection potential T:

$$\kappa = {1 \over 2}{\nabla ^2}\Upsilon.$$
(113)

The Einstein radius is the radius within the lens-plane within which the mean convergence is 〈κκ = 1. The existence of a region where κ is of that order is sufficient to produce multiple images and is the definition of strong lensing. On the other hand, the shear components γ1,γ2 are given by

$${\gamma _1} = {1 \over 2}\left({{{{\partial ^2}\Upsilon} \over {\partial \theta _1^2}} - {{{\partial ^2}\Upsilon} \over {\partial \theta _2^2}}} \right),\;\;\;{\gamma _2} = {{{\partial ^2}\Upsilon} \over {\partial {\theta _1}\partial {\theta _2}}}.$$
(114)

Due to Liouville’s theorem, gravitational lensing preserves the surface brightness, but it changes the apparent solid angle of a source. The resulting flux ratio between image and source can be expressed in terms of the magnification M,

$${M^{- 1}} = {(1 - \kappa)^2} - \gamma _1^2 - \gamma _2^2.$$
(115)

The flux ratio between two images A and B is fab = Aa/Ab. Let us finally note that (i) the time-delay between the different images can be deduced directly from the lensing potential and depends on the Hubble constant and convergence at the Einstein radius, and that (ii) points in the lens plane where M−1 = 0 (infinite magnification) form closed curves called the critical curves. Their corresponding curves located in the source plane are called caustics. The location of the source with respect to caustics determines the number of images, a source outside of the outermost caustic producing only one image, while each caustic crossing changes the number of images by a factor of two. Spherically-symmetric models of galaxy lenses can never produce observed quadruple-imaged systems because the innermost caustic of spherical models degenerates into a point.

As outlined above, what differs from GR in all the relativistic MOND theories is the relation between the non-relativistic potential Φ and the underlying mass distribution of the lens ρ. However, different theories yield slightly different relations between Φ and ρ in the weak-field limit (see especially Sections 6.1 and 6.2). For instance, while GEA theories (Section 7.7) boil down to Eq. 17 in the static weak-field limit, TeVeS (Section 7.4) leads to the situation of Eq. 40, and BIMOND (Section 7.8) to Eq. 30. However, like in the case of rotation curves (see Figure 20), the differences are only minor outside of spherical symmetry (and null in spherical symmetry), and the global picture can be obtained by assuming a relation given by the BM equation (Eq. 17).

The first studies of strong lensing by galaxies in relativistic MOND theories [93, 501, 507] made use of the CfA-Arizona Space Telescope Lens Survey (CASTLES) and made a one-parameter-fit of the lens mass to the observed size of the Einstein radius, both for point-mass models and for Hernquist spheres (with observed core radius). Zhao et al. [507] also compared the predicted and observed flux ratios fab. They used the α = 0 μ-function of Eq. 46, and concluded that reasonably good fits could be obtained with a lens mass corresponding to the expected baryonic mass of the lens. Shan et al. [419] then improved the modelling method by considering analytic non-spherical models with locally-spherically-symmetric isopotentials on both sides of the symmetry plane z = 0, implying no curl field correction (S = 0) in Eq. 19. The MOND non-relativistic potential Φ can then analytically be written, and using Eq. 108, one can analytically compute the two components α1 and α2 of the deflection angle vector α as a function of the three parameters of the model, namely the lens-mass and two scale-lengths controlling the extent and flattening of the lens (see Eq. 18 of [419]). Using the lens equation (Eq. 109), one can then trace back light-rays for each observed image to the source plane and fit the lens parameters as well as its inclination, in order for the source position to be the same for each image. The quality of the fit is thus quantified by the squared sum of the source position differences. This notably allowed [419] to fit in MOND the famous quadruple-imaged system Q2237+030 known as the Einstein cross (see Figure 41), a quasar gravitationally lensed by an isolated bulge-disk galaxy [197]. However, for three other quadruple-imaged systems of the CASTLES survey, the fits were less successful mostly because of the intrinsic limitations of the analytic model of Shan et al.[419] at reproducing at the same time both a large Einstein radius and a large shear. What is more it does not take into account the effects of the environment in the form of an external shear, which is also often needed in GR to fit quadruple-imaged systems. For 10 isolated double-imaged systems in the CASTLES survey, the fits were much more successfulFootnote 58. However, for non-isolated systems, especially for those lenses residing in groups or clusters, the need for an external shear might be coupled to a need for dark mass on galaxy group scales (see Section 6.6.4 and Section 8.3).

Figure 41
figure 41

(a) The four images of the quasar Q2237+030 (known as the Einstein cross), gravitationally lensed by an isolated bulge-disk galaxy known as Huchra’s lens [197]. ⋒ ESA’s faint object camera on HST. (b) The empty squares denote the four observed positions of the images, and the filled square denotes the MOND-fit unique position of the source [419]. The critical curves for which M−1 = 0 in the lens plane are displayed in black, and their corresponding caustics in the source plane in red. Image reproduced by permission from [419].

Due to the fact that all the above models were using the Bekenstein μ-function (α = 0 in Eq. 46), and that this function has a tendency of slightly underpredicting stellar mass-to-light ratios in galaxy rotation curve fits [145], it was claimed that this was a sign for a MOND missing mass problem in galaxy lenses [152, 153, 262]. While such a missing mass is indeed possible, and even corroborated by some dynamical studies [364] of galaxies residing inside clusters (i.e., the small-scale equivalent of the problem of MOND in clusters), for isolated systems with well-constrained stellar mass-to-light ratio, the use of the simple μ-function (α = 1 in Eq. 46) has, on the contrary, been shown to yield perfectly acceptable fits [94] in accordance with the lensing fundamental plane [400].

Finally, the probability distribution of the angular separation of the two images in a sample of lensed quasars has been investigated by Chen [90, 91]. This important question has proven somewhat troublesome for the ΛCDM paradigm, but is well explained by relativistic MOND theories [90].

8.2 Weak lensing by galaxies

A gravitational lens not only produces multiple images close to caustics, but also weakly distorted images (arclets) of other background sources. The weak and noisy signals from several individual arclets (not necessarily detected by eye, but rather numerically exploited with the help of image analysis) can be averaged by statistical techniques to get the shear components γ1 and γ2 in Eq. 114 from the mean ellipticity of the images. One can then get the convergence κ from the azimuthal average of the tangential component of the shear. This is what is known as weak lensing. In the case of galaxy-galaxy weak lensing, since the gravitational distortions induced by an individual lens are too small to be detected, one has to resort to the study of the ensemble averaged signal around a large number of lenses. This has been investigated in the context of MOND for a sample of relatively-isolated galaxy-lenses, stacked by luminosity ranges [456]. The derived MOND masses were obtained by fitting a point mass model to the lensing data within a distance of 200 kpc from the lens. While the MOND masses are perfectly compatible with the baryonic masses in all galaxies less luminous than 1011 L, it was found that the required MOND mass-to-light ratios tended to be slightly too high (M/L ≃ 10) for the most massive and luminous galaxies (L > 1011 L). However, this whole result is dictated by only one data point, which “pulls up” the result and make all the data points lie below the “best fit”, and the curve is “pulled up” strongly by only the first point. Thus, the mass-to-light ratios could easily be scaled down by a factor of two, making these galaxies in perfect agreement with MOND. But it is also worth noting that due to the very large distances probed, the presence of some weakly-clustering residual mass (hot dark matter, or some sort of “dark field” in the relativistic MOND theories) could start playing a role at these distances. While ordinary neutrinos are still too weakly clustering, a slightly more massive fermion such as a 10 eV-scale sterile neutrino could cluster on these scales, and, of course, the presence of baryonic dark matter in the form of dense molecular gas clouds could also be present around these very massive objects (see Section 6.6.4).

Also related to weak lensing, it is important to recall that the “phantom dark matter” of MOND (Eq. 33) can sometimes become negative in cones perpendicular to the direction of the external gravitational field in which a system is embedded: with accurate enough weak-lensing data, detecting these pockets of negative phantom densities around a sample of non-isolated galaxies could, in principle, be a smoking gun for MOND [490], but such an effect would be extremely sensitive to the detailed distribution of the baryonic matter, and finding a sample of galaxies with similar gravitational environments would also be extremely difficult.

8.3 Strong and weak lensing by galaxy clusters

Gravitational lensing is a complementary technique to the hydrostatic equilibrium of the X-ray emitting gas (Section 6.6.4) to probe the mass distribution of galaxy clusters. Since clusters are the most recently formed structures, they could be slightly out of equilibrium, which makes gravitational lensing extremely interesting as this technique is fully independent from the relaxed or unrelaxed nature of the lens. A famous example of such a clearly unrelaxed object is the cluster 1E0657-56, known as the Bullet Cluster (Figure 42). It is actually a pair of clusters which collided at high speed (> 3100 km/s) at z = 0.3. In the collision, the dissipational hot X-ray emitting gas which dominates the baryonic matter was separated from the negligible and collisionless galaxies and any presumed collisionless dark matter. Using background galaxies to map the shear field, the convergence map of the cluster was provided by [102], a convergence very conspicuously centered where the collisionless dark matter should be Footnote 59. It would appear difficult to reconstruct such a configuration merely by modifying gravity, but the non-linearity of MOND does not guarantee that the convergence from a two-center baryonic distribution would indeed be centered on the two centers. Indeed, while the linear relation between the matter density and the gravitational potential implies that the convergence parameter is a direct measurement of the projected surface density in the weak-field limit of GR, this is not the case anymore in MOND due to the non-linearity of the modified Poisson equation. Actually, it has been shown that, in MOND, it is possible to have a non-zero convergence along a line of sight where there is zero projected matter [15]. What is more, the gravitational environment might play an important role on the internal gravitational field too [113, 259], and the additional degrees of freedom of the various relativistic theories might play a non-negligible role, especially in non-static situations [112]. Neglecting possible effects of the gravitational environment and non-trivial features of the additional fields of the relativistic theories out of equilibrium, i.e., simply assuming that the physical metric is given by Ψ = −Φ in Eq. 73, and that Φ obeys Eq. 17, a MOND model of the bullet cluster was produced [17], in which a parametrized potential was fitted to the convergence map to then determine the underlying mass distribution from Eq. 17. The result is displayed in Figure 43, and exactly the same conclusion was reached by going from the baryonic density to the convergence map [147]. The main conclusions are that (i) the amount of residual missing mass needed to account for the convergence map of the bullet cluster is the same as in all other clusters (Section 6.6.4 and [449]), but that (ii) if it is made of dark baryons, they must be in a collisionless form, since the residual missing mass is centered on the collisionless galaxies and not on the dissipational hot gas. The dense molecular gas clouds proposed by Milgrom [310] (see discussion in Section 6.6.4) satisfy this criterion, and would mostly behave like individual stars. Like in most clusters with T > 4 keV, ordinary neutrinos with a 2 eV mass would be broadly sufficient to account for the missing mass deduced from weak lensing (and, obviously, heavier exotic hot dark matter particles such as 10 eV sterile neutrinos would do the job too).

Figure 42
figure 42

The bullet cluster 1E0657-56. The hot gas stripped from both subclusters after the collision is colored red-yellow. The green and white curves are the isocontours of the lensing convergence parameter κ (Eq. 113). The two peaks of κ do not coincide with those of the gas, which makes up most of the baryonic mass, but are skewed in the direction of the galaxies. The white bar corresponds to 200 kpc. Image courtesy of Clowe, reproduced by permission from [102], copyright by AAS.

Figure 43
figure 43

A MOND model of the bullet cluster [17]. The fitted κ-map (solid black lines) is overplotted on the convergence map of [102] (dotted red lines). The four centers of the parametrized potential used are the red stars. Also overplotted (blue dashed line) are two contours of surface density. Note slight distortions compared to the contours of κ. The green shaded region corresponds to the clustering of 2 eV neutrinos. Inset: The surface density of the gas in the model of the bullet cluster. Image reproduced by permission from [17], copyright by AAS.

For TeVeS (Section 7.4) and GEA (Section 7.7), the growth of the spatial part of the vector perturbation in the course of cosmological evolution can successfully seed the growth of baryonic structures, just as dark matter does, and it is possible to reconstruct the gravitational field of the bullet cluster without any extra matter but with a substantial contribution from the vector field. However, why the dynamical evolution of the vector field perturbations would lead to precisely such a configuration remains unclear. Similarly, the massive scalar field of Section 7.6 or the monopolar part of the dipolar DM of Section 7.9 could, in principle, provide the off-centered missing mass too, but again, why they would appear distributed as they do remains unclear, especially in the case of dipolar DM, which is supposed to cluster only very weakly, and, in principle, not to appear as densely clustered. Whether the twin matter of BIMOND (Section 7.8) could help providing the right convergence map also remains to be seen, while for non-local models (Section 7.10), there is a strong dependence on the past light-cone, meaning that recently-disturbed systems, such as the Bullet, may be far from the static MOND limit (but in that case, it would not be clear why all the other clusters from Section 6.6.4 exhibit the same amount of residual missing mass). So, while the bullet cluster clearly does not represent the MOND-killer that it was supposed to be, explaining its convergence map remains an outstanding challenge for all MOND theories. However, the bullet cluster also represents an outstanding challenge to ΛCDM (see Section 4.2), due to its high collision speed [249]. In that respect, MOND is much more promising [16].

On the other hand, a comprehensive weak lensing mass reconstruction of the rich galaxy cluster Cl0024+17 at z = 0.4 [211] has been argued to have revealed the first dark matter structure that is offset from both the gas and galaxies in a cluster. This structure is ringlike, located between r ∼ 60″ and r ∼ 85′. It was, again, argued to be the result of a collision of two massive clusters 1–2 Gyr in the past, but this time along the line-of-sight. It has also been argued [211] that this offset was hard to explain in MOND. Assuming that this ringlike structure is real and not caused by instrumental bias or spurious effects in the weak lensing analysis (due, e.g., to the unification of strong and weak-lensing or to the use of spherical/circular priors), and that cluster stars and galaxies do not make up a high fraction of the mass in the ring (which would be too faint to observe anyway), it has been shown that, for certain interpolating functions with a sharp transition, this is actually natural in MOND [325]. A peak in the phantom dark matter distribution generically appears close to the transition radius of MOND rt = (GM/a0)1/2, especially when most of the mass of the system is well-contained inside this radius (which is the case for the cluster Cl0024+17). This means that the ring in Cl0024+17 could be the first manifestation of this pure MOND phenomenon, and thus be a resounding success for MOND in galaxy clusters. However, the sharpness of this phantom dark matter peak strongly depends on the choice of the μ-function, and for some popular ones (such as the “simple” μ-function) the ring cannot be adequately reproduced by this pure MOND phenomenon. In this case, a collisional scenario would be needed in MOND too, in order to explain the feature as a peak of cluster dark matter. Indeed, we already know that there is a mass discrepancy in MOND clusters, and we know that this dark matter must be in collisionless form (e.g., neutrinos or dense clumps of cold gas). So the results of the simulation with purely collisionless dark particles [211] would surely be very similar in MOND gravity. Again, it was shown that the density of missing mass was compatible with 2 eV ordinary neutrinos, like in most clusters with T > 4 keV [139]. Finally, let us note that strong lensing was also recently used as a robust probe of the matter distribution on scales of 100 kpc in galaxy clusters, especially in the cluster Abell 2390 [149]. A residual missing mass was again found, compatible with the densities provided by fermionic hot dark matter candidates only for masses of ∼ 10 eV and heavier. All in all, the problem posed by gravitational lensing from galaxy clusters is thus very similar to the one posed by the temperature profiles of their X-ray emitting gas (Section 6.6.4), and remains one of the two main current problems of MOND, together with its problem at reproducing the CMB anisotropies (see Section 9.2).

Finally, let us note in passing that another (non-lensing) test of relativistic MOND theories in galaxy clusters has been performed by analysing the gravitational redshifts of galaxies in 7800 galaxy clusters [489], which were originally found to be difficult to reconcile with MOND: however, this original analysis assumed a distribution of residual missing mass in MOND by simply scaling down the Newtonian dynamical mass represented by a NFW halo by a factor 0.8, and the analysis confused the interpolating functions μ(x) and \(\tilde \mu (x)\) (see Section 6.2). A subsequent analysis [41] showed that these gravitational redshifts were in accordance with relativistic MOND when the correct residual mass and acceptable μ-functions were used.

8.4 Weak lensing by large-scale structure

The weak-lensing method can also be applied on larger scales, i.e., mapping the shear-field induced by large-scale structures. On these scales, the metric of the expanding-universe-forming-structure is well represented by a Newtonianly perturbed FLRW metric:

$${g_{0i}} = {g_{i0}} = 0,\;\;{g_{00}} = - (1 + 2\Phi),\;\;{g_{ij}} = a{(t)^2}(1 + 2\Psi){\delta _{ij}},$$
(116)

where a(t) is the scale factor. Like in the static weak-field case (Eq. 73), Φ is the non-relativistic potential in units of c2, but the equality Ψ = − Φ in Eq. 73 does not necessarily imply the equality in Eq. 116. In GR, this equality is actually respected for both cases (apart from perturbations around a FLRW background sourced by anisotropic stress), but the relativistic MOND theories, which have been constructed in order to yield the equality for the static weak-field limit in Eq. 73, do not harbor this equality in the perturbed FLRW case, and the quantity Φ + Ψ is referred to as the gravitational slip. For instance, in the TeVeS (Sections 7.3 and 7.4) and GEA (Section 7.7) theories, based on unit-norm vector fields, the equality is broken due to the growth of vector perturbations in the course of cosmological evolution (see, e.g., [128] and Section 9.2).

Like in the static case, weak gravitational lensing from large-scale structure will actually depend on Φ–Ψ, whereas galaxy clustering will arise only from the non-relativistic potential Φ. By combining information on the matter overdensity at a given redshift (obtained by measuring the peculiar velocity field) and on the weak lensing maps, Zhang et al. [498] proposed a clever method to observationally estimate Φ — Ψ This allowed Reyes et al. [362] to use luminous red galaxies in the SDSS survey in to exclude one model from the original TeVeS theory (Section 7.3) with the original f(X) function of [33], thus explicitly showing how such measurements could be a possible future smoking-gun for all theories based on dynamical vector fields. But note that other MOND theories such as BIMOND would not be affected by such measurements.

However, let us finally note a caveat in the interpretation of the weak lensing shear map in the context of relativistic MOND. While intercluster filaments negligibly contribute to the weak lensing signal in GR, a single filament inclined by π/4 from the line of sight can cause substantial distortion of background sources pointing toward the filament’s axis in relativistic MOND theories [148]. Since galaxies are generally embedded in filaments or are projected on such structures, this contribution should be taken into account when interpreting weak lensing data. This additional difficulty for interpreting weak-lensing data in MOND is not only true for filaments, but more generally for all low-density structure such as sheets and voids.

9 MOND and Cosmology

9.1 Expansion history

A viable theory of modified gravity, including dark fields or not, should not only be able to reproduce observations in quasi-stationary galactic and extragalactic systems, but also to reproduce all of the major probes of observational cosmology, including (i) the Hubble diagram out to large z, (ii) the anisotropies in the cosmic microwave background (CMB), and (iii) the matter power spectrum on large scales. The first requires a detailed knowledge of FLRW cosmology, and the last two a knowledge of cosmological perturbations on a FLRW background.

Concerning the first point, the FLRW solutions have been extensively studied for TeVeS (Sections 7.3 and 7.4, see, e.g., [70]) and GEA (Section 7.7, see, e.g., [515]) theories, for BIMOND (Section 7.8, see, e.g., [101]), and for theories based on dipolar dark matter (Section 7.9, see, e.g., [60]). In the latter case, the theory [58, 60] has been shown to be strictly equivalent to ΛCDM out to first-order cosmological perturbations (but very different in the galaxy formation regime), together with a natural explanation for \(\Lambda \sim a_0^2\). For the other theories, it has been shown that the contribution of the extra fields to the overall expansion is subdominant to the baryonic mass and does not affect the overall expansion [151]. Such theories can predict an extremely wide range of cosmological behavior, ranging from accelerated expansion to contraction on a finite time scale [70]. The key point is that the expansion history mainly depends on the form of the “MOND function” f(X) for the unconstrained domain X < 0 in any of these theories.

For instance, in TeVeS, X ∝ (∇ϕ)2 > 0 in static configurations (see Eq. 85), and X ∝ −2(∂ϕ/∂t)2 in evolving homogeneous and isotropic configurations such as the expanding universe. The form of f(X) is clearly constrained from the MOND phenomenology only for X > 0, meaning that a lot of freedom exists for X < 0. Exactly the same is true in GEA and BIMOND theories [101]. For instance, Bekenstein [33] originally proposed for TeVeS an f′-function (corresponding to \({\tilde \mu}\), see Eq. 79) with a discontinuity at X = 0 (the B04 function on Figure 44) not enabling galaxies to collapse continuously out of the Hubble expansion. Afterwards, Zhao & Famaey proposed an improved “mirror-function” f′(X) such that the corresponding \({\tilde \mu}\)-function reproduces the simple μ-function (α = 1 in Eq. 46) for X > 0, and f(X) = f(−X) for the cosmological regime X < 0 (see Figure 44, leading to an acceptable expansion history. However, when connecting a static galaxy to the expanding universe, the limit \(\tilde \mu (0) = 0\) would predict the existence of a singular surface around each galaxies on which the scalar degree of freedom does not propagate, meaning that it is better to reconnect the two sides at \(\tilde \mu (0) = \varepsilon\) (see Section 6.2). In addition, the integration constant f(0) can play the role of the cosmological constant [184] to drive accelerated expansion, but even some f(0) = 0 models can drive late-time acceleration [125], which is not surprising since k-essence scalar fields were also introduced to address the dark energy problem. In the case of BIMOND (see Section 7.8), a symmetric matter-twin matter early universe yields a cosmological constant through the zero-point of the MOND function, thereby naturally leading to \(\Lambda \sim a_0^2\).

Figure 44
figure 44

In solid blue, the Zhao-Famaey [508] \(\tilde \mu ({s})\)-function (Eq. 79) of TeVeS (Section 7.3 and 7.4), compared to the original Bekenstein one (dashed green) with a discontinuity at s = 0 [33]. The ZF function provides a more natural transition from static systems (the positive side) to cosmology (the negative side).

All in all, with the additional freedom of a hypothetical dark component in the matter sector, in the form of, e.g., ordinary or sterile neutrinos, playing with the form of f(X) for X < 0 in TeVeS, GEA and BIMOND always allows one to reproduce an expansion history and a Hubble diagram almost precisely identical to ΛCDM, justifying the assumption made in Section 8 of an expansion history for gravitational lensing in relativistic MOND. However, it is important to note that MOND theories are not providing a unique prediction on this.

9.2 Large-scale structure and Cosmic Microwave Background

Modified gravity theories should, of course, not only produce a reasonable Hubble expansion but also reproduce the observed anisotropies in the CMB, and the matter power spectrum. Taken at face value, these require not only dark matter, but non-baryonic cold dark matter. Any alternative theory must account for these, just as dark matter models need to explain galaxy scale phenomenology.

Using the hypothesis that the universe is filled with some form of cold dark matter, it is possible to simultaneously fit observations of the CMB [229] and provide an elegant picture for the growth of large scale structure [444]. Thus, an obvious question is how MOND fares with these subjects. Of course, as we have seen, there is no unique existing MOND theory (Section 7), and the basic theory underlying MOND as a paradigm is probably yet to be found. Nevertheless, we can make a few general considerations about how any MOND theory should behave, and then look in more details at specific predictions from existing relativistic theories. The general picture is that, in some ways MOND does surprisingly well, in others it clearly gives no real unique prediction by now, and in still others it appears to fail outright.

If one alters the force law as envisioned by MOND, the effective long-range force becomes stronger. Though details will, of course, depend on the specific relativistic theory, we can speculate about the consequences of a MOND-like force in cosmology. Note, however, that most of what follows cannot be rigorously justified at the moment for lack of a compelling unique underlying theory. But, obviously, because of the stronger force, dynamical measures of the cosmic mass density will be overestimated, just as in galaxies. Applying MOND to the peculiar motions of galaxies yields Ωm ≈ Ωb [279]. There are large uncertainties in estimating the extragalactic peculiar acceleration field, so this merely shows that MOND might alleviate the need for non-baryonic dark matter inferred conventionally from Ωm > Ωb.

The stronger effective gravitational attraction of MOND would change the growth rate of perturbations. Instead of adding dark mass to speed the growth of structure, we now rely on the modified force law to do the work. While it is obvious that MOND will form structures more rapidly than conventional gravity with the same source perturbation, we immediately encounter a challenge posed by the non-linear nature of the theory, precluding an easy linear perturbation analysis. One can nevertheless sketch a naive overview of how structure might form under the influence of MOND. The following picture emerges from numerical calculations of particles interacting under MOND in an assumed background [386, 341, 226, 250], and is thus obviously slightly (or very) different from the various relativistic MOND theories of Section 7 and from those yet to be found, especially from those MONDian theories involving the existence of some form of dark matter (twin matter, dipolar dark matter, etc.). In the early universe, perturbations cannot grow because the baryons are coupled to the photon fluid. The mass density is lower, so matter domination occurs later than in ΛCDM. Consequently, MOND structure formation initially has to lag behind ΛCDM at very high redshift (z > 200). However, as the influence of the photon field declines and perturbations begin to enter the MOND regime, structure formation rapidly speeds up. Large galaxies may form by z ≈ 10 and clusters by z ≈ 2 [11, 386], considerably earlier than in ΛCDM. By z = 0, the voids have become more empty than in ΛCDM, but otherwise simulations (of collisionless particles, which is, of course, not the best representation of the baryon fluid) show the same qualitative features of the cosmic web [226, 250]. This similarity is not surprising since MOND is a subtle alteration of the force law. The chief difference is in the timing of when structures of a given mass appear, it being easier to assemble a large mass early in MOND. This means that MOND is promising in addressing many of the challenges of Section 4.2, namely the high-z clusters challenge [11] and Local Void challenge, as well as the bulk flow challenge and high collisional velocity of the bullet cluster [16, 251], again due to the much-larger-than-Newtonian MOND force in the structure formation context. What is more, it could allow large massive galaxies to form early (z ≈ 10) from monolithic dissipationless collapse [393], with well-defined relationships between the mass, radius and velocity dispersion. Consequently, there would be less mergers than in ΛCDM at intermediate redshifts, in accordance with constraints from interacting galaxies (see Section 6.5.3), which could explain the observed abundance of large thin bulgeless disks unaffected by major mergers (see Section 4.2), and in those rare mergers between large spirals, tidal dwarf galaxies would be formed and survive more easily (see Section 6.5.4). This could lead to the intriguing possibility that most dwarf galaxies are not primordial but have been formed tidally in these encounters [239]. These populations of satellite galaxies, associated with globular clusters that formed along with them, would naturally appear in (more than one) closely related planes (because a gas-rich galaxy pair undergoes many close encounters in MOND before merging, see Section 6.5.3), thereby perhaps providing a natural solution to the Milky Way satellites phasespace correlation problem of Section 4.2. What is more, the density-morphology relation for dwarf ellipticals (more dE galaxies in denser environments [239]), observed in the field, in galaxy groups and in galaxy clusters could also find a natural explanation.

Actually, the chief problem seems not to be forming structure in MOND, but the danger of over-producing it [341, 401]. The amplitude of the power spectrum is well measured at z = 1091 in the CMB and at z ≈ 0 by surveys like the Sloan Digital Sky Survey. Simulations normalized to the CMB overproduce the structure at z = 0 by a factor of ∼ 2. Given the uncertainty in the parent relativistic theory and hence the appropriate form of the expansion history, this seems remarkably close. Given the non-linear nature of the theory, MOND could easily have been wrong by many orders of magnitude in this context. Nevertheless, it may be necessary to somehow damp the growth of structure at late times [401]. In this regard, a laboratory measurement of the ordinary neutrino mass might be relevant. Conventional structure cannot form in ΛCDM if mν > 0.2 eV [229]. In contrast, some modest damping from a non-trivial neutrino mass might be desirable in MOND, and is also relevant to the CMB and clusters of galaxies (see Section 6.6.4).

In addition to mapping the growth factor as a function of redshift, one would also like to predict the power spectrum of mass fluctuations as a function of scale at a given epoch. It is certainly possible to match the power spectrum of galaxies at z = 0 [401], but because of MOND’s non-linearity and the uncertainty in the background cosmology, it is rather harder to know if such a match faithfully represents a viable theory. Indeed, a natural prediction of baryon-dominated cosmologies is the presence of strong baryon acoustic oscillations in the matter power spectrum at z = 0 [267, 127]. Dodelson [127] portrays this as a problem, but as already pointed out in [267], the non-linearity of MOND can lead to mode mixing that washes out the initially strong signal by z= 0. A more interesting test would be provided by the galaxy power spectrum at high redshift (z ∼ 5). This is a challenging observation, as one needs both a large survey volume and high resolution in-space. The latter requirement arises because the predicted features in the power spectrum are very sharp. The window functions necessarily employed in the analysis of large scale structure data are typically wider than the predicted features. Convolution of the predicted power spectrum with the SDSS analysis procedure [326] shows that essentially all the predicted features wash out, with the possible exception of the strongest feature on the largest scale. This means that the BAO signal detected by SDSS and consistent with ΛCDM [135] could also be interpreted as a confirmation of the prediction [267] of such featuresFootnote 60 in MOND. However, there is no definitive requirement that the BAO appears at the same scale as observed, or that it survives at all. In relativistic theories such as TeVeS (Section 7.3 and 7.4), damping of the baryonic oscillations can be taken care of by parameters of the theory such as in original TeVeS (Eq. 84, see Figure 3 of [430]) or the ci coefficients in generalized TeVeS (Eq. 87). In any case, as in standard cosmology, the angular power spectrum of the CMB should be a cleaner probe.

A first attempt to address the CMB was made before the existence of relativistic theories with a simple ansatz [265]: just as MOND returns precisely Newton in high accelerations, so any parent theory should contain GR (almost exactly, although this is not precisely the case for, e.g., TeVeS) in the appropriate strong-field limit. An obvious first assumption is that MOND effects do not yet appear in the very early universe, so that pure GR suffices for calculations concerning the CMB. The chief difference between ΛCDM and a MONDian cosmology is then just the presence or absence of non-baryonic cold dark matter. With this ansatz, we can make one robust prediction: the shape of the acoustic power spectrum should follow pure baryonic diffusion damping. There is no net forcing term, as provided by the extra degree of freedom of non-baryonic cold dark matter. Thus, with nothing but baryons, each acoustic peak should be lower than the previous one [425] as part of a simple damping tail (Figure 45). In contrast, there must be evidence of forcing present in a power spectrum where CDM outweighs the baryons.

Figure 45
figure 45

The acoustic power spectrum of the cosmic microwave background as observed by WMAP [229] together with the a priori predictions of ΛCDM (red line) and no-CDM (blue line) as they existed in 1999 [265] prior to observation of the acoustic peaks. ΛCDM correctly predicted the position of the first peak (the geometry is very nearly flat) but over-predicted the amplitude of both the second and third peak. The most favorable a priori case is shown; other plausible ΛCDM parameters [468] predicted an even larger second peak. The most important parameter adjustment necessary to obtain an a posteriori fit is an increase in the baryon density Ωb, above what had previously been expected from BBN. In contrast, the no-CDM model ansatz made as a proxy for MOND successfully predicted the correct amplitude ratio of the first to second peak with no parameter adjustment [268, 269]. The no-CDM model was subsequently shown to under-predict the amplitude of the third peak [442].

The density of both the baryons and the non-baryonic cold dark matter are critical to the shape of the acoustic power spectrum. For a given baryon density, models with CDM will have a larger second peak than models without it. Similarly, the third peak is always lower than the second in purely baryonic models, while it can be either higher or lower in CDM models, depending on the mix of each type of mass. Moreover, both parameters were well constrained prior to observation of the CMB [468]: Ωb from BBN [480] and Ωm from a variety of methods [116]. Therefore, it seemed like a straightforward exercise to predict the difference one should observe. The most robust prediction that could be made was the ratio of the amplitude of the first to second acoustic peak [265]. For the range of baryon and dark matter densities allowed at the time, ΛCDM predicted a range in this ratio anywhere from 1.5 to 1.9. That is, the first peak should be almost but not quite twice as large as the second, with the precise value containing the information necessary to much better constrain both density parameters. For the same baryon densities allowed by BBN but no dark matter, the models fell in a distinct and much narrower range: 2.2 to 2.6, with the most plausible value being 2.4. The second peak is smaller (so the ratio of first to second higher) because there is no driving term to counteract baryonic damping. In this limit, the small range of relative peak heights follows directly from the narrow range in Ωb from BBN.

The BOOMERanG experiment [117] provided the first data capable of testing this prediction, and was in good agreement with the no-CDM prediction [268]. This result was subsequently confirmed by WMAP, which measured a ratio 2.34 ± 0.09 [345]. This is in good quantitative agreement with the prediction of the no-CDM ansatz, and outside the range first expected in ΛCDM. ΛCDM can nevertheless provide a good fit to the CMB power spectrum. The chief parameter adjustment required to obtain a fit is the baryon density, which must be increased: this is the reason for the near doubling of the long-standing value Ωbh2 = 0.0125 [480] to the more recent Ωbh2 = 0.02249 [229].

A critical question is whether the baryon density required by ΛCDM is consistent with the independently-measured abundances of the light isotopes. This question is explored in Figure 46. Historically, no isotope suggested a value Ωbh2 > 0.02 prior to fits to the CMB requiring such a high value. This is an important fact to bear in mind, since historically cosmology has a long tradition of confirmation biasFootnote 61. More recent measurements of deuterium and helium are consistent with the high baryon density required by ΛCDM fits to the CMB. Lithium persistently suggests a lower baryon density, consistent with pre-CMB values. If we are convinced of the correctness of ΛCDM, then it is easy to dismiss this as some peculiarity of stars — if exposed to the high temperatures in the cores of stars by turbulent mixing, lithium might be depleted from it primordial value. If we are skeptical of ΛCDM, then it is no surprise that measurements of the primordial lithium abundance return the same value now as they did before. From the perspective of the no-dark matter MOND view, the CMB, lithium, deuterium, and helium all give a consistent baryon density given the uncertainties.

Figure 46
figure 46

Estimates of the baryon density Ωbh2 [where h = H0/(100 km s−1 Mpc−1)] over time (updated [273] from [269]). BBN was already a well-established field prior to 1995; earlier contributions are summarized by compilations (green ovals [480, 107]) that gave the long-lived standard value Ωbh2 = 0.0125 [480]. More recent estimates from individual isotopes are shown as triangles (2H), squares (4He), diamonds (3He), and stars (7Li). Estimates of the baryon density based on analyses of the cosmic microwave background are shown by circles (dark blue for ΛCDM; light blue for no-CDM). No measurement of any isotope suggested a value greater than Ωbh2 = 0.02 prior to observation of the acoustic peaks in the microwave background (dotted lines), which might be seen as a possible illustration of confirmation bias. Fitting the acoustic peaks in ΛCDM requires Ωbh2 > 0.02. More recent measurements of 2H and 4He have migrated towards the ΛCDM CMB value, while 7Li remains persistently problematic [111]. It has been suggested that turbulent mixing might result in the depletion of primordial lithium necessary to reconcile lithium with the CMB (upward pointing arrow [287]), while others [405] argue that this would merely reconcile some discrepant stars with the bulk of the data defining the Spite plateau, which persists in giving a 7Li abundance discrepant from the ΛCDM CMB value. In contrast, the amplitude of the second peak of the microwave background is consistent with no-CDM and Ωbh2 = 0.014 ± 0.005 [269]. Consequently, from the perspective of MOND, the CMB, lithium, deuterium, and helium all give a consistent baryon density given the uncertainties.

However, the no-CDM ansatz must fail at some point. It could fail outright if the parent MOND theory deviates substantially from GR in the early universe. However, the more obvious [265] points of failure are rather due to the anticipated early structure formation in MOND discussed above. This should lead, in a true MOND theory, to early re-ionization of the universe and an enhancement of the integrated Sachs-Wolfe effect. Evidence for both these effects are present in the WMAP data [269]. Indeed, it turns out to be rather easy, and perhaps too easy, to enhance the integrated Sachs-Wolfe (ISW) effect in theories like TeVeS or GEA [430, 516]. Nevertheless, early re-inioniaztion is an especially natural consequence of MOND structure formation that was predicted a priori [265]. In contrast, structure is expected to build up more slowly in ΛCDM such that obtaining the observed early re-ionization implies that the earliest objects to collapse were ∼ 50 times as efficient at converting mass to ionizing photons as are collapsed objects at the present time [435].

One prediction of the no-CDM anzatz that should not obviously fail is that the third peak should be smaller than the second peak of the acoustic power spectrum of the CMB. In a universe governed by MOND rather than cold dark matter, there is no obvious non-baryonic mass that is decoupled from the photon-baryon fluid. Therefore, it is a strong expectation that we observe only baryonic damping in the power spectrum, and each peak should be smaller in amplitude than the previous one. Contrary to this expectation, WMAP observes the third peak to be nearly equal in amplitude to the second [442, 229]. This approximate equality of the second and third peaks falsifies the simple no-CDM anzatz.

The PLANCK mission should soon report a new and much higher resolution measurement of the CMB acoustic power spectrum. It is conceivableFootnote 62 that improved data will reveal a different power spectrum. A third peak as low as that expected in the no-CDM anzatz would be one of the few observations capable of clearly falsifying the existence of cosmic non-baryonic dark matter. A more likely result is basic confirmation of existing observations with only minor tweaks to the exact power spectrum. Such a result would have little impact on the discussion here as it would simply confirm the need for some degrees of freedom in relativistic MOND theories that can play a role analogous to CDM. However, the uncertainties on the best fit cosmological parameters may become negligibly small. Precise as current data are, cosmology (with the exception of BBN) is still far from being over-constrained. Hopefully, PLANCK data will be sufficiently accurate that they either agree or clearly do not agreeFootnote 63 with a host of other observations.

Presuming nothing substantial changes in the CMB data, we must understand the net forcing term in the acoustic oscillations leading to a high third peak. This might be taken in one of three ways:

  1. (i)

    Practical falsification of MOND,

  2. (ii)

    Proof of the existence of some form of non-baryonic matter particles,

  3. (iii)

    An indication of some necessary additional freedom in relativistic parent theories of MOND, playing the role of the non-baryonic mass in the CMBFootnote 64.

Tempting as the first case (i) is [432], we cannot know whether the CMB falsifies MOND until we have exhaustively explored the predictions of relativistic parent theories (Section 7). The possibility of true non-baryonic mass (ii) seems unelegant, although a modification of gravity and the existence of non-baryonic dark matter are not mutually exclusive concepts. What is more, there is one obviously existing form of non-baryonic mass that may be relevant on cosmic scales: neutrinos. If \({m_\nu} \approx \sqrt {\Delta {m^2}}\) [434], then the neutrino mass is too small to be of interest in this context. However, as discussed above, a modest neutrino mass may help to prevent MOND from over-predicting the growth of structure. Independently, a mass mν ≈ 1 eV to 2 eV for the three neutrino species provides a good match to the width of the acoustic peaks of the CMB [269], which are otherwise too wide in a purely baryonic universe. Note that it also provides a match to the missing mass in galaxy clusters of T > 4 keV (see Section 6.6.4). However, this neutrino mass is inadequate to explain the relatively high third peak in the no-DM ansatz. Obtaining a match to that instead requires a neutrino mass (for only one species) of ∼ 10 eV [9]. Such a large mass violates experimental constraints on the ordinary neutrino mass [234], but it may be possible to have a sterile neutrino with a mass in that ballpark [26]. As strange as this sounds, it provides a good fit to the CMB (Figure 47), and it may provide the unseen mass in all clusters and groups (see Section 6.6.4 [13, 11]). Experiments that can address the existence of such a particle would thus be very interesting [288], although in between it is perhaps best to view it merely as the encapsulation of our ignorance about cosmology in modified gravity theories, much as dark energy currently plays the same role in conventional cosmology. The fit of Figure 47 [9] is at least a proof of concept that cold DM is definitely not required by the CMB alone.

Figure 47
figure 47

CMB data as measured by the WMAP satellite year five data release (filled circles) and the ACBAR 2008 data release (triangles). Dashed line: ΛCDM fit. Solid line: HDM fit with a sterile neutrino of mass 11 eV. Image courtesy of Angus, reproduced by permission from [9].

Perhaps the most intriguing possibility is (iii), that the height of the third peak is providing a glimpse of some new aspect of modified gravity theories. As we have seen, generalizations of GR seeking to incorporate MONDian phenomenology must, per force, introduce either non-locality (Section 7.10), or new degrees of freedom in local theories. It is at least conceivable that these new degrees of freedom result in the net driving of the acoustic oscillations that is implied by the departure from pure baryonic damping. For instance, Dodelson & Liguori [128] have shown that in TeVeS (Sections 7.3 and 7.4) or GEA (Section 7.7) theories, based on unit-norm vector fields, the growth of the spatial part of the vector perturbation in the course of cosmological evolution is acting as an additional seed akin to non-baryonic dark matterFootnote 65 (but unlike dark matter, its energy density is subdominant to the baryonic mass). Actually, it has been shown that, with the help of this effect prior to baryon-photon decoupling, it is actually possibleFootnote 66 to produce as high a third peak as the second one in TeVeS and GEA theories without non-baryonic dark matter, but at the cost of leading to unacceptably high temperature anisotropies in the CMB on large angular scales, due to an over-enhanced ISW effect [430, 516]. Indeed, when making the effect of the growth of the perturbed vector modes large, one also generates [151, 409, 498] a large gravitational slip (see Section 8.4) in the perturbed FLRW metric (Eq. 116), which in turn leads to enhanced ISWFootnote 67. For this reason, acceptable fits to the CMB in TeVeS or GEA still need to appeal to non-baryonic mass [430]. In this case, ordinary neutrinos within their model-independent mass-limit [234] are sufficient, thoughFootnote 68. However, the gravitational slip could be able to soon exclude at least some of these models from combined information on the matter overdensity and weak lensing [362, 498]. However, an important caveat is that all of the above arguments are based on adiabatic initial conditionsFootnote 69. While initial isocurvature perturbations are basically ruled out in the GR context, this is not necessarily true for modified gravity theories, so that correlated mixtures of adiabatic and isocurvature modes could perhaps lower the ISW effect and/or raise the third peak [429].

Of course, when the additional “dark fields” of relativistic MOND theories are truly massive (as is the case in some theories), they can be thought of as true “dark matter”, whose energy density outweighs the baryonic one in the early universe: this is the case for the second scalar field of BSTV (Section 7.5), the scalar field of Section 7.6, and of course the dipolar dark matter of Section 7.9. In all these cases, reproducing the acoustic peaks of the CMB is, by construction, not a problem at all (nor erasing the baryon acoustic oscillations in the matter power spectrum contrary to [127]), while the MOND phenomenology is still nicely recovered in galaxies. In the case of BIMOND (Section 7.8), the possible appeal to twin matter could also have important consequences on the growth of structure [316] and, of course, on the CMB acoustic peaks too, although the latter analysis is still lacking. In an initially matter-twin matter symmetric universe, if the initial quantum fluctuations are not identical in the two sectors, matter and twin matter would still segregate efficiently, since density differences grow much faster that the sum [316]. The inhomogeneities of the two matter types would then develop, eventually, into mutually avoiding cosmic webs, and the tensors coming from the variation of the interaction term between the two metrics with respect to the matter metric can then act precisely as the energy-momentum tensor of cosmological dark matter [316], besides its contribution to the cosmological constant (see Section 9.1). Finally, the most thought-provoking and interesting possibility would perhaps be to explain all these cosmological observations through non-local effects (Section 7.10). In any case, it is likely that MOND will not be making truly clear predictions regarding cosmology until a more profound theory, based on first principles and underlying the MOND paradigm, is found.

10 Summary and Discussion

In this review, after briefly presenting the currently favored ΛCDM model of cosmology (which clearly works overwhelmingly well on large scales despite its slightly unelegant mixture of currently unknown elements, Sections 2 and 3), we reviewed the few most outstanding challenges that this model is still facing (Section 4), which will have to be addressed one way or the other in the coming years. These include coincidences at z = 0 between the scale of the energy density in dark energy, dark matter, and baryonic matter, as well as a common natural scale for the behavior of the dark matter and dark energy sectors. What is more, as far as galaxy formation is concerned, many predictions made by the model (keeping in mind that baryon physics could modify these predictions) were ruled out by observations: these include many observations indicating that structure formation should take place earlier than predicted, the low number of observed satellites around the Milky Way (especially the missing satellites at the low and high mass ends of the mass function), the phase-space correlation of satellite galaxies of the Milky Way as opposed to their predicted isotropic distribution, the apparent presence of constant DM density cores in the central parts of galaxies instead of the predicted cuspy dark halos, the over-abundance of large bulgeless thin disk galaxies that are extremely difficult to produce in simulations, or the presence of spiral arms in disks that should be immune to such instabilities. But even more challenging is the appearance (Figure 48) of an acceleration constant a0 ≃ 10−10 m s−2 (i.e., the common scale of the dark matter and dark energy sectors as a0 ∼ Λ1/2 in natural units) in many unrelated scaling relations for DM and baryons in galaxies. These scaling relations involve a possibly devastating amount of fine-tuning for all collisionless dark matter models (Section 4.3), and can all be summarized by Milgrom’s empirical formula (Section 5), meaning that the observed gravitational field in galaxies is mimicking a universal force law generated by the baryons alone.

Figure 48
figure 48

The acceleration parameter \(\sim V_f^4/(G{M_b})\) of extragalactic systems, spanning ten decades in baryonic mass Mb. X-ray emitting galaxy groups and clusters are visibly offset from smaller systems, but by a remarkably modest amount over such a long baseline. The characteristic acceleration scale \({a_0} \sim \sqrt \Lambda\) is in the data, irrespective of the interpretation. And it actually plays various other independent roles in observed galaxy phenomenology. This is natural in MOND (see Section 5.2), but not in ΛCDM (see Section 4.3).

With inert, collisionless and dissipationless DM, making Milgrom’s law emerge requires a huge, and perhaps even unreasonable, amount of fine-tuning in the expected feedback from the baryons. Indeed, the relation between the distribution of baryons and DM should depend on the various different histories of formation, intrinsic evolution, and interaction with the environment of the various different galaxies, whereas Milgrom’s law provides a sucessful unique and history-independent relation. Given this puzzle, the central idea of Modified Newtonian Dynamics (MOND) is rather to explore the possibility that the force law is indeed effectively modified (Section 6). The main motivation for studying MOND is thus a fully empiricist one, as it is driven by the observed phe-nomenology on galaxy scales, and not by an aesthetic wish of getting rid of DM. The corollary is that it is not a problem for a theory designed to reproduce the uncanny successes of the MOND phenomenology to replace CDM by “dark fields” (see Section 7) or more exotic forms of DM, different from simple collisionless DM particles, contrary to the common belief that this would be against the spirit of the MOND paradigm (although it is true that it would be more elegant to avoid too many additional degrees of freedom). It is perhaps more important that, if MOND is correct in the sense of the acceleration a0 being a truly fundamental quantity, the strong equivalence principle cannot hold anymore, and local Lorentz invariance could perhaps be spontaneously violated too.

At this juncture, it is worthwhile to summarize the general predictions of MOND, as a paradigm, and their observational tests (Table 2). As a mathematical description of the effective force law, MOND works remarkably well in individual galaxies. As a modified gravity theory (at the classical level), it makes some predictions that are both unique and challenging to reproduce in the context of the ΛCDM paradigm. However, MOND faces sharp challenges, particularly with cosmology and in rich clusters of galaxies, which will not be conclusively addressed without a viable parent theory (Section 7), based on first principles and underlying the MOND paradigm (if such a theory exists at all). In any case, in his series of papers introducing the idea in 1983, Milgrom [294] made a few very explicit predictions, which we quote hereafter, and compare with modern observational data (see also the Kepler-like laws of galactic dynamics in Section 5.2):

  • “Velocity curves calculated with the modified dynamics on the basis of the observed mass in galaxies should agree with the observed curves.”

    It is now well established that MOND provides good fits to the rotation curves of galaxies (Figure 23 [401, 166]), including bumps and wiggles associated with a baryonic counterpart (Figure 21, Kepler-like law no. 10 in Section 5.2). These fits are obtained with a single free parameter per galaxy, the mass-to-light ratio of the stars. What makes them most impressive is that the best-fit mass-to-light ratios, obtained on purely dynamical grounds assuming MOND, vary with galaxy color exactly as one would expect from stellar population synthesis models [42], that are based on astronomers’ detailed understanding of stars. Note that the rotation curves of galaxies are predicted to be asymptotically flat, even though this flatness is not always attained at the last observed point (see Kepler-like law no. 1 in Section 5.2, and last explicit prediction hereafter).

  • “The relation between the asymptotic velocity and the mass of the galaxy is an absolute one.”

    This is the Baryonic Tully-Fisher relation with \({M_b} = {a_0}GV_f^4\) (see Kepler-like laws no. 2 in Section 5.2). It appears to hold quite generally [272], even for galaxies that we would conventionally expect to deviate from it [165, 279, 276].

  • “Analysis of the z-dynamics in disk galaxies using the modified dynamics should yield surface densities, which agree with the observed ones.”

    This states that, in addition to the radial force giving the rotation curve, the motions of stars perpendicular to the disk must also follow from the source baryons (see Section 6.5.3). This proves to be a remarkably challenging observation, and such data for external galaxies are difficult to obtain [44]. To make matters still more difficult, the radial acceleration usually dominates the vertical \(({V^2}/r \gg \sigma _z^2/z)\). This has the consequence that the distinction between MOND and conventional dynamics is not pronounced in regions that are well observed, becoming pronounced only at rather low baryonic surface densities [279]. The vertical velocity dispersions in low surface density regions (see Section 6.5.3) is typically ∼ 8 km/s [25, 241]. This exceeds the nominal Newtonian expectation (typically ∼ 2 km/s for Σ = 1 M pc−2, depending on the thickness of the disk), and is more in accordance with MOND. However, it would require a considerably more detailed analysis to consider this a test, let alone a success, of MOND. The Milky Way (Section 6.5.2) may provide an excellent test for this prediction [50, 378] as more precision data become available.

  • “Effects of the modification are predicted to be particularly strong in (LSB) dwarf galaxies.”

    The dwarf spheroidal satellite galaxies of the Milky Way have very low surface densities of stars, so (see Kepler-like law no. 8 in Section 5.2) are far into the MOND regime. As expected, these systems exhibit large mass discrepancies [477, 427]. Detailed fits to the better observed “classical” dwarfs [8] are satisfactory in most cases (see Section 6.6.2). The “ultrafaint” dwarfs appear more problematic [285], in the sense that their velocity dispersions are higher than expected. This might be an indication of the MOND-specific external field effect (see Section 6.3 and [78]), as the field of the Milky Way dominates the internal fields of the ultrafaint dwarfs. If so, these objects are not in dynamical equilibrium, which considerably complicates their analysis.

  • Locally-measured mass-to-light ratios should show no indication of hidden mass when \(V_c^2/R \gg {a_0}\), but rise beyond the radius where \(V_c^2/R \approx {a_0}\).

    We have paraphrased this prediction for brevity (see also Kepler-like law no. 7 in Section 5.2). The test of this prediction is shown in Figures 10, 11, and 14. The predicted effect is obvious in the data with populations synthesis mass-to-light ratios for the stars [42], or with dynamical mass-to-light ratios [279] that make no assumption about stellar mass. In HSB spirals, there is no obvious need for dark matter in the inner regions, with the mass discrepancy only appearing at large radii as the acceleration drops below a0 (Figure 10).

  • “Disk galaxies with low surface brightness provide particularly strong tests.”

    Low surface brightness means low stellar surface density, which in turns means low acceleration. LSB galaxies are thus predicted to be well into the modified regime (see also Kepler-like law no. 8 in Section 5.2). This was a strong prediction, because few bona-fide examples of such objects were known at the time. Indeed, in 1983, when these predictions were published, it was widely thought that nearly all disk galaxies shared a common high surface brightness. One specific consequence of MOND for LSB galaxies is that they should lie on the same BTFR, with the same normalization, as HSB spirals. This was subsequently observed to be the case [517, 443]. There is no systematic deviation from the BTFR with surface brightness (Figure 5), thus contrary to what is naturally expected in conventional dynamics [279, 109]. Another consequence of low surface density is that the acceleration is low (< a0) everywhere. As a result, the mass discrepancy appears at a smaller radius in LSB galaxies, and is larger in amplitude than in HSB galaxies. This effect was subsequently observed (Figure 14 [279]).

  • “We predict a correlation between the value of the average surface density of a galaxy and the steepness with which the rotational velocity rises to its asymptotic value.”

    MOND does not simply make rotation curves flat. It predicts that HSB galaxies have rotation curves that rise rapidly before becoming flat, and may even fall towards asymptotic flatness. In contrast, LSB galaxies should have slowly rising rotation curves that only gradually approach asymptotic flatness (see also Kepler-like law no. 8 in Section 5.2). Both morphologies are observed (Figure 15). The expected connection between dynamical acceleration and the surface density of the source baryons is illustrated in Figures 9 and 16.

Table 2 Observational tests of MOND.

The original predictions listed above cover many situations, but not all. Indeed, once one writes a specific force law, its application must be completely general. Such a hypothesis is readily subject to falsification, provided sufficiently accurate data to test it — a perpetual challenge for astronomy. Table 2 summarizes the tests discussed here. By and large, tests of MOND involving rotationally-supported disk galaxies are quite positive, as largely detailed above (see Section 6.5). By construction, there is no cusp problem (solution to challenge no. 6 of Section 4.2), and no missing baryons problem (solution to challenge no. 10 of Section 4.2), as the way the dynamical mass-to-light ratio systematically varies with the circular velocity is a direct consequence of Milgrom’s law (Kepler-like law no. 4 of Section 5.2). There does appear to be a relation between the quality of the data and the ease with which a MOND fit to the rotation curve is obtained, in the sense that fits are most readily obtained with the best data [28]. As the quality of the data decline [384], one begins to notice small disparities. These are sometimes attributable to external disturbances that invalidate the assumption of equilibrium [403]. For targets that are intrinsically difficult to observe, minor problems become more common [120, 448]. These typically have to do with the challenges inherent in combining disparate astronomical data sets (e.g., rotation curves measured independently at optical and radio wavelengths) and constraining the inclinations of LSB galaxies (bear in mind that all velocities require a sin(i) correction to project the observed velocity into the plane of the disk, and mass in MOND scales as the fourth power of velocity). Given the intrinsic difficulties of astronomical observations, it is remarkable that the success rate of MOND fits is as high as it is: of the 78 galaxies that have been studied in detail (see Section 6.5.1), only a few cases (most notably NGC 3198 [68, 166]) appear to pose challenges. Given the predictive and quantitative success of the majority of the fits, it would seem unwise to ignore the forest and focus only on the outlying trees.

One rotationally-supported system that is very familiar to us is the solar system (see Section 6.4). The solar system is many orders of magnitude removed from the MOND regime (Figure 11), so no strong effects are predicted. However, it is, of course, possible to obtain exquisitely precise data in the solar system, so it is conceivable that some subtle effect may be observable [391]. Indeed, the lack of such effects on the inner planets already appears to exclude some slowly-varying interpolation functions [62]. Other tests may yet prove possible [37, 314], but, as they are strong-field gravity tests by nature, they all depend strongly on the parent relativistic theory (Section 7) and how it converges towards GR [22]. So, in Table 2, we list the status of solar-system tests as unclear, depending on the parent relativistic theory.

An important aspect of galactic disks is their stability (see Section 6.5.3). Indeed, the need to stabilize disks was one of the early motivations for invoking dark matter [343]. MOND appears able to provide the requisite stability [77]. Indeed, it gives good reason [299] for the observed maximum in the distribution of disk galaxy surface densities at ∼ Σ = a0/G (Freeman’s limit: Figure 8 and Kepler-like law no. 6 in Section 5.2). Disks with surface densities below this threshold are in the low acceleration limit and can be stabilized by MOND. Higher-surface-density disks would be purely in the Newtonian regime and subject to the usual instabilities. Going beyond the amount of stability required for existence, another positive aspect of MOND is that it does not over-stabilize disks. Features like bars and spiral arms are a natural result of disk self-gravity. Conventionally, large halo-to-disk mass ratios suppress the growth of such features, especially in LSB galaxies [291]. Yet such features are presentFootnote 70. The suppression is not as great in MOND [77], and numerical simulations appear to do a good job of reproducing the range of observed morphologies of spiral galaxies (solution to challenge no. 9 of Section 4.2, see [458]). Bars tend to appear more quickly and are fast, while warps can also be naturally produced (Section 6.5.3). There appears to be no reason why this should not extend to thin and bulgeless disks, whose ubiquity poses a challenge to galaxy formation models in ΛCDM. This particular point of creating large bulgeless disks (challenge no. 8 of Section 4.2) can actually be solved thanks to early structure formation followed by a low galaxy-interaction rate in MONDian cosmology (see Section 9.2), but this definitely warrants further investigation, so we mark this case as merely promising in Table 2.

Interacting galaxies are, by definition, non-stationary systems in which the customary assumption of equilibrium does not generally hold. This renders direct tests of MOND difficult. However, it is worth investigating whether commonly observed morphologies (e.g., tidal tails) are even possible in MOND. Initially, this seemed to pose a fundamental difficulty [279], as dark matter halos play a critical role in absorbing the orbital energy and angular momentum that it is necessary to shed if passing galaxies are to not only collide, but stick and merge. Nevertheless, recent numerical simulations appear to do a nice job of reproducing observed morphologies [459]. This is no trivial feat. While it is well established that dark matter models can result in nice tidal tails, it turns out to be difficult to simultaneously match the narrow morphology of many observed tidal tails with rotation curves of the systems from which they come [130]. Narrow tidal tails appear to be natural in MOND, as well as more extended resulting galaxies, thanks to the absence of angular momentum transfer to the dark halo (solution to challenge no. 7 of Section 4.2). Additionally, tidal dwarfs that form in these tails clearly have characteristics closer to those observed (see Section 6.5.4) than those from dark matter simulations [165, 309].

Spheroidal systems also provide tests of MOND (Section 6.6). Unlike the case of disk galaxies, where orbits are coplanar and nearly circular so that the centripetal acceleration can be equated with the gravitational force, the orbits in spheroidal systems are generally eccentric and randomly oriented. This introduces an unknown geometrical factor usually subsumed into a parameter that characterizes the anisotropy of the orbits. Accepting this, MOND appears to perform well in the classical dwarf spheroidal galaxies, but implies that the ultrafaint dwarfs are out of equilibrium (see Section 6.6.2). For small systems like the ultrafaint dwarfs and star clusters (Section 6.6.3) within the Milky Way, the external field effect (Section 6.3) can be quite important. This means that star clusters generally exhibit Newtonian behavior by virtue of being embedded in the larger galaxy. Deviations from purely Newtonian behavior are predicted to be subtle and are fodder for considerable debate [199, 397], rendering the present status unclear (Table 2). At the opposite extreme of giant elliptical galaxies (Section 6.6.1), the data accord well with MOND [323]. Indeed, bright elliptical galaxies are sufficiently dense that their inner regions are well into the Newtonian regime. In the MONDian context, this is the reason that it has historically been difficult to find clear evidence for mass discrepancies in these systems. The apparent need for dark matter does not occur until radii where the accelerations become low. That only spheroidal stellar systems appear to exist at surface densities in excess of Σ is the corollary of Freeman’s limit: such dense systems could not exist as stable disks, so must perforce become elliptical galaxies, regardless of the formation mechanism that made them so dense. That populations of elliptical galaxies should obey the Faber-Jackson relation (Kepler-like law no. 3 in Section 5.2, Figure 7) is also very natural to MOND [383, 395].

The largest gravitationally-bound systems are also spheroidal systems: rich clusters of galaxies. The situation here is quite problematic for MOND (Section 6.6.4). Applying MOND to ascertain the dynamical mass routinely exceeds the observed baryonic mass by a factor of 2 to 3. In effect, MOND requires additional dark matter in galaxy clusters. The need to invoke unseen mass is most unpleasant for a theory that otherwise appears to be a viable alternative to the existence of unseen mass. However, one should remember that the present-day motivation for studying MOND is driven by the observed phenomenology on galaxy scales, summarized above, and not by an aesthetic wish of getting rid of DM. What is more, parent relativistic theories of MOND might well involve additional degrees of freedom in the form of “dark fields”. But in any case, one must be careful not to conflate the rather limited missing mass problem that MOND suffers in clusters with the non-baryonic collisionless cold dark matter required by cosmology. There is really nothing about the cluster data that requires the excess mass to be non-baryonic, as long as it behaves in a collisionless way. There could for instance be baryonic mass in some compact non-luminous form (see Section 6.6.4 for an extensive discussion). This might seem to us unlikely, but it does have historical precedent. When Zwicky [518] first identified the dark matter problem in clusters, the mass discrepancy was of order ∼ 100. That is, unseen mass outweighed the visible stars by two orders of magnitude. It was only decades later that it was recognized that baryons residing in a hot intracluster gas greatly outweighed those in stars. In effect, there were [at least] two missing mass problems in clusters. One was the hot gas, which reduces the conventional discrepancy from a factor of ∼ 100 to a factor of ∼ 8 [175] in Newtonian gravity. From this perspective, the remaining factor of two in MOND seems modest. Rich clusters of galaxies are rare objects, so the total required mass density can readily be accommodated within the baryon budget of BBN. Indeed, according to BBN, there must still be a lot of unidentified baryons lurking somewhere in the universe. But the excess dark mass in clusters need not be baryonic, even in MOND. Massive ordinary neutrinos [389, 392] and light sterile neutrinos [9, 13] have been suggested as possible forms of dark matter that might provide an explanation for the missing mass in clusters. Both are non-baryonic, but as they are hot DM particle candidates, neither can constitute the cosmological non-baryonic cold dark matter. At this juncture, all we can say for certain is that we do not know what the composition of the unseen mass is. It could even just be evidence for the effect of additional “dark fields” in the parent relativistic formulation of MOND, such as massive scalar fields, vector fields, dipolar dark matter, or even subtle non-local effects (see Section 7).

There are other aspects of cluster observations that are more in line with MOND’s predictions. Clusters obey a mass-temperature relation that parallels the M ∝ T2 ∝ V4 prediction of MOND (Figures 39 and 48) more closely than the conventional prediction of M ∝ T3/2 expectation in ΛCDM, without the need to invoke preheating (a need that may arise as an artifact of the mismatch in slopes). Indeed, Figure 48 shows clearly both the failing of MOND in the offset in characteristic acceleration between clusters and lower mass systems, and its successful prediction of the slope (a horizontal line in this figure). A further test, which may be important is the peculiar and bulk velocity of clusters. For example, the collision velocity of the bullet cluster is so largeFootnote 71 as to be highly improbable in ΛCDM (occurring with a probability of ∼ 10−10 [249]). In contrast, large collision velocities are natural to MOND [16]. Similarly, the large scale peculiar velocity of clusters is observed to be ∼ 1000 km/s [221], well in excess of the expected ∼ 200 km s−2. Ongoing simulations with MOND [11] show some promise to produce large peculiar velocities for clusters. In general, one would expect high speed collisions to be more ubiquitous in MOND than ΛCDM.

An important line of evidence for mass discrepancies in the universe is gravitational lensing in excess of that expected from the observed mass of lens systems. Lensing is an intrinsically relativistic effect that requires a generally covariant theory to properly address. This necessarily goes beyond MOND itself into specific hypotheses for its parent theory (Section 7), so is somewhat different than the tests discussed above. Broadly speaking, tests involving strong gravitational lensing fare tolerably well (Section 8.1), whereas weak lensing tests, that are sensitive to larger-scale mass distributions, are more problematic (Sections 8.2, 8.3, and 8.4) or simply crash into the usual missing mass problem of MOND in clusters. Note that weak lensing in relativistic MOND theories produces the same amount of lensing as required from dynamics, so this is not the problem. The problematic fact is just that some tests seem to require more dark matter than the effect of MOND provides.

On larger (cosmological) scales, MOND, as a modification of classical (non-covariant) dynamics, is simply unsatisfactory or mute. MOND itself has no cosmology, providing analogs for neither the Friedmann equation for the dynamics of the universe, nor the Robertson-Walker metric for its geometry. For these, one must appeal to specific hypotheses for the relativistic parent theory of MOND (Section 7), which is far from unique, and theoretically not really satisfactory, as none of the present candidates emerges from first principles. At this juncture, it is not clear whether a compelling candidate cosmology will ever emerge. But on the other hand, there is nothing about MOND as a paradigm that contradicts per se the empirical pillars of the hot big bang: Hubble expansion, BBN, and the relic radiation field (Section 9). The formation of large scale structure is one of the strengths of conventional theory, which can be approached with linear perturbation theory. This leads to good fits of the power spectrum, both at early times (z ≈ 1000 in the cosmic microwave background) and at late times (the z = 0 galaxy power spectrum [452]). In contrast, the formation of structure in MOND is intrinsically non-linear. Therefore, it is unclear whether MOND-motivated relativistic theories will inevitably match the observed galaxy power spectrum, a possible problem being how to damp the baryon acoustic oscillations [127, 430]. At this stage, a unique prediction does not exist. Nevertheless, there are two aspects of structure formation in MOND that appear to be fairly generic and distinct from ΛCDM. The stronger effective long range force in MOND speeds the growth rate, but has less mass to operate with as a source. Consequently, radiation domination persists longer and structure formation is initially inhibited (at redshifts of hundreds). Once structure begins to form, the non-linearity of MOND causes it to proceed more rapidly than in GR with CDM. Three observable consequences would be (i) the earlier emergence of large objects like galaxies and clusters in the cosmic web (as well as the associated low interaction rate at smaller redshifts) providing a possible solution to challenge no. 2 of Section 4.2 [11], (ii) the more efficient evacuation of large voids (possible solution to challenge no. 3 of Section 4.2), and (iii) larger peculiar (and collisional [16]) velocities of galaxy clusters (solution to challenge no. 1 of Section 4.2). However, the potential downside to rapid structure formation in MOND is that it may overproduce structure by redshift zero [341, 250].

The final entries in Table 2 regard the cosmic microwave background, discussed in more detail in Section 9.2. The third peak of the acoustic power spectrum of the CMB poses perhaps the most severe challenge to a MONDian interpretation of cosmology. The amplitude of the third peak measured by WMAP is larger than expected in a universe composed solely of baryons [442]. This implies some substance that does not oscillate with the baryons. Cold dark matter fits this bill nicely. In the context of MOND, we must invoke some other massive substance (i.e., non-baryonic dark matter such as, e.g., light sterile neutrinos [9]) that plays the role of CDM, or rely on additional degrees of freedom in the relativistic parent theory of MOND (see Section 7) that would have the same net result (see the extensive discussion in Section 9.2), or even combine non-baryonic dark matter with these additional degrees of freedom [430]. While these are real possibilities, neither are particularly appealing, any more than it is to invoke CDM with complex fine-tuned feedback to explain rotation curves that apparently require only baryons as a source.

The missing baryon problem that MOND suffers in rich clusters of galaxies and the third peak of the acoustic power spectrum of the CMB are thus the most serious challenges presently facing MOND. But even so, the interpretation of the acoustic power spectrum is not entirely clear cut. Though there is no detailed fit to the power spectrum in MOND (unless we invoke 10 eV-scale sterile neutrinos [9]), MOND did motivate the prediction [265] of two aspects of the CMB that were surprising in ΛCDM (see Section 9.2). The amplitude ratio of the first-to-second peak in the acoustic power spectrum was outside the bounds expected ahead of time by ΛCDM for from BBN as it was then known (see Section 9.2). In contrast, the first:second acoustic peak ratio that is now well measured agrees well with the quantitative value predicted in advance for the case of the absence of cold dark matter [268, 269]. Similarly, the rapid formation of structure expected in MOND leads naturally to an earlier epoch of re-ionization than had been anticipated in ΛCDM [265, 269]. Thus, while the amplitude of the third peak is clearly problematic and poses a severe challenge to any MOND-inspired theories, the overall interpretation of the CMB is debatable. While the existence of non-baryonic cold dark matter is the most obvious explanation of the third peak indeed, it is not at all obvious that straightforward CDM — in the form of rather simple massive inert collisionless particles — is uniquely required.

Science is, in principle, about theories or models that are falsifiable, and thus that are presently either falsified or not. But in practice it does not (and cannot) really work that way: if a model that was making good predictions up to a certain point suddenly does not work anymore (i.e., does not fit some new data), one obviously first tries to adjust it to make it fit the observations rather than throwing it away immediately. This is what one calls the requisite “compensatory adjustments” of the theory (or of the model): Popper himself drew attention to these limitations of falsification in The Logic of Scientific Discovery [355]. In the case of the ΛCDM model of cosmology, which is mostly valid on large scales, the current main trend is to find the “compensatory adjustments” to the model to make it fit in galaxies, mainly by changing (or mixing) the mass(es) of the dark matter particles, and/or through artificially fine-tuned baryonic feedback in order to reproduce the success of MOND. Incidentally, exactly the same is true for MOND, but for the opposite scales: MOND works remarkably well in galaxies but apparently needs compensatory adjustments on larger scales to effectively replace CDM. Now does that mean that falsification is impossible? That all models are equal? Surely not. In the end, a theory or a model is really falsified once there are too many compensatory adjustments (needed in order to fit too many discrepant data), or once these become too twisted (like Tycho Brahe’s geocentric model for the solar system). But there is obviously no truly quantitative way of ascertaining such global falsification. How one chooses to weigh the evidence presented in this review necessarily informs one’s opinion of the relative merits of ΛCDM and MOND. If one is most familiar with cosmology and large scale structure, ΛCDM is the obvious choice, and it must seem rather odd that anyone would consider an alternative as peculiar as MOND, needing rather bizarre adjustments to match observations on large scales. But if one is more concerned with precision dynamics and the observed phenomenology in a wide swath of galaxy data, it seems just as strange to invoke non-baryonic cold dark matter together with fine-tuned feedback to explain the appearance of a single effective force law that appears to act with only the observed baryons as a source. Perhaps the most important aspect before one throws away any model is to have a “simpler” model at hand, that still reproduces the successes of the earlier favored model but also naturally explains the discrepant data. In that sense, right now, it is absolutely fair to say that there is no alternative, which really does better overall than ΛCDM, and in favor of which Ockham’s razor would be. However, it would probably be a mistake to persistently ignore the fine-tuning problems for dark matter and the related uncanny successes of the MOND paradigm on galaxy scales, as they could very plausibly point at a hypothetical better new theory. It is also important to bear in mind that MOND, as a paradigm or as a modification of Newtonian dynamics, is not itself generally covariant. Attempts to construct relativistic theories that contain MOND in the appropriate limit (Section 7) are correlated but distinct efforts, and one must be careful not to conflate the two. For example, some theories, like TeVeS (Section 7.4), might make predictions that are distinct from GR in the strong-field regime. Should future tests falsify these distinctive predictions of TeVeS while confirming those of GR, this would perhaps falsify TeVeS as a viable parent theory for MOND, but would have no bearing on the MONDian phenomenology observed in the weak-field regime, nor indeed on the viability of MOND itself. It would perhaps simply indicate the need to continue to search for a deeper theory. It would, for instance, be extremely alluring if one could manage to find a physical connection between the dark energy sector and the possible breakdown of standard dynamics in the weak-field limit, since both phenomena would then simply reflect discrepancies with the predictions of GR when \(\Lambda \sim a_0^2\) is set to zero (see, e.g., Section 7.10). Of course, it is perfectly conceivable that such a deep theory does not exist, and that the apparent MONDian behavior of galaxies will be explained through small compensatory adjustments of the current ΛCDM paradigm, but one has yet to demonstrate how this will occur, and it will inevitably involve a substantial amount of fine-tuning that will have to be explained naturally. In any case, the existence of a characteristic acceleration a0 (Figure 48) playing various different roles in many seemingly-independent galactic scaling relations (see Sections 4.3 and 5.2) is by now an empirically established fact, and it is thus mandatory for any successful model of galaxy formation and evolution to explain it. The future of this field of research might thus still be full of exciting surprises for astronomers, cosmologists, and theoretical physicists.