When confronting predictions from general relativity with astronomical observations we usually model light propagation in terms of rays, and for most applications the influence of a medium on the light rays can be neglected. It is generally assumed by relativists, as a matter of course, that then light rays can be identified with lightlike geodesics of the spacetime metric. On the other hand, we know that light is an electromagnetic phenomenon. Therefore, at a more fundamental level light propagation should be modelled in terms of wavelike solutions to Maxwell’s equations on the spacetime manifold. For consistency, it is of crucial relevance to make sure that the description of light in terms of rays is a mathematically well-defined approximation of a description in terms of Maxwell fields. The 1967 paper by Jürgen Ehlers that we revisit here was the first publication where this approximation was rigorously worked out on an arbitrary general-relativistic spacetime.

It is true that many later text-books on general relativity contain a section where the transition from Maxwell’s equations to ray optics is treated. However, it is nonetheless worthwhile to study Ehlers’ original article. It combines conceptual clarity with mathematical rigour in a way that is characteristic for all publications by this author, and it goes beyond the textbook presentations in several respects:

  1. (a)

    Whereas most textbooks restrict to Maxwell’s equations in vacuum, Ehlers allows for a medium in arbitrary (subluminal, of course) motion with a scalar permittivity \(\varepsilon >0\) and a scalar permeability \(\mu >0\) that may both depend on the spacetime point. In the vacuum case one has \(\varepsilon =1\) and \(\mu = 1\).

  2. (b)

    He derives not only the lowest-order (“geometric optics”) approximation, as most textbooks do, but an infinite hierarchy of equations that can be solved iteratively.

  3. (c)

    He performs the entire approximation procedure in terms of the electromagnetic field tensor, rather than in terms of the electromagnetic potential. In this way all results are manifestly gauge-invariant.

The basic hypotheses are outlined in Section 1: On a four-dimensional manifold with an unspecified spacetime metric \(g_{ab}\) of Lorentzian signature, the source-free Maxwell equations are assumed to hold in a medium of the above-mentioned type. With the help of the four-velocity of the medium \(u^a\) and the index of refraction \(n=\sqrt{\varepsilon \mu }\), one can construct the socalled “optical metric” \(\overline{g}{}_{ab}= g_{ab} +(1-n^{-2}) u_au_b\) which is again of Lorentzian signature. This is a standard procedure which goes back to the German physicist Walter Gordon who introduced the optical metric already in a 1923 paper that is quoted as Ref. 9 in the Ehlers paper. Less common is the way in which Ehlers constructs a complex-valued antisymmetric second-rank tensor field \(G_{ab}\), depending on \(u^a\), \(\varepsilon \) and \(\mu \), out of the electromagnetic field tensor and its Hodge dual. Then the entire system (6) of Maxwell’s equations, rewritten as a system of first-order partial differential equations for \(G_{ab}\), reduces to a single complex tensor equation (8) with four components. Here it is important to keep in mind that from Eqns. (6) onwards the optical metric is used for all mathematical operations such as raising and lowering indices, forming the Hodge dual and covariant differentiation. There is a remark, a few lines below Eqns. (6), that is rather non-trivial: Ehlers claims that “similarly to the vacuum Maxwell equations, the Eqns. (6) are conformally invariant. This means that \(\ldots \) Eqns. (6) require not the Riemannian metric \(g_{ab}\) itself but only the conformal structure it defines” on the spacetime manifold. In the vacuum case this is obviously true. In the case of a medium, however, the statement requires a careful analysis. If one conformally transforms the spacetime metric, \(g_{ab} \mapsto e^{2f} g_{ab}\), one has to rescale the four-veloctiy of the medium, \(u^a \mapsto e^{-f} u^a\), while \(\varepsilon \) and \(\mu \) are naturally left unchanged. As at this stage the spacetime metric is used for lowering indices, we have \(u_a \mapsto e^f u_a\). Then the optical metric (3) undergoes the same conformal tranformation as the spacetime metric, \(\overline{g}{}_{ab} \mapsto e^{2f} \overline{g}{}_{ab}\), and the Maxwell equations (6) remain indeed unchanged.

Section 2 is entitled “Locally approximate plane waves”. This term characterises a special series ansatz, given in equation (9), which from now on plays a central role for the entire paper. It is worthwhile to look at this ansatz very carefully. It gives a family of electromagnetic field tensors, depending on a parameter \(\varepsilon \). In a footnote Ehlers alerts the reader that in all that follows \(\varepsilon \) denotes this parameter; the permittivity will not explicitly appear anymore. \(\varepsilon \) is what is often called a “book-keeping parameter”: It has no direct physical meaning by itself and is just introduced as a mathematical tool for comparing orders of terms. Ansatz (9) consists of two series, each with an amplitude that is a power series in \(\varepsilon \) with coefficients \(K_{\nu }\) and \(L_{\nu }\), respectively, the first one with a phase \(S(x)/\varepsilon \) and the second one with a phase \(-S(x)/\varepsilon \) where S(x) is a scalar function. Ehlers does not give it a particular name, but in many other publications it is called the “eikonal function”. This ansatz describes, indeed, a locally approximate plane wave in the following sense: On a sufficiently small spacetime region we may consider the gradient of S, which is explicitly required to be non-zero, as approximately constant. If it were exactly constant, the series would describe exactly a plane wave (in the chosen coordinates), with wave covector \(\partial _a S/\varepsilon \). (Note that for a scalar function S the partial derivative \(\partial _a S\) is the same as the covariant derivative \(\nabla _a S\), for any linear connection.) That is why we may say that, on a sufficiently small spacetime region, the series describes to within a good approximation a plane wave with wave covector \(\partial _a S/\varepsilon \). Then an observer at the event x with four-velocity \(V^a (x) \) associates with this wave a frequency \(\omega (x) = -V^a (x) \partial _a S (x)/\varepsilon \). In this sense, small values of \(\varepsilon \) correspond to high frequencies. Physicists will immediately recognise that the series ansatz (9) resembles the Wentzel–Kramers–Brioullin (WKB) series which are discussed in all textbooks on quantum mechanics. However, there are several important differences. Firstly, the WKB series refers to solutions to the Schrödinger equation which is an equation for a scalar wave function \(\psi \); here we have an equation for a second-rank tensor field. Secondly, in the WKB series the expansion parameter is the Planck constant \(\hbar \) which occurs explicitly in the Schrödinger equation; here the expansion is made with respect to a book-keeping parameter that is introduced by hand. Thirdly, the WKB expansion is usually done only with one series, with S(x) in the phase, and not with a second one with \(-S(x)\) in the phase. Again, this difference has its origin in the fact that the Schrödinger wave function is a scalar whereas the electromagnetic field is a tensorial quantity: As Ehlers emphasises in a brief remark, even for light propagation in vacuum it is necessary to consider both series if one wants to take all polarisation states into account.

As it is characteristic for all of his papers, Ehlers now very precisely states the mathematical conditions he is going to use in the following: He assumes that the series (9) converges pointwise, and that its derivatives with respect to the coordinates converge uniformly on the considered spacetime region. We have to come back later to the relevance of these assumptions. Then the procedure is the following: By inserting the ansatz (9) into Maxwell’s equations and comparing order by order the coefficients of powers of \(\varepsilon \) one gets a hierarchy of equations that can be solved iteratively. The lowest non-trivial order, which is known as the “geometric optics approximation”, tells us that the eikonal function S(x) has to satisfy the “eikonal equation” (11); moreover, at each order \(\nu \) we get algebraic conditions and differential equations, often called “transport equations”, for the amplitudes \(K_{\nu }\) and \(L_{\nu }\). The eikonal equation says that the integral curves of the vector field \(k^a= \overline{g}{}^{ab} \partial _b S\) are lightlike geodesics of the optical metric; these are the “rays” associated with the considered series solution to Maxwell’s equations. In the analogy to the WKB series, the rays correspond to the classical trajectories and the eikonal equation corresponds to the Hamilton-Jacobi equation. If the rays and the amplitudes up to a certain order are known, at the next order the transport equations give us ordinary differential equations along each ray which have a unique solution once appropriate initial conditions are given on a hypersurface transverse to the rays, see Fig.1 in the Ehlers paper. Here saying that the initial conditions have to be “appropriate” refers to the fact that there are also algebraic conditions on the amplitudes to be satisfied. The latter determine the allowed polarisation states.

Section 3 gives some additional information on the geometric optics approximation. In particular it is shown that, to within this approximation, the energy flux of the electromagnetic field is aligned with the rays. This justifies the claim that for sufficiently small \(\varepsilon \), i.e., for sufficiently high freqencies, light propagates along lightlike geodesics of the optical metric which in vacuum reduces to the spacetime metric.

What remains to be discussed is the relevance of the convergence condition. Ehlers makes a short but important remark in the last paragraph of Section 2: He says that, even if the series that has been constructed by iteratively solving the hierarchy of equations does not converge, one may expect that it gives a good approximation to exact Maxwell fields if \(\varepsilon \) is small. Actually, it is known that in almost all cases the series does not converge. Of course, there are special cases where the series breaks off after finitely many terms which is true, e.g., for exact plane waves on Minkowski spacetime. However, with the exception of such trivial cases it is very difficult to find examples where the convergence condition is satisfied. Nonetheless, as anticipated by Ehlers’ remark, the method always gives good approximations for small \(\varepsilon \) in the following sense. If we have solved the hierarchy of equations up to a certain order \(\nu \) and break off the series after this order, we get what may be called a \(\nu \)th order asymptotic solution of Maxwell’s equations that depends on the parameter \(\varepsilon \). Then there is a one-parameter family of exact Maxwell fields, also depending on \(\varepsilon \), such that the difference between the asymptotic solution and the exact solution goes to zero in the high-frequency limit \(\varepsilon \rightarrow 0\). This was proven, for a class of media that may be anisotropic and are thus more general than the ones considered in Ehlers’ article, in Section 2.7 of Perlick [1].

In view of applications to astrophysics, it is relevant to generalise the results of Ehlers’ 1967 paper to the case of a dispersive medium, i.e., to the case that the propagation of light depends on the frequency. In almost all applications where the influence of a medium on the rays is not negligible the medium is a plasma and dispersion has to be taken into account. For a pressure-less (“cold”) magnetised electron-ion plasma, this was worked out by Ehlers together with Reinhard Breuer in 1980 [2, 3]. As far as the methodology is concerned, the most important new feature is in the fact that now one needs two book-keeping parameters. The high-frequency limit corresponds to sending both parameters to zero in a certain synchronised way.

Finally, it should be mentioned that the transition from Maxwell’s equations to geometric optics can also be performed in a completely different way: Instead of analysing approximate plane wave solutions in the high-frequency limit one may consider the propagation of discontinuities of solutions to Maxwell’s equations. In this approach, which goes back to the French mathematician Jacques Hadamard, the eikonal equation comes about as the equation for hypersurfaces on which discontinuities in the first derivatives of the field tensor may occur. In the mathematical literature such hypersurfaces are called “characteristic hypersurfaces” and the eikonal equation is known as the “characteristic equation”. By considering discontinuities of higher order one gets a hierarchy of equations, quite analogous to what one gets in the asymptotic series approach. For Maxwell’s equations on a spacetime manifold, details of this approach can be found e. g. in the book by Hehl and Obukhov [4]. From a mathematical point of view it is a matter of taste which of the two approaches one prefers. In view of physical interpretation, however, the use of approximate plane waves is advantageous: As demonstrated in the Ehlers paper, it directly associates with the rays the energy flux of electromagnetic waves of sufficiently high frequency.