Causality for nonlocal phenomena

Drawing from the theory of optimal transport we propose a rigorous notion of a causal relation for Borel probability measures on a given spacetime. To prepare the ground, we explore the borderland between causality, topology and measure theory. We provide various characterisations of the proposed causal relation, which turn out to be equivalent if the underlying spacetime has a sufficiently robust causal structure. We also present the notion of the 'Lorentz-Wasserstein distance' and study its basic properties. Finally, we discuss how various results on causality in quantum theory, aggregated around Hegerfeldt's theorem, fit into our framework.


Introduction
The notion of a space, understood as a set of points, provides an indispensable framework for every physical theory. But, regardless of the physical system that is being modelled, the space itself is not directly observable. Indeed, any measuring apparatus can provide information about the localisation only up to a finite resolution. In the relativistic context, it means that the event is an idealised concept, which is not accessible to any observer.
Apart from the 'practical' obstructions for measuring position, there exist also fundamental ones due to the quantum effects manifest at small scales. Although non-relativistic quantum mechanics does not impose any a priori restrictions on the accuracy of the position measurement, in quantum field theory a suitable 'position operator' is always nonlocal (see for instance [10,16,41]). Moreover, an attempt to perform a very accurate measurement of localisation in spacetime would require the use of signals of very short wavelength, resulting in an extreme concentration of energy. The latter would eventually lead to black hole formation and the desired information would become trapped [13,14].
It is generally believed that any physical theory should be causal, i.e. that no information can be transmitted with the speed exceeding the velocity of light. Indeed, despite some controversies (compare for example [2] and [26]), no evidence of a physical process that would involve superluminal signalling was found in any system (see for instance [47]), even at the level of Planck scale [1].
In relativity theory it is straightforward to implement the postulate of causality as the Lorentzian metric induces a precise notion of causal curves. Although Einstein's equations admit spacetime solutions with closed causal curves, they are usually discarded as unphysical [23].
On the other hand, the status of causality in quantum theory is much more subtle because of its nonlocal nature. Hegerfeldt's theorem [24] (see also [25,27,28]) implies that during the evolution of a generic quantum system driven by a Hamiltonian bounded from below, an initially localised state 1 immediately develops infinite tails. However, whereas initial localisation implies the breakdown of Einstein causality, the use of nonlocal states does not guarantee a subluminal evolution. In fact, the results of Hegerfeldt suggest [25] that acausal evolution is a feature of the quantum system and not the state. In other words, if a system impels a superluminal propagation one could use nonlocal states to effectuate a faster-than-light communication.
In quantum field theory the nonlocality is even more prevailing, but it does not allow for communication between spacelike separated regions of spacetime [15]. There is thus strong evidence that quantum theory, despite its inherent nonlocality, conforms to the principle of causality [36]. In fact, the request of no faster-than-light signalling is often used as a guiding principle to restrain the admissible quantum theories [5,21] and their possible extensions [34]. In quantum field theory it is reflected by promoting the principle of microscopic causality to one of the axioms [22,40].
However, the study of causation in quantum theory (and other nonlocal theories) is far from being complete. One of the stumbling blocks is the lack of a suitable notion of causality for nonlocal objects, like wave functions. To properly investigate Einstein's principle, one needs to disentangle nonlocality from the potential causality violation, as for instance the interference fringes can travel superluminally, but cannot be utilised to send information [8]. Also -to our knowledge -in the study of causality in quantum systems, time was treated as an external parameter, whereas the most riveting consequences of Einstein causality, in particular the existence of horizons, manifest themselves in curved spacetimes.
The aim of this paper is to provide a rigorous notion of a causal relation between probability measures on a given spacetime. These can be utilised to model classical spread objects, for instance charge or energy density, as well as quantum probabilities obtained via the 'modulus square principle' from wave functions. Moreover, one can make use of probability measures to take into account experimental errors, as the measurement of any physical object's spacetime localisation would effectively be vitiated by an error resulting from the apparatus' imperfection.
We allow the probability measures to be spread also in the timelike direction, as typical states in quantum field theory extend over the whole spacetime [11]. We work in a generally covariant framework, hence our definitions and results apply to any curved spacetime with a sufficiently rich causal structure.
The paper is organised as follows: In Section 2 we recall some basic notions in topology, measure theory and causality, to make the paper self-contained and accessible to a broad range of researchers. Section 3 contains the first result on the verge of causality and measurability, which establishes the foundations for the developed theory. The main concepts and results of the paper are aggregated in Section 4. We start off with a 'dual' definition of the causal relation, based on the notion of causal functions [32,Definition 2.3], proposed in [17] in a much wider context of noncommutative geometry. In several steps we show that it encapsulates an intuitive notion of causality for nonlocal objects: Each infinitesimal part of the probability density should travel along a futuredirected causal curve.
At each step we keep the causality conditions imposed on the underlying spacetime as low as possible. At the same time we provide several characterisations of causality for probability measures, which illustrate the concept and provide tools for concrete computations.
Motivated by the main result, we put forward in Section 4.2 a definition of a causal relation between the probability measures, valid on any spacetime, and study its properties. In particular, we demonstrate that the proposed relation is a partial order in the space of Borel probability measures on a given spacetime M, even with a relatively poor causal structure.
Finally, drawing from the theory of optimal transport adapted to the relativistic setting we propose in Section 5 a notion of the 'Lorentz-Wasserstein' distance in the space of measures.
We conclude, in Section 6, with an outlook into the possible future developments and applications. In particular, we briefly discuss the potential use of the presented results in the study of causality in quantum theory. We also address the interrelation of probability measures with states on C * -algebras. In this way we provide a link with the notion of 'causality in the space of states' proposed originally in [17] in the framework of noncommutative geometry.
The space of continuous, continuous and bounded, continuous and compactly supported real-valued functions on a topological space M will be respectively denoted by C(M), C b (M), C c (M). Analogous spaces of smooth functions will be respectively denoted by If M is Hausdorff, then every its compact subset is closed. If M is second-countable, that is if M has a countable base, then the notions of compactness and sequential compactness coincide.
A Hausdorff 2 space M is called locally compact iff every its point has a precompact neighbourhood.
M is called separable iff there exists a countable subset {a n } n∈N ⊆ M dense in M.Every open subspace of a separable space is itself separable. M is called (completely) metrisable iff there exists a (complete) metric ρ : M 2 → R ≥0 inducing its topology. Fixing a metric allows one to talk about balls. By B(x, ε) := {y ∈ M | ρ(x, y) < ε} we denote an open ball centered at x ∈ M of radius ε > 0. By B(x, ε) we denote its closure. Finally, M is called Polish iff it is separable and completely metrisable.
In the following, we are going to work with spacetimes (see Section 2.

Measure theory
Let M be a topological space. The σ-algebra of Borel sets B(M) is the smallest family of subsets of M containing the open sets, which is closed under complements and countable unions (and hence also under countable intersections). If M is a Hausdorff space then, in particular, its σ-compact subsets are Borel. A function f : M 1 → M 2 between topological spaces is called Borel iff f −1 (V ) ∈ B(M 1 ) for any V ∈ B(M 2 ). Every continuous (or even semi-continuous) real-valued function is Borel, but not vice versa.
A Borel probability measure on M is a function µ : A Borel set whose measure µ is zero is called µ-null. The pair (M, µ) is called a probability space. The set of Borel probability measures on M will be denoted by P(M).

Causality theory
For a detailed exposition of causality theory the reader is referred to [6,31,33,35].
Recall that a spacetime is a connected time-oriented Lorentzian manifold. Causality theory introduces and studies certain binary relations between points (i.e. events) of a given spacetime M. Namely, for any p, q ∈ M, we say that p causally (chronologically, horismotically) precedes q, what is denoted by p q (resp. p ≪ q, p → q), iff there exists a piecewise smooth future-directed causal (resp. timelike, null) curve γ : [0, 1] → M from p to q, i.e. γ(0) = p and γ(1) = q.
Clearly the relations and ≪ are transitive and is also reflexive. Moreover ([33, Chapter 14, Corollary 1]), To denote (≪, →) understood as a subset of M 2 it is customary to use the symbol J + (resp. I + , E + ). I + is open and equal to int J + , and so the causal structure of M is completely determined by the relation and the topology of M. Moreover, I + = J + , For any X ⊆ M one defines If X is a singleton, one simply writes J ± (p) instead of J ± ({p}). Notice that J ± (X ) = p∈X J ± (p).
Let now U ⊆ M be an open subset of M. One defines U to be the causal precedence relation on U treated as a spacetime on its own right. By analogy with J + , we denote J + U := {(p, q) ∈ U 2 | p U q}. Notice that J + U ⊆ J + ∩ U 2 , but not necessarily vice versa because p U q requires a piecewise smooth future-directed causal curve from p to q not only to exist, but also to be contained in U.
A subset F ⊆ M is called a future set iff 4 J + (F ) = F . Similarly, subset P ⊆ M is called a past set iff J − (P) = P. Usually it is required that future and past sets be open by definition. However, if we drop this assumption future and past sets behave more naturally under set-theoretical operations.
Proof: The statement is proven by the following chain of equivalences: Notice that only the inclusion '⊆' is nontrivial in the definition of a future (past) set.
If X α 's are past sets, simply replace J + with J − in the previous sentence.
We have thus shown that a union of the family of future (past) sets is a future (past) set. To obtain an analogous result for the intersection, one simply uses Proposition 1 and de Morgan's laws.
• a causal function iff it is non-decreasing along every future-directed causal curve; • a generalised time function iff it is increasing along every future-directed causal curve; • a time function iff it is a continuous generalised time function; • a temporal function iff it is a smooth function with past-directed timelike gradient.
Each of the above properties is stronger than the preceding one.
'iii) ⇒ i)' Assume f is not causal, i.e. there exist p, q ∈ M such that p q but f (p) > f (q). We claim that f −1 ([f (p), +∞)) is not a future set. Indeed, were it a future set, then, since it clearly contains p, it would contain q as well. But this would mean that f (q) ≥ f (p), in contradiction with the assumption.
On the other hand, future sets can be characterised by means of their indicator function being causal.
Corollary 1. Let M be a spacetime. F ⊆ M is a future set iff the function 1 F is causal.
Proof: Observe that · By equivalence 'i) ⇔ iii)' from Proposition 3, we immediately obtain the desired equivalence.
Volume functions are causal and semi-continuous and hence Borel. For any p, q ∈ M letĈ(p, q) denote the set of piecewise smooth future-directed causal curves from p to q. The Lorentzian distance (or time separation) is the map d : Its basic properties include: i) For any p, q ∈ M d(p, q) > 0 ⇔ p ≪ q.
ii) The reverse triangle inequality holds. Namely, for any p, q, r ∈ M p r q ⇒ d(p, r) + d(r, q) ≤ d(p, q).
v) The map d is lower semi-continuous [33, Chapter 14, Lemma 17] and hence Borel.
The causal ladder is a hierarchy of spacetimes according to strictly increasing requirements on their causal properties [6]. The rungs of this ladder, from the top to the bottom, read: Globally hyperbolic ⇒ Causally simple ⇒ Causally continuous ⇒ Stably causal ⇒ Strongly causal ⇒ Distinguishing ⇒ Causal ⇒ Chronological Each level of the hierarchy can be defined in many equivalent ways. Below we present only these definitions, characterisations and properties, of which we make use in the paper. For the complete review of the causal hierarchy, consult [31,Section 3].
M is chronological iff it satisfies one of the following equivalent conditions: i) p ≪ p for all p ∈ M.
ii) No timelike loop exists.
iii) Any volume function is increasing along every future-directed timelike curve. iv) d(p, p) = 0 for all p ∈ M.
M is causal iff it satisfies one of the following equivalent conditions: i) The relation is a partial order, meaning that in addition to being reflexive and transitive, it is also antisymmetric.

ii) No causal loop exists.
M is future (past) distinguishing iff it satisfies one of the following equivalent conditions: i) For any p, q ∈ M, the equality I + (p) = I + (q) (resp. I − (p) = I − (q)) implies that p = q.
ii) Any future (past) volume function is a generalised time function [6,Proposition 3.24].
M is distinguishing iff it is both future and past distinguishing. M is strongly causal iff the family {I + (p) ∩ I − (q) | p, q ∈ M} is a base of the standard manifold topology of M. It is stably causal iff it admits a time function or, equivalently, iff it admits a temporal function [7]. It is causally continuous iff any volume function is a time function.
M is causally simple iff it is causal and satisfies one of the following equivalent conditions [ Before providing a definition of the top level of the causal hierarchy, recall that a curve γ : (a, b) → M with −∞ ≤ a < b ≤ +∞ is called extendible iff it has a continuous extension onto [a, b) or onto (a, b]. Otherwise such a curve is called inextendible. Recall also that a Cauchy hypersurface is a subset S ⊆ M which is met exactly once by any inextendible timelike curve. Any such S is a closed achronal (i.e. S 2 ∩ I + = ∅) topological hypersurface, met by every inextendible causal curve [33,Chapter 14,Lemma 29.]. However, such an S need not be acausal (i.e. S 2 ∩ J + might be nonempty).
M is globally hyperbolic iff it satisfies one of the following equivalent conditions: i) M is causal and the sets J + (p) ∩ J − (q) are compact for all p, q ∈ M; ii) M admits a smooth temporal function T , the level sets of which are (smooth spacelike) Cauchy hypersurfaces [7].
In a globally hyperbolic spacetime the Lorentzian distance d is finite-valued and continuous. Moreover, for every (p, q) ∈ J + there exists a causal geodesic γ of length d(p, q) [33,Chapter 14].
3 On the σ-compactness of J + The purpose of this section is to prove the following theorem.
Theorem 4. Let M be a spacetime. Then J + ⊆ M 2 is a σ-compact set.
Let us note here that property is automatic in causally simple spacetimes. Indeed, let (K n ) n∈N be an exhaustion of M with compact sets and notice that J + = m,n∈N is compact for any m, n ∈ N.
In the proof of Theorem 4, however, we shall make no assumptions on the causal properties of M.
Theorem 4 implies that J + is Borel for any spacetime. As we shall see, it also implies that J ± (X ) is Borel for any closed X ⊆ M. Moreover, previous statements are still true if we replace J ± with E ± .
Theorem 4 thus settled in the overlap of causality theory, topology and measure theory. Whereas the interplay between the causal and topological properties of spacetimes is relatively well understood, the question of Borelness of causal futures -a fundamental one from the point of view of any conceivable measure-theoretical extension of causality theory -has never been addressed to authors' best knowledge.
We recall the notion of simple convex sets (called also simple regions) [35, Section 1]. Loosely speaking, they are small patches of the spacetime M with 'nice' topological, differential and causal properties, and which constitute a countable cover of the entire spacetime.
Concretely, let M be a spacetime. Then for any p ∈ M there exists a star-shaped neighbourhood Q ⊆ T p M containing the zero vector and such that the exponential map exp p restricted to Q is a diffeomorphism. The image of this diffeomorphism exp p (Q) is called a normal neighbourhood of p. Every event has a neighborhood U which is a normal neighbourhood of any p ∈ U. Such U is called convex. If U ⊆ M is convex, then it is open and for any p, q ∈ U there exists precisely one geodesic from p to q which is contained in From the point of view of causality theory, the following property of convex sets will be crucial: if U ⊆ M is convex, then J + U is a closed subset of U 2 [33, Lemma 14.2]. Finally, a convex set N is called simple iff it is precompact and contained in another convex set U.
Any spacetime M can be covered with a family of simple convex sets [35,Proposition 1.13]. This cover can be chosen countable, because every spacetime is a Lindelöf space.
Proof of Theorem 4: Fix a countable, locally finite family of simple convex sets {N i } i∈N covering M. Let also {U i } i∈N be a family of convex sets such that ∀ i ∈ N N i ⊆ U i , which exists by the very definition of a simple convex sets.
We introduce a couple more definitions.
that is the set containing all these pairs of points from N i which can be connected by a piecewise smooth future-directed causal curve contained in U i . For any X ⊆ M define, by analogy with (6), This is the set of all those pairs of points (p, q) ∈ N i 1 × N i 2 , which can be connected by a concatenation of two piecewise smooth future-directed causal curves, first of which is contained in U i 1 , while the other in U i 2 , and the concatenation point r must lie in the compact set N i 1 ∩ N i 2 . As above, we additionally define, for any X ⊆ M, and (9) .
It is crucial to understand what these sets contain (see Figure 1). Namely, J + is the set of all those pairs of points (p, q) ∈ N i 1 × N in which can be connected by a concatenation of n − 1 piecewise smooth future-directed causal curves, each being of the type discussed after the definition of J + (N i 1 ,N i 2 ) . The curves' concatenation points must lie in We now claim and shall prove inductively that Let us first prove the base case n = 2. Let {a m } m∈N be a dense subset of N i 1 ∩ N i 2 , which exists by separability of . The piecewise smooth curve from p to q shown is assumed causal and future-directed.
which would already mean that J + (N i 1 ,N i 2 ) is a closed subset of N i 1 × N i 2 (and hence also a compact subset of U i 1 × U i 2 ), because finite unions of closed sets are closed and so are any intersections of closed sets.
Indeed, to prove the inclusion '⊆', assume (p, q) it is possible to find m ∈ F k such that r ∈ B(a m , 1 k ). One can thus simply take p m := r =: q m .
On the other hand, to show the inclusion '⊇', let us assume that (p, q) We can thus construct the sequence {a m k } k∈N , which, being contained in the compact set We now invoke the fact that J + are closed subsets of U 2 i 1 and of U 2 i 2 , respectively. It implies that which completes the proof of (11) and of the base case of the induction.
We now move to the proof of the inductive step, which essentially goes along the same lines as the proof of the base case.
The assumption is that J The induction hypothesis states that J + Let {a m } m∈N denote now a dense subset of N in , and hence also a dense subset of N in . Similarly as before, for each k ∈ N consider the family {B(a m , 1 k )} m∈N covering N in , and take its finite subcover {B(a m , 1 k )} m∈F k . We now claim that which would already mean that J + (10)) and that

by the induction assumption and definitions
(by the base case and definitions (9)). To show the inclusion '⊆' in (12) One can thus simply take p m := r =: q m .
On the other hand, to show the inclusion '⊇', let us assume that (p, q) We can thus construct the sequence (a m k ) k∈N , which, being contained in the compact set N in , has a subsequence (a m k l ) l∈N convergent to some a ∞ ∈ N in . Analogously as before, we argue that also the sequences (p m k ), (q m k ) have subsequences converging to a ∞ . We now invoke the induction assumption and definitions (10), which together imply . On the other hand, invoking the base case and definitions (9), we also have that . This completes the proof of (12) and of the entire induction.
Altogether, we can thus write that Bearing the above in mind, the σ-compactness of J + will be proven if we show that In order to show the inclusion '⊆', take any (p, q) ∈ J + and let γ : [0, 1] → M be a piecewise smooth future-directed causal curve from p to q.
Consider the inverse images γ −1 (N i ), i ∈ N. By continuity of γ, they are all open subsets of [0, 1], however they might be disconnected (i.e. they need not be intervals). Nevertheless, every γ −1 (N i ) is a union of its connected components, which are all open 5 subintervals of [0, 1].
Let us thus consider the family of all connected components of all γ −1 (N i )'s, i ∈ N. This family is a cover of [0, 1] and, since the latter is a compact space, we can take its finite subcover I := {I 1 , I 2 , . . . , I n }, where each of the intervals I j , (j = 1, . . . , n) is a connected component of some (possibly not unique) γ −1 (N i j ). Therefore ∀ j = 1, . . . , n γ(I j ) ⊆ N i j and, by the continuity of γ, Without loss of generality, we can assume that I j 1 ⊆ I j 2 for all j 1 = j 2 . Bearing this in mind, we can rewrite I either as {[0, 1]} (the trivial cover) or, if n > 1, as where 0 < a 2 < a 3 < . . . < a n < 1. Notice also that b j > a j+1 for j = 1, . . . , n − 1, because otherwise such I would not be a cover. In . In order to show the other inclusion '⊇' in (14), notice simply that a concatenation of finitely many piecewise smooth future-directed causal curves is itself a piecewise smooth future-directed causal curve. Therefore, if (p, q) ∈ J + (N i 1 ,N i 2 ,...,N in ) , then (p, q) ∈ J + .
Proof: On the strength of (14), we have that . . , i n ∈ N), and hence a compact subset of M 2 .
Corollary 3. Let M be a spacetime and let X ⊆ M be a countable union of closed sets. Then J ± (X ) and E ± (X ) are σ-compact subsets of M.
Observe that, by (14), For any m, n ∈ N and any i 1 , Since pr 2 is a continuous map, the projection of a compact set is itself compact and we obtain that J + (X ) is σ-compact.
The proof for J − (X ) is completely analogous. Moreover, by the previous corollary, replacing J ± with E ± in the above proof yields the desired result for the horismotical futures and pasts.
The final corollary shows that the volume functions can be defined by means of causal futures instead of the chronological ones.
Proof: By the previous corollary, E ± (p) and J ± (p) are Borel sets for any p ∈ M and so the expressions η(E ± (p)) and η(J ± (p)) are well defined. Since it is true that where we have used the second condition in the definition of an admissible measure. Therefore, t − (p) = η(J − (p)). The proof for t + is analogous.

Causality for probability measures
The aim of this section is to extend the causal precedence relation onto the space of measures P(M) on a given spacetime M. We begin by invoking certain characterisation of causality between events.
Let C(M) denote the set of smooth bounded causal functions on the spacetime M.
Theorem 5. Let M be a globally hyperbolic spacetime. For any p, q ∈ M the following conditions are equivalent The proof, based on a result by Besnard [9], can be found in [17, Proposition 10] (see also [30]). Actually, as we shall see later, the above characterisation is valid also in causally simple spacetimes (cf. Corollary 6).
As an important side note, observe that Theorem 5 exactly mirrors the definition of a causal function. Indeed, the latter can be written symbolically as whereas Theorem 5 in fact says that Therefore, instead of using to define what a causal function is, one can come up with an abstract, suitably structurised set C of 'smooth bounded causal functions' and define through C using the analogue of Theorem 5. This was done by Eckstein and Franco in [17] in very general context of noncommutative geometry.
Condition 1 ⋄ provides a 'dual' definition of the causal precedence, which actually suggests how could be extended onto P(M). Definition 1. Let M be a globally hyperbolic spacetime. For any µ, ν ∈ P(M) we say that µ causally precedes ν (symbolically µ ν) iff In [17] it is proven (in a much more general context) that the above defined relation is in fact a partial order. This definition, however, has two shortcomings. Firstly, it is well motivated only on globally hyperbolic spacetimes. Secondly, the intuitive notion of causality for spread objects, as phrased in the introduction, is not directly visible in Definition 1.

Characterisations of the causal relation
In the following, we provide various conditions which are equivalent to the above definition of a causal relation between measures. Moreover, in some of the implications the assumption on global hyperbolicity of M can be relaxed.
The first result states that if C(M) is sufficiently rich, one can abandon the smoothness requirement.
Theorem 6. Let M be a stably causal spacetime. For any µ, ν ∈ P(M) the following conditions are equivalent: Proof: (1 • ⇒ 2 • ) Relying on [12,Corollary 5.4 and the subsequent comments] we use the fact that in stably causal spacetimes any time function can be uniformly approximated by a smooth time (or even temporal) function.
Using the stable causality, fix a temporal function T : M → R. For any ε > 0, the function f + ε arctan T is a time function which clearly approximates f uniformly. By the above mentioned corollary, this function in turn can be approximated by a smooth time function f ε such that Clearly To obtain 2 • it now remains to observe that for any measure η ∈ P(M) it is true that lim Indeed, for any η ∈ P(M) and ε > 0 one has where we have used (17).
The next result characterises the relation between measures in terms of open future sets. Proof: Fix an open future set F ⊆ M and let η be an admissible measure on M. For any λ ∈ (0, 1] construct a new admissible measure η λ := λη + (1 − λ)η( · ∩ F ) and consider the associated past volume function t − λ defined via Because M is causally continuous, t − λ is a time function for any λ ∈ (0, 1]. Now, for every n ∈ N define an increasing function ϕ n ∈ C ∞ b (R) by The sequence of functions (ϕ n ) is pointwise convergent to the indicator function of R >0 . Moreover, also (ϕ n • t − λ ) is a bounded time function for every n ∈ N and λ ∈ (0, 1]. By 2 • , this means that Since the functions ϕ n are bounded and continuous, we can invoke Lebesgue's dominated convergence theorem and first take λ → 0 + , obtaining and then take n → +∞, which yields It is now crucial to notice that the function p → η(F ∩ I − (p)) is positive on F and zero on M \ F . These observations follow from the definition of an admissible measure and of F , and together with the above inequality of integrals they imply that 6 Such a function exists because causal continuity implies stable causality. In fact, in the proof of (3 • ⇒ 2 • ) we only need M be stably causal.

By 3 • , we obtain the following inequality of integrals
It is not difficult to realise that ∀ p ∈ M [∀n ∈ N s n (p) < f ε (p)] and lim n→+∞ s n (p) = f ε (p).
The third and the most important result concerns causally simple spacetimes. We show that condition 3 • extends to different kinds of future sets. Moreover, we introduce a condition that uses the existential quantifier.
Suppose then that q ∈ ∞ n=1 x∈K J + B x, 1 n , which means that ∀ n ∈ N ∃ x n ∈ K ∃ p n ∈ B(x n , 1 n ) p n q.
Since K is compact, the sequence (x n ) has a convergent subsequence (x n k ), lim k→+∞ x n k = x ∞ ∈ K. Notice that also the subsequence (p n k ) converges to x ∞ . But because J + is a closed set in the case of a causally simple spacetime, the fact that for every k ∈ N p n k q implies that x ∞ q and therefore q ∈ J + (K). By 3 • we know that Since for all n ∈ N it is true that , therefore, by (1), where we have also used (25) and (26), thus proving 4 • .
(4 • ⇒ 5 • ) Let F ⊆ M be any Borel future set. For any K ⊆ F it is then true that J + (K) ⊆ F . Therefore In the above chain of inequalities let us take the supremum over all compact K ⊆ F . Using the tightness of µ (see (4)), we have µ(F ) = sup {µ(K) | K ⊆ F , K compact} ≤ sup µ(J + (K)) | K ⊆ F , K compact ≤ µ(F ), and so µ(F ) = sup µ(J + (K)) | K ⊆ F , K compact and similarly for the measure ν. As we can see, in order to obtain 5 • from 4 • it is enough to take the supremum over all compact K ⊆ F . (2 • ⇒ 6 • ) In the first step of the proof we will show that 6 • holds for all nonnegative ϕ, ψ ∈ C b (M) with ϕ compactly supported. Namely, for such functions we will show that the condition ∀p, q ∈ M p q ⇒ ϕ(p) ≤ ψ(q) (27) implies the inequality of integrals Then, in the second step, we will demonstrate that the assumptions of nonnegativity of ϕ, ψ and of the compactness of supp ϕ can in fact be abandoned. Define a functionφ : M → R viaφ(p) := max x p ϕ(x). Functionφ is well-defined, because for every p ∈ M the function ϕ, being continuous, attains its maximum over the compact 7 set J − (p) ∩ supp ϕ. Moreover,φ satisfies Indeed, first inequality follows directly from the very definition ofφ. In order to obtain the second inequality, notice first that by (27) we have ϕ(p 2 ) ≤ ψ(q). By transitivity of the relation , this inequality holds also if we replace p 2 with any x p 2 . Hencê and (29) is proven. Functionφ is obviously nonnegative, bounded and, by transitivity of , it is causal. We claim that it is also continuous.
Finally, notice that which proves thatφ −1 ((α, β)) is an open set. We have thus shown thatφ ∈ C b (M). By 2 • we have that But from (30) we readily obtain (28), because 7 We are using the fact that in causally simple spacetimes J ± (p) are closed sets for all p ∈ M.
where the first and the last inequalities follow from (29) and the middle one is exactly (30). Thus, we have already proven 6 • under the assumption that ϕ is compactly supported and both ϕ and ψ are nonnegative. Let us now take any ϕ, ψ ∈ C b (M) satisfying (27).
Let (K n ) n∈N be an exhaustion of M by compact sets. Using Urysohn's lemma for LCH spaces (Theorem 1), we construct a sequence (θ n ) n∈N ⊆ C c (M) of functions such that, for any n ∈ N, θ n | Kn ≡ 1 and 0 ≤ θ n ≤ 1.
Notice that (for every n ∈ N) the function θ n ϕ m is compactly supported and, together with ψ m , they are nonnegative and satisfy (27), because for all p, q ∈ M such that p q one has On the strength of the previous part of the proof, it is then true that By the very definition, θ n ≤ 1 for every n and, since (K n ) n∈N exhausts M, we have that θ n → 1 pointwise. By Lebesgue's dominated convergence theorem we can pass with n → +∞ in (31)  Theorem 9. (Kantorovich duality) Let (X 1 , µ 1 ) and (X 2 , µ 2 ) be two Polish probability spaces and let c : X 1 × X 2 → R ≥0 ∪ {+∞} be a lower semi-continuous function. Then where (8 • ⇒ 4 • ) Let T : M → R be a smooth temporal function whose every level set is a Cauchy hypersurface.
Take any compact subset K ⊆ M. Let T 0 denote the minimal value attained at K by the function T . For any n ∈ N 0 define the level set S n := T −1 (T 0 + n). Every S n is a smooth spacelike Cauchy hypersurface. Now, for any n ∈ N 0 consider the set Figure 3: The construction of Σ n 's.
We claim that for every n ∈ N 0 , Σ n is a Cauchy hypersurface and that Indeed, observe first that J + (S n ∪ K) is a future set. By [33,Chapter 14,Corollary 27] Σ n is therefore a closed achronal topological hypersurface. Let γ be any inextendible timelike curve. It crosses the Cauchy hypersurfaces S n (which is contained in J + (S n ∪ K)) and S 0 (the past of which, I − (S 0 ), is disjoint with J + (S n ∪ K)), therefore it must cross the boundary ∂J + (S n ∪ K) = Σ n . Since the latter is achronal, it is met by γ exactly once and therefore Σ n is a Cauchy hypersurface. In order to obtain (34), we prove the following lemma.
Lemma 1. Let M be a spacetime and let F ⊆ M be a closed future set such that F ⊆ J + (X ) for some achronal set X . Then J + (∂F ) = F .
Proof: '⊆' Because F is closed, it contains its boundary: ∂F ⊆ F . Hence because F is a future set. '⊇' Take q ∈ F . By assumption, there exists x ∈ X and a future-directed causal curve γ connecting x with q.
Notice, first, that x ∈ F \ ∂F = int F . Indeed, if x would belong to int F , which is an open subset of F , there would exist x ′ ∈ F such that x ′ ≪ x. But since F ⊆ J + (X ), there would exist x ′′ ∈ X such that x ′′ x ′ . Altogether, by (5) we would obtain that x ′′ ≪ x, in contradiction with the achronality of X . Therefore, either x ∈ ∂F or x ∈ M\F .
If x ∈ ∂F , then q ∈ J + (∂F ) and the proof is complete.
On the other hand, if x ∈ M \ F , then the curve γ must cross ∂F at some point p. Of course, p q and hence also in this case q ∈ J + (∂F ).
Notice now that J + (S n ∪ K) = J + (S n ) ∪ J + (K) is in fact a closed 9 future set such that J + (S n ∪ K) ⊆ J + (S 0 ). On the strength of Lemma 1, we obtain (34).
By 8 • , because Σ n is a Cauchy hypersurface for any n ∈ N 0 , we can write that Observe that the sequence (J + (Σ n )) n∈N 0 is decreasing, because for all n ∈ N 0 where we have used (34) and the very definition of S n 's. Property (2) allows us to pass with n → +∞ in (35) and write that The countable intersection appearing above can be easily shown to be equal to J + (K). Indeed, one has Therefore, (36) yields (21) and the proof of 4 • is complete.
We have thus provided 8 different characterisations of a causal relation between probability measures, which are equivalent if the underlying spacetime is globally hyperbolic. Some of the implications hold under lower causality conditions, as demonstrated in Theorems 6 -8. Let us now discuss other implications not covered in the proofs. Remark 1. Let us first stress that the formulation of conditions 3 • -5 • using the future of a set is just a matter of convention and one could equally well employ the pasts. Concretely, straightforward application of the time inversion (note that such operation changes the relation into the opposite one) shows that conditions 3 • , 4 • , 5 • are (in any spacetime M) equivalent to the following conditions, respectively: 5 ′• For every Borel past set P ⊆ M µ(P) ≥ ν(P) .

Basic properties of the causal relation between measures
In the previous subsection we have shown that for any spacetime M the condition 7 • not only implies all of the others listed in Theorems 6, 7, 8 and 10, but also more general ones 2 ′• and 6 ′• . It encourages us to promote the condition 7 • to a definition of the causal precedence relation on P(M) for any spacetime M.
Such an ω will be called a causal coupling of µ and ν.
Observe that ω(J + ) is well-defined because, by Theorem 4, J + is σ-compact, and hence Borel, for any spacetime M.

Remark 5.
In the case of causally simple spacetimes J + ⊆ M 2 is closed and therefore, by the very definition of the support of a measure (see the last paragraph of Section 2.2), condition ii) in Definition 2 is equivalent to the inclusion supp ω ⊆ J + . However, without the assumption of causal simplicity this is no longer true.
The term 'coupling (of measures µ and ν)' comes from the optimal transport theory [43], where it describes any ω ∈ P(M 2 ) with property i) of the above definition. The set of such couplings, denoted Π(µ, ν), has already appeared above in the context of the Kantorovich duality (Theorem 9). Such a coupling -or a transference plan, as it is also called -can be regarded as an instruction how to 'reconfigure' a fixed amount of 'mass' distributed over M according to the measure µ so that it becomes distributed according to the measure ν. This 'reconfiguration' involves transporting the (possibly infinitesimal) portions of 'mass' between points of M, and a coupling ω ∈ Π(µ, ν) ⊆ P(M 2 ) precisely describes what amount of 'mass' is transported between any given pair of points.
It is, however, property ii) which ties the above definition with the causality theory. It can be summarised as a requirement that the transport of 'mass' be conducted along future-directed causal curves only -that is why such couplings deserve to be called causal. The set of all causal couplings of measures µ and ν will be denoted by Π c (µ, ν).
Notice that a (causal) coupling does not specify along which (causal) curves the portions of 'mass' are transported. In fact, various families of (causal) curves can lead to the same (causal) coupling. Notice also that the 'mass' concentrated initially at some point p ∈ M can dilute to many different points.
Observe that for Dirac measures µ = δ p , ν = δ q Definition 2 reduces to the standard definition of the causal relation between events p and q. This can be seen as a corollary of the following proposition. Corollary 5. Let M be a spacetime. Then for any p, q ∈ M p q iff δ p δ q .
Proof: By Proposition 4, the only coupling between two Dirac measures δ p , δ q is their product measure ω := δ p × δ q = δ (p,q) . Hence, the fact that p q is equivalent in this case to the requirement that ω(J + ) = 1.
Corollary 6. Let M be a causally simple spacetime. For any p, q ∈ M the following conditions are equivalent Proof: It is a direct consequence of the equivalence (1 • ⇒ 7 • ) in Theorem 8 and Corollary 5.
If the measure µ is compactly supported, then in the light of the above discussion it is natural to expect that the support of any ν with µ ν should be within the future of supp µ [45]. This intuitive condition in fact true in causally simple spacetimes.
We now claim that if M is causally simple, then this implies that supp ν ⊆ J + (supp µ). Indeed, recall that in a causally simple spacetime the causal futures of compact sets are closed. Therefore, if there existed q ∈ supp ν but q ∈ J + (supp µ), then we could take an open neighborhood U ∋ q such that ν(U) > 0 but U ∩ J + (supp µ) = ∅. But this would imply that ν(J + (supp µ)) ≤ 1 − ν(U) < 1, in contradiction with the first part of the proof.
Recall that the causal precedence relation between events is reflexive, transitive and, iff M is causal, antisymmetric. We now prove analogous results for the space of Borel probability measures on M equipped with the relation . To this end, it will be convenient to use the diagonal function ∆ : M → M 2 , defined as ∆(p) := (p, p) for any p ∈ M.
Theorem 11. Let M be a spacetime. The relation on P(M) is reflexive and transitive.
Proof: To prove reflexivity of , it suffices to notice that for any µ ∈ P(M) the pushforward measure ∆ * µ is a causal coupling of µ with itself.
We now move to proving the transitivity of . Let us invoke the following standard result [43,Lemma 7.6] from the optimal transport theory.
Indeed, notice first that ω 123 (p, q, r) ∈ M 3 | p q r ≤ ω 123 (p, q, r) ∈ M 3 | q r Since M 3 can be decomposed into the following union of (pairwise disjoint) sets therefore we obtain and hence ω 123 (p, q, r) ∈ M 3 | p q r = 1.
The natural question arises: how robust the causal structure of a spacetime M must be to render the relation antisymmetric and hence a partial order? Obviously, M must be at least causal (otherwise even the causal precedence relation between events fails to be antisymmetric).
We have the following partial result.
Theorem 12. Let M be a spacetime with the following property: For any compact K ⊆ M there exists a Borel function τ K : K → R such that Then, for any µ ∈ P(M) Π c (µ, µ) = {∆ * µ}. Moreover, the relation is antisymmetric.
Remark 6. Property (43) implies that M is causal. Indeed, suppose that there exist two distinct events p, q ∈ M such that p q p. Taking now K = {p, q}, on the strength of (43) we would obtain that τ K (p) < τ K (q) < τ K (p), a contradiction.
On the other hand, if M is past (future) distinguishing, then any past (resp. future) volume function is a semi-continuous, and hence Borel, generalised time function (cf. Section 2.3). This obviously implies (43) -for any compact K ⊆ M simply define τ K := τ | K . However, being past or future distinguishing is not necessary for (43) to hold. Indeed, the rightmost diagram in [31, Figure 6] presents a causal, but neither future nor past distinguishing spacetime M := R × S 1 \ {(0, 0)}, which admits a Borel generalised time function, for instance for any x ∈ R and θ ∈ S 1 , where the latter is the angular coordinate whose range is [0, 2π), except for x = 0, when its range is (0, 2π).
Before we move to the proof of Theorem 12, let us present the following lemma. = ω ∆(∆ −1 (U)) .
The rightmost expression, in turn, can be further transformed either into what proves the second part of the theorem. To obtain the equality µ = ν, take any Borel V ⊆ M and notice, for instance, that which concludes the entire proof.
Proof of Theorem 12: Take any µ ∈ P(M) and let π ∈ Π c (µ, µ). By Definition 2, we have that and hence or, by noticing that the integrand vanishes on ∆(M), Suppose now that π(J + \ ∆(M)) > 0. Because π is tight, there exists a compact set K ⊆ J + \ ∆(M) with π(K) > 0. Notice that K ⊆ K 2 , where K := pr 1 K ∪ pr 2 K is a compact subset of M, and so π( where τ K is a function whose existence is guaranteed by property (43). Function f K is Borel and bounded, and hence µ-integrable. Plugging it into (44) yields But the integrand of the above integral is positive on K 2 ∩J + \∆(M) by the very definition of τ K , therefore the fact that the integral is zero implies that π(K 2 ∩ J + \ ∆(M)) = 0, which contradicts the earlier result. This proves that π(J + \ ∆(M)) = 0. By property ii) from Definition 2, this in turn means that π(∆(M)) = π(J + ) − π(J + \ ∆(M)) = 1.
On the strength of Lemma 3, we get that π = ∆ * µ.
Notice, however, that the set {(p, q, r) ∈ M 3 | p q r = p} is Ω-null, because Therefore, in fact, = Ω (p, q, r) ∈ M 3 | p q r But M is causal (cf. Remark 6), therefore the causal precedence relation between events is antisymmetric and thus the set whose measure is evaluated in (45) We can now easily obtain that and so ω(∆(M)) = 1. Invoking Lemma 3, we obtain that µ = ν.

Lorentz-Wasserstein distances
Recall that the Lorentzian distance d : M 2 → [0, +∞] provides a physically meaningful way of measuring distances between events, in an analogy with the Riemannian distance d R in the case of Riemannian manifolds. In the latter case, one can extend the notion of a distance to the space of measures on M. Concretely, for any s ≥ 1 one defines the socalled s th Wasserstein distance between any two measures µ, ν ∈ P(R) on a Riemannian manifold R as For an exposition of the theory of Wasserstein distances in the context of the optimal transport theory one is referred e.g. to [43]. We now propose the following natural definition of a distance between measures on a spacetime.
Notice that the integrals are well-defined, because d is lower semi-continuous and hence Borel. Notice also that for Dirac measures LW s (δ p , δ q ) = d(p, q) for any s.
Lorentz-Wasserstein distances have properties analogous to those of the Lorentzian distance (cf. Section 2.3).
Proof: i) The implication is obvious, so we only prove the equivalence. To prove the '⇒' part of the equivalence, assume that LW s (µ, ν) > 0. By the very definition of LW s , this implies that there exists ω ∈ Π c (µ, ν) such that To prove the '⇐' part, suppose there exists ω ∈ Π c (µ, ν) with ω(I + ) > 0, but nevertheless LW s (µ, ν) = 0. The latter implies that M 2 d(p, q) s dω(p, q) = 0. But this, in turn, means that But d is positive on I + and so the latter must be an ω-null set, which contradicts with the assumption that ω(I + ) > 0.
One has the inequality which is proven through the following sequence of equalities and inequalities.
Indeed, proceeding identically as in the beginning of the proof of Theorem 12, we obtain (compare with (44)) The key now is to use a past volume function t − associated to some admissible measure on M. Recall that t − is causal. Moreover, since M is chronological, t − is increasing on any future-directed timelike curve (cf. Section 2.3). Symbolically: Substituting f := t − in (49) (recall that t − is Borel and bounded and hence µ-integrable), we can write By the first property in (50), both integrals in (51) are nonnegative and hence they both must vanish. However, by the second property in (50), the integrand in the rightmost integral is positive on I + , therefore this integral cannot vanish unless ω(I + ) = 0. The proof of '⇐' is straightforward. Take any p ∈ M and notice that, by assumption, But this implies (see property i) of the Lorentzian distance in Section 2.3) that p ≪ p for any p ∈ M, which means that M is chronological.
Unlike the Lorentzian distance, Lorentz-Wasserstein distances can assume infinite values even in globally hyperbolic spacetimes.
Proof If Π c (µ, ν) = ∅, then trivially LW s (µ, ν) = 0 < +∞. Assume then that the set of causal couplings between µ and ν is nonempty and take any ω ∈ LW s (µ, ν). On the strength of Proposition 4, ω(supp µ × supp ν) = 1. By assumption, the set supp µ × supp ν ⊆ M 2 is compact. Moreover, by the global hyperbolicity of M, d is a continuous map and hence it is bounded on that compact set. Therefore,

Outlook
Let us briefly summarise the main results of the paper. We proposed a notion of a causal relation between probability measures on a given spacetime M. To give sense to Definition 2 embedded in the theory of optimal transport, we had to enter the domain on the verge of causality and measure theory. We believe that our paper paves the way to this terra incognita, which is worth exploring both from the viewpoint of mathematical relativity, as well as possible applications in quantum physics.
On the mathematical side, the presented theory can be developed in various directions. Firstly, one can try to lower the causality conditions imposed on the spacetime in the theorems presented in Section 4. In particular, it would be interesting to see whether the defined relation on P(M) is a partial order for every causal spacetime M, or is the assumption (43) in Theorem 12 a necessary one. If the latter holds, one would obtain a new rung of the causal ladder between the causal and distinguishing spacetimes.
A second path of possible development is to investigate further the notion of a Lorentzian distance in the space of probability measures on a spacetime, and the associated topological questions. In Section 5 we proposed a notion of the s th Lorentz-Wasserstein distance, which is a natural generalisation of the Lorentzian distance between the events on M. However, in the optimal transport theory there are other ways to measure distances between probability measures (see for instance [44, p. 97]). It is tempting to see how (if at all) these notions can be adapted to the spacetime framework. This directly relates to the issue of topology on P(M) and its interplay with the semi-Riemannian metric on M.
Another potential direction of future studies, particularly interesting from the viewpoint of applications, would be to generalise the results of the present paper to signed measures. This would allow to study causality of, both classical and quantum, charge (probability) densities on spacetimes.
The applications of the developed theory in classical and quantum physics will be discussed in details in a forthcoming paper. Let us, however, make some remarks here.
Probability measures on space(time) arise in a natural way in quantum theory from the wave functions via the 'modulus square principle'. The results of Hegerfeldt show that in a generic quantum evolution driven by a Hamiltonian bounded from below a state initially localised in space immediately develops infinite tails. If a quantum system is acausal in the sense of Hegerfeldt, then it is so in the sense of Definition 2. Indeed, a wave function is localised (that is of compact support) if and only if the corresponding probability measure is so. Thus, if µ 0 ∈ P(R n ) has compact support and µ t ∈ P(R n ) extends to infinity for any t > 0, then δ 0 × µ 0 δ t × µ t as measures on the (n + 1)-dimensional Minkowski spacetime on the strength of Proposition 5.
Note however, that Proposition 5 provides only a necessary condition for a causal relation to hold, and not a sufficient one. In [25], Hegerfeldt has extended his theorem to initial states with exponentially bounded tails. He also suggested therein that a similar phenomenon resulting in the breakdown of causality should occur for states with powerlike decay. It thus indicates that acausality is a property of the quantum system and cannot be avoided by the use of nonlocal states. Our Definition 2 opens the door to check this conjecture in a mathematically rigorous way.
It is sometimes argued (see for instance [3,4]) that Hegerfeldt's theorem implies that localised quantum states do not exist in Nature. This conclusion is however challenged by the results in [29], which suggest that there is no lower limit on the localisation of the electron. Moreover, the fact that a state is nonlocal does not necessarily cure the causality violation. Indeed, imagine that one disposes of an initial quantum state, localised or not, which undergoes an acausal evolution, i.e. δ 0 × µ 0 δ t × µ t for any t > 0. Then one could encode information in the probability density of µ 0 in some compact region K of space and transmit it to an observer localised outside of J + ({0} × K), as follows from the condition 4 • . Such a method of signalling would have a very low efficiency, but is a priori possiblesee for instance the discussion in [27] and other cited works by Hegerfeldt. Finally, let us come back to the original motivation of our preliminary Definition 1. As stressed at the beginning of Section 4, it was inspired by the notion of 'causality in the space of states' coined in [17]. The partial order relation considered in [17] is defined on the space of states S(A) of a C * -algebra A. If the algebra A is commutative then, by Gelfand duality, there exists a locally compact Hausdorff topological space M, such that A ≃ C 0 (M). Then, the Riesz-Markov representation theorem implies that S(A) ≃ P(M). Hence, if M is a causally simple spacetime, then the two notions of 'causality for Borel probability measures' and 'causality in the space of states' coincide.
The concept of causality in the space of states was explored [18,19] in the framework of 'almost commutative spacetimes', i.e. for C * -algebras of the form C 0 (M) ⊗ A F , with A F being a finite dimensional matrix algebra. However, the study therein was limited only to special subclasses of all states, nevertheless yielding interesting results. The theory put forward in the present paper blazes a trail to unravel the complete causal structure of almost commutative spacetimes. Having in mind that almost commutative spacetimes are utilised to build models in particle physics [42], it is enticing to see whether the extended causal structure imposes any restrictions on probabilities that could be checked experimentally.