Causality for Nonlocal Phenomena

Drawing from the theory of optimal transport we propose a rigorous notion of a causal relation for Borel probability measures on a given spacetime. To prepare the ground, we explore the borderland between Lorentzian geometry, topology and measure theory. We provide various characterisations of the proposed causal relation, which turn out to be equivalent if the underlying spacetime has a sufficiently robust causal structure. We also present the notion of the ‘Lorentz–Wasserstein distance’ and study its basic properties. Finally, we outline the possible applications of the developed formalism in both classical and quantum physics.


Introduction
The notion of a space, understood as a set of points, provides an indispensable framework for every physical theory. But, regardless of the physical system that is being modelled, the space itself is not directly observable. Indeed, any measuring apparatus can provide information about the localisation only up to a finite resolution. In the relativistic context, it means that the event is an idealised concept, which is not accessible to any observer.
Apart from the 'practical' obstructions for measuring position, there exist also fundamental ones because of the quantum effects manifest at small scales. Although nonrelativistic quantum mechanics does not impose any a priori restrictions on the accuracy of the position measurement, in quantum field theory a suitable 'position operator' is always nonlocal (see for instance [15,24,56]). Moreover, an attempt to perform a very accurate measurement of localisation in spacetime would require the use of signals of very short wavelength, resulting in an extreme concentration of energy. The latter would eventually lead to black hole formation, and the desired information would become trapped [19,20].
It is generally believed that any physical theory should be causal, i.e. that no information can be transmitted with the speed exceeding the velocity of light. On a flat spacetime, one can develop a theory of tachyons consistent with special relativity, but it would have undesirable physical properties, for instance the vacuum state would not be Lorentz-invariant [23]. In general relativity, although Einstein's equations admit spacetime solutions with closed causal curves, these lead to paradoxes and are usually discarded as unphysical [34]. The status of causality in quantum theory was controversial from its dawn because of the inherent nonlocality of quantum wave functions [22]. However, careful studies in quantum information theory have proven that quantum nonlocality on its own cannot be utilised for a superluminal transfer of information [49]. In fact, the request of no faster-than-light signalling is often used as a guiding principle to restrain the admissible quantum theories [6,32] and their possible extensions [47]. In quantum field theory, where nonlocality is even more prevailing, the postulate of causality is promoted to one of the axioms [33,54]. Some of the approaches to quantum gravity suggest the breakdown of Einstein's causality at the level of Planck scale [4], but no evidence of such a phenomenon has been found so far [1].
In relativity theory it is straightforward to implement the postulate of causality as the Lorentzian metric induces a precise notion of causal curves between the point events. However, when a physical object requires a nonlocal description, the notion of a causal relation becomes hazy. This is particularly pertinent in quantum mechanics, as for instance the interference fringes resulting from quantum superposition can travel superluminally, but cannot be utilised to send information [9]. In fact, different concepts of causality in quantum theory can lead to controversial results-compare for instance [36] and [16] or [2] and [37,62]. Hence, the issue of causality for nonlocal phenomena is still a timely subject of study [3,5,9,17,32,38,47,49,56,60,62].
The aim of this paper is to provide a rigorous notion of a causal relation between probability measures on a given spacetime. These can be utilised to model various nonlocal phenomena ranging from classical dust densities in cosmology, through energy or charge distributions in relativistic continuum mechanics, to quantum probability densities arising from the 'modulus square principle' in the wave packet formalism. Moreover, one can make use of probability measures to take into account experimental errors, as the measurement of any physical object's spacetime localisation would effectively be vitiated by an error resulting from the apparatus' imperfection.
We shall work in a generally covariant framework and keep the discourse in the spirit of mathematical relativity. We allow the probability measures to spread also in the timelike direction to facilitate applications that reach beyond the standard evolutionary approach. result on the verge of causality and measure theory-we demonstrate the σcompactness of the set J + ⊆ M 2 in any spacetime. It establishes the foundations for the formalism developed in the next sections, but it also might be of independent interest for a mathematical-relativity-oriented reader.
The main concepts and results of the paper are aggregated in Sect. 4. We start off with a 'dual' definition of the causal relation, based on the notion of causal functions [44,Definition 2.3], proposed in [25] in a much wider context of noncommutative geometry. In several steps we show that it encapsulates an intuitive notion of causality for nonlocal objects: Each infinitesimal part of the probability distribution should travel along a future-directed causal curve. At each step we keep the causality conditions imposed on the underlying spacetime as low as possible. At the same time we provide several characterisations of causality for probability measures, which illustrate the concept and provide tools for concrete computations.
We are eventually led to the theory of optimal transport adapted to the relativistic setting. The latter is a new and fast-developing area of research [10,13,55], 1 which has found successful applications in the early universe reconstruction problem [14,[28][29][30].
Motivated by the main result, we put forward in Sect. 4.2 a definition of a causal relation between probability measures, valid on any spacetime, and study its properties. In particular, we demonstrate that the proposed relation is a partial order in the space of Borel probability measures on a given spacetime M, even with a relatively poor causal structure.
Finally, we propose in Sect. 5 a notion of the 'Lorentz-Wasserstein' distance in the space of measures, enforcing the established bridge between mathematical physics and the theory of optimal transport. We conclude, in Sect. 6, with an outlook into the possible future developments and applications. In particular, we briefly discuss the potential use of the presented results in the study of causality in quantum theory. We also address the interrelation of probability measures with states on C * -algebras. In this way we provide a connection with the notion of 'causality in the space of states' proposed originally in [25] in the framework of noncommutative geometry.
The spaces of continuous, continuous and bounded, continuous and compactly supported real-valued functions on a topological space M will be, respectively, denoted by C(M), C b (M), C c (M). Analogous spaces of smooth functions will be, respectively, denoted by

Topology
Let M denote a topological space and let X ⊆ M. The closure, interior, boundary and complement of X will be, respectively, denoted by X , int X , ∂X and X c . An open cover of X ⊆ M is a family {U α } α∈A of open subsets of M such that α∈A U α ⊇ X . M is called Lindelöf iff each of its open covers has a countable subcover.
A subset X ⊆ M is called compact iff each of its open covers has a finite subcover. It is called sequentially compact iff every sequence in X has a subsequence convergent in X . It is called precompact (or relatively compact) iff its closure is compact. Finally, it is called σ-compact iff it is a countable union of compact subsets. In particular, M is σ-compact if and only if it admits an exhaustion by compact sets, that is a sequence (K n ) n∈N of compact sets such that K n ⊆ K n+1 and To conclude this section, recall that a topological space is connected iff it is not a union of two disjoint nonempty open sets. Furthermore, it is locally connected iff it has a base of connected sets. In general, these two properties are independent from each other. Spacetimes, however, are both connected and locally connected, what has the following interesting consequence.

Lemma 1.
Let M be a connected, locally connected, second-countable LCH space. Then M admits an exhaustion by connected compact sets.
Proof. By the separability of M there exists a countable dense subset {a n } n∈N ⊆ M. Let U n denote a precompact neighbourhood of a n , the existence of which is guaranteed by the local compactness of M. By the local connectedness of M, for every n ∈ N there exists an open connected set V n such that a n ∈ V n ⊆ U n . Note that every V n is precompact, because V n , being a closed subset of a compact set U n , is itself compact.
Thus, the family {V n } n∈N is a countable cover of M by open, precompact, connected sets. Using this cover, one can construct an exhaustion of M with the desired properties.
Concretely, define an increasing sequence (C n ) n∈N of subsets of M recursively as where V in is chosen in such a way that i n ∈ {i 1 , . . . , i n−1 } and V in ∩ C n = ∅. By the connectedness of M, such V in can always be found.
Indeed, suppose that for certain n ∈ N all V i s with i ∈ {i 1 , . . . , i n−1 } are disjoint with C n . But C n is a union of open sets, and since it is disjoint with i ∈{i1,...,in−1} V i , we would just write M as a union of two disjoint nonempty open sets, contradicting the connectedness of M.
Obviously, every C n is connected. Since the closure of a connected set is itself connected, the sequence (C n ) n∈N is an exhaustion of M by connected compact sets.

Measure Theory
Let M be a topological space. The σ-algebra of Borel sets B(M) is the smallest family of subsets of M containing the open sets, which is closed under complements and countable unions (and hence also under countable intersections). If M is a Hausdorff space, then, in particular, its σ-compact subsets are Borel.
A function f : M 1 → M 2 between topological spaces is called Borel iff f −1 (V ) ∈ B(M 1 ) for any V ∈ B(M 2 ). Every continuous (or even semicontinuous) real-valued function is Borel.
A Borel probability measure on M is a function μ : such that X n ∩ X m = ∅ for n = m. A Borel set, the measure μ of which is zero, Furthermore, if M is metrisable, then every μ ∈ P(M) is regular [52, Lemma 3.4.14], i.e.
Borel probability measures with properties (3) and (4) are called Radon probability measures. Since we will be working with spacetimes (which are Polish spaces), all elements of P(M) will be Radon. For simplicity, from now on the term 'measure' will always stand for the 'Borel probability measure'. For any X ⊆ M its indicator function 3 1 X : M → R is defined by 1 X (p) = 1 for p ∈ X and 1 X (p) = 0 otherwise. 1 X is a Borel function iff X ∈ B(M).
A simple function on M is any function s : M → R, the range s(M) of which is finite. Such a function can be written in the form s =   Any Borel function f : M 1 → M 2 between topological spaces induces the pushforward map f * : P(M 1 ) → P(M 2 ), μ → f * μ. The latter is called a pushforward measure and is defined through As for the integrability, one has that g ∈ L( Given two probability spaces (M 1 , μ 1 ), (M 2 , μ 2 ), there exists a unique measure μ 1 × μ 2 ∈ P(M 1 × M 2 ), called the product measure, such that (μ 1 × μ 2 )(U 1 ×U 2 ) = μ 1 (U 1 )μ 2 (U 2 ) for any U i ∈ B(M i ), i = 1, 2 (cf. [51] for details).
On the other hand, given ω ∈ P(M 1 × M 2 ), its marginals are defined as (pr i ) * ω ∈ P(M i ), where pr i : M 1 × M 2 → M i (i = 1, 2) are the canonical projection maps. Obviously, the marginals of the product measure μ 1 × μ 2 are μ 1 and μ 2 ; however, usually there are many measures on M 1 × M 2 sharing the same pair of marginals.
Given a measure μ ∈ P(M), its support can be defined as the smallest closed set with full measure. Symbolically,

Causality Theory
For a detailed exposition of causality theory the reader is referred to [7,43,46,48].
Recall that a spacetime is a connected time-oriented Lorentzian manifold. Causality theory introduces and studies certain binary relations between points (i.e. events) of a given spacetime M. Namely, for any p, q ∈ M, we say that p causally (chronologically) precedes q, what is denoted by p q (resp. p q), iff there exists a piecewise smooth future-directed causal (resp. timelike) curve γ : [0, 1] → M from p to q, i.e. γ(0) = p and γ(1) = q. Additionally, we say that p horismotically precedes q, what is denoted by p → q, iff p q, but p q.
Clearly, the relations and are transitive and is also reflexive. Moreover ([46, Chapter 14, Corollary 1]), To denote ( , →) understood as a subset of M 2 it is customary to use the symbol J + (resp. I + , E + ). If X is a singleton, one simply writes J ± (p) instead of J ± ({p}). Notice that J ± (X ) = p∈X J ± (p). Let now U ⊆ M be an open subset of M. One defines U to be the causal precedence relation on U treated as a spacetime on its own right. By analogy with J + , we denote J + U := {(p, q) ∈ U 2 | p U q}. Notice that J + U ⊆ J + ∩ U 2 , but not necessarily vice versa, because p U q requires a piecewise smooth future-directed causal curve from p to q not only to exist, but also to be contained in U .
Analogously to (6), one defines J ± U (X ) for any subset X ⊆ M. One similarly introduces I ± (X ), Proof. The statement is proven by the following chain of equivalences:

Proposition 2.
Let {X α } α∈A be a family of future (past) subsets of M. Then, also α∈A X α and α∈A X α are future (past) subsets of M.
Proof. Assuming that all X α s are future sets, notice that J + α∈A X α = α∈A J + (X α ) = α∈A X α . If X α s are past sets, simply replace J + with J − in the previous sentence.
We have thus shown that a union of future (past) sets is a future (past) set. To obtain an analogous result for the intersection, one simply uses Proposition 1 and de Morgan's laws.
A function f : M → R is called • a causal function iff it is nondecreasing along every future-directed causal curve; • a generalised time function iff it is increasing along every future-directed causal curve; • a time function iff it is a continuous generalised time function; • a temporal function iff it is a smooth function with past-directed timelike gradient. Each of the above properties is stronger than the preceding one.
Causal functions can be characterised by means of future sets.
) is a future set for any a ∈ R.
We claim that f −1 ([f (p), +∞)) is not a future set. Indeed, were it a future set, then, since it clearly contains p, it would contain q as well. But this would mean that f (q) ≥ f (p), in contradiction with the assumption.
On the other hand, future sets can be characterised by means of their indicator functions.
By the equivalence '(i) ⇔ (iii)' from Proposition 3, we immediately obtain the desired result.
• for any p ∈ M the boundaries ∂I ± (p) are η-null. To such an η one associates the functions t − , t + : M → R, called past and future volume functions, respectively, defined via Volume functions are causal and semi-continuous, and hence Borel.
For any p, q ∈ M letĈ(p, q) denote the set of piecewise smooth futuredirected causal curves from p to q. The Lorentzian distance (or time separation) is the map d : Its basic properties include: The reverse triangle inequality holds. Namely, for any p, q, r ∈ M (iii) If there exists a timelike loop through p ∈ M (i.e. a piecewise smooth curve from p to p), Each level of the hierarchy can be defined in many equivalent ways. Below we present only these definitions, characterisations and properties, of which we make use in the paper. For a complete review of the causal hierarchy, consult [43,Section 3].
M is chronological iff it satisfies one of the following equivalent conditions: M is causal iff it satisfies one of the following equivalent conditions: (i) The relation is a partial order ; hence, in addition to being reflexive and transitive, it is also antisymmetric.

(ii) No causal loop exists.
M is future (past) distinguishing iff it satisfies one of the following equivalent conditions: implies that p = q. (ii) Any future (past) volume function is a generalised time function [7,Proposition 3.24]. M is distinguishing iff it is both future and past distinguishing. M is strongly causal iff the family {I + (p) ∩ I − (q) | p, q ∈ M} is a base of the standard manifold topology of M. It is stably causal iff it admits a time function or, equivalently, iff it admits a temporal function [8]. It is causally continuous iff any volume function is a time function.
M is causally simple iff it is causal and satisfies one of the following equivalent conditions [43, Proposition 3.68]: Before providing a definition of the top level of the causal hierarchy, recall that a curve γ : Otherwise such a curve is called inextendible. Recall also that a Cauchy hypersurface is a subset S ⊆ M which is met exactly once by any inextendible timelike curve. Any such S is a connected, closed, achronal (i.e. S 2 ∩ I + = ∅) topological hypersurface, met by every inextendible causal curve [46, Chapter 14, Lemma 29 & Proposition 31]. However, such an S need not be acausal (i.e. S 2 ∩ J + might be nonempty).
M is globally hyperbolic iff it satisfies one of the following equivalent conditions: (i) M is causal and the sets J + (p) ∩ J − (q) are compact for all p, q ∈ M; (ii) M admits a smooth temporal function T , the level sets of which are (smooth spacelike) Cauchy hypersurfaces [8]. In a globally hyperbolic spacetime the Lorentzian distance d is finite-valued and continuous. Moreover, for every (p, q) ∈ J + there exists a causal geodesic γ from p to q of length d(p, q) [46,Chapter 14].

On the σ-Compactness of J +
The purpose of this section is to prove the following theorem.
Let us note here that this property is automatic in causally simple spacetimes. Indeed, let (K n ) n∈N be an exhaustion of M with compact sets and notice that J In the proof of Theorem 4, however, we shall make no assumptions on the causal properties of M.
Theorem 4 implies that J + is Borel for any spacetime. As we shall see, it also implies that J ± (X ) is Borel for any closed X ⊆ M. Moreover, previous statements are still true if we replace J ± with E ± .
Theorem 4 is thus settled in the overlap of causality theory, topology and measure theory. Whereas the interplay between the causal and topological properties of spacetimes is relatively well understood, the questions concerning Borelness have never been, to authors' best knowledge, addressed in the relativistic context. The study of the interaction between causality and measure theory is, however, essential from the viewpoint of the theory developed in Sect. 4.
We recall the notion of simple convex sets (called also simple regions) [48, Section 1]. Loosely speaking, they are small patches of the spacetime M with 'nice' topological, differential and causal properties, which constitute a countable cover of the entire spacetime.
Concretely, let M be a spacetime. Then, for any p ∈ M there exists a star-shaped neighbourhood Q ⊆ T p M containing the zero vector and such that the exponential map exp p restricted to Q is a diffeomorphism. The image of this diffeomorphism exp p (Q) is called a normal neighbourhood of p. Every event has a neighbourhood U which is a normal neighbourhood of any p ∈ U . Such a U is called convex. If U ⊆ M is convex, then it is open and for any p, q ∈ U there exists precisely one geodesic from p to q which is contained in U [46, p. 129].
From the point of view of causality theory, the following property of convex sets will be crucial: if U ⊆ M is convex, then J + U is a closed subset of Finally, a convex set N is called simple iff N is compact and contained in another convex set U .
which exists by the very definition of a simple convex set.
We introduce a couple more definitions.
that is the set containing all these pairs of points from N i which can be connected by a piecewise smooth future-directed causal curve contained in U i . For any X ⊆ M define, by analogy with (6), This is the set of all those pairs of points (p, q) ∈ N i1 × N i2 , which can be connected by a concatenation of two piecewise smooth future-directed causal curves, first of which is contained in U i1 , while the other in U i2 , and the concatenation point r must lie in the compact set N i1 ∩ N i2 . As above, we additionally define, for any X ⊆ M, Finally, fix n ≥ 3 together with i 1 , i 2 , . . . , i n ∈ N and define, recursively, . The plotted piecewise smooth curve from p to q is assumed causal and futuredirected where, for any X ⊆ M, It is crucial to understand what these sets contain (cf. Fig. 1). Namely, is the set of all those pairs of points (p, q) ∈ N i1 × N in which can be connected by a concatenation of n − 1 piecewise smooth future-directed causal curves, each being of the type discussed after the definition of J + (N i 1 ,N i 2 ) . The curves' concatenation points must lie in N i2 , N i3 , . . . , N in−1 , respectively (and in that order).
We now claim and shall prove inductively that Let us first prove the base case n = 2. Let {a m } m∈N be a dense subset of We now claim that , because finite unions of closed sets are closed and so are any intersections of closed sets.
Indeed, to prove the inclusion '⊆', assume (p, One can thus take p m := r =: q m . On the other hand, to show the inclusion '⊇', let us assume that (p, We now invoke the fact that J + which completes the proof of (11) and of the base case of the induction. We now move to the proof of the inductive step, which essentially goes along the same lines as the proof of the base case.
The assumption says that for any The induction hypothesis then reads: for any Similarly as before, for each k ∈ N consider the family {B(a m , 1 k )} m∈N covering N in , and take its finite subcover {B(a m , 1 k )} m∈F k . We now claim that which would mean that J + (N i 1 ,N i 2 ,...,N i n+1 ) is a closed subset of N i1 × N in+1 (hence also a compact subset of U i1 × U in+1 ), because we already know that J − (N i 1 ,N i 2 ,...,N in ) B(a m , 1 k ) is closed in N i1 (by the induction assumption and definitions (10)) and that J + (N in ,N i n+1 ) B(a m , 1 k ) is closed in N in+1 (by the base case and definitions (9)).
To show the inclusion '⊆' in (12), assume (p, q) ∈ N i1 × N in+1 is such that there exists r ∈ N in satisfying (p, r) ∈ J + (N i 1 ,N i 2 ,...,N in ) and (r, q) ∈ J + (N in ,N i n+1 ) . For any k ∈ N, since {B(a m , 1 k )} m∈F k covers N in , it is possible to find m ∈ F k such that r ∈ B(a m , 1 k ). One can thus take p m := r =: q m . On the other hand, to show the inclusion '⊇', let us assume that (p, q) ∈ N i1 × N in+1 are such that . We can thus construct a sequence (a m k ) k∈N , which, being contained in the compact set N in , has a subsequence (a m k l ) l∈N convergent to some a ∞ ∈ N in . Analogously as before, we argue that also the sequences (p m k ), (q m k ) have subsequences converging to a ∞ .
By the induction assumption, we obtain that (p, a ∞ ) ∈ J + (N i 1 ,N i 2 ,...,N in ) . On the other hand, invoking the base case we similarly obtain that (a ∞ , q) ∈ J + (N in ,N i n+1 ) . This completes the proof of (12) and of the entire induction. Altogether, we can thus write that ∀ n ∈ N ∀i 1 , i 2 , . . . , i n ∈ N Bearing the above in mind, the σ-compactness of J + will be proven if we show that In order to show the inclusion '⊆', take any (p, q) ∈ J + and let γ : Without loss of generality, we can assume that I j1 ⊆ I j2 for all j 1 = j 2 . Bearing this in mind, we can rewrite I either as {[0, 1]} (the trivial cover) or, if n > 1, as (a 2 , b 2 ), . . . , (a n−1 , b n−1 ), (a n , 1]} , where 0 < a 2 < a 3 < . . . < a n < 1. Notice also that b j > a j+1 for j = 1, . . . , n − 1, because otherwise such an I would not be a cover. In the first (trivial) case, γ([0, 1]) ⊆ N i1 ⊆ N i1 for some i 1 ∈ N and hence (p, q) ∈ J + (N i 1 ) . In the second case, observe that for some i 1 , . . . , i n ∈ N and hence (p, q) ∈ J + (N i 1 ,...,N in ) . In either case, we obtain that (p, q) ∈ ∞ n=1 i1,i2,...,in∈N J + (N i 1 ,N i 2 ,...,N in ) . In order to show the other inclusion '⊇' in (14), notice simply that a concatenation of finitely many piecewise smooth future-directed causal curves is itself a piecewise smooth future-directed causal curve. Therefore, if (p, q) ∈ J + (N i 1 ,N i 2 ,...,N in ) , then (p, q) ∈ J + .
Proof. On the strength of (14), we have that  Proof. By assumption, X = ∞ m=1 X m , where for any m ∈ N, X m ⊆ M is closed. Observe that, by (14), For any m, n ∈ N and any i 1 , i 2 , . . . , i n ∈ N the set (X m × M) ∩ J + is closed in N i1 × N in and hence compact in M 2 . Since pr 2 is a continuous map, the projection of a compact set is itself compact and we obtain that J + (X ) is σ-compact.
The proof for J − (X ) is completely analogous. Moreover, on the strength of the previous corollary, replacing J ± with E ± in the above proof yields the desired result for the horismotical futures and pasts.
The final corollary shows that the volume functions can be defined by means of causal futures/pasts instead of the chronological ones.
Proof. By the previous corollary, E ± (p) and J ± (p) are Borel sets for any p ∈ M and so the expressions η(E ± (p)) and η(J ± (p)) are well-defined. Since it is true that where we have used the second condition in the definition of an admissible measure. Therefore, t − (p) = η(J − (p)). The proof for t + is analogous.

Causality for Probability Measures
The aim of this section is to extend the causal precedence relation onto the space of measures P(M) on a given spacetime M. We begin by invoking a certain characterisation of causality between events.
The proof can be found in [25, Proposition 10]. 6 As an important side note, observe that Theorem 5 exactly mirrors the definition of a causal function. Indeed, the latter can be written symbolically as whereas Theorem 5 in fact says that Therefore, instead of using to define what a causal function is, one can come up with an abstract, suitably structurised set C of 'smooth bounded causal functions' and define through C using the analogue of Theorem 5. This was done by Franco and Eckstein in [25] in a very general context of noncommutative geometry.
Condition 1 provides a 'dual' definition of the causal precedence, which actually suggests how could be extended onto P(M).

Definition 1.
Let M be a globally hyperbolic spacetime. For any μ, ν ∈ P(M) we say that μ causally precedes ν (symbolically μ ν) iff In [25] it is proven (in a much more general context) that the abovedefined relation is in fact a partial order. This definition, however, has two shortcomings. Firstly, it is well motivated only on globally hyperbolic spacetimes. Secondly, the intuitive notion of causality for spread objects, as phrased in the introduction, is not directly visible in Definition 1.

Characterisations of the Causal Relation
In the following, we provide various conditions which are equivalent to the above definition of a causal relation between measures. Moreover, in some of the implications the assumption on global hyperbolicity of M can be relaxed.
The first result states that if C(M) is sufficiently rich, one can abandon the smoothness requirement. Proof.
(1 • ⇒ 2 • ) Relying on [18, Corollary 5.4 and the subsequent comments] we use the fact that in stably causal spacetimes any time function can be uniformly approximated by a smooth time (or even temporal) function.
Using the stable causality, fix a temporal function T : M → R. For any ε > 0, the function f +ε arctan T is a time function which clearly approximates f uniformly. By the above-mentioned corollary, this function in turn can be approximated by a smooth time function f ε such that Clearly To obtain 2 • it now remains to observe that for any measure η ∈ P(M) it is Indeed, for any η ∈ P(M) and ε > 0 one has where we have used (17).
The next result characterises the relation between measures in terms of open future sets.
Because M is causally continuous, t − λ is a time function for any λ ∈ (0, 1]. Now, for every n ∈ N define an increasing function ϕ n ∈ C ∞ b (R) by ∀ x ∈ R ϕ n (x) := 1 2 + 1 π arctan n 2 x − n . The sequence of functions (ϕ n ) is pointwise convergent to the indicator function of R >0 . Moreover, also ϕ n • t − λ is a bounded time function for every n ∈ N and λ ∈ (0, 1]. By 2 • , this means that Since the functions ϕ n are bounded and continuous, we can invoke Lebesgue's dominated convergence theorem and first take λ → 0 + , obtaining and then take n → +∞, which yields It is now crucial to notice that the function p → η(F ∩ I − (p)) is positive on F and zero on M\F. These observations follow from the definition of an admissible measure and the fact that F is future set. Together with the above inequality of integrals they imply that For any fixed n ∈ N let us consider the following simple function By 3 • , we obtain the following inequality of integrals It is not difficult to realise that More concretely, one can show that Indeed, the very definition of F where · denotes the ceiling function. Using the fact that x − x − 1 ∈ (0, 1] for any x ∈ R, we obtain that which proves (20). Invoking now Lebesgue's dominated convergence theorem and passing with n → +∞ in (19) we obtain Invoking Lebesgue's theorem again, we pass with ε → 0 + and obtain 2 • .
Vol. 18 (2017) Causality for Nonlocal Phenomena 3071 The third and the most important result concerns causally simple spacetimes. We show that condition 3 • extends to different kinds of future sets. Moreover, we introduce a condition that uses the existential quantifier.  Hence (see Fig. 2), We claim that By (24), it suffices to prove the inclusion '⊇'. Suppose then that q ∈ ∞ n=1 x∈K J + B x, 1 n , which means that ∀ n ∈ N ∃ x n ∈ K ∃ p n ∈ B(x n , 1 n ) p n q. Since K is compact, the sequence (x n ) has a convergent subsequence (x n k ), lim k→+∞ x n k = x ∞ ∈ K. Notice that also the subsequence (p n k ) converges to x ∞ . But because J + is a closed set in the case of a causally simple spacetime, the fact that for every k ∈ N p n k q implies that x ∞ q and therefore q ∈ J + (K).
By 3 • we know that Since for all n ∈ N, where we have also used (25) and (26), thus proving 4 • .
(4 • ⇒ 5 • ) Let F ⊆ M be any Borel future set. For any K ⊆ F it is then true that J + (K) ⊆ F. Therefore, In the above chain of inequalities let us take the supremum over all compact K ⊆ F. Using the tightness of μ (see (4)), we have and so μ(F) = sup μ(J + (K)) | K ⊆ F, K compact and similarly for the measure ν. As we can see, in order to obtain 5 • from 4 • it is enough to take the supremum over all compact K ⊆ F.
Trivial-open sets are Borel.
(2 • ⇒ 6 • ) In the first step of the proof we will show that 6 • holds for all nonnegative ϕ, ψ ∈ C b (M) with ϕ compactly supported. Namely, for such functions we will show that the condition ∀p, q ∈ M p q ⇒ ϕ(p) ≤ ψ(q) (27) implies the inequality of integrals Then, in the second step, we will demonstrate that the assumptions of nonnegativity of ϕ, ψ and of the compactness of supp ϕ can in fact be abandoned.
Define a functionφ : M → R viaφ(p) := max x p ϕ(x). Functionφ is well-defined, because for every p ∈ M the function ϕ, being continuous, attains its maximum over the compact 8 set J − (p) ∩ supp ϕ. Moreover,φ satisfies Indeed, the first inequality follows directly from the very definition ofφ. In order to obtain the second inequality, notice that by (27) we have ϕ(p 2 ) ≤ ψ(q). By the transitivity of the relation , this inequality holds also if we replace p 2 with any x p 2 . Hence, and (29) is proven.
The functionφ is obviously nonnegative, bounded, and by the transitivity of , it is causal. We claim that it is also continuous.
We have thus shown thatφ ∈ C b (M). By 2 • we have that But from (30) we readily obtain (28), because where the first and the last inequalities follow from (29) and the middle one is exactly (30). Thus, we have already proven 6 • under the assumption that ϕ is compactly supported and both ϕ and ψ are nonnegative. Let us now take any ϕ, ψ ∈ C b (M) satisfying (27).
Let (K n ) n∈N be an exhaustion of M by compact sets. Using Urysohn's lemma for LCH spaces (Theorem 1), we construct a sequence (θ n ) n∈N ⊆ C c (M) of functions such that, for any n ∈ N, θ n | Kn ≡ 1 and 0 ≤ θ n ≤ 1.
Notice that (for every n ∈ N) the function θ n ϕ m is compactly supported and, together with ψ m , they are nonnegative and satisfy (27), because for all p, q ∈ M such that p q one has On the strength of the previous part of the proof, it is then true that By the very definition, θ n ≤ 1 for every n and, since (K n ) n∈N exhausts M, we have that θ n → 1 pointwise. By Lebesgue's dominated convergence theorem we can pass with n → +∞ in (31) obtaining Theorem 9 (Kantorovich duality). Let (X 1 , μ 1 ) and (X 2 , μ 2 ) be two Polish probability spaces and let c : X 1 × X 2 → [0, +∞] be a lower semi-continuous function. Then where Let us apply the above theorem to the setting in which (X 1 , μ 1 ) := (M, μ), (X 2 , μ 2 ) := (M, ν) and c : M 2 → [0, +∞] is defined as The assumptions of Theorem 9 are met. M is a Polish space (cf. Sect. 2.1), whereas the function c is lower semi-continuous, because the causal simplicity of M implies that J + is a closed subset of M 2 .
Notice that in the above setting In other words, Ψ (μ, ν) is the set of exactly those pairs of functions which satisfy the assumptions of condition 6 • . Since we assume that 6 • holds, we obtain that and, therefore, Using Kantorovich duality (32), we thus obtain that min π∈Π(μ,ν) M 2 c(p, q) dπ(p, q) ≤ 0.
In particular, there exists at least one ω ∈ Π(μ, ν) such that the integral above is finite. But, by the very definition of the function c, this is possible iff ω(M 2 \J + ) = 0 or, equivalently, iff ω(J + ) = 1. Thus, we have proven the existence of a measure ω with desired properties.
(7 • ⇒ 2 • ) Let f ∈ C b (M) be a causal function. Because the probability measures μ and ν are, respectively, left and right marginals of the joint distribution ω, one can write that where the inequality follows from the causality of f . In the integrals with respect to ω we can always switch between M 2 and J + because ω(M 2 \J + ) = 0.
The fourth result strengthens condition 5 • in the case of globally hyperbolic spacetimes.
Proof. (5 • ⇒ 9 • ) Trivial. (9 • ⇒ 8 • ) Let S ⊆ M be a Cauchy hypersurface. As such, S is a connected achronal closed topological hypersurface, thus in particular a locally connected, second-countable LCH space. By Lemma 1, S admits an exhaustion (C n ) n∈N by compact connected subsets. Of course, each C n regarded as a subset of M is also compact, connected and achronal. By assumption, we have that But since (C n ) n∈N is increasing, we also have that J + (C n ) ⊆ J + (C n+1 ) for all n ∈ N. Thus, by (1) we obtain that Vol. 18 (2017) Causality for Nonlocal Phenomena 3077 Take any compact subset K ⊆ M. Let T 0 denote the minimal value attained at K by the function T . For any n ∈ N 0 define the level set S n := T −1 (T 0 + n). Every S n is a smooth spacelike Cauchy hypersurface. Now, for any n ∈ N 0 consider the set (see Fig. 3) We claim that for every n ∈ N 0 , Σ n is a Cauchy hypersurface and that Indeed, observe first that J + (S n ∪ K) is a future set. By [46, Chapter 14, Corollary 27] Σ n is therefore a closed achronal topological hypersurface. Let γ be any inextendible timelike curve. It crosses the Cauchy hypersurface S n (which is contained in J + (S n ∪ K)) and S 0 (the past of which, I − (S 0 ), is disjoint with J + (S n ∪ K)); therefore, it must cross the boundary ∂J + (S n ∪ K) = Σ n . Since the latter is achronal, it is met by γ exactly once and therefore Σ n is a Cauchy hypersurface.
In order to obtain (35), we prove the following lemma.

Lemma 2.
Let M be a spacetime and let F ⊆ M be a closed future set such that F ⊆ J + (X ) for some achronal set X . Then, J + (∂F) = F.
Proof. '⊆' Because F is closed, it contains its boundary: ∂F ⊆ F. Hence, because F is a future set. '⊇' Take q ∈ F. By assumption, there exists x ∈ X and a future-directed causal curve γ from x to q.
Notice first that x ∈ F\∂F = int F. Indeed, if x would belong to int F, which is an open subset of F, there would exist x ∈ F such that x x. But since F ⊆ J + (X ), there would exist x ∈ X such that x x . Altogether, by (5) we would obtain that x x, in contradiction with the achronality of X . Therefore, either x ∈ ∂F or x ∈ M\F.
If x ∈ ∂F, then q ∈ J + (∂F) and the proof is complete.
On the other hand, if x ∈ M\F, then the curve γ must cross ∂F at some point p. Of course, p q and hence also in this case q ∈ J + (∂F).
Notice now that J + (S n ∪ K) = J + (S n ) ∪ J + (K) is in fact a closed 10 future set such that J + (S n ∪ K) ⊆ J + (S 0 ). On the strength of Lemma 2, we obtain (35).
By 8 • , because Σ n is a Cauchy hypersurface for any n ∈ N 0 , we can write that Observe that the sequence (J + (Σ n )) n∈N0 is decreasing, because for all n ∈ N 0 where we have used (35) and the very definition of S n 's. Property (2) allows us to pass with n → +∞ in (36) and write that The countable intersection appearing above can be easily shown to be equal to J + (K). Indeed, one has Therefore, (37) yields (21) and the proof of 4 • is complete.
We have thus provided 9 different characterisations of a causal relation between probability measures, which are equivalent if the underlying spacetime is globally hyperbolic. Some of the implications hold under lower causality conditions, as demonstrated in Theorems 6, 7 and 8. Let us now discuss other implications not covered in the proofs.

Basic Properties of the Causal Relation Between Measures
In the previous subsection we have shown that for any spacetime M the condition 7 • not only implies all of the others listed in Theorems 6, 7, 8 and 10, but also more general ones 2 • and 6 • . It encourages us to promote the condition 7 • to the definition of the causal precedence relation on P(M) for any spacetime M.
Such an ω will be called a causal coupling of μ and ν.
Observe that ω(J + ) is well-defined because, by Theorem 4, J + is σcompact, and hence Borel, for any spacetime M.
Remark 5. In the case of causally simple spacetimes J + ⊆ M 2 is closed and therefore, by the very definition of the support of a measure (see the last paragraph of Sect. 2.2), condition (ii) in Definition 2 is equivalent to the inclusion supp ω ⊆ J + . However, without the assumption of causal simplicity this is no longer true.
The term 'coupling (of measures μ and ν)' comes from the optimal transport theory [58], where it describes any ω ∈ P(M 2 ) with property (i) of the above definition. The set of such couplings, denoted Π(μ, ν), has already appeared above in the context of the Kantorovich duality (Theorem 9). Such a coupling-or a transference plan, as it is also called-can be regarded as an instruction how to 'reconfigure' a fixed amount of 'mass' distributed over M according to the measure μ so that it becomes distributed according to the measure ν. This 'reconfiguration' involves transporting the (possibly infinitesimal) portions of 'mass' between points of M, and the coupling ω ∈ Π(μ, ν) ⊆ P(M 2 ) precisely describes what amount of 'mass' is transported between any given pair of points.
It is, however, property (ii) which ties the above definition with causality theory. It can be summarised as a requirement that the transport of 'mass' should be conducted along future-directed causal curves only-that is why such couplings deserve to be called causal. The set of all causal couplings of measures μ and ν will be denoted by Π c (μ, ν).
Notice that a (causal) coupling does not specify along which (causal) curves the portions of 'mass' are transported. In fact, various families of (causal) curves can lead to the same (causal) coupling. Notice also that the 'mass' concentrated initially at some point p ∈ M can dilute to many different points.
Observe that for Dirac measures μ = δ p , ν = δ q Definition 2 reduces to the standard definition of the causal relation between events p and q. This can be seen as a corollary of the following proposition.
Proof. (i) To prove '⇒' we use the inclusion-exclusion principle to write Conversely, to prove '⇐', notice that Corollary 5. Let M be a spacetime. Then, for any p, q ∈ M p q iff δ p δ q .
Proof. By Proposition 4, the only coupling between two Dirac measures δ p , δ q is their product measure ω := δ p × δ q = δ (p,q) . Hence, the fact that p q is equivalent in this case to the requirement that ω(J + ) = 1.

Corollary 6.
Let M be a causally simple spacetime. For any p, q ∈ M the following conditions are equivalent Proof. It is a direct consequence of the equivalence (1 • ⇔ 7 • ) in Theorem 8 and Corollary 5.
If the measure μ is compactly supported, then in the light of the above discussion it is natural to expect that the support of any ν with μ ν should be within the future of supp μ [60]. This intuitive condition is in fact true in causally simple spacetimes. Proof. By condition 4 • (which is implied by Definition 2) it is true that 1 = μ(supp μ) ≤ μ(J + (supp μ)) ≤ ν(J + (supp μ)) ≤ 1 and therefore ν(J + (supp μ)) = 1.
We now claim that if M is causally simple, then this implies that supp ν ⊆ J + (supp μ).
Indeed, recall that in a causally simple spacetime the causal futures of compact sets are closed. Therefore, if there existed q ∈ supp ν but q ∈ J + (supp μ), then we could take an open neighbourhood U q such that ν(U ) > 0 but U ∩ J + (supp μ) = ∅. But this would imply that in contradiction with the first part of the proof.
Recall that the causal precedence relation between events is reflexive, transitive and, iff M is causal, antisymmetric. We now prove analogous results for the space of Borel probability measures on M equipped with the relation . To this end, it will be convenient to use the diagonal map Δ : M → M 2 , defined as Δ(p) := (p, p) for any p ∈ M. Proof. To prove reflexivity of , it suffices to notice that for any μ ∈ P(M) the pushforward measure Δ * μ is a causal coupling of μ with itself.
We now move to proving the transitivity of . Let us invoke the following standard result [58, Lemma 7.6] from the optimal transport theory.
But this, in turn, means that where the middle inequality is a direct consequence of the transitivity of the causal precedence relation between events. We have thus proven that ω 13 (J + ) = 1, and so ω 13 ∈ Π c (μ 1 , μ 3 ) and therefore μ 1 μ 3 .
The natural question arises: How robust the causal structure of a spacetime M must be to render the relation antisymmetric and hence a partial order? Obviously, M must be at least causal (otherwise even the causal precedence relation between events fails to be antisymmetric). We have the following result: Theorem 12. Let M be a spacetime with the following property: For any compact K ⊆ M there exists a Borel function τ K : K → R such that Then, for any μ ∈ P(M), Π c (μ, μ) = {Δ * μ}. Moreover, the relation is antisymmetric.
Remark 6. Property (45) implies that M is causal. Indeed, suppose that there exist two distinct events p, q ∈ M such that p q p. Now, taking now K = {p, q} we would obtain, on the strength of (45), that τ K (p) < τ K (q) < τ K (p), a contradiction.
On the other hand, if M is past (future) distinguishing, then any past (resp. future) volume function is a semi-continuous, and hence Borel, generalised time function τ (cf. Sect. 2.3). This obviously implies (45)-for any compact K ⊆ M simply define τ K := τ | K . However, being past or future distinguishing is not necessary for (45) to hold.
On the strength of Lemma 4, we get that π = Δ * μ.
Notice, however, that the set (p, q, r) ∈ M 3 | p q r = p is Ω-null, because Therefore, in fact, But M is causal (cf. Remark 6), therefore the causal precedence relation between events is antisymmetric, and thus the set, the measure of which is evaluated in (47), is equal to {(p, p, p) ∈ M 3 | p ∈ M}.

Lorentz-Wasserstein Distances
For an exposition of the theory of Wasserstein distances in the context of the optimal transport theory one is referred e.g. to [58]. We now propose the following natural definition of a distance between measures on a spacetime.
Notice that the integrals are well-defined, because d is lower semi-continuous and hence Borel. Notice also that for Dirac measures LW s (δ p , δ q ) = d(p, q) for any s.
Proof. (i) The implication is obvious, so we only prove the equivalence. To prove the '⇒' part of the equivalence, assume that LW s (μ, ν) > 0. By the very definition of LW s , this implies that there exists ω ∈ Π c (μ, ν) such that  However, Lorentz-Wasserstein distances between two compactly supported measures in globally hyperbolic spacetimes are finite.

Outlook
In Definition 2 we proposed a notion of the causal relation between probability measures on a given spacetime M. To make sure that the relation is well-defined, we entered the little explored domain on the verge of causality and measure theory. The presented formalism can be developed in various directions and applied in both classical and quantum physics. Firstly, one can try to lower the causality conditions imposed on the spacetime in the theorems presented in Sect. 4. In particular, it would be interesting to see whether the defined relation on P(M) is a partial order for every causal spacetime M, or is the assumption (45) in Theorem 12 a necessary one. If the latter holds, one would obtain a new rung of the causal ladder between the causal and distinguishing spacetimes.
A second path of possible development would be to investigate further the notion of a Lorentzian distance in the space of probability measures on a spacetime and the associated topological questions. In Sect. 5 we proposed a notion of the s th Lorentz-Wasserstein distance, which is a natural generalisation of the Lorentzian distance between the events on M. However, in the optimal transport theory there are other ways to measure distances between probability measures (see for instance [59, p. 97]). It is tempting to see how (if at all) these notions can be adapted to the spacetime framework. This directly relates to the issue of topology on P(M) and its interplay with the semi-Riemannian metric on M.
Finally, it is desirable to conciliate between our results and the recent paper of Suhr [55]. In particular, one could check whether Theorems 6, 7, 8, 9 and 10 can be extended to the more abstract, Lorentz-Finsler setting adopted in [55]. This would increase the potential usefulness of our work in the application to the early universe reconstruction problem [14,[28][29][30].
The first application of the presented theory to the study of causality in quantum theory is discussed in details in [21]. Therein, we focus on the wave packet formalism, which is in common use in atomic, condensed matter [53] and particle physics [12], as an approximation to complicated QFT problems. In this framework, any normalised solution to the Schrödinger equation i ∂ t ψ = Hψ defines a family of probability measures {μ t ∈ P(R n+1 )} t∈R localised on tslices, via μ t = δ t × ψ(t, x) 2 d n x, where d n x is the standard Lebesgue measure on R n . Given such a family of measures on (n + 1)-dimensional Minkowski spacetime one can, within our formalism, rigorously study the causality of the quantum evolution, i.e. check whether μ s μ t whenever s ≤ t (see also [40]). This allows us to rigorously check the conclusions obtained by Hegerfeldt [35,36,39]. We also compare the outcomes with the more recent results on causality in quantum mechanics [3,5,9,60] and seek potential empirical consequences.
To allow for more sophisticated applications in quantum theory one could extend the presented formalism to probability measures with values in some Banach space. The first direct development would be to consider signed measures, which may model both classical and quantum charge densities on a given spacetime. A more challenging task would be to extend the presented causal order to positive-operator-valued measures, what might offer a new insight into the quantum information theory [45].
Finally, let us come back to the original motivation of our preliminary Definition 1. As stressed at the beginning of Sect. 4, it was inspired by the notion of 'causality in the space of states' coined in [25]. The partial order