1 Introduction

In 2015, Ronald Hanson and colleagues at Delft reported the first of an important new class of ‘Bell tests’—i.e., experimental confirmations that the quantum world violates the Bell inequalities [1]. The Delft experiment was the first Bell test to close both of two well-known experimental loopholes, the so-called detection and locality loopholes. Previous Bell tests had closed one or the other, but not both simultaneously. As the Delft authors note , their experiment exploits ‘[a]n elegant approach for realizing a loophole-free setup ... proposed by Bell himself’ [1, p. 3]. The same approach has now been taken by several other experiments [2,3,4].

The Delft experiment was widely hailed as a further important confirmation of quantum ‘action at a distance’ (AAD). Media coverage presented the experiment as good news for AAD and hence as bad news for Einstein: “The most rigorous test of quantum theory ever carried out has confirmed that the ‘spooky action-at-a-distance’ that [Einstein] famously hated ... is an inherent part of the quantum world,” as a report in Nature put it [5].

Here we argue that this conclusion needs a large caveat. The Delft experiment does, as claimed, do a convincing job of closing experimental loopholes in the project of showing that nature violates the Bell inequalities. But the step from violation of the Bell inequalities to AAD involves a sensitivity to experimental geometry that has not previously been recognised, so far as we are aware. Our goal in this paper is to call attention to this issue, using the Delft experiment and some variants of it as a framework for our discussion.

The issue arises because, unlike previous Bell tests, such experiments make use of entanglement swapping. Their spacetime geometry thus has a \(\lor\hspace{-2pt}\lor \) shape, rather than the \(\vee \) shape of previous experiments. The central vertex of the \(\lor\hspace{-2pt}\lor \) is a measurement whose result (if suitable) confirms an entanglement between the particles measured at the outer vertices. We will show that this geometry may permit an alternative explanation of the observed Bell correlations across the \(\lor\hspace{-2pt}\lor \), not requiring AAD. The apparent causal influence may be a selection artifact, of a kind familiar in the causal modelling literature, rendered possible in this case by the role of the central measurement.

Specifically, the central measurement is a ‘collider’ in causal modelling terms, with the result that apparent causal influence across the \(\lor\hspace{-2pt}\lor \) may be an artifact of so-called ‘collider bias’. (We explain these terms below.) This means that such an experiment may provide no evidence of AAD in its own case – i.e., across the \(\lor\hspace{-2pt}\lor \) as a wholeFootnote 1 – despite confirming predictions that seem to mandate AAD in other geometries. We will call this the Collider Loophole (CL).

We will show that under standard assumptions, vulnerability of such experiments to CL depends on the spacetime location of the central vertex of the \(\lor\hspace{-2pt}\lor \) with respect to the outer vertices. Cross-\(\lor\hspace{-2pt}\lor \) AAD is highly questionable if the central vertex lies in the absolute future of the outer vertices, but not if it lies in their absolute past. Interestingly, the Delft experiment itself is the intermediate case, in which this separation is spacelike. We will argue that in the light of this, the case for AAD across the \(\lor\hspace{-2pt}\lor \) in the Delft experiment is weaker than in the subsequent similar experiments, in which the central vertex lies in the overlap of the past light cones of the outer vertices.

The sensitivity of CL to the location of the central vertex depends on the assumption that there is no retrocausality in play. This may seem uncontroversial, but it is has been challenged in this context, an option that has been held to provide a different reason for questioning the inference from violation of the Bell inequalities to AAD. Retrocausal models allow causality ‘across the \(\vee \)’ in conventional \(\vee \)-shaped Bell experiments, but take it to be indirect: the causal influence is said to take a zig zag path, along the two arms of the \(\vee \). The result is spacelike causality, without direct AAD. The claimed advantage is that by keeping direct causal influences within the light cones, such models may be easier than conventional models to reconcile with special relativity; see [6,7,8,9,10] for recent discussions. In the present paper we ignore this option. A No Retrocausality Assumption (NoRA) plays a crucial role in our argument, supporting the case for the existence of the Collider Loophole in some \(\lor\hspace{-2pt}\lor \) geometries but not others.Footnote 2

The paper goes like this. Section 2 deals with preliminaries. We introduce entanglement swapping, and a variant of it known as delayed choice entanglement swapping (DCES). We summarise recent discussion in the literature about the ontological status of the entanglement produced by DCES. In particular, we explain the common view that DCES does not create ‘genuine’ entanglement, and that the appearance that it does so is an artifact of post-selection. We also introduce the notions of colliders and collider bias from the causal modelling literature.

In Sect. 3 we describe the Delft experiment, and then apply the lessons of Sect. 2 to a hypothetical variant of it, conducted with DCES. We show that in this DCES variant, it would be highly questionable whether violation of the Bell inequalities would reflect any real AAD across the \(\lor\hspace{-2pt}\lor \) geometry of the experiment as a whole. The alternative is that the Bell correlations are an artifact of collider bias. We give two reasons for thinking that this is the true explanation of the cross-\(\lor\hspace{-2pt}\lor \) correlations in the DCES case. First, the central measurement is a collider, which immediately puts the possibility of collider bias on the table. Second, the experimental correlations fail two intuitive tests for the existence of a genuine (cross-\(\lor\hspace{-2pt}\lor \)) causal connection. We argue that taken together, these factors provide strong reasons to doubt cross-\(\lor\hspace{-2pt}\lor \) AAD in the DCES case.Footnote 3

Up to this point, we will have been speaking about AAD, not ‘nonlocality’ or ‘nonlocal causation’. This choice is deliberate, and in Sect. 4 we explain why we make it. Bell’s own formal notion of Local Causality (LC) [12] turns out to require particular care, in these contexts. When selection artifacts are in the offing, it is important to distinguish a version of LC expressed in terms of frequencies in post-selected ensembles from a version (Bell’s own) expressed in term of underlying fundamental probabilities. In Sect. 4 we explain the need for this distinction, and then discuss the relation between the failure of cross-\(\lor\hspace{-2pt}\lor \) AAD and LC (in both senses).

In Sect. 5 we turn to a second variant of the Delft experiment, in which the entanglement swapping occurs in the past of the measurements at the extremities of the \(\lor\hspace{-2pt}\lor \) (i.e., in the overlap of their past light cones). As we will explain, this variant appears to avoid the Collider Loophole.

In Sect. 6, with these variants as comparisons, we return to the actual Delft experiment. In this case, as we noted, the entanglement swapping measurement occurs at spacelike separation from the two measurements between which the experiment claims to reveal AAD. This makes a difference, but we conclude that although the case for thinking that the experiment fails to close the Collider Loophole is not as straightforward as in the delayed choice version, the loophole remains a threat. Section 7, finally, is a brief conclusion.

2 Preliminaries

2.1 Entanglement Swapping

We introduce entanglement swapping by following a helpful recent presentation by Glick [15]. As Glick puts it:

Entanglement swapping is a procedure in which entanglement may be “swapped” from a pair of jointly measured particles to a pair of particles lacking common preparation. The technique has become quite commonplace in experiments involving entanglement and has numerous applications in quantum information theory. A simple experimental arrangement is depicted (in Fig. 1). [15, p. 16]

Fig. 1
figure 1

Entanglement swapping (from [15])

A similar procedure can be considered in which the measurement that induces the ‘swapping’ occurs in the absolute future of the side measurements by Alice and Bob. This is called delayed-choice entanglement swapping. As Glick describes it:

The procedure was proposed as a thought experiment by [16], but has now been realized experimentally by [17] and others. We begin with two entangled systems as in the ordinary case, but rather than have Victor perform his measurement prior to Alice and Bob, we delay particles 2 and 3 so that Victor can perform his measurement after his colleagues. Recall that the argument given above ...suggests that we should expect the same result as in the ordinary swapping case. In particular, when Victor successfully performs a BSM [Bell state measurement], entanglement will be swapped to (1,4). ... [T]hese results seem to have been confirmed by an experiment conducted by [17] depicted (in Fig. 2). [15, p. 17]

As Glick notes, this possibility seems to have peculiar consequences:

This presents the following challenge: In the ordinary entanglement swapping case, Victor has the power to entangle (or not) the outer particles (1,4) at a distance. In the delayed-choice case, it seems that Victor has the same power to entangle (1,4). However, at the time of Victor’s measurement (or choice), (1,4) have already been detected. Thus, Victor’s measurement must not only be capable of influence at a (spacelike) distance, but also backwards in time. The only way Victor can entangle (1,4) is to act on them retrocausally when (or before) they are measured (\(t\le M_A, M_B\)). [15, p. 17]

Fig. 2
figure 2

Delayed choice entanglement swapping ([15], after [17])

Some writers (e.g., [16, 18]) propose that we avoid this appearance of retrocausality by adopting an antirealist view of the quantum state. If the quantum state does not represent a piece or property of reality, then there is no retrocausality (or indeed any sort of causality) involved here, because there is no real effect. Victor’s measurement may change our knowledge of the past in some way, but it doesn’t affect the past.

Glick’s preferences are more realist. He proposes to defend a realist view of the quantum state by allowing timelike entanglement, considering several interpretations of the resulting timelike connection.

On this view, (1,4) are entangled at \(M_A, M_B\) in virtue of Victor’s later measurement (at \(M_V\)). Generalizing, a pair of particles in a DCES experiment are entangled only if there actually is a BSM performed in the future that swaps entanglement to them. Of course, one may not know whether such a measurement will be performed, and hence, may wish to leave it open that the particles one encounters may be entangled. But, this doesn’t trivialize entanglement as it still only applies to certain pairs of particles, namely, those prepared in an entangled state or entangled via other means (e.g., entanglement swapping). [15, pp. 19–20]

Glick discusses two variants of this view. One is that ‘Victor’s measurement has a retrocausal (or non-causal influence) on the pair of particles (1,4) at \(t\le M_A, M_B\).’ The other, which Glick calls ‘Nonseparability’, proposes that ‘Victor’s measurement gives rise to the (1,4) whole that Alice and Bob both measure.’ Glick acknowledges ‘difficulties in working out the details and timing of such processes’, but suggests that ‘these difficulties are by in large the same as those already faced by [similar] approaches in the context of spacelike entanglement.’

Glick notes that there is an alternative explanation of the correlations involved in DCES cases, one described by Egg [19]. Glick observes that Egg’s proposal provides a halfway house between the antirealism of Peres and Healey and his own realist view, and describes it as follows [15, p. 20]:

An alternative interpretation of DCES is given by Egg. Egg endeavours to provide a principled basis to accept the realist account of ordinary entanglement swapping but reject its extension to DCES. Egg’s reply focuses on an aspect of Ma et al.’s DCES experiment that was omitted from the initial presentation. Unlike a simple EPR experiment, the correlations in the data recorded by Alice and Bob are only apparent once that data has been sorted into subensembles according to the measurement performed and results obtained by Victor. Once we sort the results obtained by Alice and Bob in this way, we find that the subsets of data associated with Victor performing a BSM exhibit correlations that violate a Bell inequality. This leads Egg to conclude the following:

The Bell measurement on the [2,3] pair allows us to sort the [1,4] pairs into four subensembles corresponding to the four Bell states. Without delayed choice, this has physical significance, because each [1,4] pair really is in such a state after the [2,3] measurement. But if the [1,4] measurements precede the [2,3] measurement, the [1,4] pair never is in any of these states. This is entirely compatible with the fact that evaluating the [1,4] measurements within a certain subensemble shows Bell-type correlations. [19, p. 1133, original emphasis, notation changed to match Glick]

As Glick says, ‘Egg’s proposal is that we should posit physical entanglement between (1,4) only when Victor’s measurement occurs before Alice’s and Bob’s (\(M_V < M_A, M_B\)).’ Glick notes that this ‘allows one to preserve realism about entanglement (and an ontic view of the quantum state more generally) without having to adopt the revisionary metaphysics’ that he himself proposes.

Egg’s analysis of the DCES cases seems to be a widespread view.Footnote 4 The crucial point for our purposes is that if post-selection is in play, existence of Bell Inequality-violating correlations in a subensemble of measurements need not be evidence of real entanglement. In a moment we will apply Egg’s analysis to a delayed-choice version of the Delft experiment. Before that, we need to introduce some terminology from the causal modelling literature.

2.2 Collider Bias

In causal modelling terminology, a collider (or inverted fork) is a variable with more than one direct cause within a causal model. In other words, in the graphical format of directed acyclic graphs (DAGs), it is a node at which two or more arrows converge (hence the term ‘collider’). It is well known that conditioning on such a variable – i.e., selecting the cases in which it takes a certain value – may induce a correlation between its causes, even if they are actually independent. As Cole et al put it, ‘conditioning on the common effect imparts an association between two otherwise independent variables; we call this selection bias’ [22, p. 417].

Collider bias is sometimes called Berkson’s paradox [23]. Berkson’s own example involved an apparent negative dependence between diabetes and cholecystitis in patients admitted to hospital with certain symptoms. Such symptoms tended to be caused either by diabetes or by cholecystitis, so that presence of these symptoms is a collider, in causal modelling terms. In these patients, lack of one cause does make more probable the other cause, but Berkson’s point is that this is a biased sample. There need be no such correlation in the general population.

Here’s a simpler example, to lead us in the direction of quantum cases. Imagine that Alice and Bob are at spacelike separation, and play rock-paper-scissors with each other, sending their choices to a third observer, Charlie. Suppose that Alice and Bob make their choices entirely at random, and that Charlie records three kinds of outcomes: Alice wins, Bob wins, or neither wins. Obviously, post-selecting on any one of these outcomes induces a correlation between Alice’s choices and Bob’s choices. Equally obviously, this does not amount to real causality between Alice and Bob.

3 The Delft Experiment and Variants

3.1 The Actual Experiment

The Delft experiment [1] adopts a proposal originally made by Bell himself [24]. Detection efficiency is improved by means of an ‘event-ready’ measurement, whose function is to signal that two suitably entangled particles (in this case, electrons) are in the A and B detector channels. In the Deflt experiment, unlike in Bell’s own proposal, the event-ready signal is provided by a particular outcome to a measurement that also serves to entangle the two electrons, via entanglement swapping. The experimental procedure is described as follows [1, p. 3]:

We generate entanglement between the two distant spins by entanglement swapping in the Barrett-Kok scheme using a third location C (roughly midway between A and B ...). First we entangle each spin with the emission time of a single photon (time-bin encoding). The two photons are then sent to location C, where they are overlapped on a beam-splitter and subsequently detected. If the photons are indistinguishable in all degrees of freedom, the observation of one early and one late photon in different output ports projects the spins A and B into the maximally entangled state \(\left| \psi ^{-}\right\rangle =(\left| \uparrow \downarrow \right\rangle - \left| \downarrow \uparrow \right\rangle )/\sqrt{2}\), where \(m_s = 0 \equiv \left| \uparrow \right\rangle \), \(m_s = -1 \equiv \left| \downarrow \right\rangle \). These detections herald the successful preparation and play the role of the event-ready signal in Bell’s proposed setup. ... [W]e ensure that this event-ready signal is space-like separated from the random input bit generation at locations A and B.

In the Delft protocol, successful event-ready detection results at the central point C select a (small) subensemble of the total series of trials E, a subensemble we denote by \(E_{C}\). These are the trials in which there is held to be successful entanglement swapping, ensuring that the electrons at A and B are entangled.

We note in passing that the Delft authors themselves take the view that, as they put it, ‘John Bell proved that no theory of nature that obeys locality and realism can reproduce all the predictions of quantum theory’, and hence that their own result ‘rules out large classes of local realist theories’ [1, p. 1, emphasis added]. However, the claim that Bell’s Theorem requires an assumption of realism is controversial; see, e.g., Norsen [25] for the argument to the contrary. This is not directly relevant to our present concerns, but we mention it to emphasise that our own challenge to AAD in Delft-like experiments does not depend on this (claimed) realism loophole.

3.2 Delayed-Choice Delftware

Let us now consider a DCES version of the Delft experiment, Delayed Delft (DD).Footnote 5 It differs from the original experiment in that the measurement C takes place later in time, in the future light cones of the measurements A and B. This experiment would be expected to yield the same result as the original, because the relevant joint probabilities are insensitive to the relative timings of the three experiments involved. In DD, as before, let \(E_{C}\) denote the subensemble of all measurement results in which an event-ready result is recorded at C. Let \(\{a_n,b_n,A_n,B_n\}\) denote the nth result within \(E_{C}\). Here \(a_n\) is the setting of measurement A, \(A_n\) the outcome of measurement A, and so on, in the usual way.

3.2.1 Colliders in DD

In DD, it is uncontroversial that the measurement choices at A and B may exert a causal influence on the result of the measurement C. This is certainly so in orthodox QM, in which the measurements at A and B affect the state of the particles converging on C from the left and right, respectively. (The fact that the measurements at A and B take place before that at C is crucial in this account, of course.)

In causal modelling terms, then, the outcome of the measurement C is a collider, for causal influences originating at A and B. This immediately puts on the table the possibility that any apparent AAD between A and B might be a manifestation of selection bias. We say ‘puts on the table’ here because in principle an association between A and B might result from a combination of selection bias at C and some real underlying causal influence between A and B.

The possibility that the apparent AAD between A and B might be a selection artifact is our Collider Loophole (CL). But it is one thing to identify a possibility, another to show that it is actually the case. How can we determine whether CL is the true explanation of the apparent AAD across DD? We will proceed by offering two tests for genuine causality, and explaining why DD seems to fail both tests.

3.2.2 The No Difference Test

The first test starts with this question. Would \(A_n\) or \(B_n\) have been different, if the measurement C had not (later) taken place?Footnote 6 The intuitive answer to this question is ‘No’. Because C lies in the future with respect to the measurements A and B, allowing it to influence the measurement results at A and B would amount to retrocausality, which we are assuming is impossible (NoRA).

Yet answering ‘No’ leads to a puzzle. If each individual measurement at C makes no difference in this way, then it seems to follow that the entire set of C measurements makes no difference – in other words, that all the measurement results in \(E_{C}\) would have been the same, even if the measurement device at C had simply been absent altogether. But these results display Bell correlations between A and B, and hence seem to constitute evidence of entanglement between the electrons A and B. How could such entanglement arise, if there is no measurement to provide entanglement swapping? Egg’s approach to DCES answers this question, telling us that the apparent entanglement is a selection artifact. The set of results \(E_{C}\) is indeed just the same, whether or not C measurements take place, but there’s no real entanglement in either case.

It is easy to see how this reasoning extends to AAD. If the results in \(E_{C}\) are independent of whether the C measurements actually take place, then the questions of AAD in the two cases—i.e., with and without the C measurements—stand and fall together. Either we are committed to AAD even in the absence of the C measurements, or we are not committed to it in the presence of the C measurements. And the latter is by far the more plausible option. Like entanglement itself, on the Egg view, the appearance of AAD is a selection artifact.

What we have just done is to take reasoning that supports the Egg view – the argument that in the absence of retrocausality, C can make no real physical difference in the DECS case – and applied to the issue of AAD. Let’s call this the No Difference Argument (NDA) against AAD in DD. This is our first causal test, and we have argued that DD fails it. To get to our second causality test we’ll proceed indirectly, via a possible objection to NDA.

3.2.3 Thinking About Counterfactuals

NDA appealed to the assumption that the results in \(E_{C}\) are independent of whether the future C measurements take place—that C makes no difference. A possible reply is that even if C doesn’t make a difference to the actual contents of \(E_{C}\), it might make a difference to the kind of counterfactuals that would support a claim of causal influence from A to B. If so, that might explain how a relation of causal dependence could exist in the presence of C that would not exist in its absence.

To explore this proposal, let’s think of an example. Let’s consider simply the extreme correlations, the ones that are relevant in the original EPR argument. Suppose that Alice chooses setting \(a_n=0\), and is told that the case falls in \(E_{C}\).Footnote 7 From the fact that \(E_{C}\) satisfies the Bell correlations, it follows that if Bob has chosen the same setting \(b_n=0\), then \(A_n=-B_n\). In other words, in the spirit of the EPR argument, Alice knows something about the probabilities on Bob’s side of the experiment. If \(A=1\), for example, then she knows that \(P(B = -1\mid {b_n=0})=1\). Now the crucial question. Does she also know that had she instead chosen \(a_n=1\), she would have been able to predict the result of a measurement with \(b_n=1\) with similar certainty? No—for she doesn’t know that the run of the experiment would have fallen within \(E_{C}\), in that counterfactual case. If we assume that it does so—if we hold fixed the result of the C measurement, in effect—then the reasoning goes through, but why should we be entitled to do that? Why should C not be sensitive to the choice of measurement setting at A, in a way that makes it possible that if Alice had chosen differently, the result of the C measurement might have been different? This thought will lead us in the direction of our second causality test.

3.2.4 The Counterfactual Fragility Test

NDA offered one reason for thinking that violation of Bell inequalities in DD should be seen as an artifact of post-selection, rather than a manifestation of AAD. There is a second factor that points in the same direction. Any argument from a set of experimental data to causal dependence is going to require an assumption something like the following.

Alternative Measurements (AM)—It is legitimate to consider measurements which the experimenters might have performed, in addition to those which they actually perform. [26]

In this case, the assumption AM is formulated in a discussion of Bell-style arguments, but the point is much more general. In the causal modelling framework, it is embodied in the assumption that exogenous variables may take a range of values. If an ensemble of correlation data is to provide information about causation, it needs to respect such a principle. It needs to provide information about the results of alternative choices. But a post-selected ensemble may fail to do so. It may be ‘counterfactually fragile’, in the sense that it doesn’t support inferences about what would have happened, had an alternative measurement been performed. To support a causal claim, a set of data needs to be counterfactually robust.

Here’s a simple example. Suppose I record occasions on which it is true either that I wear green socks in the morning and it is sunny the same afternoon, or that I wear red socks in the morning and it is raining in the afternoon. ‘Look’, I claim, ‘I can control the weather, at least in this subensemble of cases.’ What I’ve missed is that the selection method for the subensemble doesn’t respect AM, and is hence counterfactually fragile. Had I chosen the other sock colour the resulting case would not have been in the subensemble at all, in most cases.

It is easy to see how this kind of fragility might be present in the DD protocol. As we just observed, there is no guarantee that if A had chosen an alternative measurement setting in case \(\{a_n,b_n,A_n,B_n\}\), the resulting measurement would have been in the subensemble \(E_{C}\). Why not? Because there is no guarantee that the result of the measurement C would have been the same in that case. As we noted, conventional QM takes it for granted that setting choices can influence each of the particles converging on the central vertex of the \(\lor\hspace{-2pt}\lor \)  and in turn influence the outcome of the measurement at C.

The upshot is that the correlations within the subensemble \(E_{C}\) may provide no guide to what \(A_n\) and/or \(B_n\) would have been, had \(a_n\) been different. If so, then the data provided by \(E_{C}\) is not counterfactually robust, in the sense described above, and cannot support the causal claim of AAD. Let’s call this the Counterfactual Fragility Argument (CFA). It is our second causal test, and again, there are good grounds to think that DD fails it.

3.2.5 Summary: The Collider Loophole

We have argued that the case for cross-\(\lor\hspace{-2pt}\lor \) AAD in DD is vulnerable to the Collider Loophole (CL). In other words, Bell correlations revealed in the results of DD are likely to be an artifact of collider bias, rather than a sign of genuine AAD. It is uncontroversial that C is a collider in DD, and the diagnosis of collider bias is supported by intuitive tests for causal dependence in two ways.

First, from the fact that C occurs later than A and B, and NoRA, we concluded that the results in the subensemble \(E_{C}\) would have been the same, even if the C measurements had not taken place. NDA therefore undermines the claim that the correlations in \(E_{C}\) reveal genuine AAD.

Second, the fact that C occurs later than A and B allows the measurement settings at A and B to influence the result of the measurement at C. But this means that \(E_{C}\) may be counterfactually fragile. So CFA, too, undermines the claim that the Bell correlations in DD reflect any genuine causal dependency.

All of these points appealed to the fact that in DD, C lies in the absolute future of A and B. This suggests that CL would be avoided by a version of the Delft experiment that put C in the past with respect to A and B. We turn to that case in Sect. 5, before returning to the actual Delft experiment (which lies between these two variants, in the sense that it puts C at spacelike separation to A and B).

We emphasise again that CL does not challenge AAD within each of the two wings of DD. In each of the wings we have a component with \(\vee \) geometry, within which there is no further collider to generate selection bias.Footnote 8 It may be that such ‘mini-AAD’ has a crucial role in the processes that make post-selection possible in the \(\lor\hspace{-2pt}\lor \)-geometry as a whole, enabling the C measurement to ‘know about’ the A and B measurements. Our point is that this need not imply AAD across the \(\lor\hspace{-2pt}\lor \) from A to B or vice versa. The Collider Loophole stands in the way.

4 Connections to Bell’s Theorem

Before we turn to other variants of the Delft experiment, we want to clarify the relationship between the above discussion and Bell’s Theorem. As we said in Sect. 1, we have been using the term AAD deliberately, avoiding the term ’nonlocality’. We are now in a position to explain this choice.

Bell derived a family of inequalities from two mathematical assumptions, widely known as Local Causality (LC) and Statistical Independence (SI). When these inequalities are violated, as they are for the Bell correlations observed in the entanglement experiments we are discussing, at least one of these assumptions must fail. It is instructive to ask how the Collider Loophole might accomplish this, if there is to be no AAD across the \(\lor\hspace{-2pt}\lor \) as a whole.

LC is formalized by Bell himself [12, 25] as a conditional independence (or screening) condition between two spacelike-separated wings of an experiment. Norsen describes Bell’s own account of LC with reference to the diagram reproduced here as Fig. 3, adapted from Bell [12]. Norsen’s caption for this diagram reads as follows:

Spacetime diagram illustrating the various beables of relevance for the EPR-Bell setup. ... Separated observers Alice (in region 1) and Bob (in region 2) make spin-component measurements (using apparatus settings a and b respectively) on a pair of spin- or polarization-entangled particles (represented by the dashed lines). The measurements have outcomes A and B respectively. The state of the particle pair in region 3 is denoted \(\lambda \). Note that what we are here calling region 3 extends across the past light cones of both regions 1 and 2. It thus not only “completely shields off from 1 the overlap of the backward light cones of 1 and 2”, but also vice versa. Bell’s local causality condition therefore requires both that b and B are irrelevant for predictions about the outcome A, and that a and A are irrelevant for predictions about the outcome B, once \(\lambda \) is specified.

As Norsen goes on to say:

A complete specification of beables in this region 3 will therefore, according to Bell’s concept of local causality, “make events in 2 irrelevant for predictions about 1” and will also make events in 1 irrelevant for predictions about 2.

In formal terms, LC may thus be written:

$$\begin{aligned} P({A}\mid a,b,{B},\lambda )= & {} P(A\mid a,\lambda ),\nonumber \\ P({B}\mid a,b,{A},\lambda )= & {} P(B\mid b,\lambda ). \end{aligned}$$

In the same terminology, Statistical Independence (SI) between the settings (ab) and the prior state of the system (\(\lambda \)) is the following condition:

$$\begin{aligned} P(\lambda \mid a,b)=P(\lambda ) \end{aligned}$$

Norsen emphasises the character of the probabilities that Bell has in mind in his definition of LC.

Bell has deliberately and carefully formulated a local causality criterion that ... is ... stated explicitly in terms of probabilities—the fundamental, dynamical probabilities assigned by stochastic theories to particular happenings in space-time. Note in particular that the probabilities in Eq. (1) are not subjective (in the sense of denoting the degree of someone’s belief in a proposition ...), they cannot be understood as reflecting partial ignorance about relevant beables in region 3, and they do not (primarily) represent empirical frequencies for the appearance of certain values .... They are, rather, the fundamental “output” of some candidate (stochastic) physical theory. [25, p. 10, emphasis added]

We call attention to Norsen’s distinction between two notions of probability because it turns out to be a helpful way to make the point that when post-selection is in play, LC and SI may each be applied in two different ways, within the same model.

Fig. 3
figure 3

Bell’s notion of Local Causality (from [25])

Let’s begin with a toy example. With reference to Fig. 3, imagine that Alice and Bob (in regions 1 and 2, respectively) each generate ordered pairs of (genuinely) random bits and send them to a third observer, Charlie, in their absolute future. Think of the first of each pair of bits as a ‘setting’ and the second as an ‘outcome’. Let Charlie perform a binary measurement on each pair of ordered pairs of bits, the producing positive results with probabilities based on the Bell correlations. Selecting for positive outcomes at C will then generate an ensemble of results of the form \(\{a_n,b_n,A_n,B_n\}\). Denote this ensemble \(E_{C}\), as before.

The selection procedure guarantees that the ensemble \(E_{C}\) violates a Bell inequality. Bell’s Theorem therefore implies that either LC or SI must fail, within this ensemble of results. But SI is guaranteed by the assumption that the settings are genuinely random bits, so the effect of post-selection must be to induce an LC-violating correlation between the two sides of the experiment, conditional on \(\lambda \). In other words, at least one of the following must hold:

$$\begin{aligned} P_{ps}({A}\mid a,b,\lambda ) \ne P_{ps}({A}\mid a,\lambda ) \nonumber \\ P_{ps}({B}\mid a,b,\lambda ) \ne P_{ps}({B}\mid b,\lambda ) \end{aligned}$$

where \(P_{ps}\) denotes probability (i.e, in this case, frequency) within the post-selected ensemble.

Let us call this result a violation of \(\hbox {LC}_{{ps}}\), adding the subscript to remind ourselves where the probabilities involved originate. Does this violation of \(\hbox {LC}_{{ps}}\) reflect any real causality between A and B? Clearly not. It is simply an artifact of collider bias. Alice’s and Bob’s pairs of bits are joint causes of the outcome of Charlie’s measurement, and that’s the source of the correlation. Equally clearly, the violation of \(\hbox {LC}_{{ps}}\) does not imply any violation of LC as interpreted in terms of the underlying dynamical probabilities of the model. (Let us write \(\hbox {LC}_{{dp}}\) for this case.) In this toy example, these fundamental probabilities are those involved in the stipulation that Alice and Bob generate genuinely random bits, so that \(\hbox {LC}_{{dp}}\) is trivially satisfied:

$$\begin{aligned} P_{dp}({A}\mid a,b,{B},\lambda ) = P_{dp}(A\mid a,\lambda ) = 0.5,\nonumber \\ P_{dp}({B}\mid a,b,{A},\lambda ) = P_{dp}(B\mid b,\lambda ) = 0.5. \end{aligned}$$

The need to distinguish \(\hbox {LC}_{{dp}}\) and \(\hbox {LC}_{{ps}}\) explains our caution about using the term ‘nonlocality’. This simple example shows that in the presence of post-selection, we may have nonlocality in the sense of failure of \(\hbox {LC}_{{ps}}\) (call this ‘\(\hbox {nonlocality}_{{ps}}\)’), without nonlocality in the sense of failure of \(\hbox {LC}_{{dp}}\). As the example illustrates, \(\hbox {nonlocality}_{{ps}}\) does not imply AAD, or genuine causality.Footnote 9

This lesson carries over to DD. The diagnosis of the Bell correlations offered by the Collider Loophole excludes cross-\(\lor\hspace{-2pt}\lor \) AAD, but it is not incompatible with \(\hbox {nonlocality}_{{ps}}\). DD may well exhibit \(\hbox {nonlocality}_{{ps}}\), despite respecting \(\hbox {LC}_{{dp}}\) itself. In other words, the distinction between \(\hbox {LC}_{{dp}}\) and \(\hbox {LC}_{{ps}}\) explains how it can be true both (i) that a failure of LC explains the Bell correlations observed in DD, and (ii) that the experiment exhibits no cross-\(\lor\hspace{-2pt}\lor \) nonlocal causation. The apparent tension between these claims dissolves when we realise that they rely on different applications of LC, (i) relying on the \(\hbox {LC}_{{ps}}\) sense and (ii) relying on the \(\hbox {LC}_{{dp}}\) sense.

Now to SI. Failure of Bell inequalities in DD requires that one of LC and SI fails, and we have been considering the possibility that it is LC, in the form \(\hbox {LC}_{{ps}}\). However, SI might also fail in a post-selected ensemble. After all, the geometry of DD allows the central measurement to be informed of the remote measurement settings at A and B via a classical channel. This post-selection might induce a correlation between a and b and \(\lambda \), violating \(\hbox {SI}_{{ps}}\) (where, again, the subscript makes explicit that we are referring to the post-selected ensemble). Again, violation of \(\hbox {SI}_{{ps}}\) need not imply a violation of \(\hbox {SI}_{{dp}}\).

Indeed, a small modification of the toy example above illustrates this possibility. Suppose now that the random bits interpreted as outcomes in that example are generated as a pair AB at the source marked with a star in Fig. 3, and this pair is sent directly to Charlie. Alice and Bob each generate a random ‘setting’ bit, a and b, respectively, and send these to Charlie. Charlie post-selects, as before, to yield a subensemble \(E_{C}\) violating a Bell inequality. Once again, this implies correlations between a and B and/or between b and A, within the post-selected ensemble. Because the pair AB now falls within \(\lambda \), however, these correlations manifest as a failure of \(\hbox {SI}_{{ps}}\), not as a failure of \(\hbox {LC}_{{ps}}\).

These toy examples suggest that in any real experiment in which apparent AAD is an artifact of collider bias, the precise explanation in terms of \(\hbox {LC}_{{ps}}\) and \(\hbox {SI}_{{ps}}\) is likely to be model-dependent. For this reason we will not discuss these issues further in this piece. We will continue to speak of AAD, rather than ‘nonlocality’, to avoid the need to keep in mind the distinction between \(\hbox {LC}_{{dp}}\) and \(\hbox {LC}_{{ps}}\).

5 Early Delftware

We now turn to the case in which the measurement C is in the absolute past of A and B (Fig. 4), a case we’ll call Early Delft (ED).Footnote 10 In this case, the geometry seems to do a much better job of avoiding CL. The assumption NoRA ensures that A and B are not causes of C, at least partially removing the threat that C is a collider, and hence a source of collider bias. (We say ‘partially’ because C is still a collider with respect to influences from the two sources. More on this below.)

Fig. 4
figure 4

Early Delftware (entanglement swapping in the past)

More importantly, our two causal tests no longer seem a threat. In DD, the first test, NDA, relied on the claim that all the individual results in \(E_{C}\) would have been the same, even if the C measurements had not been made. But that claim relied on NoRA; and NoRA cuts no ice in ED, obviously, where C is earlier than A and B.

As for the second test, CFA, it relied in DD on the possibility that the choice of measurement settings at A and B could influence C, at least in principle. As a result, we were not entitled to infer that a given result would still have been in \(E_{C}\), even if Alice had chosen a different measurement. That was why \(E_{C}\) was susceptible to counterfactual fragility. But now, given NoRA, the measurement choices at A and B cannot influence C. CFA is blocked.

5.1 Determined Scepticism?

Fig. 5
figure 5

Early Delft in Bell’s framework (adapted from a diagram in [13])

At this point, we introduce a character we’ll call the AAD Sceptic, whose mission in life is to find small loopholes in arguments for AAD. Concerning ED, the AAD Sceptic objects that although passing the two causal tests blocks two arguments against AAD in ED, it doesn’t actually confirm that there is AAD in ED. After all (the Sceptic continues), there’s still a collider of some kind in ED—the central measurement at C is presumably a joint effect of the events that supply the particles from left and right. As in DD, ED will only yield Bell correlations when we condition on the result of the measurement at C. Moreover, the correlations as a whole in ED are exactly the same as in DD, where we’ve agreed that there’s a strong case for saying that there isn’t AAD. In the light of this, isn’t it weird to think that a small difference in C’s location could make such a big difference to the causal structure of the case?

How should we meet this argument? One strategy would be to appeal to Bell’s Theorem, where ED permits a straightforward application of Bell’s own reasoning. Consider Fig. 5, where the measurement C occurs in the past lightcones of A and B. Unlike in DD, this measurement can be considered part of the preparation of the final two-particle system, contributing to correlations in \(\lambda \). The violation of Bell inequalities observed between A and B therefore implies that one of the assumptions LC and SI must fail, in the corresponding ensemble of results.

In the light of our discussion in Sect. 4, however, it is clear that this won’t be enough to convince the AAD Sceptic. The Sceptic will repeat that even in ED, Bell correlations are only revealed in subensembles, corresponding to particular results of the measurement at C. So ED still gives us only a violation of \(\hbox {LC}_{{ps}}\), not a violation of \(\hbox {LC}_{{dp}}\). There’s still a collider of some kind at C, so why should a violation of \(\hbox {LC}_{{ps}}\) be enough to guarantee AAD? It wasn’t in DD.

To meet this challenge, we need to identify some difference between the nature of the colliders in DD and ED—something that will explain how Bell correlations can be a selection artifact in DD but not in ED. The crucial point seems to be the difference in the causal relationship between the measurement settings at A and B and the outcome at C, in the two cases. In DD, as we’ve seen, it is uncontroversial that the measurement settings at A and B may themselves be contributing causes to the outcome at C. This means that the settings themselves ‘feed into’ the collider at C. In a conventional QM picture, this happens because the measurements at A and B affect the state of the other particle in the corresponding pair, before those particles arrive at C. Intuitively, we might say that this feeds information about the settings at A and B in the direction of C.

In ED, however, the conventional picture has information, or causal influence, flowing the other way—from the measurement C to the particles due to be measured at A and B. Intuitively, this makes C a simpler kind of collider. Without input from the measurement settings, it simply isn’t rich enough to generate Bell correlations by post-selection.

We have some sympathy with the AAD Sceptic in thinking that it’s counterintuitive that a small difference in the location of C could make such a big difference to the causal structure of a \(\lor\hspace{-2pt}\lor \)-geometry Bell experiment. However, the difference between ED and DD certainly isn’t negligible. Sitting between them is the third case, in which C is spacelike separated from A and B. If there’s any merit in the Sceptic’s point, it might be expected to emerge from the three-way comparison between ED, DD, and this intermediate case. This brings us nicely to the actual Delft experiment, which has this intermediate geometry.

6 The Collider Loophole in the Actual Delft Experiment

Recapping the argument so far, we have considered two variants of the Delft experiment. We argued that in one (DD) but not apparently the other (ED), CL would be a significant challenge to the claim that its results provided evidence of AAD, across the \(\lor\hspace{-2pt}\lor \) geometry concerned. With these two variants of the Delft experiment as comparison cases, we now return to the actual Delft experiment.

As just noted, the actual Delft experiment is an intermediate case between ED and DD, in the sense that the C measurements are at a spacelike separation to the corresponding A and B measurements. As [1, 3] say, they ‘ensure that [the] event-ready signal is space-like separated from the random input bit generation at locations A and B’. Where does this leave it? Is it troubled by the Collider Loophole, like DD, or safe from it, like ED?

As we shall see, this question turns out to be quite subtle. It is also in one sense moot, because we now have three similar Bell tests with the ED geometry [2,3,4]. But the Delft experiment was the first Bell test to use this technique, and so for historical as well as metaphysical reasons, it is interesting to ask on which side of the line it falls. Was there really AAD ‘across the \(\lor\hspace{-2pt}\lor \)’ in Delft in 2015?

Let’s see how far we can push the case for the Collider Loophole in this case. Could the correlations in \(E_{C}\) be merely a selection artifact, as in DD? Can we still use NDA to argue that they would have been the same, even if C had not taken place? In DD we appealed to NoRA to argue that the C measurements makes no difference to the A and B measurements. As a substitute we might now try to appeal to a principle of ‘no spacelike causality’, calling on the authority of special relativity in a familiar way.

There’s a very obvious objection to such an appeal in this context, but set that aside for the moment. If we could appeal to such a principle, it would again tell us that the A and B measurements would have been the same without the C measurements, and hence that the actual subensemble \(E_{C}\) would have involved the same correlations, even without C. As before, no one would claim that there would be genuine AAD from A to B without C; so this would be enough to cast doubt on the claim that the correlations in \(E_{C}\) reveal genuine AAD, even in the case in which we do have the C measurements.

Summing up, a no spacelike causality principle leaves the Delft experiment vulnerable to NDA. On the other hand, it provides some protection against CFA. Without spacelike causality, the A and B measurement choices cannot affect the result of the C measurement, which was the main source of concern about counterfactual fragility. So the news for the Delft experiment would be mixed: continuing vulnerability to NDA, but protection against CFA. CL would still be a concern.Footnote 11

The obvious objection set aside a moment ago is that, as Bell himself has shown us, QM implies that there is spacelike causality. So we can’t glibly assume its absence, in this context. Unfortunately for the project of extracting a watertight case for AAD from the Delft experiment, this point cuts both ways. If we can’t exclude the possibility that the C measurement makes a difference to the A and B measurements, we can’t exclude the possibility that the A and B measurement settings make a difference to the result of the C measurement. And that leads us vulnerable to counterfactual fragility. We can’t assume that if \(a_n\) had been different, the instance in question would still have fallen within \(E_{C}\).

6.1 A Preferred Frame to the Rescue?

Does it help the case for AAD if we allow a preferred frame, and then revert to our previous assumption—no retrocausality, in the sense of the preferred frame?Footnote 12 This does indeed help if C is earlier than A and B in the preferred frame. Then C may make a difference to A and B, so that we can’t assume that \(E_{C}\) would have been just the same, without the C measurements. And yet A and B can’t make a difference to C, weakening the case for an appeal to counterfactual fragility. With both causal tests blocked, the Collider Loophole seems sealed, just as in ED.Footnote 13

This approach suffers from the disadvantage of needing a preferred frame, but that’s a cost that many in the field are in any case reconciled to paying. Setting that cost aside, does it provide the Delft experiment with a complete solution to the challenges of CL? We want to identify two arguments for pessimism. The first is our reason for concluding that the Delft experiment cannot entirely evade CL. The outcome of the second argument is less clear. We think it raises issues that deserve to be put on the table in this context, but we also propose a way to escape it.

6.1.1 Which is the Preferred Frame?

The first argument turns on the observation that it is going to be difficult, probably impossible, to exclude the possibility that in the actual experiment, C is in the future of A and B with respect to the preferred frame. That option inherits the problems of DD, not the safety of ED. Who is to say which of all the possible preferred frames is the ‘true’ preferred frame? Without an answer to that question, the Delft experiment can’t exclude the possibility that its actual results are a selection artifact.Footnote 14

In case the reader feels tempted to wave this concern aside, it is worth emphasising that the Delft experiment belongs to a decades-long project of closing what many have seen as tiny loopholes in the experimental case for quantum AAD. It would hardly be in the spirit of that project simply to wave aside this new concern, arising from the experimental undetectability of the assumed preferred frame.

6.1.2 Does the Argument for AAD Beg the Question?

The second argument turns on the question whether an argument for AAD based on the Delft experiment begs the question, even granting a preferred frame. To avoid NDA—i.e., the challenge that \(E_{C}\) would have been just the same without the C measurements—it was necessary to allow that the C measurements might make a difference to the A and B measurements. But in the spacelike case—i.e., in the actual Delft experiment—such an influence of C on A and B would itself be a case of AAD. This leaves the argument in a delicate position. Unless it already assumes AAD from C to A and C to B, it is unable to meet a challenge to its claim to establish AAD between A and B. This looks dangerously like a logical circle.

It might be replied that we already have evidence for AAD in the two component wings of the experiment (from C to A, and from C to B), and that we are entitled to rely on this evidence to block a challenge to the claim that the experiment demonstrates AAD from A to B. But if that’s the way the logic works, it deserves to be made explicit. Again, there’s a question as to whether it is good enough, by the lights of the project of making the case for AAD completely watertight. Imagine again our AAD Sceptic, keen to exploit any possible loophole to try to refute AAD. Such an opponent is hardly going to be convinced by a proposed reason for setting aside CL, if that proposal assumes AAD somewhere else.

In our view, a better reply is to appeal to an suggestion we considered in Sect. 3.2.3. There, we proposed that even if C doesn’t make a difference to the actual contents of \(E_{C}\), it might make a difference to the kind of counterfactuals needed to support a claim of causal influence from A to B. In the context of DD, this proposal didn’t seem to work. On the contrary, the fact that A and B might influence C gave us a reason to think that the counterfactuals needed would not hold—that led us to CFA.

In the present context, however, a ban on AAD would prevent the A and B measurement settings from affecting C, saving the counterfactuals. This gives the Delft experiment an answer to the AAD Sceptic. If the Sceptic were right, we would be able to appeal to this different way of meeting the causality tests. This seems to avoid the charge that the inference from the Delft results to AAD is begging the question, though it is a subtle matter, and we think that the reasoning needs to be made clear. Let’s lay out the reasoning explicitly:

  1. 1.

    Assume for the sake of the argument that the AAD Sceptic is right, and hence that there is no AAD anywhere in the Delft experiment (either in the wings or across the \(\lor\hspace{-2pt}\lor \)).

  2. 2.

    Then the move proposed in Sect. 3.2.3 (see also fn. 11) protects the Delft argument from NDA and CFA. With these causality tests met, the observed violation of the Bell inequality shows that the AAD Sceptic and Assumption 1 are wrong.

  3. 3.

    This shows that there is AAD somewhere, but this isn’t a complete ‘bridge repair’— i.e., a complete defence of AAD across the \(\lor\hspace{-2pt}\lor \)–because the AAD might be only in the wings. However, it does protect the Delft argument against the charge that that it is begging the question.

6.2 Summary: CL in the Actual Delft Experiment

Summarising our discussion, the best prospect for defending the Delft experiment against CL lies in the assumption of a preferred frame, and reliance on NoRA with respect to this frame. So long as C is earlier than A and B with respect to the preferred frame, the argument for AAD in the Delft experiment escapes CL, and avoids passes the causal tests NDA and CF, in the same way as in the case of ED.

However, the cost of this move—setting aside the theoretical cost of reliance on a preferred frame—is that safety from CL becomes experimentally unconfirmable. In any actual version of the Delft experiment, it would simply be unknown whether the required condition was satisfied.

In addition—echoing the AAD Sceptic’s point in Sect. 5.1 about the difference between ED and DD—we note that reliance on a preferred frame commits us to a sudden and unobservable change in the causal structure of the experiment, as the spatiotemporal location of C varies with respect to that of A and B. Again, many in the field may feel that this is not much of a cost, because they are committed to such things for other reasons. Think of the sensitivity of the causal structure of a regular \(\vee \)-shaped EPR-Bell experiment to the time-order of measurements, in any collapse model. In our view, it is worth asking whether there are models that avoid this consequence, especially given that the experimental correlations are independent of the temporal location of C.Footnote 15

7 Conclusion

The use of entanglement swapping in Bell tests introduces an additional loophole into arguments from violation of the Bell inequalities to AAD. Under conventional assumptions—i.e., excluding retrocausality—the sensitivity of such experiments to this Collider Loophole depends on the temporal relation between the entanglement-swapping measurement C and the measurements A and B. CL a threat if the C is in the future of A and B, but not if it is in the past. The Delft experiment is the intermediate case, in which the separation is spacelike. We argued that this leaves it vulnerable to CL, unable to confirm experimentally that it avoids it.