Foundations of Physics

, Volume 44, Issue 7, pp 736–761 | Cite as

A Rigorous Analysis of the Clauser–Horne–Shimony–Holt Inequality Experiment When Trials Need Not be Independent



The Clauser–Horne–Shimony–Holt (CHSH) inequality is a constraint that local hidden variable theories must obey. Quantum Mechanics predicts a violation of this inequality in certain experimental settings. Treatments of this subject frequently make simplifying assumptions about the probability spaces available to a local hidden variable theory, such as assuming the state of the system is a discrete or absolutely continuous random variable, or assuming that repeated experimental trials are independent and identically distributed. In this paper, we do two things: first, show that the CHSH inequality holds even for completely general state variables in the measure-theoretic setting, and second, demonstrate how to drop the assumption of independence of subsequent trials while still being able to perform a hypothesis test that will distinguish Quantum Mechanics from local theories. The statistical strength of such a test is computed.


Quantum theory Bell’s theorem Measure-theoretic probability  Bell inequalities Hypothesis test Hidden-variable theories 

Mathematics Subject Classification

81P15 81Qxx 

1 Introduction

It has been known since the 1964 publication by Bell [1] that Quantum Mechanics makes predictions incompatible with any so-called local hidden variable theory (LHVT). The conflict can be tested experimentally with an instrument that generates entangled particles and two particle detectors that can measure certain properties, such as spin or polarization. For such an experiment, the Clauser–Horne–Shimony–Holt (CHSH) inequality [2] provides a constraint on the possible outcomes under a LHVT; according to the prediction of Quantum Mechanics, the constraint will be violated. The profound physical implications of the CHSH experiment have been long discussed, and more recently, the experiment has been found to have new applications in the field of device-independent quantum key distribution [3, 4] and device-independent randomness expansion [5].

The CHSH inequality is as follows:
$$\begin{aligned} -2 \le E_{ab}(D_1 D_2) - E_{a'b}(D_1 D_2) + E_{ab'}(D_1 D_2) + E_{a'b'}(D_1 D_2) \le 2, \end{aligned}$$
where \(E_{xy}(D_1D_2)\) is the expected value of a detectable quantity \(D_1D_2\) when the measurement apparatus has setting \(xy\). The precise meaning of this will be explained in the next section, but what is immediately clear is that (1) is a probabilistic statement, asserting that under locality, a particular function of the probabilities of various experimental outcomes cannot exceed a certain quantity. According to the predictions of Quantum Mechanics, this quantity will be exceeded.

The probabilistic nature of the constraint (1) raises two issues. The first issue is: how does one build an appropriate mathematical model for the experiment? In the original proofs of the Bell [1] and CHSH [2] inequalities, it is tacitly assumed that the random variable that models the state of the system can be taken to be absolutely continuous, in the sense that it has a probability density function. Though this is a fairly reasonable assumption to make about a random variable modeling a real-world phenomenon, in the interest of full generality it would be best to not make such a claim. In some recent work on hidden variable models [6, 7], authors have worked in a more general measure-theoretic setting, though the frameworks set out in [6, 7] have not been used to prove the original CHSH inequality or model repeated trials of the experiment.

The second issue is: how does one draw a conclusion from the experimental data? As the constraints on LHVTs are probabilistic, any single execution of the experiment does not provide evidence for or against any one particular theory. (This is for the same reason that the result of a single coin toss does not tell you if a coin is biased.) The standard strategy for dealing with this is to run many trials of the experiment and compare the sample means to the predicted expectations. There is a problem, though—the sample means needn’t converge to the predicted expectations. One could expect convergence if one could assume that subsequent trials are independent and identically distributed (i.i.d.)—but plausible though this assumption seems, it need not be satisfied by a LHVT. Indeed, it is not hard to devise a mechanism for a LHVT to violate this assumption: detected particles could leave some sort of residue in the particle detectors that biases the outcome of the next incoming particle. This complication has been referred to as the “memory loophole” in [8], and it has also been addressed in [9]. (Possible interdependence between experimental trials can also cause security problems for quantum key distribution protocols, as seen in [10, 11].) [8] concludes that, even allowing for time dependence, quantum mechanical experimental data can be reliably distinguished from the data produced by any LHVT; however, the paper uses some informal justifications and assumes that the state random variable is absolutely continuous. [9] reaches the same conclusion with more rigor, but the exact bound on the statistical p-value derived from the Azuma–Hoeffding inequality [16, 17] can be improved on. (Here, the “p-value” is the probability of seeing data as or more extreme than what is observed experimentally, under a LHVT.)

In this paper, we resolve these two issues simultaneously. We present a completely general measure–theoretic model for the Bell test experiment, making no unnecessary assumptions about the random variables involved. Using this framework, we show that the CHSH inequality can still be derived. The framework can be extended in a natural way to accommodate repeated trials that need not be independent and/or identically distributed. In the extended framework, we prove that a hypothesis test can reliably distinguish between Quantum Mechanics and LHVTs, where the null hypothesis is that nature is governed by a LHVT. Interestingly, the p-value for rejecting the null hypothesis is shown to be the same as it would be if we restricted the null hypothesis to the narrower class of LHVTs that are i.i.d. That is, allowing for LHVTs with memory does not increase the probability of violating the CHSH inequality under the null hypothesis. The calculated p-value of the hypothesis test described in this paper compares favorably to other calculations of p-values in Bell-inequality experiments [12, 13].

The paper uses the formalism of measure-theoretic probability (see, e.g., [14]). The structure is as follows: in Sect. 2, we describe the mathematical model for the CHSH experiment, in Sect. 3, we derive the CHSH inequality in this setting, and in Sect. 4, we extend the framework to the multiple trial, non-i.i.d. setting and show how to set up an appropriate hypothesis test, which is then analyzed. There is also an appendix in which we provide some context for our mathematical model by comparing it to another recent model of hidden variable theories given by Brandenburer and Yanofsky in [15].

2 The Setting and the Mathematical Model

Let us describe the setup of the Bell test experiment, which is depicted in Fig. 1. A photon source, such as a low-powered laser, is pointed at an object with specific properties, such as a nonlinear crystal, which should generate an entangled pair of photons in the singlet state. Upon arrival, each of these photons is subjected to a measurement by a detector.
Fig. 1

Diagram of a Bell test experiment

As depicted in Fig. 2, detector 1 has two measurement settings and two possible outputs. The detector measures the polarization of the incoming photons; the setting is the angle at which polarization is measured. The two setting choices, \(a\) or \(a'\), are chosen to maximize the violation of the CHSH inequality. Detector 2 has a very similar scheme; the only difference is that we label its settings as \(b\) and \(b'\), to distinguish them from the settings of detector 1. The time of detection of the photons should be calibrated so the selection of the setting choice at detector 1 is spacelike separated from the detection event at detector 2, and vice-versa.
Fig. 2

Detail at detector 1

We now model a single trial of the experiment, and leave the repeated-trial scenario to Sect. 4. The following definition contains the necessary elements for the model. Standard concepts such as “probability measure” are defined in [14].

Definition 1

Let \((\Omega , {\mathcal F}, P)\) be a probability space, where \(\Omega \) is a set (the sample space), \({\mathcal F}\) is a \(\sigma \)-algebra on \(\Omega \) (\({\mathcal F}\) is a set of events), and \(P\) is a probality measure on \({\mathcal F}\). Let \(\lambda \), \(A\), \(B\), \(D_1\), \(D_2\) be the random functions
$$\begin{aligned}&\lambda : \Omega \rightarrow \Lambda , \quad \Lambda \, \mathrm{is} \,\, \mathrm{a} \,\, \mathrm{measureable} \,\, \mathrm{space},\\&A : \Omega \rightarrow \{a,a'\}, \quad B : \Omega \rightarrow \{b,b'\},\\&D_1: \Omega \rightarrow \{-1,1\}, \quad D_2: \Omega \rightarrow \{-1,1\}. \end{aligned}$$
We call \(\lambda \) the state of the system prior to measurement. \(A\) and \(B\) are detector 1’s and detector 2’s settings, respectively, and \(D_1\) and \(D_2\) are detector 1’s and detector 2’s output, respectively. We label events in \(\mathcal F\) corresponding to the outputs of \(D_1\) and \(D_2\) with the following notation:
$$\begin{aligned} +_1 = \{D_1=+1\} \quad \quad -_1 = \{D_1=-1\}\\ +_2 = \{D_2=+1\} \quad \quad -_2 = \{D_2=-1\}. \end{aligned}$$

The most general of the five random variables above is \(\lambda \). This generality is fitting, because \(\lambda \) describes the portion of the experiment that we don’t directly observe: the state of the photon pair that is theorized to be travelling towards the detectors. Quantum Mechanics has a well-defined description of \(\lambda \) and how it triggers the detectors. But we also want to be able to model any conceivable LHVT, so we define the state of the system, \(\lambda \), with complete generality.

The other four random variables are more straightforward because they model aspects of the experiment that we can directly observe. We model the detector settings as random variables, as we want the experimenter to toggle the detector settings randomly and independently of anything else going on in the experiment, with the choice of setting occurring just before the detection event.

The following three assumptions encapsulate a set of requirements that an experimenter can satisfy in order to properly test Bell’s theorem. The notation “\(X \perp \!\!\!\perp Y\)” means “\(X\) is independent of \(Y\).”

Experimental Assumption 1:
$$\begin{aligned} A \perp \!\!\!\perp B. \end{aligned}$$
The assumption above asserts that choice of measurement settings are independent of each other. In practice, this could be achieved by toggling the measurement setting according to the output of a random number generator attached to the detector, or the output of some independent quantum process that generates randomness, or any other desired source of randomness believed to be uncorrelated with other parts of the experiment.
Experimental Assumption 2:
$$\begin{aligned} P(A =a)P(A = a')P(B =b)P(B = b') >0. \end{aligned}$$
The above assumption ensures that the experimenter sets a positive probability of choosing any given detector setting.
Experimental Assumption 3:
$$\begin{aligned} A \perp \!\!\!\perp \lambda , \quad B \perp \!\!\!\perp \lambda . \end{aligned}$$
The third and final experimental assumption captures the notion that whatever process is used to choose the detector settings, it should be chosen independently of the state of the approaching particles. Again, we are trusting that our source of randomness for the detector settings is uncorrelated to anything else in the experiment.

(4) is closely related to the “\(\lambda \)-independence” assumption that appears in Brandenburger and Yanofsky [15]. Note that, unlike [15], we don’t make a slightly stronger assumption that the joint distribution of \(A\) and \(B\) is independent from \(\lambda \), written \((A, B) \perp \!\!\!\perp \lambda \); this stronger assumption turns out to be unnecessary in our framework. This contrast is explored in the appendix.

We now have a mathematical model for the experiment. This gives us a framework to discuss a local theory and the conditions it must satisfy. A LHVT must satisfy a locality condition, in addition to the three assumptions described above. To formulate it in a concise manner, we define the random vectors
$$\begin{aligned} V_1 = (D_1, A), \quad \quad V_2 = (D_2, B). \end{aligned}$$
Then the locality assumption is that \(V_1\) and \(V_2\) are conditionally independent given \(\lambda \), which is written as follows:
Locality Assumption:
$$\begin{aligned} (V_1 \perp \!\!\!\perp V_2) |\lambda . \end{aligned}$$
So, why do we choose Eq. (5) as an expression of locality? Remember that \(\lambda \) represents what is going on between the two detectors, prior to the measurement event, that can affect the detection events. Once we condition on this knowledge, what occurs at detector 1 should be independent of what occurs at detector 2. Equation (5) says that the events at detector 1 cannot be correlated with the events at detector 2 beyond the effects of the shared history of what happened between them prior to detection, represented by \(\lambda \).
Since \(V_1\) and \(V_2\) each only have four possible outputs, (5) is essentially a statement of the conditional independence of a collection of sixteen pairs of events. For instance, one of the sixteen consequences of (5) would be
$$\begin{aligned}&P\bigg [\big (+_1\cap \{A=a'\}\big )\bigcap \big (-_2\cap \{B=b\}\big )\big |\lambda \bigg ] \nonumber \\&\quad = P\big (+_1\cap \{A=a'\}\big |\lambda \big )\cdot P\big (-_2\cap \{B=b\}\big |\lambda \big ). \end{aligned}$$
When we use (5), it will be through equivalences like the one above.

The conditional probabilities in (6) are themselves random variables, as defined in [14]. Theoretically these can be complicated constructions, but if \(\lambda \) is a discrete random variable, the dependent random variable \(P(E|\lambda )\) is also discrete, taking the value \(P(E|\{\lambda =x\})\) when \(\lambda = x\). This simplified situation has the benefit of being highly intuitive, and it is explored in the appendix. For now, we make no such simplifying assumption about \(\lambda \).

We refer to a Bell experiment satisfying (2)–(5) as being governed by a LHVT. Experimental results inconsistent with assumptions (2)–(5) can be considered violations of the LHVT hypothesis, implying that one of the assumptions must not hold. We will further explore how to interpret a violation in the conclusion.

3 Deriving the CHSH Inequality

In this section, we work in the fully general setting of Sect. 2 and derive the CHSH inequality, given in (1). Thus our first task is to precisely define the expressions in (1). If we condition on the event that detector 1 is set to “\(a\)” and detector 2 is set to “\(b\)”, we can discuss the quantity
$$\begin{aligned} E_{ab}(D_1D_2):=E(D_1D_2 | A=a,B=b), \end{aligned}$$
where \(E\) denotes the expectation value. It will save space to use shorthands such as “\(a\)” or “\(ab'\)” for the events \(\{A=a\}\) or \(\{A=a\}\cap \{B=b'\}\), etc., as is done in (1).
Deriving (1) in the general setting of Sect. 2 takes some work. The following notation will be useful:
$$\begin{aligned} \mu _{X}(Y|\xi ):= {1 \over P(X)}P(Y\cap X|\xi ). \end{aligned}$$
(8) is introduced to approximate an intuitive notion of the probability of event \(Y\) that is conditioned simultaneously on the event \(X\)and the random variable \(\xi \). Using this shorthand, we can derive the following expression,
$$\begin{aligned} E_{ab}(D_1 D_2) = \mathop \int \limits _{\Omega } \big [ \mu _{ab}(D_1D_2=+1| \lambda ) - \mu _{ab}(D_1 D_2 = -1| \lambda ) \big ] dP, \end{aligned}$$
where the integral is taken over \(\Omega \) with respect to the probability measure \(P\). The justification of Eq. (9) is given by the following lemma. Note that the proof makes no use of the locality assumption (5).

Lemma 1

Let \(a\), \(b\), \(D_1\), \(D_2\) be as in Definition 1. Then, under (2) and (3), the Eq. (9) holds.


By (2) and (3), \(P(a\cap b) >0\). If we let \(I_{ab}\) denote the indicator function of the event \(a \cap b\), we can write
$$\begin{aligned} E_{ab}(D_1 D_2) = {{E(I_{ab}D_1 D_2)}\over P(ab)} = {E(E(I_{ab}D_1 D_2|\lambda ))\over P(ab)} \end{aligned}$$
by the definition of conditional expectation (when we condition on events) and the law of iterated expectation. Note that we can be sure that the conditional expectation \(E(I_{ab}D_1 D_2|\lambda )\) is guaranteed to exist, as \(E(|I_{ab}D_1 D_2|)\) is finite.
We claim that
$$\begin{aligned} E(I_{ab}D_1 D_2|\lambda ) = E(I_{\{D_1 D_2 = +1\} \cap ab }|\lambda ) - E(I_{\{D_1 D_2 = -1\} \cap ab }|\lambda ), \quad \mathrm{a.s.} \end{aligned}$$
To prove the assertion, we must show that for all \(A \in \sigma (\lambda )\),
$$\begin{aligned} \mathop \int \limits _A E(I_{\{D_1 D_2 = +1\} \cap ab }|\lambda ) - E(I_{\{D_1 D_2 = -1\} \cap ab }|\lambda )dP = \mathop \int \limits _A I_{ab} D_1 D_2 dP. \end{aligned}$$
$$\begin{aligned}&\mathop \int \limits _A E(I_{\{D_1 D_2 = +1\} \cap ab }|\lambda ) - E(I_{\{D_1 D_2 = -1\} \cap ab }|\lambda ) dP\\&\quad = \mathop \int \limits _A I_{\{D_1 D_2 = +1\} \cap ab } - I_{\{D_1 D_2 = -1\} \cap ab } dP \\&\quad = (+1)P(\{D_1 D_2 = +1\} \cap ab \cap A) + (-1) P(\{D_1 D_2 = -1\} \cap ab \cap A) \\&\quad = \mathop \int \limits _{\Omega } (D_1 D_2) I_A I_{ab} dP = \mathop \int \limits _A I_{ab} D_1 D_2 dP, \end{aligned}$$
which proves (11). Plugging (11) into (10), we can write
$$\begin{aligned} E_{ab}(D_1 D_2) = E\bigg ({1\over {P(ab)}}\big [ E(I_{\{D_1 D_2 = +1\} \cap ab }|\lambda ) - E(I_{\{D_1 D_2 = -1\} \cap ab }|\lambda )\big ] \bigg ). \end{aligned}$$
Using the notation introduced in (8), we have
$$\begin{aligned} \mu _{ab}(D_1 D_2=+1| \lambda ) = {1\over {P(ab)}} E(I_{\{D_1 D_2=+1\} \cap ab}| \lambda ), \end{aligned}$$
so we can rewrite (12) as
$$\begin{aligned} E\bigg ( \mu _{ab}(D_1 D_2=+1| \lambda ) - \mu _{ab}(D_1 D_2=-1| \lambda )\bigg ). \end{aligned}$$
Thus, (9) holds. \(\square \)
As we work toward the CHSH inequality, it will be useful to expand expression (7), for which we introduce a shorthand for readability:
$$\begin{aligned} \mathbf{a}= \big [\mu _a(+_1|\lambda ) -\mu _a(-_1|\lambda )\big ], \quad \mathbf{b} = \big [\mu _{b}(+_2|\lambda ) -\mu _{b}(-_2|\lambda )\big ], \\ \mathbf{b'} = \big [\mu _{b'}(+_2|\lambda ) -\mu _{b'}(-_2|\lambda )\big ], \quad \mathbf{a'} = \big [\mu _{a'}(+_1|\lambda ) -\mu _{a'}(-_1|\lambda )\big ]. \end{aligned}$$
The following lemma will also be useful:

Lemma 2

Let \((\Omega , {\mathcal F}, P)\) be a probability space, let \(\mathcal G \subseteq {\mathcal F}\) be a sub-\(\sigma \)-algebra of \(\mathcal F\), and let \(\{B_i\}_{i\in I}\) be a countable indexed set of pairwise-disjoint events in \(\mathcal F\). Then,
$$\begin{aligned} P(\cup _{i\in I} B_i | \mathcal G) = \sum _{i \in I} P(B_i |\mathcal G), \quad \mathrm{almost\, surely.} \end{aligned}$$


This follows in a straightforward manner from the measure-theoretic definition of conditional probability. \(\square \)

Proposition 1

Let \(a\), \(b\), \(D_1\), \(D_2\) be as in Definition 1. Then, under (2), (3), (5),
$$\begin{aligned} E_{ab}(D_1D_2) = \mathop \int \limits _{\Omega } \mathbf{a} \mathbf{b} dP. \end{aligned}$$


Note that
$$\begin{aligned} \{D_1 D_2 = 1 \}&= ( +_1 \cap +_2 )\cup ( -_1 \cap -_2 ), \\ \{D_1 D_2 = -1 \}&= ( +_1 \cap -_2 )\cup ( -_1 \cap +_2 ). \end{aligned}$$
Lemma 2 can thus be applied to rewrite (9) as
$$\begin{aligned} \mathop \int \limits _{\Omega }\!\mu _{ab}(+_1\cap +_2| \lambda ) + \mu _{ab}(-_1\cap -_2|\lambda ) \!-\! \big [\mu _{ab}(+_1\cap -_2| \lambda ) + \mu _{ab}(-_1\cap +_2| \lambda )\big ] dP.\nonumber \\ \end{aligned}$$
We now appeal to the locality assumption. Applying (5), as well as (2), we can modify the terms in the integrand in the following way:
$$\begin{aligned} \mu _{ab}(+_1 \cap +_2| \lambda )&= {1\over P(ab)} P(ab\cap +_1 \cap +_2 | \lambda )\\&= {1\over P(a)P(b)} P(a\cap +_1 | \lambda ) P(b\cap +_2 |\lambda ) \\&= \mu _{a}(+_1| \lambda ) \mu _{b}(+_2|\lambda ). \end{aligned}$$
Doing the same thing for the three other terms in (14), we get
$$\begin{aligned} E_{ab}(D_1D_2)&= \mathop \int \limits _{\Omega } \mu _{a}(+_1| \lambda )\mu _b (+_2|\lambda ) + \mu _{a}(-_1| \lambda )\mu _b (-_2|\lambda ) \nonumber \\&-\,\big [ \mu _{a}(+_1| \lambda )\mu _b (-_2|\lambda ) + \mu _{a}(-_1| \lambda )\mu _b (+_2|\lambda )\big ] dP \nonumber \\&= \mathop \int \limits _{\Omega } \big [\mu _a(+_1|\lambda ) -\mu _a(-_1|\lambda )\big ]\big [\mu _b(+_2|\lambda ) -\mu _b(-_2|\lambda )\big ] dP. \end{aligned}$$
Thus, (13) holds. \(\square \)
Now, consider the following constant:
$$\begin{aligned} K^{CHSH} := E_{ab}(D_1 D_2) - E_{a'b}(D_1 D_2) + E_{ab'}(D_1 D_2) + E_{a'b'}(D_1 D_2). \end{aligned}$$
In the local setting, we can calculate bounds that \(K^{CHSH}\) must obey. These bounds—the CHSH inequality (1)—are developed in the next proposition, which requires the following lemma.

Lemma 3

Let \(X\) be an event for which \(P(X)>0\) and \(X\perp \!\!\!\perp \xi \), where \(\xi \) is a random variable. Then for any event \(Y\),
$$\begin{aligned} 0\le \mu _X(Y|\xi ) \le 1, \quad \mathrm{almost} \,\, \mathrm{surely.} \end{aligned}$$


We have
$$\begin{aligned} \mu _X(Y|\xi ) = {1\over P(X)} P(Y\cap X | \xi ) = {P(Y\cap X | \xi ) \over P(X|\xi )}, \end{aligned}$$
where \(P(X|\xi ) = P(X)\) holds because \(X\perp \!\!\!\perp \xi \). Since \(P(X|\xi )\ge P(Y\cap X|\xi )\) and \(P(X|\xi )\) is positive (almost surely), we have
$$\begin{aligned} \mu _X(Y|\xi ) = {P(X\cap Y| \xi ) \over P(X|\xi )}\le 1\quad \mathrm{a.s.} \end{aligned}$$
This proves the upper bound in (17). The lower bound holds because a conditional probability such as \(P(Y\cap X | \xi )\) is in general greater than or equal to zero almost surely. \(\square \)

Example 1

Under (3) and (4), Lemma 3 applies to expressions such as \(\mu _a(+_1|\lambda )\), \(\mu _{b'}(-_2|\lambda )\), etc.

Proposition 2

(CHSH Inequality) Let \(a\), \(b\), \(D_1\), \(D_2\) be as in Definition 1. Then, under (2), (3), (4), and (5),
$$\begin{aligned} |K^{CHSH}| \le 2. \end{aligned}$$


By Proposition 1, we have
$$\begin{aligned} K^{CHSH}&= \mathop \int \limits _{\Omega } \mathbf{a}\mathbf{b} dP - \mathop \int \limits _{\Omega } \mathbf{a} \mathbf{b'} dP + \mathop \int \limits _{\Omega } \mathbf{a'}\mathbf{b} dP + \mathop \int \limits _{\Omega } \mathbf{a'} \mathbf{b'} dP \\&= \mathop \int \limits _{\Omega } (\mathbf{a} \mathbf{b}) - (\mathbf{a} \mathbf{b'}) + (\mathbf{a'} \mathbf{b}) + (\mathbf{a'} \mathbf{b'} ) dP = \mathop \int \limits _{\Omega } (\mathbf{a}+ \mathbf{a'})\mathbf{b} + (\mathbf{a'}-\mathbf{a})\mathbf{b'} dP. \end{aligned}$$
By Lemma 3, (3) and (4) tell us that \(\mathbf{a}\) and \(\mathbf{a'}\) must lie in the interval \([-1, +1]\). Then by arithmetical considerations, it follows that
$$\begin{aligned} |\mathbf{a} + \mathbf{a'}| + |\mathbf{a'} - \mathbf{a}| \le 2. \end{aligned}$$
Since \(|\mathbf{b}|,|\mathbf{b'}| \le 1\), by (19) we have
$$\begin{aligned} |K^{CHSH}|&= \left| \mathop \int \limits _{\Omega } (\mathbf{a}+ \mathbf{a'})\mathbf{b} + (\mathbf{a'}-\mathbf{a})\mathbf{b'} dP \right| \\&\le \mathop \int \limits _{\Omega } |\mathbf{a}+ \mathbf{a'}||\mathbf{b}| + |\mathbf{a'}-\mathbf{a}||\mathbf{b'}| dP \\&\le \mathop \int \limits _{\Omega } |\mathbf{a}+ \mathbf{a'}| + |\mathbf{a'}-\mathbf{a}| dP \le 2. \end{aligned}$$
\(\square \)

As a consequence of Proposition 2, in any LHVT, the quantity \( K^{CHSH}\) must satisfy the simple inequality (16). On the other hand, Quantum Mechanics predicts \(K^{CHSH} = 2 \sqrt{2} > 2\). If we repeat the Bell test experiment many times and assume that the results of repeated trials are independent and identically distributed, we can calculate the \(K^{CHSH}\) quantity empirically and draw an appropriate conclusion about the theory describing the experiment.

However, as earlier noted, the assumptions of a LHVT do not require repeated trials to be independent and identically distributed, and so we have no reason to assert that the relative frequencies of various outcomes will converge to some underlying probability. A priori, we cannot even rule out the (pathological) possibility that each successive trial individually obeys the CHSH inequality (as required by a LHVT), but that the relative frequencies over many trials converge to the quantum values! In the next section, we address this problem.

4 A Hypothesis Test When Trials are Not Independent

If we run the experiment one time, we will randomly select one particular setting result for \(A\) and \(B\), and we will observe \(D_1D_2\) equal to \(+1\) or \(-1\). This one result tells us nothing about the satisfaction or violation of (18). We must run the experiment many times to discern a pattern.

Luckily, we can perform a cogent hypothesis test, even without the assumption of independent, identically distributed trials. Here is a useful analogy that will illustrate how we do this. Suppose we were to flip 10,000 different coins, and 80 % of them were to come up “heads.” Then we could reasonably conclude that at least some of the 10,000 coins were biased towards heads. The coins needn’t be identically distributed—indeed, perhaps some of the coins were fair—but it is intuitively clear that some of them must have been biased.

Analogously, each trial of the Bell test is like a coin flip, resulting in the product \(D_1D_2\) being equal to \(+1\) or \(-1\). In the previous section, we showed that the assumption of a LHVT puts certain constraints on the probabilities of getting \(+1\) or \(-1\). If the universe is governed by a LHVT, then the constraint must be satisfied on every trial. On the other hand, if Quantum Mechanics is obeyed, the constraint is violated on every trial. Then, thinking of the analogy, the locality assumption is like the assumption that every one of the 10,000 coins are fair, whereas agreement with Quantum Mechanics will predict getting 80 % heads. The Bell test is of course a little more complicated than coin tossing, but the analogy is a good idea to keep in mind as we design the hypothesis test.

To represent repeated trials, we must extend the framework of Sect. 2. Let us define a sequence of random vectors:
$$\begin{aligned} \{D_{1i}, D_{2i}, A_i, B_i, \lambda _i\}_{i\in \mathbb N ^+} \end{aligned}$$
For each \(i\), we take the above to be as defined in Definition 1, satisfying conditions (2) and (4), and a strengthened version of (3). That is, we assume:
Experimental Assumption 1:
$$\begin{aligned} \forall i, \quad A_i \perp \!\!\!\perp B_i. \end{aligned}$$
Experimental Assumption 2*:
$$\begin{aligned} \forall i, \quad P(A_i = a) = P(B_i = b) = 1/2. \end{aligned}$$

Remark 1

(22) can be satisfied by appropriate calibration of the experimental apparatus. Earlier, we assumed only that these probabilities were positive; to prove an analogue of the CHSH inequality that holds over repeated trials, it is useful to assume that all the setting probabilities are calibrated to 1/2.

Experimental Assumption 3:
$$\begin{aligned} A_i \perp \!\!\!\perp \lambda _i, \quad B_i \perp \!\!\!\perp \lambda _i. \end{aligned}$$
An additional point about the \(\lambda _i\) needs to be made. Since \(\lambda _i\) models the state of the system at the \(i\)th trial, the previous \(i-1\) trials have already taken place. Hence, the outcomes of previous trials are in the “history”, and can contribute to or influence the present state of the system. Mathematically, this is modeled by assuming that the results of previous trials are events in \(\sigma (\lambda _i)\). This yields a filtration—i.e., a sequence of nested \(\sigma \)-algebras:
$$\begin{aligned} \mathrm{For } \,i<j, \quad \sigma (\lambda _i) \subseteq \sigma (\lambda _j). \end{aligned}$$
The filtration is a standard mathematical tool for modeling a time-indexed stochastic process. The above equation is not used in our argument, but we will need to use the fact that the outcomes of previous trials are events in \(\sigma (\lambda _i)\). The following assumption formalizes this.

Time Sequentiality:

For any positive integer \(n \ge 2\), let \(I\) be a subset of \(\{1, 2, \ldots , n-1\}\) whose cardinality we denote with the letter \(m\). Let \(\varvec{v}_1\), \(\varvec{v}_2\) be elements of \(\{-1, +1\}^m\), let \(\varvec{w}_a\) be an element of \(\{a, a'\}^m\), and let \(\varvec{w}_b\) be an element of \(\{b, b'\}^m\). Then the following event is in \(\sigma ( \lambda _n)\):
$$\begin{aligned} \bigcap _{i\in I}\big [ \{D_{1i} = \varvec{v}_{1i} \} \cap \{D_{2i} = \varvec{v}_{2i} \} \cap \{A_{i} = \varvec{w}_{ai} \} \cap \{B_{i} = \varvec{w}_{bi} \} \big ]. \end{aligned}$$
The astute reader will notice that this is mathematically equivalent to saying that all the single events such as \(\{D_{1i} = \varvec{v}_{1i} \}\) or \(\{A_{1j} = \varvec{w}_{1j} \}\), etc. are individually in \(\sigma (\lambda _n)\); (24) is written to emphasize simultaneity of the four events with the same \(i\)-index. The significance of asserting that (24) is in \(\sigma (\lambda _n )\) is to encode the notion that \(\lambda _n\) can potentially depend on the outcomes of previous trials. Of course, this assumption doesn’t require that \(\lambda _n\) definitely does have some relation to the outcome of previous trials; \(\lambda _n\) could still be independent of this information.

For the final step, we establish a locality assumption corresponding to (5):

Locality Assumption: Let \(V_{i1} = (D_{i1}, A_i)\) and \(V_{i2} = (D_{i2}, B_i)\). Then
$$\begin{aligned} (V_{i1} \perp \!\!\!\perp V_{i2}) |\lambda _i. \end{aligned}$$
This completes the set of assumptions. Now, to formulate the hypothesis test, it will be convenient to define a random variable \(C_i\) as a function of the random variables in (20). So, let
$$\begin{aligned} C_i = {\left\{ \begin{array}{ll} D_{1i}D_{2i}, &{} \mathrm{if } \, (A_i, B_i) \ne (a', b),\\ -D_{1i}D_{2i}, &{} \mathrm{if } \, (A_i, B_i) = (a', b). \end{array}\right. } \end{aligned}$$
\(C_i\) distills the result of the \(i\)th trial into a single, two-output random variable. As we will see in the next proposition, the CHSH inequality applies to \(C_i\) to cap the probability that \(C_i = +1\) at 75 %, if we make all of the experimental assumptions plus locality.

Proposition 3

Under assumptions (21), (22), (23), and the locality assumption (25), we have \(P(C_i = +1)\le {3\over 4}\), or equivalently,
$$\begin{aligned} E(C_i) \le 1/2. \end{aligned}$$


By the Law of Iterated Expectations,
$$\begin{aligned} E(C_i)= E\big [E(C_i|(A_i,B_i))\big ]. \end{aligned}$$
Notice that \(E(C_i|(A_i,B_i))\) is a discrete random variable with four outputs, corresponding to the four outputs of \((A_i, B_i)\). Applying (22) and (21), we have
$$\begin{aligned} E\big [E(C_i|(A_i,B_i))\big ]&= {1\over 4} \bigg ( E[C_i|(A_i,B_i)= (a, b)] + E[C_i|(A_i,B_i)= (a', b)] \\&+\, E[C_i|(A_i,B_i)= (a, b')] + E[C_i|(A_i,B_i)= (a', b')]\bigg ) \\&= {1\over 4} \bigg ( E_{ab}(D_{1i}D_{2i}) - E_{a'b}(D_{1i}D_{2i})\\&+\, E_{ab'}(D_{1i}D_{2i}) + E_{a'b'}(D_{1i}D_{2i})\bigg ). \end{aligned}$$
Noting the similarity to (16), we obtain the following,
$$\begin{aligned} E(C_i) = {1\over 4}K^{CHSH}_i \end{aligned}$$
where we take \(K^{CHSH}_i\) to be as defined in (18), after replacing the variables with the \(i\)-indexed versions given in (20). Assumptions (21), (22), (23), and (25) are equivalent to the assumptions of Proposition 2 if applied to \(K^{CHSH}_i\), so the proposition holds. \(\square \)
For each \(i\), \(C_i\) is a Bernoulli trial, taking outputs in the set \(\{+1, -1\}\), so let us define
$$\begin{aligned} p_i := P(C_i = +1). \end{aligned}$$
It is straightforward to compute
$$\begin{aligned} E(C_i) = 2p_i -1, \quad \mathrm{Var}(C_i) = 4p_i(1-p_i). \end{aligned}$$
Under a LHVT, (26) and (28) implies that \(p_i\) must be at most 75 %. On the other hand, Quantum Mechanics predicts that \(E(C_i) = {\sqrt{2} \over 2}\), which yields a \(p_i\) of roughly 85.4 %. This will allow us to discern a difference over many trials.
We can now formulate the hypothesis test in mathematical terms:
$$\begin{aligned}&H_0:\forall i, p_i \le {3\over 4} \quad \quad \quad \quad \quad \quad \mathrm{(LHVT;} (21)-(25)\, \mathrm{satisfied)}\\&H_A: \forall i, p_i = {1 + \sqrt{2} \over 2\sqrt{2}} = .854\ldots \quad \quad \quad \quad \quad \quad \mathrm{(Quantum)} \end{aligned}$$
Over \(n\) trials, the natural choice for a sample statistic is \(\overline{C_n}\), defined as follows:
$$\begin{aligned} \overline{C_n} = {\sum _{i=1}^n C_i\over n}. \end{aligned}$$
and so under the assumption of \(H_0\), we expect the sample statistic \(\overline{C_n}\) to satisfy
$$\begin{aligned} E(\overline{C_n}) \le 1/2. \end{aligned}$$
We will reject the null hypothesis in favor of the alternative hypothesis if \(\overline{C_n} > z\), where \(z\) will be some cut-point exceeding \(1/2\) by a little bit.
Let \(p_n(\_)\) denote a probability mass function for the first \(n\) outputs of \(C_i\), and let \(\Theta _0\) be the collection of \(p_n(\_)\) that satisfy the assumptions (21)–(25). \(\Theta _0\) thus denotes the collection of allowable distributions under the null hypothesis. Then the significance level of the hypothesis test—the probability of Type I error – is defined to be
$$\begin{aligned} \alpha = \sup _{p_n(\_)\in \Theta _0}P[\overline{C_n} > z | p_n(\_)]. \end{aligned}$$
Calculating \(\alpha \) is somewhat involved. This is because the null hypothesis does not assert that the various \(C_i\) are i.i.d., so equation (29) alone does not provide us with an asymptotic distribution of \(\overline{C_n}\). In the absence of the assumption of i.i.d, we cannot rule out trivialities such as
$$\begin{aligned} C_1=C_2 = \cdots = C_{n-1} = C_n \end{aligned}$$
(total dependence), for which we would have \(\alpha = p_i\), independent of \(n\)!

The following lemma rules out possibilities like (31), and it will allow us to demonstrate that \(\alpha \) decreases as \(n\) increases.

Lemma 4

Let \(\varvec{v}\) be any vector in \( \{-1, +1\}^{i-1}\) for which \(P(C_1,\ldots ,C_{i-1}=\varvec{v})\) is positive. Then, under the null hypothesis—which subsumes assumptions (21)–(25)—we have
$$\begin{aligned} P\big (C_i=+1 \big | (C_1,\ldots ,C_{i-1})=\varvec{v}\big ) \le {3\over 4}. \end{aligned}$$


Let \(\mathcal C\) denote the event \((C_1,\ldots ,C_{i-1})=\varvec{v}\). Let \(C_i\) be a shorthand for the event \(C_i = +1\). Then we have
$$\begin{aligned} P(C_i|\mathcal C)&= {P(C_i\cap \mathcal C) \over P(\mathcal C)} = {1\over P(\mathcal C)} E(I_{C_i \cap \mathcal C}) = {1\over P(\mathcal C)} E\big [ E(I_{C_i \cap \mathcal C}|\lambda _i)\big ] \\&= {1\over P(\mathcal C)} \mathop \int \limits _\Omega E(I_{C_i \cap \mathcal C}|\lambda _i) dP = {1\over P(\mathcal C)} \mathop \int \limits _\Omega P(C_i \cap \mathcal C|\lambda _i) dP. \end{aligned}$$
In the integral above, we note that \(\mathcal C\) is in \(\sigma (\lambda _i )\) by the time-sequential nature of the experiment, encapsulated in Eq. (24). This implies that
$$\begin{aligned} \mathop \int \limits _\Omega P(C_i \cap \mathcal C|\lambda _i) dP = \mathop \int \limits _\Omega P(C_i |\lambda _i)I_{\mathcal C} dP, \end{aligned}$$
which is a consequence of Theorem 9.1.3 in [14]. So the integral becomes
$$\begin{aligned} \mathop \int \limits _{\mathcal C} P(C_i|\lambda _i) dP. \end{aligned}$$
Using an \(i\)-indexed version of the “\(+_1\)” notation introduced in Definition 1, we apply Lemma 2 to decompose the integrand into the eight constituent sub-events of \(C_i\), obtaining
$$\begin{aligned}&\mathop \int \limits _{\mathcal C} \bigg [ P(+_{1i} \cap +_{2i} \cap a_i \cap b_i |\lambda _i) + P(-_{1i} \cap -_{2i} \cap a_i \cap b_i |\lambda _i) \nonumber \\&\quad +\,P(+_{1i} \cap +_{2i} \cap a_i \cap b'_i |\lambda _i) + P(-_{1i} \cap -_{2i} \cap a_i \cap b'_i |\lambda _i) \nonumber \\&\quad +\, P(+_{1i} \cap -_{2i} \cap a'_i \cap b_i |\lambda _i) + P(-_{1i} \cap +_{2i} \cap a'_i \cap b_i |\lambda _i) \nonumber \\&\quad +\,P(+_{1i} \cap +_{2i} \cap a'_i \cap b'_i |\lambda _i) + P(-_{1i} \cap -_{2i} \cap a'_i \cap b'_i |\lambda _i)\bigg ] dP. \end{aligned}$$
We apply (25) to the first term of (33) to get
$$\begin{aligned} P(+_{1i} \cap +_{2i} \cap a_i \cap b_i |\lambda _i) = P(+_{1i}\cap a_i |\lambda _i) P(+_{2i} \cap b_i |\lambda _i), \end{aligned}$$
and multiplying right-hand side above by \(P(a_i \cap b_i) / P(a_i \cap b_i)\) yields, via (21) and (22),
$$\begin{aligned} P(+_{1i} \cap +_{2i} \cap a_i \cap b_i |\lambda _i)&= P(a_i \cap b_i)\big [ \mu _{a_i}(+_{1i}|\lambda _i) \mu _{b_i}(+_{2i}|\lambda _i)\big ] \\&= {1\over 4}\big [ \mu _{a_i}(+_{1i}|\lambda _i) \mu _{b_i}(+_{2i}|\lambda _i)\big ]. \end{aligned}$$
The other seven terms simplify the same way, so (33) becomes
$$\begin{aligned}&{1\over 4} \mathop \int \limits _{\mathcal C} \mu _{a_i}(+_{1i}|\lambda _i) \mu _{b_i}(+_{2i}|\lambda _i) + \mu _{a_i}(-_{1i}|\lambda _i) \mu _{b_i}(-_{2i}|\lambda _i) \nonumber \\&\quad +\,\mu _{a_i}(+_{1i}|\lambda _i) \mu _{b'_i}(+_{2i}|\lambda _i) + \mu _{a_i}(-_{1i}|\lambda _i) \mu _{b'_i}(-_{2i}|\lambda _i) \nonumber \\&\quad +\,\mu _{a'_i}(+_{1i}|\lambda _i) \mu _{b_i}(-_{2i}|\lambda _i) + \mu _{a'_i}(-_{1i}|\lambda _i) \mu _{b_i}(+_{2i}|\lambda _i) \nonumber \\&\quad +\,\mu _{a'_i}(+_{1i}|\lambda _i) \mu _{b'_i}(+_{2i}|\lambda _i) + \mu _{a'_i}(-_{1i}|\lambda _i) \mu _{b'_i}(-_{2i}|\lambda _i) dP. \end{aligned}$$
If we define
$$\begin{aligned} t = \mu _{a_i}(+_{1i}|\lambda _i) \quad s = \mu _{a'_i}(+_{1i}|\lambda _i) \quad u = \mu _{b_i}(+_{2i}|\lambda _i) \quad v = \mu _{b'_i}(+_{2i}|\lambda _i), \end{aligned}$$
we can factor the integrand in (34) and again apply Lemma 2 to obtain
$$\begin{aligned}&{1\over 4}\mathop \int \limits _{\mathcal C} t[u+v]+(1-t)[(1-u)+(1-v)]\nonumber \\&\quad +\,s[v+(1-u)]+(1-s)[u+(1-v)] dP. \end{aligned}$$
By Lemma 3, which applies by (22) and (23), we have \(s\), \(t\), \(u\), and \(v\) in \([0,1]\). With this constraint, a case analysis shows that the integrand in (35) is always bounded by 3. Returning to the original expression, we now have
$$\begin{aligned} P(C_i|\mathcal C) \le {1\over P(\mathcal C)}\cdot {1\over 4}\mathop \int \limits _{\mathcal C} 3 dP = {3\over 4}. \end{aligned}$$
Hence, the claim is true. \(\square \)

Lemma 4 allows us to formulate an upper-limit distribution for \(\overline{C_n}\), as shown in the following proposition. The result shows us that over many repetitions of the experiment, \(C_i\) cannot do any better at accumulating “\(+1\)” outcomes than an independent, identically distributed process that has a \({3\over 4}\) chance of success each time (i.e., a Binomial random variable). In light of Lemma 4, this may seem intuitive, but the proof does take some effort.

Proposition 4

For a fixed positive integer \(n\), let \(B_n\) be the Binomial random variable corresponding to \(n\) trials with probability of success \(p_B = {3\over 4}\). Then, under the assumptions of Lemma 4, for a fixed \(k\in \{0,\ldots ,n\}\), and for \(i\) ranging between \(1\) and \(n\),
$$\begin{aligned} P(\mathrm{at} \,\, \mathrm{least } \,\, k \,\, \mathrm{of} \,\, \mathrm{the } \,\, C_i \,\,\mathrm{equal } +1) \le P(B_n\ge k). \end{aligned}$$


To show this holds for any fixed positive integer \(n\), we use mathematical induction.

Case 1: \(n=1\).

There are two possibilities for \(k\): 0 and 1. For \(k=1\),
$$\begin{aligned} P(\mathrm{at} \,\, \mathrm{least } \,\, k \,\, \mathrm{of} \,\, \mathrm{the } \,\, C_i \,\, \mathrm{equal } +1) = P(C_1 = +1) \le {3\over 4} = P(B_1\ge 1), \end{aligned}$$
and for \(k=0\),
$$\begin{aligned} P(\mathrm{at} \,\, \mathrm{least } \,\, k \,\, \mathrm{of} \,\, \mathrm{the } \,\, C_i \,\, \mathrm{equal } +1) = 1 = P(B_1\ge 0). \end{aligned}$$
Case 2: Assume the claim is true for \(n\), and derive that it is true for \(n+1\).

Now, \(k\) can range from \(0\) to \(n+1\). First, let us prove it for \(k\) between \(1\) and \(n\), and later we will prove the boundary cases of \(k=0\) and \(k=n+1\).

Introduce a shorthand,
$$\begin{aligned} P_{n, k} (C)&:= P(\mathrm{for }\, 1\le i\le n \,\, \mathrm{at} \,\, \mathrm{least } \,\, k \,\, \mathrm{of} \,\, \mathrm{the } \,\, C_i \,\, \mathrm{equal } +1),\\ P_{n, k} (B)&:= P(B_n\ge k), \end{aligned}$$
so what we are trying to prove can now be written as \(P_{n+1,k}(C) \le P_{n+1,k}(B)\). By conditioning,
$$\begin{aligned} P_{n+1,k}(C) = P_{n,k}(C) + p_{C_{n+1}}\cdot \big [P_{n,k-1}(C) -P_{n,k}(C)\big ], \end{aligned}$$
where we note that \(\big [P_{n,k-1}(C) -P_{n,k}(C)\big ]\) is the probability that we have exactly\(k-1\) successes after \(n\) trials, and \(p_{C_{n+1}}\) denotes the probability that \(C_{n+1}=+1\), given exactly \(k-1\) successes after \(n\) trials. As we are temporarily omitting the possibility that \(k=n+1\) or \(k=0\), it follows that \(P_{n,k}(C)\) and \(P_{n,k-1}(C)\) are well-defined and included in the scope of the inductive hypothesis.
Let \(S\) be the subset of \(\{-1, +1\}^n\) consisting of vectors for which exactly \(k-1\) of the entries are \(+1\) and \(\varvec{v} \in S \Rightarrow P\big [ (C_1,\ldots ,C_n) = \varvec{v} \big ] >0\). We have
$$\begin{aligned}&p_{C_{n+1}} \big [P_{n,k-1}(C) -P_{n,k}(C)\big ] \\&\quad =\sum _{\varvec{v} \in S} P\big [C_{n+1}= +1\big |(C_1, \ldots , C_n)= \varvec{v}\big ]P\big [(C_1, \ldots , C_n)= \varvec{v}\big ] \\&\quad \le \sum _{\varvec{v} \in S} {3\over 4} P\big [(C_1, \ldots , C_n)= \varvec{v}\big ] \\&\quad = {3\over 4} \sum _{\varvec{v} \in S}P\big [(C_1, \ldots , C_n)= \varvec{v}\big ] \\&\quad = {3\over 4} \big [P_{n,k-1}(C) -P_{n,k}(C)\big ], \end{aligned}$$
where the inequality above follows by Lemma 4. From this, (37) can be re-written as
$$\begin{aligned} P_{n+1,k}(C)\le {1\over 4} P_{n,k}(C) + {3\over 4} P_{n,k-1}(C). \end{aligned}$$
By the inductive hypothesis, \(P_{n,k}(C)\le P_{n,k}(B)\) and \(P_{n,k-1}(C)\le P_{n,k-1}(B)\), and we have
$$\begin{aligned} {1\over 4} P_{n,k}(B) + {3\over 4}P_{n,k-1}(B)&= P_{n,k}(B) + {3\over 4} \big [P_{n,k-1}(B) -P_{n,k}(B)\big ] \\&= P_{n+1,k}(B). \end{aligned}$$
Hence, \(P_{n+1,k}(C)\le P_{n+1,k}(B)\).
This leaves only the boundary cases unproven. For \(k=0\), we clearly have
$$\begin{aligned} P_{n+1,k}(C) = 1 = P_{n+1, k} (B), \end{aligned}$$
so the inequality holds easily. For \(k=n+1\), we have
$$\begin{aligned} P_{n+1, k}(C) = P_{n,k-1}(C)\cdot P(C_{n+1} = +1 | C_i = +1\, \mathrm{for }\, i = 1,\ldots , n). \end{aligned}$$
As \(P_{n,k-1}(C)\le P_{n,k-1}(B)\) by the inductive hypothesis, and as
$$\begin{aligned} P(C_{n+1} = +1 | C_i = +1\, \mathrm{for }\, i = 1,\ldots , n)\le {3\over 4} \end{aligned}$$
by Lemma 4, we have
$$\begin{aligned} P_{n+1,k}(C) \le {3\over 4} P_{n,k-1}(B) = P_{n+1,k}(B). \end{aligned}$$
\(\square \)

So, under the null hypothesis, the probability of getting at least \(k\)\(C_i=+1\)” results over the course of \(n\) trials is bounded above by the probability of getting at least \(k\) “successes” over the course of \(n\) Bernoulli trials with probability of success \({3\over 4}\). The bound is sharp: the i.i.d. case with \(p_i = {3\over 4}\) is allowed (just not implied) by assumptions (21)–(25). Note that this result directly pertains to the behavior of \(\overline{C_n}\), as the event “\(\overline{C_n}>z\)” is equivalent to the event “at least \(k\) of the \(C_i\) equal \(+\)1”, where \(k\) is an integer determined by the particular value of \(z\).

With these results, we have
$$\begin{aligned} \alpha = \sup _{p_n(\_)\in \Theta _0} P\bigg (\overline{C_n}>z \bigg | p_n(\_) \bigg ) = P \bigg (\overline{C_n}>z\bigg | C_i \,\mathrm{i.i.d. and }\,\,\forall i, p_i = {3\over 4}\bigg ). \end{aligned}$$
As \(\alpha \) is bounded by the i.i.d case, which is achieved at the boundary of the null parameter space, we can now calculate it.

Corollary 1

If \(B_{n, \frac{3}{4}}\) is a Binomial random variable of \(n\) trials with probability of success \(\frac{3}{4}\), then
$$\begin{aligned} \alpha = P\left( B_{n,\frac{3}{4}}> \frac{n}{2}(z+1)\right) . \end{aligned}$$
For various particular choices of \(n\) and \(z\), it may be accurate to estimate \(\alpha \) using the asymptotic Normal distribution, especially for large choices of \(n\). Then the approximation would be \(\alpha \approx 1 - \Phi {2z -1\overwithdelims () \sqrt{3}/ \sqrt{n}}\), where \(\Phi (x)\) is the cumulative distribution function of the Standard Normal distribution. However, care should be used, as the Central Limit Theorem only states that
$$\begin{aligned} \frac{\overline{C_n}-\frac{1}{2}}{\sqrt{\frac{3}{4}}/\sqrt{n}}\sim N(0,1), \end{aligned}$$
which does not directly apply to (40) for fixed choices of \(z\) as \(n\rightarrow \infty \). It is safest to use the Binomial cumulative distribution function to calculate \(\alpha \) exactly.
We can also calculate the power of the test, as the alternative hypothesis \(H_A\) specifies the distribution of the \(C_i\) exactly. The probabilities \(p_i\) are all equal to \({\sim }.854\), and the quantum mechanical description of the experiment asserts that successive trials are independent (as is intuitive). The power is \(1 - \beta \), where \(\beta \) is defined as
$$\begin{aligned} \beta = P(\overline{C_n} \le z | H_A). \end{aligned}$$
From (28), we calculate \(E(C_i) = {\sqrt{2} \over 2}\) and \(\mathrm{Var}(C_i) = {1\over 2}\), then
$$\begin{aligned} \beta =P\left( B_{n, .854\ldots }<\frac{n}{2}(z+1)\right) . \end{aligned}$$
This can be calculated exactly, or estimated asymptotically with the Normal distribution, subject to the same caveats discussed in the previous paragraph. The Normal approximation for \(\beta \) is \(\beta \approx \Phi {2z - \sqrt{2}\overwithdelims () \sqrt{2}/ \sqrt{n}}\).
To obtain statistical significance, the needed number of trials is not especially high. If the quantum prediction is correct, then \(\overline{C_n}\) should tend to \({\sqrt{2}\over 2}\). Hence, if after \(n\) trials, \(C_n\) is about \({\sqrt{2}\over 2}\), we can calculate a \(p\)-value, using \(z={\sqrt{2}\over 2}\) in (40):
$$\begin{aligned} \mathrm{{ p}-value } = P\left( B_{n, \frac{3}{4}}>\frac{n(\sqrt{2} + 2)}{4}\right) . \end{aligned}$$
For example, to get a \(p\)-value of \(\alpha < .05\), it would suffice to have \(n\ge 50\) trials.
The \(p\)-value calculated in (44) is comparable to the figure claimed by [8], and is not larger than the relevant p-values calculated numerically in [13]. The martingale-based analysis of [9] would result in a larger p-value, as discussed in [13]; this is due [9]’s use of the loose (though computationally simple) Azuma–Hoeffding inequality [16, 17] to bound the upper tail probabilities, as opposed to exact figures that can be obtained from the Binomial distribution. Tighter Azuma-Hoeffding bounds can be applied, such as expression (8) in [18], which in our setting simplifies to
$$\begin{aligned} p\text {-value}\le \left[ \left( \frac{1}{2-\sqrt{2}}\right) ^{\frac{2-\sqrt{2}}{4}}\left( \frac{3}{2+\sqrt{2}}\right) ^{\frac{2+\sqrt{2}}{4}}\right] ^n. \end{aligned}$$
The above bound is easier to compute than the Binomial cumulative distribution function, but there is still a meaningful gap between the bound and the exact figures:






Exact p-value (44)



\(6.34\times 10^{-16}\)

\(8.58\times 10^{-142}\)

A–H Bound (8) in [18]



\(1.18\times 10^{-14}\)

\(5.03\times 10^{-140}\)

As the table reveals, the difference between the upper bound and the exact calculation is roughly two orders of magnitude for larger values of \(n\).

Remark 2

To calculate the power of the test, we used our knowledge of the quantum predictions. \(H_A\) could be extended to include any violation of locality; from a hypothesis test standpoint, our knowledge of the precise quantum predictions is not necessary. Smaller (sub-quantum) violations of the CHSH inequality would take more trials to detect. And violations of the inequality on some trials, balanced by trials that obey the inequality, could be statistically undetectable if the trials obeying the inequality were to do so by a large enough margin.

5 Conclusion

We have shown that the CHSH inequality can be proved in a completely general measure-theoretic framework, and furthermore that a hypothesis test can definitively test locality in an experimental setting.

By working in a precise setting, we gain the benefit of clearly delineating all of the assumptions being made. If \(H_A\) is supported by experiment, one of the various assumptions must be false. Under most standard interpretations of the quantum description of a Bell experiment, (2)–(4) can be satisfied and it is the locality assumption, (25), that is violated. As Quantum Mechanics is a successful theory upheld by countless experiments, it would be logical to attribute the failure of \(H_0\) to a quantum violation of (25).

However, the formulation of \(H_0\), and the derivation of the CHSH inequality (16) also rest on four other assumptions; the “experimental assumptions,” (21), (22), and (23), and time sequentiality, (24). A physical theory could violate \(H_0\), but still satisfy locality so long as one of the other assumptions turned out not to hold.

It is not clear that a violation of the time sequentiality assumption (24) would have any physical interpretation, as (24) is really a technical detail of how to model the problem—akin to the more basic assumption that we can model the problem with a probability space and random variables to begin with. As for the two assumptions (21) and (22), these can be compared to observed data and confirmed to any desired degree of certainty.1 On the other hand, (23) is a different creature. Equation (23) states that two observable random variables, \(A_i\) and \(B_i\), are independent of an unobservable random variable, \(\lambda _i\), and therefore this assumption cannot be directly tested.

What would a violation of (23) imply? This would mean that whatever process you were using to randomly set the detector settings was influenced by the state of the system prior to detection, \(\lambda _i\). Since we can choose any source of randomness—a separate quantum process, a random number generator on a computer, random fluctuations of the cosmic background radiation—to toggle the detector settings, the state of the system \(\lambda _i\) would have to be correlated with all sorts of seemingly unrelated processes. However, this would be the only alternative explanation, if we are to keep the locality assumption.

Sometimes it is claimed that it is not locality, but realism that must be abandoned. However, there is some debate about whether realism is a well-defined, required concept in the context of Bell experiments [19], and there is no clear invocation of realism at any point in this paper (assumption (23) is more aptly referred to as a free-will assumption, and (5) is of course a locality assumption). It could be argued that modeling the problem using the usual notions of probability fundamentally presupposes a realist viewpoint, but then it is not clear what a non-realist—but local—theory would be, or how such a theory could be modeled. In any case, to claim that the CHSH inequality rests on an assumption of realism requires being able to identify which of the assumptions and/or deductive steps in Sects. 24 should be identified with realism.

This paper assumes that every trial results in a detection event at both ends of the laboratory. In practice, however, there are limits in the detection efficiency of real-world particle detectors that result in most photons going undetected, so many trials end with only one detector detecting a photon, or no detections at all: see, for example, [20], where detection efficiency was only 5 %. To properly model a real-world experiment with this constraint, one would have to allow for a third outcome, “undetected” or “0”, in addition to the two outcomes “\(+1\)” and “\(-1\)”. Previous papers [21, 22, 23] have analyzed how to model this additional-outcome experiment and it has been found that, for a CHSH experiment using the singlet state, Quantum Mechanics is distinguishable from any LHVT so long as the detection efficiency exceeds a crucial cut-off of about 83 %, an efficiency that has not yet been achieved in a CHSH experiment. Detection-efficiency issues can be addressed in a completely general measure–theoretic framework without making i.i.d. assumptions about repeated trials; this is done in a separate work [24].


  1. 1.

    The reader may note that confirming these two assumptions by appealing to experimental data would require an assumption that the random variable sequences \(\{A_i\}\) and \(\{B_i\}\) are i.i.d.—exactly the sort of assumption we are trying to avoid in this paper. However, the difference is this: we observe\(\{A_i\}\) and \(\{B_i\}\), and we may come to a reasonable conclusion that we are observing an i.i.d. sequence, whereas we will never be able to conclude this about the unobserved sequence \(\{\lambda _i\}\).



The author would like to thank Michael Mislove and Keye Martin for their support and guidance, as well as Gustavo Didier and Lev Kaplan for their helpful comments and suggestions. This work was partially supported by grant FA9550-13-1-0135 from the US Air Force Office of Scientific Research and Grant N00014-10-1-0329 P00004 from the US Office of Naval Research.


  1. 1.
    Bell, J.: On The Einstein Podolsky Rosen Paradox. Physics 1, 195–200 (1964)Google Scholar
  2. 2.
    Clauser, J., Horne, A., Shimony, A., Holt, R.: Proposed experiment to test local hidden-variable theories. Phys. Rev. Lett. 23, 880–884 (1969)ADSCrossRefGoogle Scholar
  3. 3.
    Barrett, J., Hardy, L., Kent, A.: No signaling and quantum key distribution. Phys. Rev. Lett. 95, 010503 (2005)ADSCrossRefGoogle Scholar
  4. 4.
    Acín, A., Brunner, N., Gisin, N., Massar, S., Pironio, S., Scarani, V.: Device-independent security of quantum cryptography against collective attacks. Phys. Rev. Lett. 98, 230501 (2007)ADSCrossRefGoogle Scholar
  5. 5.
    Pironio, S., et al.: Random numbers certified by Bell’s theorem. Nature 464, 1021–1024 (2010)ADSCrossRefGoogle Scholar
  6. 6.
    Fritz, T.: Beyond Bell’s theorem: correlation scenarios. New J. Phys. 14(10), 103001 (2012)ADSCrossRefMathSciNetGoogle Scholar
  7. 7.
    A. Brandenburger, H.J. Keisler.: Fiber products of measures and quantum foundations. URL To appear in Logic and Algebraic Structures in Quantum Computing and Information. Lecture Notes in Logic, Association for Symbolic Logic, Cambridge University Press, Cambridge (2012)
  8. 8.
    Barrett, J., Collins, D., Hardy, L., Kent, A., Popescu, S.: Quantum nononlocality, Bell inequalities, and the memory loophole. Phys. Rev. A 66, 042111 (2002)ADSCrossRefGoogle Scholar
  9. 9.
    Gill, R.D.: Accardi Contra Bell (Cum Mundi): the impossible coupling. Math. Stat. Appl. Festschr. Constance van Eeden IMS Lect. Notes Monogr. 42, 133–154 (2003)Google Scholar
  10. 10.
    Hänggi, E., Renner, R., Wolf, S.: The impossibility of non-signaling privacy amplification. Theor. Comput. Sci. 486, 27–42 (2013)CrossRefMATHGoogle Scholar
  11. 11.
    Barrett, J., Colbeck, R., Kent, A.: Memory attacks on device-independent quantum cryptography. Phys. Rev. Lett. 110, 010503 (2013)ADSCrossRefGoogle Scholar
  12. 12.
    van Dam, W., Gill, R.D., Grunwald, P.D.: The statistical strength of nonlocality proofs. IEEE Trans. Inf. Theory 51, 2812–2835 (2005)CrossRefMATHGoogle Scholar
  13. 13.
    Zhang, Y., Glancy, S., Knill, E.: Asymptotically optimal data analysis for rejecting local realism. Phys. Rev. A 84, 062118 (2011)ADSCrossRefGoogle Scholar
  14. 14.
    Chung, K.L.: A Course in Probability Theory. A Course in Probability Theory, 2nd edn. Academic Press, San Diego (1974)Google Scholar
  15. 15.
    Brandenburger, A., Yanofsky, N.: A classification of hidden-variable properties. J. Phys. A 41, 425302 (2008)ADSCrossRefMathSciNetGoogle Scholar
  16. 16.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. 19(3), 357–367 (1967)CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    Zhang, Y., Glancy, S., Knill, E.: Efficient quantification of experimental evidence against local realism. Phys. Rev. A 88, 052119 (2013).Google Scholar
  19. 19.
    Gisin, N.: Non-realism: deep Thought or a Soft Option? Found. Phys. 42, 80–85 (2012)ADSCrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Weihs, G., Jennewein, T., Simon, C., Weinfurter, H., Zeilinger, A.: Violation of Bell’s inequality under strict Einstein locality conditions. Phys. Rev. Lett. 81, 5039–5043 (1998)Google Scholar
  21. 21.
    Pearle, P.M.: Hidden-variable example based upon data rejection. Phys. Rev. D 2, 1418–1425 (1970)Google Scholar
  22. 22.
    Clauser, J., Horne, M.: Experimental consequences of objective local theories. Phys. Rev. Lett. 10(2), 526–535 (1974)ADSGoogle Scholar
  23. 23.
    Mermin, N.D., Garg, A.: Detector inefficiencies in the Einstein–Podolsky–Rosen experiment. Phys. Rev. D 35(12), 3831–3835 (1987)ADSCrossRefGoogle Scholar
  24. 24.
    P. Bierhorst, A mathematical foundation for locality. Ph.D. thesis, Tulane University (2014)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Mathematics DepartmentTulane UniversityNew OrleansUSA

Personalised recommendations