1 Introduction

Many physicists are unimpressed by Bell’s theorem. A widespread view is that Bell’s reasoning rests upon an implicit assumption of “classicality” that directly goes against the fundamental principles of quantum mechanics (QM). According to such an understanding, the violation of Bell’s inequalities poses no challenge to our causal picture of the world (locality, in particular), but simply reinforces the fact that QM is not classical. One proponent of such a view is Reinhard Werner, who concisely puts it like this (Werner, 2014a, p. 4):

Bell showed (maybe against his own intentions ...) that classicality and locality together lead to false empirical conclusions. Of course, all the talk about the non-locality of quantum mechanics really says [is] that any classical extension violates locality ...

In line with this picture, physicists have developed various interpretations of quantum theory that are claimed to be local and non-classical. Among recent variants are Werner’s operational quantum mechanics (Werner , 2014a; Werner , 2014b) and Robert Griffiths’s consistent histories approach (Griffiths , 2020).

Others object to this view. Criticizing Werner’s position about the EPR argument and Bell’s theorem, Maudlin (2014b, pp. 1–2) writes:

Werner thinks that Bell and Einstein and I have all tacitly made an assumption of which we are unaware, an assumption he labels C for ‘classicality’. ... Werner concedes that Bell proved that any classical theory that violates his inequalities must be non-local. But deny classicality and the arguments no longer go through. ... The condition C is easily stated: it is that the state space of a theory forms a simplex. Good. The space of density matrices in quantum theory does not form a simplex, so if one takes the possible physical states of a system to be given by the density matrices, then one’s theory is not classical in this sense. That much is clear. But what is not at all clear is where the assumption that the state space is a simplex is presupposed in either Einstein’s or Bell’s reasoning.

It is interesting to compare this debate with an idea developed in the ’80s by scholars like Itamar Pitowsky, Arthur Fine, and others. On their interpretation, Bell’s inequalities have nothing to do with locality and causality. The violation of the inequalities, on their account, is an indication that the classical laws of probability are no longer applicable in the quantum domain. That is, there is a sense of classicality that directly entails Bell’s inequalities, without any further assumption about locality. In Pitowsky’s (1989, p. 8) words:

The set of axioms for classical probability entail that frequencies should obey an a-priori set of constraints [Bell-type inequalities] that are often violated by quantum frequencies. The violation itself has a-priori nothing to do with the principle of locality for it often occurs in cases where spatio-temporal aspects play no role whatever.

On Fine’s (1982, p. 294) reading:

... hidden variables and the Bell inequalities are all about ... imposing requirements to make well defined precisely those probability distributions for noncommuting observables whose rejection is the very essence of quantum mechanics.

Notice that Fine says not only that Bell’s theorem is about probability rather than locality and causality, but also that Bell’s inequalities follow from a condition (classicality) that contradicts QM, suggesting that the violation of the inequalities should not be surprising but should actually be expected in the first place.

In sum, the debate over the role of classicality in Bell’s theorem leaves us with a confusing disagreement of the following positions:

  • Bell’s argument presupposes classicality in addition to locality (Werner)

  • Classicality is nowhere referred to in Bell’s argument; the only substantial assumption is locality (Maudlin)

  • Bell’s inequalities follow only from classicality; no further locality assumption is needed (Pitowsky and Fine)

This situation poses two straightforward questions: 1) Is there a common notion of classicality shared by all parties? 2) If yes, what role exactly does classicality thus construed play in Bell’s theorem? The aim of this paper is to clarify these questions.

We will proceed as follows. Section 2 answers question 1 positively: we show that Werner’s notion of classicality (condition C above) is equivalent with Pitowsky’s and Fine’s probabilistic conditions of the existence of a Kolmogorovian representation of quantum probabilities and the existence of joint distributions, respectively. Then, with an unambiguous notion of classicality at hand, Section 3 answers question 2: we demonstrate that classicality is not a presupposition of Bell’s theorem but a consequence of the standard causal-statistical assumptions. Next, in Section 4, we investigate how the approaches of Werner and Griffiths can claim to get around Bell’s theorem. In light of what we will have shown about classicality, it is clear that in getting around the derivation of Bell’s inequalities each of the two approaches in question must violate one of the standard causal-statistical assumptions of Bell’s theorem. We claim that while in Werner’s operational quantum mechanics the Common Cause Principle is violated, in the consistent histories approach of Griffiths, the formulation of quantum theory turns out to be conspiratorial. Finally, in Section 5, we relate these two options to the idea of realism, a notion that is also often identified as an implicit assumption of Bell’s theorem. The Appendix contains the proofs of the two central mathematical propositions that we formulate in the main text.

2 Classicality as a probabilistic notion

Recall Pitowsky’s (1989) formalism. Let n be a natural number and S be a subset of \(\left\{ (i,j)|i<j;i,j=1,2,...,n\right\} \). Suppose we are given \(n+\left| S\right| \) numbers (\(\left| S\right| \) denotes the number of elements of S):

$$\begin{aligned} \begin{array}{rclcl} p_{i} &{} &{} i=1,2,...,n\\ p_{ij} &{} &{} (i,j)\in S \end{array} \end{aligned}$$
(1)

with \(0\le p_{i},p_{ij}\le 1\). We arrange these numbers in a so-called a correlation vector

$$\begin{aligned} \overrightarrow{p}=\left( p_{1},p_{2},...,p_{n},...,p_{ij},...\right) \in \mathbb {R}^{n+\left| S\right| } \end{aligned}$$
(2)

where the index pairs \((i,j)\in S\) are ordered lexicographically. \(\overrightarrow{p}\) will be thought of as an array of experimentally ascertained probabilities of n outcomes of some measurements performed on a given system, and some of the correlations of these outcomes (the probabilities of their conjunctions). \(\overrightarrow{p}\) can be seen as a (partial) description of the system’s “state,” characterizing how the system is disposed to react to certain measurements performed on it.

As an example consider a \(2\times 2\)-type EPR–Bohm (EPRB) scenario. In each wing of a \(2\times 2\) EPRB experiment one selects from two given measurement settings (directions). Label by \(a_{1},a_{2}\) the settings on the left, by \(b_{3},b_{4}\) the settings on the right. Let \(A_{1},A_{2},B_{3},B_{4}\) denote the corresponding spin up outcomes. The probabilities of spin outcomes yielded by the experiment and predicted by QM can be arranged in a correlation vector of type \(n=4\), \(S_{\text {EPR}}=\left\{ (1,3),(1,4),(2,3),(2,4)\right\} \):

$$\begin{aligned} \overrightarrow{p}_{\text {EPR}}=\left( p_{1},p_{2},p_{3},p_{4},p_{13},p_{14},p_{23},p_{24}\right) \in \mathbb {R}^{4+4} \end{aligned}$$
(3)

withFootnote 1

$$\begin{aligned} \begin{array}{rclcl} p_{i} &{} = &{} p\left( A_{i}|a_{i}\right) &{} &{} i=1,2\\ p_{j} &{} = &{} p\left( B_{j}|b_{j}\right) &{} &{} j=3,4\\ p_{ij} &{} = &{} p\left( A_{i}\cap B_{j}|a_{i}\cap b_{j}\right) &{} &{} (i,j)\in S_{\text {EPR}} \end{array} \end{aligned}$$
(6)

\(\overrightarrow{p}_{\text {EPR}}\) provides a (partial) description of the spin state of the two-particle system (or an ensemble of such systems) prepared and measured in an EPRB experiment.

We now recall and precisely formulate notions of when such a description—and hence the system in question and its state—is regarded as “classical.”

The first notion (Pitowsky, 1989) requires that a correlation vector be composed of numbers that satisfy Kolmogorov’s axioms, so they be classical probabilities.

Definition 1

Correlation vector \(\overrightarrow{p}\) admits a classical probability space representation iff there exists a classical probability space \(\left( X,\mathcal {A},\mu \right) \) and \(E_{1},E_{2},...,E_{n}\in \mathcal {A}\) such that

$$\begin{aligned} \begin{array}{rclcl} p_{i} &{} = &{} \mu \left( E_{i}\right) &{} &{} i=1,2,...,n\\ p_{ij} &{} = &{} \mu \left( E_{i}\cap E_{j}\right) &{} &{} (i,j)\in S \end{array} \end{aligned}$$
(4)

The second notion (Fine, 1982) requires that the probability values in a correlation vector arise from a joint distribution as marginal probabilities.

Definition 2

Correlation vector \(\overrightarrow{p}\) is extractable from a joint distribution iff there exist \(2^{n}\) numbersFootnote 2\(p_{\alpha _{1}...\alpha _{n}},\alpha _{1},...,\alpha _{n}\in \left\{ +,-\right\} \) such that

$$\begin{aligned} \begin{array}{rclcl} 0\le &{} p_{\alpha _{1}...\alpha _{n}} &{} \le 1\\ \underset{\alpha _{1},...,\alpha _{n}\in \left\{ +,-\right\} }{\sum } &{} p_{\alpha _{1}...\alpha _{n}} &{} =1 \end{array} \end{aligned}$$
(7)

and

$$\begin{aligned} \begin{array}{rclcl} p_{i} &{} =\underset{\alpha _{i}=+}{\underset{\alpha _{1},...,\alpha _{n}\in \left\{ +,-\right\} }{\sum }} &{} p_{\alpha _{1}...\alpha _{n}} &{} &{} i=1,2,...,n\\ p_{ij} &{} =\underset{\alpha _{i},\alpha _{j}=+}{\underset{\alpha _{1},...,\alpha _{n}\in \left\{ +,-\right\} }{\sum }} &{} p_{\alpha _{1}...\alpha _{n}} &{} &{} (i,j)\in S \end{array} \end{aligned}$$
(8)

Consider the set \(\Omega \) of all possible correlation vectors that can be experimentally realized—with fixed type of measurements, but varying ways in which the system is prepared before the measurements are carried out. One assumes that \(\Omega \) is a convex set in \(\mathbb {R}^{n+\left| S\right| }\), so that the statistical mixture of realizable probabilities is also realizable. \(\Omega \) can be associated with the system’s “state space.” The third notion of classicality (Barrett, 2007; Werner, 2014a) characterizes the state space of a system as a convex set: it requires that the state space of a classical system be a simplex, that is, every state has a unique decomposition as a convex combination of extreme points of state space. Since correlation vectors don’t necessarily provide a complete description of the system’s “state” in the sense of specifying the probabilities of atomic events, here we give a slightly modified formulation of this idea, one where \(\Omega \) itself is not required to be a simplex, but be obtainable as a projection (that is, partial description) of one.

Definition 3

Let \(\Omega \subset \mathbb {R}^{n+\left| S\right| }\) be a set of correlation vectors. \(\Omega \) is projectable from a probability simplex iff there exists a probability simplex \(\Delta _{d}\subset \mathbb {R}^{d}\) with d number of vertices for some positive integer d, a linear map \(\varphi :\mathbb {R}^{d}\rightarrow \mathbb {R}^{n+\left| S\right| }\), and sets of indices \(R_{i}\subseteq \left\{ 1,2,...,d\right\} ,i=1,2,...,n\) such that

$$\begin{aligned} \begin{array}{rclcl} \Omega\subseteq & {} \varphi \left( \Delta _{d}\right) \end{array} \end{aligned}$$
(9)

and for all \(\overrightarrow{p}\in \Omega ,\varvec{\pi }\in \Delta _{d}\), if \(\varphi \left( \varvec{\pi }\right) =\overrightarrow{p}\) then

$$\begin{aligned} \begin{array}{rclcl} p_{i} &{} =\underset{r\in R_{i}}{\sum }\!\!\!\!\!&{} \pi _{r} &{} &{} i=1,2,...,n\\ p_{ij} &{} =\underset{r\in R_{i}\cap R_{j}}{\sum }\!\!\!\!\!&{} \pi _{r} &{} &{} (i,j)\in S \end{array} \end{aligned}$$
(10)

where \(\pi _{r}\) is the rth component of \(\varvec{\pi }\).Footnote 3

In the foundations of QM literature the above notions of classicality are often used interchangeably. For special, EPRB-type correlation vectors, the equivalence of the first two notions is an immediate consequence of results by Fine (1982) and Pitowsky (1989). However, all three notions are in fact equivalent, for generic correlation vectors (for proof see Appendix):

Proposition 4

Consider a set of correlation vectors \(\Omega \subset \mathbb {R}^{n+\left| S\right| }\). The following conditions are equivalent:

  1. (i)

    For all \(\overrightarrow{p}\in \Omega \), \(\overrightarrow{p}\) admits a classical probability space representation.

  2. (ii)

    For all \(\overrightarrow{p}\in \Omega \), \(\overrightarrow{p}\) is extractable from a joint distribution.

  3. (iii)

    \(\Omega \) is projectable from a probability simplex.

The consequence of Proposition 4 is that classicality as a probabilistic notion has an unambiguous meaning. Many hold that classicality thus construed is an implicit assumption of Bell’s theorem, and giving up this assumption then provides a way to get around the violation of Bell’s inequalities—a particularly natural way, it is held, given that classicality is already in contradiction with the fundamental principles of QM. In the next section we will argue that this picture is mistaken: in fact, classicality is not a presupposition of Bell’s theorem, in addition to the standard causal-statistical assumptions, rather it is a corollary of those.

3 Classicality and Bell’s theorem

Bell’s theorem can be and has been formulated in various different ways. Here we consider a commonly accepted derivation that is more general than Bell’s original 1964 reasoning in that it doesn’t presuppose perfect correlations and is based on the notion of a common cause.

The probabilities measured in an EPRB experiment and encoded in correlation vector \(\overrightarrow{p}_{\text {EPR}}\) ((3)–(5)) in general display statistical correlations between outcomes in the two wings. In general, we have

$$\begin{aligned} \begin{array}{rclcl} p\left( A_{i}\cap B_{j}|a_{i}\cap b_{j}\right)\ne & {} p\left( A_{i}|a_{i}\cap b_{j}\right) p\left( B_{j}|a_{i}\cap b_{j}\right)&\,&(i,j)\in S_{\text {EPR}}\end{array} \end{aligned}$$
(11)

Since the two wings are space-like separated, the only way these correlations can be explained is by assuming the existence of some correlated properties, commonly described by a “hidden variable,” that the particles carry with themselves right from their emission and that are responsible for the outcomes (even if in a probabilistic sense).Footnote 4 As many have rightly emphasized (e.g. Bell , 2004, pp. 143–144; Norsen , 2007, pp. 318–319; Maudlin , 2014a, p. 5), these pre-existing properties are not presupposed but inferred in Bell’s reasoning. Given the experimentally verified statistics, the fundamental presuppositions from which the existence derives are in most general terms captured by the following two principles.

  1. 1.

    Locality: There can exist no direct causal connection between space-like separated events.

  2. 2.

    Common Cause Principle: Robust probabilistic correlations do not occur in nature as a matter of pure accident, or mere hap. Any such correlation must be brought about either by direct or by common causal connection.Footnote 5

1–2 entail the existence of a common cause—which we may think of as the physical event that determines the physical properties of the particles after emission—in terms of which the EPRB correlations (11) can be explained.Footnote 6 What it means to explain these correlations is characterized by a second pair of assumptions.

Let \(C_{k}(k\in K)\) denote a partition of events, describing the common cause.Footnote 7

  1. 3.

    Factorization:

    $$\begin{aligned} \begin{array}{rclcl} p\left( A_{i}\cap B_{j}|a_{i}\cap b_{j}\cap C_{k}\right) &{} = &{} p\left( A_{i}|a_{i}\cap C_{k}\right) p\left( B_{j}|b_{j}\cap C_{k}\right) \\ &{} &{} (i,j)\in S_{\text {EPR}},k\in K \end{array} \end{aligned}$$
    (12)
  2. 4.

    No-conspiracy:

    $$\begin{aligned} \begin{array}{rclcl} p\left( C_{k}|a_{i}\cap b_{j}\right) &{} = &{} p\left( C_{k}\right) \\ &{} &{} (i,j)\in S_{\text {EPR}},k\in K \end{array} \end{aligned}$$
    (13)

Both factorization and no-conspiracy are statistical independence conditions. Factorization expresses the requirement that conditionalizing on the common cause \(C_{k}\) leaves no residual correlation between \(A_{i}\) and \(B_{j}\), given the chosen measurement angles on their respective sides. No-conspiracy expresses the assumption that the choice about which angles to measure, which is something that can be done at the last moment and by any selection-procedure one likes, can not influence, nor be influenced by, the common cause, and hence the two must be uncorrelated.

It is worth noting that the four conditions are in fact inextricably intertwined. Both factorization and no-conspiracy incorporate the Common Cause Principle in a trivial sense: if there could be robust correlations in the world occurring as a matter of pure accident, then requiring these independence conditions would have no ground, for any such accidental correlation could spoil these independencies. Further, factorization is often formulated as a joint result of two conditions: 1) outcome independence

$$\begin{aligned} p\left( A_{i}\cap B_{j}|a_{i}\cap b_{j}\cap C_{k}\right) =p\left( A_{i}|a_{i}\cap b_{j}\cap C_{k}\right) p\left( B_{j}|a_{i}\cap b_{j}\cap C_{k}\right) \end{aligned}$$
(14)

which is a characterization of the common cause as a screener-off; and 2) parameter independence

$$\begin{aligned} p\left( A_{i}|a_{i}\cap b_{j}\cap C_{k}\right) = p\left( A_{i}|a_{i}\cap C_{k}\right) \end{aligned}$$
(15)
$$\begin{aligned} p\left( B_{j}|a_{i}\cap b_{j}\cap C_{k}\right) = p\left( B_{j}|b_{j}\cap C_{k}\right) \end{aligned}$$
(16)

which is taken to be required by locality. Finally, note that no-conspiracy, as a statement about statistical independence, is also a compound condition: it not only incorporates the idea of the autonomy of measurement choice (sometimes referred to as no-conspiracy in a narrower sense), according to which the settings of measurements can neither be directly influenced by the \(C_{k}\)-s, nor can there be common causal connection between them. But it also incorporates the idea of no retrocausation. This is because statistical independence (13) could also break down in a way that, reversely, the measurement choices have an effect on the \(C_{k}\)-s, and since the measurement choice can be made at the last moment before the measurement, while the \(C_{k}\)-s, characterizing the common cause, are localized at the emission of particles, this would involve retrocausal connection.

Conditions 1–4 above will be referred to as the standard causal-statistical assumptions of Bell’s theorem. We do not claim that these conditions cover every detail that the derivation of Bell’s inequalities rests uponFootnote 8—though we believe they condense the substantial assumptions—, nor are these conditions independent or non-redundant, as we have just seen. None of this will be relevant to our argument; the main ingredient of which is the mathematical fact that given these four assumptions correlation vector \(\overrightarrow{p}_{\text {EPR}}\) must be classical. To formulate and prove this we will use Pitowsky’s characterization of classicality as is encapsulated in Definition 1. The following statement is a consequence of results by Fine (1982) in conjunction with Proposition 4. In the Appendix we give a more explicit proof of it based on Hofer-Szabó (2020).

Proposition 5

Suppose that there is a partition of events \(C_{k}(k\in K)\) for which (12)–(13) hold. Then \(\overrightarrow{p}_{\text {EPR}}\) admits a classical probability space representation.

Conditions 1–2 plus the EPRB statistics imply the existence of a common cause (hidden variable) assumed to be characterized by conditions 3–4. Conditions 3–4 imply that \(\overrightarrow{p}_{\text {EPR}}\) must be classical. Thus, in sum, the standard causal-statistical assumptions of Bell’s theorem imply that \(\overrightarrow{p}_{\text {EPR}}\) is a classical correlation vector.

In light of this result, the following remarks about the conceptual terrain are in order. Firstly, as Pitowsky (1989) and Fine (1982) proved, classicality (the mathematical condition) alone implies Bell’s inequalities. Therefore it is strictly speaking incorrect to say, as Werner (2014a) does, that classicality is an additional premise of Bell’s theorem, on top of the standard causal-statistical assumptions. Werner here seems to have in mind the introduction of the common cause \(C_{k}\) (“a hidden state \(\lambda \)”), which he takes to either imply, or be equivalent to, classicality. But as we have seen, the necessity of introducing \(C_{k}\) follows from the standard causal-statistical assumptions alone, so it is not an additional assumption.

Secondly, the Pitowsky–Fine derivation of Bell’s inequalities is often interpreted as a demonstration that Bell’s theorem has nothing to do with locality, causality, etc., but is instead essentially about probability. We have two remarks on this view. The first one is simply logic: Since Bell’s inequalities can be derived from two alternative sets of premises (the standard causal-statistical assumptions on the one hand, and classicality on the other), the violation of the inequalities implies that both sets of premises must contain a false one. That is, both classicality and the standard causal-statistical assumptions have to be violated in the world.

But this picture is still, on its own, potentially misleading. For while the standard causal-statistical assumptions (locality, the Common Cause Principle, no-conspiracy, etc.) are all robust physical/metaphysical principles which we have strong reasons to assume, the mathematical condition of classicality in itself is completely unreasonable and unmotivated. This is because the components of correlation vector \(\overrightarrow{p}_{\text {EPR}}\), (5), are conditional probabilities:

$$\begin{aligned} \overrightarrow{p}_{\text {EPR}}{\left\{ \begin{array}{ll} \begin{array}{rclcl} p_{1} &{} = &{} p\left( A_{1}|a_{1}\right) \\ p_{2} &{} = &{} p\left( A_{2}|a_{2}\right) \\ p_{3} &{} = &{} p\left( B_{3}|b_{3}\right) \\ p_{4} &{} = &{} p\left( B_{4}|b_{4}\right) \\ p_{13} &{} = &{} p\left( A_{1}\cap B_{3}|a_{1}\cap b_{3}\right) \\ p_{14} &{} = &{} p\left( A_{1}\cap B_{4}|a_{1}\cap b_{4}\right) \\ p_{23} &{} = &{} p\left( A_{2}\cap B_{3}|a_{2}\cap b_{3}\right) \\ p_{24} &{} = &{} p\left( A_{2}\cap B_{4}|a_{2}\cap b_{4}\right) \end{array}\end{array}\right. } \end{aligned}$$
(17)

Values of conditional probabilities pertaining to different conditions do not form a probability measure in general, and so it makes no sense in general to require that these values obey Kolmogorov’s axioms, that is, that they be representable in a classical probability space in accord with Definition 1.Footnote 9

The only way classicality is motivated in EPRB is that, as stated by Proposition 5, it is a mathematical corollary of the standard causal-statistical assumptions of Bell’s theorem. So what the Pitowsky–Fine derivation of Bell’s inequalities provides is not a new understanding of Bell’s theorem, but yet another way of deriving Bell’s inequalities from the standard assumptions:

standard causal-statistical assumptions

\(\Downarrow \)

classicality

\(\Downarrow \)

Bell’s inequalitiesFootnote 10

Therefore, the mathematical condition of classicality makes sense in cases where the standard causal-statistical assumptions apply: where we have space-like separated subsystems that are assumed to behave locally, etc. Importantly, when one or more of those assumptions does not hold, classicality (again, the mathematical condition) may fail to hold, even in classical physical systems.

One way that this can happen is if the system in question is composed of time-like separated subsystems that are allowed to interact. As an example, imagine Laurel and Hardy on a teeter-totter. Assume that Hardy weights twice as much as Laurel. We want to see if Laurel and Hardy go up or down, under the following conditions:

$$\begin{aligned} a_{1}&:&\text {Laurel sits 1.5 meter away from the center of the teeter-totter}\\ a_{2}&:&\text {Laurel sits 1 meter away from the center}\\ b_{3}&:&\text {Hardy sits 1.5 meter away from the center }\\ b_{4}&:&\text {Hardy sits 0.5 meter away from the center} \end{aligned}$$

Introduce the following outcome events:

$$\begin{aligned} A_{1}&:&\text {Laurel goes up when he sits 1.5 meter away from the center}\\ A_{2}&:&\text {Laurel goes up when he sits 1 meter away from the center}\\ B_{3}&:&\text {Hardy goes down when he sits 1.5 meter away from the center }\\ B_{4}&:&\text {Hardy goes down when he sits 0.5 meter away from the center} \end{aligned}$$

Suppose that the teeter-totter experiment is performed repeatedly. Elementary physics entails the following probabilities:Footnote 11

$$\begin{aligned} \begin{array}{rclcl} p_{13} &{} = &{} p\left( A_{1}\cap B_{3}|a_{1}\cap b_{3}\right) &{} = &{} 1\\ p_{14} &{} = &{} p\left( A_{1}\cap B_{4}|a_{1}\cap b_{4}\right) &{} = &{} 0\\ p_{23} &{} = &{} p\left( A_{2}\cap B_{3}|a_{2}\cap b_{3}\right) &{} = &{} 1\\ p_{24} &{} = &{} p\left( A_{2}\cap B_{4}|a_{2}\cap b_{4}\right) &{} = &{} \frac{1}{2} \end{array} \end{aligned}$$
(18)

Further, assuming that both Laurel and Hardy sit far from the center and close to the center half of the times, independently of each other, we have

$$\begin{aligned} \begin{array}{rclcl} p_{1} &{} = &{} p\left( A_{1}|a_{1}\right) &{} = &{} \frac{1}{2}\\ p_{2} &{} = &{} p\left( A_{2}|a_{2}\right) &{} = &{} \frac{3}{4}\\ p_{3} &{} = &{} p\left( B_{3}|b_{3}\right) &{} = &{} 1\\ p_{4} &{} = &{} p\left( B_{4}|b_{4}\right) &{} = &{} \frac{1}{4} \end{array} \end{aligned}$$
(19)

Now, correlation vector \(\overrightarrow{p}_{\text {LH}}=\left( p_{1},p_{2},p_{3},p_{4},p_{13},p_{14},p_{23},p_{24}\right) \) is not classical since, for example, \(p_{1}<p_{13}\). There certainly cannot exist a classical probability space with events \(E_{1}\) and \(E_{3}\) in it such that

$$\begin{aligned} \begin{array}{rclcl} p_{1} &{} = &{} \mu \left( E_{1}\right) \\ p_{13} &{} = &{} \mu \left( E_{1}\cap E_{3}\right) \end{array} \end{aligned}$$
(20)

as that would entail \(\mu \left( E_{1}\right) <\mu \left( E_{1}\cap E_{3}\right) \), in contradiction with Kolmogorov’s laws of probability. But this fact by no means implies the break down of classical probability theory, let alone classical physics, in any sense. It is simply that the values of classical conditional probabilities pertaining to different conditions—\(p_{1}=p\left( A_{1}|a_{1}\right) \) and \(p_{13}=p\left( A_{1}\cap B_{3}|a_{1}\cap b_{3}\right) \)—do not form a probability measure.

At the same time, this example witnesses an obvious violation of the standard assumptions of Bell’s theorem.Footnote 12 The outcomes are correlated, for we have:

$$\begin{aligned} \begin{array}{rclcl} \frac{1}{2}=p\left( A_{2}\cap B_{4}|a_{2}\cap b_{4}\right)\ne & {} p\left( A_{2}|a_{2}\cap b_{4}\right) p\left( B_{4}|a_{2}\cap b_{4}\right) =\frac{1}{2}\cdot \frac{1}{2}\end{array} \end{aligned}$$
(21)

The obvious explanation of this correlation is the direct physical influence between the two ends of the teeter-totter, which ensures that when one end goes down, the other end goes up. Since there is direct causal connection, the Common Cause Principle no longer demands the existence of a common cause satisfying factorization and no-conspiracy. Indeed, there is just no such an event in the example. For instance, \(B_{4}\) as a potential direct cause of \(A_{2}\) does screen off correlation (21), meaning that partition \(\left\{ C_{k}\right\} _{k=1,2}=\left\{ B_{4},\lnot B_{4}\right\} \) satisfies outcome independence (14), as well as factorization (12). But \(B_{4}\) fails to satisfy no-conspiracy (13), for the obvious reason that \(B_{4}\) can only occur when the corresponding “measurement” \(b_{4}\) is performed, so there is strong correlation between \(B_{4}\) and \(b_{4}\). Again, the fact that there is no common cause \(C_{k}\) satisfying the standard Bell assumptions comes as no surprise since, unlike in the EPRB scenario, events on the two sides of the teeter-totter can, and in fact do, causally influence each other, with no contradiction to locality.

With these remarks in mind, it is worth mentioning a strand of approaches attempting to get around Bell’s theorem. The experimental violation of Bell’s inequalities in conjunction with the Pitowsky–Fine derivation of the inequalities implies that \(\overrightarrow{p}_{\text {EPR}}\) does not admit a classical probability space representation. A reaction shared by many scholars is that the way to evade this problem is generalizing the notion of probability space, relaxing some of Kolmogorov’s axioms, so that under the generalized notion of probability space \(\overrightarrow{p}_{\text {EPR}}\) does admit a probability space representation. There are many proposals in this direction, for an overview see Feintzeig and Fletcher (2017). In light of what we said about classicality, here we briefly mention two possible concerns with these approaches. First of all, it must be emphasized that there is a clear sense in which the probability values in \(\overrightarrow{p}_{\text {EPR}}\) can be represented in a classical probability space: not as absolute probabilities of events, as Definition 1 requires, but as conditional probabilities conditioned on different conditioning events, in line with (17).Footnote 13 Similarly in our example: \(\overrightarrow{p}_{\text {LH}}\) is not a classical correlation vector, but no one would take this as evidence of Kolmogorov’s probability rules being violated by the teeter-totter system. If one wants to write down the numbers in \(\overrightarrow{p}_{\text {LH}}\) as values of probabilities in a probability space, they will be conditional probabilities in a classical probability space. Accordingly, to accommodate the Bell inequality violating correlation vectors in a probability space it is not necessary to generalize the notion of probability. Secondly, even if one did that, it must be clear that this move doesn’t save locality. This is because classicality (in the sense of Definition 1), as we have argued, is not among the premises of the standard derivation of Bell’s inequalities—it is just not a condition that one could deny in order to evade the derivation of the inequalities from locality, no-conspiracy, etc. (without also having to deny one of these standard assumptions).Footnote 14

Note that what we said about classicality is equally true for the existence of non-contextual hidden variables: 1) Bell’s inequalities can be derived from the existence of non-contextual hidden variables (Shimony , 1984, pp. 30–31). 2) Many believe that this makes the derivation from the standard causal-statistical assumptions irrelevant. 3) But the derivation from non-contextual hidden variables does not invalidate the derivation from the standard causal-statistical assumptions: the violation of Bell’s inequalities implies that both non-contextual hidden variables and locality (or another one of the standard causal-statistical assumptions) must go. 4) Furthermore, as with classicality, the existence of non-contextual hidden variables in itself is not well-motivated. Our teeter-totter system is again a good example, since it displays an obvious violation of non-contextuality in the following sense. Let \(C_{k}\) now describe not a common cause but the system’s “ontic state,” that is, for the teeter-totter system, all the physical factors together, including the weights of Laurel and Hardy and the small perturbations in play in the balance case (but excluding conditions \(a_{1},a_{2},b_{3},b_{4}\)), that go into determining which one of the two goes up and down. Non-contextuality is the condition that the ontic state determines the probability of the outcomes of each measurement independently of what other measurements are simultaneously performed;Footnote 15 which is, in our case, is nothing but the condition of parameter independence (15)–(16). The violation of this condition is due to the fact that, even when \(C_{k}\) is given,Footnote 16 the outcome on one side (whether Laurel/Hardy goes up or down) depends on the measurement choice on the other side (where Hardy/Laurel sits, respectively). Again, contextuality in this sense comes as no surprise since the two ends can and do physically interact. 5) That said, the existence of non-contextual hidden variables, just as classicality, does follow when we have space-like separated subsystems that are assumed to behave locally, etc., that is, where the standard causal-statistical assumptions apply. For in that case \(C_{k}\) in assumptions 3–4 will just be a non-contextual hidden variable. The most well-known derivation of non-contextual hidden variables from locality, etc. is the EPR argument.

Indeed, the tight connection between classicality, non-contextual hidden variables, and the standard causal-statistical assumptions is especially transparent in the deterministic case. Suppose we have parallel measurement directions in the two wings of EPRB, with perfect correlation between outcomes of measurements in the same direction. Perfect correlation can only be explained by a deterministic common cause (Hofer-Szabó et al. , 2013, p. 15, Proposition 2.7). This fact, together with no-conspiracy (13), imply that \(p\left( A_{i}|a_{i}\cap b_{j}\cap C_{k}\right) ,p\left( B_{j}|a_{i}\cap b_{j}\cap C_{k}\right) \in \left\{ 0,1\right\} ,(i,j)\in S_{\text {EPR}}\). Parameter independence (15)–(16) further entails

$$\begin{aligned} \begin{array}{rclcl} p\left( A_{i}|a_{i}\cap C_{k}\right) ,p\left( B_{j}|b_{j}\cap C_{k}\right)\in & {} \left\{ 0,1\right\}&\,&(i,j)\in S_{\text {EPR}}\end{array} \end{aligned}$$
(22)

Now introduce the following events:

$$\begin{aligned} \begin{array}{rclcl} C_{A_{i}} &{} = &{} \underset{\underset{p\left( A_{i}|a_{i}\cap C_{k}\right) =1}{k\in K}}{\bigcup }C_{k}\\ C_{B_{j}} &{} = &{} \underset{\underset{p\left( B_{j}|b_{j}\cap C_{k}\right) =1}{k\in K}}{\bigcup }C_{k}\\ &{} &{} (i,j)\in S_{\text {EPR}} \end{array} \end{aligned}$$
(23)

In conjunction with (22), the standard causal-statistical assumptions imply (see formula (32) in Appendix):

$$\begin{aligned} \begin{array}{rclcl} p\left( A_{i}|a_{i}\right) &{} = &{} p\left( C_{A_{i}}\right) \\ p\left( B_{j}|b_{j}\right) &{} = &{} p\left( C_{B_{j}}\right) \\ p\left( A_{i}\cap B_{j}|a_{i}\cap b_{j}\right) &{} = &{} p\left( C_{A_{i}}\cap C_{B_{j}}\right) \\ &{} &{} (i,j)\in S_{\text {EPR}} \end{array} \end{aligned}$$
(24)

That is, the conditional probabilities of outcome events figuring in \(\overrightarrow{p}_{\text {EPR}}\) must be equal with the absolute probabilities of the events that predetermine these outcomes. This gives Proposition 5 a straightforward interpretation: since on the right hand side we have absolute probabilities of events that are representable in a classical probability space, the values of conditional probabilities on the left hand side must also be so representable: \(\overrightarrow{p}_{\text {EPR}}\) must be classical. On the other hand, what (24) says is that the measurement outcomes \(A_{i},B_{j}\) simply reveal the pre-existing properties \(C_{A_{i}},C_{B_{j}}\)—which is exactly the idea behind non-contextual hidden variables. It must be stressed that equalities (24), and thus both classicality and the existence of non-contextual hidden variables, are consequences of the standard causal-statistical assumptions of Bell’s theorem, including, importantly, the causal separation of the subsystems we consider.

All this implies that when the system in question is not composed of space-like separated subsystems then we have no automatic reason to expect that classicality, the existence of non-contextual hidden variables and Bell’s inequalities will be satisfied in the first place. A quantum example of such a system is precisely that of the spin-3/2 Neon atom, as discussed in Griffiths (2020): Looking at expected values of spins along different axes, one can derive predicted values that violate a Bell inequality (specifically the CHSH inequality). Griffiths (2020, p. 3) urges us, on the basis of this example, to view things in this way (similar to Pitowsky and Fine quoted in the introduction):

Thus the violation of the CHSH inequality in this case has nothing to do with nonlocality. Instead it has everything to do with the fact that in quantum mechanics, unlike classical mechanics, physical properties and variables are represented by noncommuting operators.

Let us make a few remarks here. Firstly, one way to characterize the significance of noncommuting operators is what Fine (1982) suggests in the passage quoted in the introduction: according to the accepted view physical variables represented by noncommuting operators cannot be measured simultaneously and so in general they need not, and in fact they do not, have a joint distribution, in contradiction with what’s required by classicality in the sense of Definition 2. Notice however that the failure of existence of joint distributions in that sense is not specific to QM. Indeed, the same holds in our Laurel and Hardy example: since all three definitions of classicality are equivalent, and classicality in the sense of Definition 1 is violated in the Laurel and Hardy case, this means that classicality in the sense of Definition 2 also fails to hold for this simple classical physical system. Therefore, the absence of joint distributions that comes with noncommuting operators in the formalism doesn’t seem to mark off quantum physics from classical physics.

Secondly, there indeed is an intuition behind non-contextual hidden variables that might come from (simple cases in) classical physics. In the Neon atom case, one might say “the value of its spin along any given direction ought to be well-defined at all times; after all, spin might be roughly analogous to a classical angular momentum vector, the projection of which along any direction always has a well-defined value.” But the idea of non-contextual hidden variables not only incorporates the assumption that there are real, well-defined properties existing at all times; but, crucially, it brings with it a specific, and rather simplified, picture of how these properties are revealed in measurements: that the outcome of a measurement only depends on the corresponding property at present, irrespective of what other measurements, that is physical interactions, take place. However, this latter picture of the measurement process is something that is generally, perhaps even typically, not true in classical physics. The Laurel and Hardy example is again a case in point. The physics of it incorporates real properties that are well-defined at all times: the weights of Laurel and Hardy, their distances from the center, the density distribution of the board, etc. All these real properties altogether will determine the measurement outcomes, that is which end will go up and down. But this determination is contextual: whether Laurel goes up or down will depend not only on a property of Laurel but also on where Hardy sits. Since non-contextual hidden variables do not exist in general even for classical physical systems, it doesn’t seem terribly surprising, in itself, that their existence is not provided for generic physical systems, including the Neon atom.Footnote 17

Finally, we agree with Griffiths that the violation of a Bell inequality doesn’t necessarily have to do with non-locality, as exemplified by the Neon case. Indeed, the same is true of the violation in the Laurel and Hardy example. While Laurel and Hardy are somewhat spatially separated, that is inessential; what matters is that the events \(A_{1},B_{3}\) etc. are (a) not space-like separated and indeed are (b) causally connected. So the violation of a Bell inequality in general is neither surprising nor relevant to whether there is locality in the world. By contrast, when a Bell inequality has been derived for a setup like EPRB experiments, with locality as a fundamental physical assumption among the premises, and the inequality is found to be violated in actual experiments, this does bear on the principle of locality!Footnote 18 By analogy: to say that the violation of Bell’s inequalities in EPRB has nothing to do with non-locality because the inequalities can be violated in cases where locality perfectly holds, is no better than saying that the violation of energy conservation in a closed system has no bearing on the laws of thermodynamics because energy conservation can be violated in an open system in complete harmony with the those laws.

4 Two ways to get around Bell’s theorem

Classicality is thus not a condition that one could deny without also having to deny one of the standard causal-statistical assumptions of Bell’s theorem. If any theory of quantum phenomena is to avoid the derivation of Bell’s inequalities, it has to give up one of those standard premises. We will now look at two versions of standard QM, Werner’s operational quantum mechanics and Griffiths’ consistent histories approach, both of which are claimed to evade Bell’s theorem by giving up classicality, that is, claimed to be local non-classical quantum theories. We describe what the two versions are claimed to say about the example of the EPRB scenario, and identify which one of the standard causal-statistical assumptions of Bell’s theorem each theory is in fact forced to give up.

Operational quantum mechanics (OQM) is basically a variant of standard QM in which quantum states \(\psi \) are treated as purely epistemic, i.e., as tools for calculating what results to expect from measurements made in various scenarios. In the EPRB case this means that the only role of \(\psi \) is to recover, through Born’s rule, the measurement statistics encapsulated in \(\overrightarrow{p}_{\text {EPR}}\):

$$\begin{aligned} \begin{array}{rclcl} \left\langle \psi _{s},\hat{A}_{i}\otimes \hat{I}\psi _{s}\right\rangle &{} = &{} p\left( A_{i}|a_{i}\right) &{} &{} i=1,2\\ \left\langle \psi _{s},\hat{I}\otimes \hat{B}_{j}\psi _{s}\right\rangle &{} = &{} p\left( B_{j}|b_{j}\right) &{} &{} j=3,4\\ \left\langle \psi _{s},\hat{A}_{i}\otimes \hat{B}_{j}\psi _{s}\right\rangle &{} = &{} p\left( A_{i}\cap B_{j}|a_{i}\cap b_{j}\right) &{} &{} (i,j)\in S_{\text {EPR}} \end{array} \end{aligned}$$
(25)

where \(\hat{A}_{i},\hat{B}_{j}\) are projection operators pertaining the outcomes \(A_{i},B_{j}\) respectively, and \(\psi _{s}\) is the singlet state.

It is not clear in what sense OQM is meant to be local. On the one hand, Werner seems to suggests that locality simply consists in no-signaling (4), that is in a statistical independence condition:

... in the operational approach no prediction about B changes when or if a measurement or other procedure is carried out on A. This independence is built into the structure of quantum theory. This is also the same as the no-signalling condition and the possibility of tracing out system A, getting a reduced state for B, which does not change (and so is undisturbed) whatever happens just to A. (Werner , 2014b, p. 4)

On the other hand, later he explicitly condemns the conflation of mere statistical dependence and physical disturbance:

... if you condition on the outcome of a measurement on A, you get a modified state for B. That is just another way to look at correlation, but never, not even in classical probability, can this be confused with a physical disturbance. The state change only becomes effective when the results from the two labs are brought together and are jointly analyzed, which can happen centuries later.Footnote 19 (Werner , 2014b, p. 4)

One can nonetheless accept that OQM is local in the sense that the theory doesn’t state—in fact, since it only talks about measurement statistics, it has no words to state—the existence of physical disturbance between the two wings. The problem is that in the very same way, the theory doesn’t contain any kind of causal mechanism that could serve to correlate the outcomes in the two wings. Indeed, since the ontology of OQM only consists of measurement events and perhaps information states of agents, there are no common causes in it either that could explain the EPRB correlations. Thus, OQM violates the Common Cause Principle, the demand that there be no robust regularities without some sort of causal explanation—which was one of the standard assumptions of Bell’s theorem. It is by giving up the Common Cause Principle’s requirement that OQM is able to evade the derivation of Bell’s inequalities. In effect, the defender of OQM says: “Two guys are flipping coins on opposite sides of the Earth, whenever they get a “Flip!” command from their cousin Alice. There is no causal connection—neither direct, nor common causal—between the outcomes. But the two coins always land oppositely. Deal with it.”

Griffiths’ consistent histories approach (CH) to QM is an expression of Bohr’s complementarity idea. In CH, every maximal set of compatible measurements is associated with a so-called “framework.” In the EPRB case we have four frameworks pertaining to the four measurement pairs in \(S_{\text {EPR}}\). Relative to a given framework, in CH, quantum systems possess real properties that measurements simply reveal. Mathematically, one can model this by assigning to each framework a “small” probability space \(\left( X_{ij},\mathcal {A}_{ij},\mu _{ij}\right) ,(i,j)\in S_{\text {EPR}}\).Footnote 20 In each of these spaces we have a partition of events \(C_{ij}^{++},C_{ij}^{+-},C_{ij}^{-+},C_{ij}^{--}\in \mathcal {A}_{ij}\) corresponding to the particles having spin properties \(+\) or − in the chosen pair of directions \((i,j)\in S_{\text {EPR}}\). Then Born’s rule is said to recover the probabilities of these properties:

$$\begin{aligned} \begin{array}{rclcl} \left\langle \psi _{s},\hat{A}_{i}\otimes \hat{I}\psi _{s}\right\rangle &{} = &{} \mu _{ij}\left( C_{ij}^{++}\cup C_{ij}^{+-}\right) \\ \left\langle \psi _{s},\hat{I}\otimes \hat{B}_{j}\psi _{s}\right\rangle &{} = &{} \mu _{ij}\left( C_{ij}^{++}\cup C_{ij}^{-+}\right) &{} &{} (i,j)\in S_{\text {EPR}}\\ \left\langle \psi _{s},\hat{A}_{i}\otimes \hat{B}_{j}\psi _{s}\right\rangle &{} = &{} \mu _{ij}\left( C_{ij}^{++}\right) \end{array} \end{aligned}$$
(26)

These properties are thought to be revealed in measurements and hence the probabilities of these properties return the components of \(\overrightarrow{p}_{\text {EPR}}\):Footnote 21

$$\begin{aligned} \begin{array}{rclcl} \mu _{ij}\left( C_{ij}^{++}\cup C_{ij}^{+-}\right) &{} = &{} \mu _{ij}\left( A_{i}\right) =p_{i}\\ \mu _{ij}\left( C_{ij}^{++}\cup C_{ij}^{-+}\right) &{} = &{} \mu _{ij}\left( B_{j}\right) =p_{j} &{} &{} (i,j)\in S_{\text {EPR}}\\ \mu _{ij}\left( C_{ij}^{++}\right) &{} = &{} \mu _{ij}\left( A_{i}\cap B_{j}\right) =p_{ij} \end{array} \end{aligned}$$
(27)

The essential tenet of CH is the “single framework rule”: properties associated with different frameworks, and represented in different probability spaces, do not coexist and cannot be talked about at the same time. In particular, properties \(C_{ij}^{\alpha \beta }\) do not coexist for different \((i,j)\in S_{\text {EPR}}\), and hence the particles possess spin only in one direction at a time. This notion is what encapsulates Bohr’s idea of complementary descriptions.

In CH, properties \(C_{ij}^{\alpha \beta }\) operate as (deterministic) common causes in the sense that they are assumed to obey outcome independence in each probability space separately:

$$\begin{aligned} \begin{array}{rclcl} \mu _{ij}\left( A_{i}\cap B_{j}|C_{ij}^{\alpha \beta }\right) &{} = &{} \mu _{ij}\left( A_{i}|C_{ij}^{\alpha \beta }\right) \mu _{ij}\left( B_{j}|C_{ij}^{\alpha \beta }\right) \\ &{} &{} (i,j)\in S_{\text {EPR}},\alpha ,\beta =+,- \end{array} \end{aligned}$$
(28)

Notice however that neither parameter independence nor no-conspiracy can be written down in the “frameworks” formalism. This is because that would require an identification of the \(C_{ij}^{\alpha \beta }\)-s across different frameworks (probability spaces), and that’s exactly what the single framework rule forbids doing. Nonetheless, there is a clear sense in which no-conspiracy is violated in CH. Consider the ensemble of runs of an EPRB experiment. The presence of a spin property of the system, in a given run, depends on the framework in which we choose to the describe the system, which in turn depends on the measurements we choose to perform in the given run. Indeed, spin property \(C_{ij}^{\alpha \beta }\) will only be assumed to be present in a given run if we choose to perform measurements \(a_{i}\) and \(b_{j}\). This means that there is strong correlation, over the runs of the EPRB experiment, between the properties we ascribe to the system and the measurements we choose to perform—which is a violation of no-conspiracy. Again, here we use ‘correlation’ in a relative frequency sense rather than a probabilistic sense expressible in terms of probability spaces \(\left( X_{ij},\mathcal {A}_{ij},\mu _{ij}\right) \). One way to phrase the point is that in CH, the very existence of the \(C_{ij}^{\alpha \beta }\) properties depends on human choices and thus cannot have an antecedent probability. Therefore, the right hand side of equation (13) cannot exist, per the single framework rule.

Thus, in the consistent histories approach, it is the violation of another standard premise of Bell’s theorem, that of no-conspiracy, which blocks the derivation of Bell’s inequalities.

5 Conclusion

As we mentioned in the introduction, many physicists adamantly insist that EPRB-type phenomena do not show that there is genuine non-locality in natural phenomena. Werner and Griffiths are just two prominent voices making such claims in recent years. While they advocate different interpretations of the standard QM formalism, both options can be related to a form of anti-realism. The instrumentalist attitude implicit in OQM is certainly a source of not asking for explanation of correlations, and of not being bothered by the violation of the Common Cause Principle (cf. Lewis , 2019). The violation of no-conspiracy we found in the CH approach can also be a result of some form of anti-realism, in which the measurement process is thought to have a role in constituting the property of the system measured. Such a constitution relation brings about correlation between performing the measurement and ascribing the property—a correlation which is not due to a causal link but due to a logical/analytic connection. The idea is reminiscent of the Bohrian position that Einstein lamented: the moon is only there when you look at it.

In our view, adopting an anti-realist stance of the above sorts does not save locality. The reason is that the very formulation of locality requires a sort of realism (cf. Norsen , 2007). One piece of evidence for this is that in neither of the two “anti-realist” versions of QM in question can one meaningfully formulate the condition of parameter independence (15)–(16), a condition that is usually taken to be a requirement of locality. In OQM there are no \(C_{k}\)-s, so parameter independence cannot be written down. Similarly, as we have seen above, to write down parameter independence in the histories formalism, we need to have an identification of the \(C_{ij}^{\alpha \beta }\)-s across different “frameworks”; but that’s exactly what we are forbidden to have in the CH approach.

If this dialectical situation is acknowledged, then the philosophers impressed by Bell’s theorem and the EPRB experiments can come to a peaceful agreement with physicists who are not so impressed. Those physicists prefer a strong dose of anti-realism in their physics, rather than a realistic physics that incorporates non-locality explicitly. The CH advocate embraces a form of conspiracy between the pre-existing properties that are revealed by measurement and the choices we make of what to measure. The OQM advocate must admit that nature somehow displays (enforces?) the inequality-violating correlations, and that nothing in the properties of the measured particles pre-determine (or at least causally explain) what the results would be (cf. Alice’s two friends flipping coins on opposite sides of the world). In both approaches, it must remain a mystery how nature can display these correlations between chancy events at space-like separation.Footnote 22 As philosophers, we would only ask that the physicists refrain from making two sorts of statements (i) Saying that the QM treatment of EPRB is perfectly local (though they can perfectly well say that the QM treatment is not overtly non-local!). (ii) Saying that Bell did not prove what many philosophers think he proved, because he made a tacit and inappropriate presupposition of “classicality” in his argument. As we have seen (Section 3), “classicality” is a consequence of the standard causal and statistical assumptions made in Bell’s argument, not a separate, tacit assumption.