Classical models of quantum mechanics

This chapter gives an introduction to a chain of results attempting to exclude deeper layers underneath quantum mechanics that restore some form of classical physics: ‘[Such results] more or less illustrate the ways along which some opponents might hope to escape Bohr’s reasonings and von Neumann’s proof and the places where they are dangerously near breaking their necks.’ (Groenewold, 1946, p. 454). In so far as they are mathematically precise, such no-go results have their roots in von Neumann’s 1932 book, which gave rise to two traditions that were often in polemical opposition to each other.

In so far as they are mathematically precise, such no-go results have their roots in von Neumann's 1932 book, which gave rise to two traditions that were often in polemical opposition to each other. Mathematically minded authors typically admired von Neumann's exclusion of hidden variables, yet tried to strengthen his theorem by weakening its assumptions; this sparked, for example, Gleason's Theorem (1957) as well as the Kochen-Specker Theorem (1967). Certain physicists (led by Bell), on the other hand, tried to circumvent (and later even ridicule) von Neumann's work. A high point of this tradition was Bell's Theorem from 1964, which was informed not only by von Neumann, but even more so by the famous Einstein-Podolsky-Rosen (EPR) paper from 1935, as well as by Bohm's deterministic pilot wave reformulation of quantum mechanics (1952). However, at the end of the day these traditions turned out to be not really divergent after all: Bell not only independently (and earlier) obtained a version of the Kochen-Specker Theorem, but, more importantly, his results from 1964 turn out to be very closely related to the culmination of the first tradition in the form of the so-called Free Will Theorem (FWT), which was published by Conway and Kochen during 2006-2008. Indeed, although its validity is uncontroversial, this theorem has been criticized on the following grounds: 1. Lack of novelty compared with the famous paper by Bell (1964), whose assumptions and conclusions are at least quite similar to those of the FWT (although the underlying proofs are mathematically quite distinct from those in the FWT). 2. Lack of novelty even within its own terms: versions of the FWT had actually been around for decades under less illustrious titles and authorships, e.g. Heywood & Redhead (1983), Stairs (1983), Brown & Svetlichny (1990), and Clifton (1993). 3. Circularity, in that indeterminism is presupposed (namely in the assumption that 'experimenters have a certain freedom') instead of derived.
One aim of this chapter is to clarify these matters, with the following conclusions: 1. The difference between earlier literature in the same direction and the FWT is largely one of emphasis, namely on free will (!), exemplifying a recent trend (also found elsewhere) in emphasizing free choice of the settings of experiments. Unfortunately, like Bell, Conway and Kochen even mathematically use an informal way of talking about free settings, not to speak of the complete absence of any serious philosophical analysis of free will among all three authors (for which perhaps Bell, but certainly not Conway and Kochen may be excused). 2. Granting the informal characterization of free settings, both Bell's (1964) Theorem and the FWT establish a contradiction between quantum mechanics, determinism, and locality (in the sense of Bell, which in the presence of determinism reduces to a no-signaling condition called parameter independence). 3. The technical difference between Bell's Theorem and the FWT lies in four facts: a. Bell's arguments rely on probability theory (whereas the FWT does not). b. The (optical) corner of quantum mechanics used in Bell's Theorem may be replaced by the corresponding experimental results, whereas the FWT uses uncontroversial yet untested predictions about massive spin-1 particles. c. The FWT must assume perfect (EPR) correlations, which are difficult to realize and hence are avoided by later versions of Bell's Theorem (i.e. through the CHSH inequalities rather than the original Bell inequalities). d. Like EPR, Bell and his followers focused on locality right from the beginning, and hence in Bell (1964) the inference is from locality to determinism. Conway and Kochen, on the other hand, resolve the contradiction their FWT established by inferring randomness of outcomes from freedom of settings.
We start with a very simple treatment of both von Neumann's argument against linear hidden variables and Kochen & Specker's refinement of it, in which von Neumann's controversial linearity assumption is decisively weakened so as to only apply to commuting operators; the Kochen-Specker Theorem excludes what are called non-contextual quasi-linear hidden variables. We then present what we see as a more transparent version of the FWT, whose key ingredient of replacing the noncontextuality assumption in the Kochen-Specker Theorem by a locality condition is preserved, but where this time the setting is completely deterministic. Freedom of choice then arises as a very natural independence assumption, and any threat of circularity is avoided: the conclusion is simply a contradiction between determinism, freedom of choice (i.e. of apparatus settings), locality, and quantum mechanics. Moreover, as we argue in §6.3, the philosophically precise concept of free will used in the assumptions of the FWT is what Lewis coined 'local miracle compatibilism'. Following an interlude on the GHZ Theorem, which seamlessly fits into the given framework, we then turn to Bell's Theorems, which we compare with the FWT. Finally, we give our own rigorous version of an argument first proposed by Colbeck and Renner to the effect that, under suitable freeness of choice and no-signaling conditions (similar to those in Bell's Theorem and the FWT), as long as they are compatible with quantum mechanics, hidden variables are at best irrelevant. In fact, this can only be proved under much stronger assumptions, obscuring the claim.

From von Neumann to Kochen-Specker
Von Neumann's Theorem 6.2 below was the first technical result excluding some class of hidden variables underneath quantum mechanics, namely (in current parlance) linear non-contextual hidden variables. This terminology requires some explanation. First, theorems of this kind apparently accept the mathematical structure of the observables prescribed by the usual formalism of quantum theory, i.e., observables are identified with elements of the self-adjoint part H n (C) ≡ M n (C) sa = {a ∈ M n (C) | a * = a} (6.1) of the algebra M n (C) of n × n matrices (this simple case suffices to make all points of conceptual interest). Short of introducing "hidden" observables, hidden variable theories propose the existence of hidden states, which either replace or supplement the usual quantum states (which in the case at hand would be density operators).
Mimicking classical (statistical) physics, such states are interpreted as probability measures on some phase space X, whose points x ∈ X assign sharp values to quantum-mechanical observables. Naively, this is done through associated functions V x : H n (C) → R, (6.2) but in fact this choice already commits us to the first of two possibilities, which we pragmatically present as theories predicting measurement outcomes: • In non-contextual deterministic theories of measurement, the outcome solely depends on the observable a that is being measured and on the (possibly 'hidden') state of the system. Theorem 6.2 below, then, rules out such theories in which values are sharp (i.e., dispersion-free), and V x in (6.2) is linear. The Kochen-Specker Theorem subsequently proves the same impossibility under a weaker (and physically more reasonable) assumption called quasi-linearity. • Contextual deterministic theories of measurement, on the other hand, allow the outcome of some measurement of a to depend on the measurement context (as well as on the state), which in this case is understood as the choice of possible other (compatible) observables b measured together with a (i.e., ab = ba). This seems a reasonable assumption, well within the spirit of quantum mechanics, though perhaps not so in the extreme form later held by Heisenberg, according to which measurement outcomes (or even "reality") are "created" by the measurement. Under a weakened non-contextuality assumption, Bell's Theorem (cf. §6.5) and the Free Will Theorem ( §6.2) rule out such theories, too.
Definition 6.1. A non-contextual hidden variable is a map V : H n (C) → R that for each a ∈ H n (C), and in terms of the n × n unit matrix 1 n , satisfies V (a 2 ) = V (a) 2 ; (6.3) V (1 n ) = 1. (6.4) That is, V is dispersion-free as well as normalized, respectively. Theorem 6.2. For n ≥ 2, non-zero linear dispersion-free maps V : H n (C) → R do not exist. In particular, linear non-contextual hidden variables do not exist.
Proof. Such maps extend to complex-linear dispersion-free maps V : M n (C) → C by complex linearity, so that theorem is equivalent to Proposition 2.10.
As von Neumann perfectly well understood himself, his seemingly natural linearity assumption (given the mathematical structure of quantum mechanics unearthed by none other than he!) is unwarranted physically (and even mathematically, since eigenvalues and eigenstates, which should be the hallmark of dispersion-free states, are by no means linear in the underlying operator). This suggests the following: Definition 6.3. A map V : H n (C) → R is called quasi-linear if for all s,t ∈ R and all a, b ∈ H n (C) that commute (i.e., ab = ba) one has V (sa + tb) = sV (a) +tV (b). (6.5) As in the linear case, such a map uniquely extends to a map V : M n (C) → C that is precisely a quasi-state in the sense of Definition 2.26. The following lemma will be useful, also showing that the above objections to linearity have been met.
Lemma 6.4. Let V : H n (C) → R be a quasi-linear non-contextual hidden variable.
1. For each a ∈ H n (C), the number λ = V (a) is an eigenvalue of a.
More generally, it follows from Theorem C.24 that if H is a Hilbert space and V : B(H) sa → R is a quasi-linear non-contextual hidden variable (or, equivalently, its complexification V C : B(H) → C is a dispersion-free quasi-state), then V (a) ∈ σ (a) (provided a * = a). This implies the above lemma, but we also provide a direct proof.
Proof. For any b ∈ H n (C) with ab = ba, eq. (6.3) and quasi-linearity imply that V (ab) = V (a)V (b); (6.6) just evaluate V ((a ± b) 2 ) = (V (a) ± V (b)) 2 . Taking b = a 2 etc. and also invoking (6.4) then yields V (p(a)) = p(V (a)) for any polynomial in a. If λ i are the eigenvalues of a, its characteristic polynomial p(a) = ∏ n i=1 (a − λ i ) satisfies p(a) = 0, so that V (p(a)) = 0 and hence p(V (a)) = 0, or ∏ n i=1 (λ − λ i ) = 0. This implies that λ = λ i for some i. The second claim is proved in a similar way. This is the Kochen-Specker Theorem. It follows from Gleason's Theorem 2.28 and von Neumann's Theorem 6.2, since according to Corollary 2.29 to the former, quasistates on M n (C) are actually states (in other words, quasi-linear non-contextual hidden variables are linear). However, Kochen and Specker also gave a direct proof of their theorem, subsequently somewhat simplified along the following lines.
Proof. We prove the claim for n = 3, which (by restricting V to any self-adjoint subalgebra of M n (C) isomorphic to H 3 (C)) implies the result for all n > 3 also. To prove Theorem 6.5 for n = 3, we interpret H 3 (C) as the algebra of observables of a spin-1 particle and introduce the well-known angular momentum matrices In what follows, we will heavily use the squares each of which has eigenvalues 0 and 1. The J 2 i commute by inspection, and satisfy The (matrix-valued) angular momentum vector is given by where (e 1 , e 2 , e 3 ) is the standard basis of R 3 (seen as a vector space with the usual inner product ·, · ), i.e., e 1 = (1, 0, 0), etc., and the angular momentum J u along an arbitrary unit vector u = ∑ i u i e i in R 3 is given by (6.11) This brings us to the crucial point: a map V : H 3 (C) → R induces a mapṼ : S 2 → R on the set S 2 of all unit vectors u in R 3 , viã As usual, a basis of R 3 , denoted by a = (u 1 , u 2 , u 3 ), is always assumed orthonormal.
Proof. If a = (u 1 , u 2 , u 3 ) is a basis, then J u i = uJ i u * for i = 1, 2, 3, where u is the 3 × 3 matrix with entries u i j = u i , e j . Since u is unitary, the matrices J u i and their squares have the same eigenvalues and satisfy the same relations as the J i and their squares. Thus the eigenvalues of J 2 u i are 0 and 1, for fixed a the squares J 2 u i mutually commute, and they satisfy the sum rule (6.9), i.e., J 2 The claim then follows from Definition 6.3 and Lemma 6.4. Now define a coloring of R 3 as any mapṼ : S 2 → {0, 1} satisfying the two properties in Lemma (6.6). The proof of Theorem 6.5 then reduces to the following lemma. Lemma 6.7. There exists no coloring of R 3 .
We will show that one cannot even color this particular finite set of vectors (let alone all unit vectors in R 3 ). We denote a vector u i in a basis a μ by and write e.g.Ṽ (a μ ) = (0, 1, 1) for the three conditions The main point is that if some coloringṼ maps a specific vector u to 0, then all vectors orthogonal to u must go to 1. In particular, two orthogonal vectors can never both be sent to 0. To find a contradiction (to the assumption thatṼ exists), we try to assign valuesṼ (u (μ) i ) one after the other, starting in row 1. Here some specific choices will be made, but by symmetry other choices lead to similar contradictions.
Corollary 6.8. There is no functionṼ with the two properties stated in Lemma 6.6.
The Kochen-Specker Theorem is often stated in the following way. Proof. For H = C 3 , the existence of W would yield the existence ofṼ through (6.18) where u ∈ R 3 is regarded as a vector in C 3 . Property 1 in Lemma 6.6 is obviously satisfied. To prove property 2, we note that for any unit vector u ∈ R 3 ⊂ C 3 , we have (6.19) since an explicit computation based on (6.11) shows that, with u = (u 1 , u 2 , u 3 ), It follows from rotation invariance that the eigenvalues of J 2 u are the same as those of each J 2 i , cf. (6.8), i.e., λ = 0 with multiplicity one and λ = 1 with multiplicity two. Hence (6.19) gives the projection e 0 onto the eigenspace of J 2 u for λ = 0 as Property 2 in Lemma 6.6 then follows from the assumption that W is a coloring. SinceṼ cannot exist by Lemma 6.7, neither can W . This proves the claim for C 3 . We finish by induction. Suppose C n contains some set {u k } k∈K of unit vectors that cannot be colored, assuming that u 0 = (1, 0,..., 0) lies in this set. We embed each u k into C n+1 by adding a zero at the end, calling the image u k . Adding v = (0,..., 0, 1), the only possible coloring of the set {u k , v} k∈K in C n+1 is given by W (u k ) = 0 for each k ∈ K and W (v) = 1. Indeed, if W (u k 0 ) = 1 for some k 0 , then, since v is orthogonal to each u k , we must have W (v) = 0, which means that the original set {u k } k∈K should be colorable in C n , but this is impossible by assumption.
We now embed each u k into C n+1 by adding a zero at the beginning, denoting its image by u k , and add u 0 = (1, 0,..., 0, 0). By the same token, the only coloring of the set {u k , u 0 } k∈K is given by W (u k ) = 0 for each k ∈ K and W (u 0 ) = 1. But this leaves the set The set thus obtained is larger than necessary. For example, already for H = C 4 the following bases cannot be colored (again writing down unnormalized vectors): The proof is the following observation: if we present the coloring condition as then since there are nine such equations the sum of the right-hand sides is odd, whereas the sum of the left-hand sides is even, since each vector appears twice.
To bridge the gap between the Kochen-Specker Theorem and the Free Will Theorem, as well as the one between mathematics and physics, we now rephrase the former as a "mini FWT". We build an experiment consisting of a box containing a spin-1 particle and a device capable of measuring all of the three observables for an arbitrary basis a of R 3 ; since the operators in question commute, this simultaneous measurement is allowed by quantum theory. The choice of a is called the setting of the experiment, traditionally denoted by A (in honor of Alice, who is supposed to perform the experiment), with possible values A = a. In "phenomenological" notation, the observable measured in an experiment like this is called F, which in the case at hand has three components F = (F 1 , F 2 , F 3 ): given the setting a, the observable F i corresponds to J 2 u i . The notation F = λ for λ = (λ 1 , λ 2 , λ 3 ), i.e., F i = λ i , then expresses the fact that the outcome of a measurement of F is λ .
Otherwise, quantum mechanics and our quasi-linear non-contextual hidden variable theory provide a different picture of the experiment. According to the former theory, a given spin-1 particle may be prepared in a (pure) quantum state ψ, which is a unit vector in C 3 . Quantum theory then merely predicts probabilities for the possible outcomes λ , which according to the Born rule (2.21) are given by So if ψ = u i , then the outcome will be λ = λ (i) with probability one, but in a superposition ψ = ∑ i c i u i (with ∑ i |c i | 2 = 1), quantum theory predicts a random sequence of outcomes λ (i) , each with probability |c i | 2 . Let us note that quantum mechanics is non-contextual in the following (probabilistic) sense. Alice could decide to perform just one measurement instead of three, say F 1 , with setting a 1 = u 1 , or perhaps she may not know if the other two are performed. Fortunately, this does not matter, since for any unit vector ψ ∈ C 3 , (6.25) so that according to quantum mechanics, it does not matter for the Born probabilities of the first measurement if the other two are performed or not.
The question now arises if some quasi-linear non-contextual hidden variable theory theory could improve on this, in that the probabilities quantum theory assigns to various outcomes are replaced by predictions. In the sprit of determinism (whilst avoiding the appearance of circularity), such a theory should also predict the settings of the experiment. Accordingly, the assumptions leading to our "mini FWT" are: Definition 6.11. In the context of the experiment on spin-1 particles just discussed: • Determinism firstly means that there is a state space X with associated functions

27)
where X A is the set of all bases in R 3 (i.e. a ∈ X A ), and Λ is some set of possible outcomes; these functions completely describe the experiment in the sense that each state x ∈ X determines both its settings a = A(x) and its outcome λ = F(x).
where the functions A i : X → S 2 (seen as the space of unit vectors in R 3 ) combine to define a basis, and F = (F 1 , Secondly, there exists some set X Z and an additional function Z : X → X Z , (6.28) such that F = F(A, Z). (6.29) More precisely, for each x ∈ X one has for a certain functionF : In terms of (6.28), then: • Nature then requires that Λ is given by (6.22) (so that F i : X → 2).
• Freedom states that A and Z are independent in the sense that the function is surjective; in other words, for each (a, z) ∈ X A × X Z there is an x ∈ X for which A(x) = a and Z(x) = z (making a and z free variables). • Non-contextuality (cf. Lemma 6.6) finally stipulates thatF take the form for a single functionF : S 2 × X Z → 2 that also satisfies F(−u, z) =F(u, z). (6.33) "Nature" may be taken to be either an experimental result or an uncontroversial prediction of (some corner of) quantum mechanics. The function Z (including its domain X Z ) describes anything relevant to the experiment (such as the behaviour of the particle) except the variables determining the settings (which do form part of X). The goal of the freedom assumption is to remove any potential dependencies between the variables (a, z), and hence between the physical system Alice perform her measurements on, and the devices she performs her measurements with.
Proof. For each z ∈ X Z , define a functionṼ z : S 2 → 2 byṼ z (u) =F(u, z). The assumptions combine to giveṼ z the same properties asṼ in Lemma 6.6 (where z "goes along for a free ride"). According to Corollary 6.8 (which applies because by Freedom one can freely vary a for any given z), the functionṼ z cannot exist.
This "mini FWT" is a good exercise for the Free Will Theorem in the next section. For example, let us note, as a warning, that if Determinism is seen as the culprit (and hence falls), then the other assumptions in the (min) FWT are no longer defined. This blocks a direct inference from Freedom to Indeterminismà la Conway & Kochen.

The Free Will Theorem
The Free Will Theorem is similar in spirit to Corollary 6.12, with the difference that the experiment now has two wings and the non-contextuality assumption is replaced by a certain locality condition. This condition relates to the setting introduced by Einstein, Podolsky, and Rosen in 1935 and further studied by Bohm, Bell, and others, in which (in current jargon) two physicists, called Alice and Bob, are far apart whilst performing simultaneous experiments on some correlated two-particle state (technically speaking, their measurements need to be spacelike separated). In the situation considered by EPR each particle had a spatial degree of freedom and hence required the infinite-dimensional Hilbert space L 2 (R 3 ) for its description, but, as recognized by Bohm, the thrust of the argument comes out more clearly if each particle merely has an internal degree of freedom (and is "frozen" otherwise).
Bell (1964) considered a pair of spin 1 2 particles (cf. §6.5), each of which has Hilbert space C 2 (although the famous experiments of Aspect testing the violation of Bell's inequalities used photons, which have the "same" Hilbert space), but because of its reliance on the Kochen-Specker Theorem (which fails for C 2 ) the Free Will Theorem requires one dimension more, i.e., H = C 3 . As before, we see this as the state space of a massive spin-1 particle. The price of this extra dimension is that the pertinent experiment whose outcome provides the Nature input for the Free Will Theorem has not actually been performed, but, as in the Bell case, the predictions of quantum mechanics are uncontroversial and will serve as input instead.
These predictions are as follows. Alice and Bob measure on the correlated state ψ 0 = (e 1 ⊗ e 1 + e 2 ⊗ e 2 + e 3 ⊗ e 3 )/ √ 3, (6.34) where we recall that (e 1 , e 2 , e 3 ) is the standard basis of R 3 , now seen as a basis of C 3 . This state is rotation-invariant, which means that nonzero angular momentum in one particle must be compensated for in the other, creating the desired correlations. As before, we denote Alice's setting by A = a, which remains the choice of some basis of R 3 , but this time also Bob picks some basis b, so that we write B = b for his choice. Similar to Alice's outcome F = λ we denote Bob's by G = γ, and quantum mechanics provides all (Born) probabilities which are well defined because Alice's squared angular momentum operators J 2 u 1 commute with Bob's J 2 v 1 as a consequence of Einstein locality (stating that spacelike separated observables commute). Note that similarly to a = (u 1 , u 2 , u 3 ) for Alice's basis, we write b = (v 1 , v 2 , v 3 ) for Bob's. If Alice merely measures F i whilst Bob measures G j , then, as in the previous section, it does not matter which other (commuting) operators are measured and/or whether Alice and Bob know about this, cf. (6.25). Thus we may write either (A = a, B = b) or A i = u i , B i = v i for the settings, and simple calculations show that the Born probabilities are given by: , since the vectors are real, In terms of the notation (6.40) this yields The crucial point for the Free Will Theorem is that this implies perfect correlation: (6.43) in agreement with the intuition about angular momentum expressed earlier.
We now move to a (possibly counterfactual) deterministic description of this experiment along the lines of the previous section. It is straightforward to adapt all of Definition 6.11 except Non-contextuality (which after all is the assumption we would like to get rid of!). With the obvious changes, we obtain: • Determinism again first claims there is a state space X with associated functions A : X → X A ; (6.44) B : X → X B ; (6.45) F : X → Λ ; (6.46) G : X → Λ , (6.47) where X A = X B is the set of all bases in R 3 , and Λ is some set of possible outcomes, which completely describe the experiment in the sense that each state x ∈ X determines both its settings (a = A(x), b = B(x)) and its outcome where the functions A i : X → S 2 (where S 2 is seen as the space of unit vectors in R 3 ) combine to define a basis (similarly for B j : X → S 2 ), and F = (F 1 , F 2 , F 3 ). Secondly, there exists some set X Z and an additional function Z : X → X Z such that (6.49) in that for each x ∈ X one has the functional relationships (6.51) for certain functionsF : is just the traditional "hidden variable" (which is often denoted by λ ). • Freedom then states that A, B, and Z are independent in that for each (a, b, z -Λ is given by (6.22), i.e. F i and G j , and henceF i andĜ j take values in {0, 1}; -The experiment measures squares of angular momenta, so that We now come to the locality condition that is to replace Non-contextuality. This condition was first clearly stated by Bell (1964, p. 196), who attributes it to Einstein: 'The vital assumption is that the result G for particle 2 does not depend on the setting a of the magnet for particle 1, nor F on b.' Noting various other notions of locality (such as Einstein locality in local quantum physics, which requires spacelike separated operators to commute, or Bell locality, discussed below), the above idea might be called Context locality, but we will simply refer to it as Locality. In our deterministic setting, a precise formulation is this: In other words, we have F = F(A, Z) and G = G(B, Z), so that (with slight abuse of notation)F : This finally brings us to (our reformulation of) the Free Will Theorem: Theorem 6.13. Determinism, Freedom, Nature, and Locality are contradictory.
Proof. The Freedom assumption allows us to treat (a, b, z) as free variables, a fact that will tacitly be used all the time. First, taking i = j in (6.54) shows that ). Consequently, the functionF : X A × X Z → X F is given by (6.32). We are now back to the proof of Corollary 6.12, concluding that such a function does not exist by Corollary 6.8.

Philosophical intermezzo: Free will in the Free Will Theorem
'The determinism-free will controversy has all of the earmarks of a dead problem. The positions are well staked out and the opponents manning them stare at each other in mutual incomprehension.' (Earman, 1986, p. 235) The question arises which specific notion of free will is among the assumptions of the FWT (in the reformulation just given). To put this question in perspective, let us briefly recall the main point of the debate about free will. This concept has two poles. One is the "will" itself, requiring a sense of agency, deliberation, and control. This pole seems to require some form of determinism. A powerful expressions is: 'Fürst! Was Sie sind, sind Sie durch Zufall und Geburt. Was ich bin, bin ich durch mich.' 1 (Beethoven, to his benefactor (!) Prince Lichnowsky) The other pole of free will is the adjective "free", i.e., the ability to do otherwise, which at first sight requires indeterminism. The problem of free will is that these poles seem contradictory. Many authors conflate free will with moral responsibility: 'free will can be defined as the unique ability of persons to exercise control over their conduct in the manner necessary for moral responsibility.' (McKenna & Coates, 2015) This aspect is irrelevant to our discussion, concerned as it is with the question what it would mean for Alice and Bob to choose their settings "freely" if determinism is assumed (it would have been different if one setting launched a nuclear missile). Even in our narrow context, the traditional philosophical stances are relevant: • Compatibilism denies the contradiction, claiming that free will and determinism coexist. This position may be defended in many ways, among which one finds: -Reconceptualizing "the ability to do otherwise" in a deterministic world. This will be our focus in what follows, especially in a version inspired by Lewis. -Belittling the relevance of "the ability to do otherwise", as e.g. by Dennett: 'So if anyone at all is interested in the question of whether one could have done otherwise in exactly the same circumstances (and internal state) this will have to be a particularly pure metaphysical curiosity-that is to say, a curiosity so pure as to be utterly lacking in any ulterior motive, since the answer could not conceivably make any noticeable difference to the way the world went.' (Dennett, 1984, p. 559).
• Incompatibilism accepts the contradiction, once again branching off into: -Libertarianism, arguing that free will requires an indeterministic world.
-Hard determinism, claiming determinism (which is assumed) blocks free will: 'Ein Mensch kann zwar tun was er will, aber nicht wollen was er will.' 2 (Schopenhauer) -Hard incompatibilism, asserting that 'every way you look at it you lose': free will makes no sense in either a deterministic or an indeterministic world.
Although hard incompatibilism has our sympathy, our opening question concerning the notion of free will in the FWT drives us into the compatibilist direction, since determinism is among the assumptions shown to be contradictory by Theorem 6.13. Within compatibilism, we will be close to the well-known 'local miracle' variant thereof proposed by the philosopher David Lewis. Like other compatibilists before him (starting at least with G.E. Moore), Lewis attempts to make sense of the intuition that even in a deterministic world one in principle has the ability to act differently from the way one actually does, despite the fact that the latter was predetermined. A simple example is Alice's choosing setting a by moving her hand in a certain way, although she was able to choose a . On the other hand, she could not have moved her hand with a speed greater than that of light, so her ability remains constrained by the laws of nature. Lewis asks us to distinguish between: • 'I am able to do something such that, if I did it, a law would be broken.' • 'I am able to break a law.' The latter is impossible, but the former is not on Lewis's own theory of counterfactuals, according to which the phrase 'if I did it' leads us to consider the possible world in which doing 'something' is actually true, whilst in the possible worlds under consideration as many other features as possible are kept the same as in the actual world (the precise underlying measure of similarity is not important here). Thus the phrase 'a law would be broken' refers to the laws of the actual world (in which the alternative action is not realized). It seems to be of great importance to Lewis that in the first case it is not the agent who would break a law; instead, it is the breaking of some law of our actual world at an earlier time that enables the subject to do in an alternative possible world what she could not do in our actual world, .
By making this distinction, Lewis claims that he invalidates the seemingly lethal Consequence Argument against compatibilist free will, of which a simple version reads (assuming determinism, on which compatibilist free will is predicated): 1. Alice's actions are a necessary consequence of the laws of nature plus the state of the universe (or the relevant part thereof) at any earlier time; 2. Alice is unable to render both (laws and earlier states) false; 3. Alice is unable to render the consequences of laws and earlier states false; 4. Ergo: Alice is unable to do otherwise than what she actually does.
Lewis claims that statement 3 is ambiguous, in that it fails to distinguish between the two senses in his two bullet points above. The Consequence Argument requires the latter (which is false), whereas this argument itself is unsound on the former (which is true). This disambiguation of assumption 3 in the Consequence Argument, then, is supposed to save (compatibilist) free will. However, a considerable philosophical literature suggests that the tension between Lewis's denying the second bullet point whilst accepting the first is pretty uncomfortable, reflecting the corresponding tension between the conjunction of determinism and freedom in general; indeed, this is what the FWT makes precise! Let us first point out that, at least in his terminology Lewis fails to make a clear distinction between laws of nature and initial states; from the point of view of modern physics, this distinction is absolutely fundamental (although it may dispappear in post-modern physics based in e.g. quantum gravity).
Lewis's examples of law-breaking events in our actual world typically refer to violations of some law of nature (like exceeding the speed of light), whereas the (alleged) law-breaking in his counterfactuals, such as choosing a (where in fact Alice did not do so) amounts to a change in some earlier state. Thus it might have been more appropriate if the paper in which Lewis laid out his version of compatibalism had been entitled Are we free to change the states? instead of Are we free to break the laws?. On this revision, his distinction of the two cases takes the following form: • I am able to do something such that, if I did it, the state of the actual world at some earlier time would have been different. • I am able to change the actual state of the world.
The latter remains impossible, while it is the former that enables free will. Applied to Alice, the former should mean (still in the compatibilist spirit of Lewis): • A slight alteration in the state of the actual world (which would have made it a different but very similar world according to Lewis) would have led Alice to do something (such as choosing a ) that she did not do in the actual world (because according to determinism its actual state at any earlier time-as opposed to the counterfactual alternative state in the discussion-led her to choose a).
We now make this revised version of Lewis's local miracle compatibilism mathematically precise, in a way that has the additional advantage of involving not only "the ability do do otherwise", but also the other component free will, i.e. agency. Here the intuition is that free will involves a separation between the agent, Alice, (who is to exercise it) and the rest of the world, under whose influence she acts. Namely, as in the FWT, let X be the state space of the Universe, and let a = A(x) (6.55) again be Alice's setting, where A : X → X A , as before. We now assume that a is determined by her "inner state" I as well as the "outer state" O of the rest of the world, under whose influence she acts. These, in turn, are determined by the state x ∈ X of the world. That is, A = A(O, I), which expresses the existence of functions where X O and X I are certain sets, such that for each x ∈ X one has In other words, for some given state x of the world we have Note that, in the spirit of Conway and Kochen, in the above analysis Alice (whose free choice they after all believe to be ultimately a consequence of the free choice of elementary particles) now plays the role of the spin-1 particles in the bipartite experiment. Thus the analogy is between the triples: • The first triple is defined in the experimental context of the FWT, where a is the setting of Alice's wing of the experiment (which from the perspective of the spin-1 particle plays the role of the outer state of the world), z is the inner state of the particle, and λ is the outcome of Alice's measurement. • The second pertains to the analysis of Alice's "free" choice of the setting of her experiment, where o is the outer state of the world, i is her inner state, and a is her actual setting, given x ∈ X and hence (o, i) = (O(x), I(x)).
Beyond Determinism, which is expressed by the above framework, our fundamental assumption underpinning compatibilist free will is Freedom, defined exactly as in the FWT: O and I are independent in that the following function is surjective: i.e., for each pair (o, i) ∈ X I × X O there is x ∈ X for which (6.60) and (6.61) hold. Rephrasing our earlier analysis in this elementary mathematical language, Lewis wants to make sense of the idea that although Alice's choice (6.62) at some fixed time t was determined by the state x of the Universe at that time through (6.60) -(6.61), or, equivalently, through (6.59), and hence-and this is the whole point of the Consequence Argument Lewis challenges-by any earlier state x p of the Universe at time t p , nonetheless Alice was "able to act otherwise" at time t, e.g. in choosing making the environment in which Alice acts the same as in the actual world, but where i should be close to i in some appropriate sense (such as a slight change in the state of Alice's brain), such that (6.66) holds, with o = o as required by (6.67).
The point, then, is that according to our Freedom assumption, there indeed is such a nearby state x , for any given i and (o, i). Thus the freedom Alice has is precisely what we have formalized as Freedom: even given the state o of the causal influences on her behaviour (and possibly even the entire state of the rest of the world), there is a different admissible state x of the world such that, had this state been actual, she would have chosen a (although she in fact, necessarily, picked a).
It should be clear now that at least in the context of the Free Will Theorem, our precise technical formulation of all assumptions implies that the freedom Alice and Bob have in choosing their settings is an instance of the local miracle compatibilist form of free will proposed by Lewis (1981), at least if one accepts our reformulation thereof. The theorem then establishes a contradiction between: • the physics assumptions, i.e., Nature, and Locality; • the compatibilist free will assumption, i.e., Determinism and Freedom.
Accepting the former, the latter must fall. Making this choice, one should realize that the physics assumptions on the one hand just form a small corner of modern physics (from which point of view they are weak), but on the other hand have singled out the corner in which the two fundamental theories of quantum mechanics and special relativity meet and are brought to a head (from which perspective they are strong).
The challenge their theorem puts to compatibalism was recognized by Conway & Kochen (2009), who write: 'The tension between human free will and physical determinism has a long history. Long ago, Lucretius made his otherwise deterministic particles swerve unpredictably to allow for free will. It was largely the great success of deterministic classical physics that led to the adoption of determinism by so many philosophers and scientists, particularly those in fields remote from current physics. (This remark also applies to "compatibilism", a now unnecessary attempt to allow for human free will in a deterministic world.)' This quotation does not use a precise version of compatibilism, but, as Conway explains elsewhere, what they mean is that compatibilism in whatever form was a desperate pre-twentieth-century attempt to save the notion of free will for e.g. Christianity in the face of the physics of the time, which assumed that the universe was a mechanical clockwork. Such attempts, then, would no longer be necessary if the world is, in fact, indeterministic (as Conway and Kochen claim to have at last proved). Our reformulation of their theorem (which removes the threat of circularity) gives a more subtle picture: the FWT uses modern physics to challenge one particular version of compatibilist free will. As such, it only provides indirect support for libertarian free will, namely by weakening one of its competitors.
To close this philosophical intermezzo, let us note that determinism is seen as a property of theories. Since it is the job of a deterministic theory to predict the outcome of any experiment, whether or not it is performed, this obviates the need for assumptions like counterfactuality in the sense that 'unperformed experiments have results' (which was famously denied by Asher Peres). Such controversial notions of counterfactuality have effectively been replaced by the considerably more refined modal counterfactuality of Lewis (at least in our slight reformulation thereof).

Technical intermezzo: The GHZ-Theorem
The essence of the proof of the Free Will Theorem lies in the argument that perfect correlation together with context-locality implies non-contextuality. Remarkably, context-locality is at the same time a special case of non-contextuality, as the following example illustrates. We take H = C 2 ⊗ C 2 , equipped with the Bell basis where we use the physicists' notation Of course, C 2 ⊗ C 2 ∼ = C 4 contains the spin-1 Hilbert space C 3 of the Kochen-Specker Theorem as the subspace orthogonal to the vector υ 0 . Thus we identify C 3 with the subspaceC 3 of C 4 spanned by the basis vectors υ 1 , υ 2 , υ 3 . The operators where u ∈ R 3 is a unit vector as before, and (6.77) in terms of the Pauli matrices σ i , map υ 1 to zero and leave its orthogonal comple-mentC 3 stable. Elementary group theory or direct calculation then shows that the operator J u on C 3 in (6.11) is (unitarily) equivalent to the operatorJ u onC 3 . Sincẽ the Kochen-Specker argument can be rephrased in terms of the operators σ u ⊗ σ u . In particular, for each frame a = (u 1 , u 2 , u 3 ), the three operators commute, they each square to one, and their joint eigenvalues are one of the triples: The eigenvector corresponding to the first one is υ 0 , and hence the others must lie inC 3 . Hence by Lemma 6.4 any quasi-linear non-contextual hidden variable must also assign these values, which by Lemma 6.7 is impossible for arbitrary bases.
The key mathematical property of the three operators (6.79) is that they commute, and together with the unit 1 2 ⊗ 1 2 form a maximal set of commuting self-adjoint matrices on C 4 . But other such sets could have been chosen by Alice (under whose sole control the situation so far has been assumed to be), such as a triple of the kind where u and v are arbitrary unit vectors in R 3 . Since the third operator is the product of the first two, the joint eigenvalues of this triple, and hence also the assignments by a quasi-linear non-contextual hidden variable, must be one of the four triples The non-contextuality assumption would then dictate that the outcome of Alice's measurement of σ u ⊗ 1 2 be independent of her choice of the setting v in a possible simultaneous measurement of 1 2 ⊗ σ v , and vice versa. Therefore, in a (non-local) bipartite setting where Alice is only able to measure operators of the type a ⊗ 1 2 , whilst Bob can measure 1 2 ⊗ b, on the above choice of (commuting) operators,

non-contextuality in the situation where Alice controls everything is mathematically equivalent to (context) locality in the bipartite Alice & Bob setting.
Further constraints then arise if the system is prepared in a correlated state like ψ 0 , which is an eigenstate of σ u ⊗ σ v with eigenvalue −1 whenever u = v. So in that case the values of (σ u ⊗ 1 2 , 1 2 ⊗ σ v ) can only be (1, −1) or (−1, 1), yielding perfect anti-correlation. This is not enough, however, to derive a Free Will Theorem; to do so with the small single-site Hilbert space C 2 , one needs a third (non-local) party.
Indeed, the well-known tripartite GHZ-argument may be rephrased as a Free Will Theorem, as follows. The underlying Hilbert space is and hence as a warm-up we first (re)prove Theorem 6.5 for n = 8. Suppose we have a map V : where a, b, c can be 1, 2, 3. From Lemma 6.4 we then have Furthermore, the four operators on the left-hand side commute and turn out to satisfy (6.85) so that again by Lemma 6.4, , this is impossible, so that V cannot exist. Now, using the notation in the preceding discussion, consider the unit vector which is a joint eigenstate of each of the four operators on the left-hand side of (6.81) -(6.84), with eigenvalue +1 for the first three, and hence eigenvalue −1 for the fourth, i.e., Theorem 6.14. The conjunction of the following assumptions is contradictory:  each (a, b, c, z) there is x ∈ X such that • Locality: F = F (A, Z), G = G(B, Z), and H = H(C, Z).
Proof. Using notation as in the proof of Theorem 6.13, for fixed z ∈ Z we obtain F(a, z) = λ (a) 1 etc. Nature then leads to the contradiction derived after (6.86).

Bell's theorems
Two different results are known as "Bell's Theorem": the first, from his paper in 1964, is Theorem 6.15 below, and the second, dating from 1976, is Theorem 6.18. The first is similar to the Free Will Theorem in both its assumptions and its conclusion, and to make this similarity more obvious we first state it for C 3 instead of C 2 . The difference lies in the probabilistic flavour of Bell's Theorem, whose empirical input is not given by the only non-probabilistic consequence to be drawn from the quantum-mechanical formulae (6.35) -(6.38), viz. the certainty (6.43) of perfect correlation on identical settings, but rather by the probabilistic formula (6.40), i.e., where θ u,v is the angle between two unit vectors u and v. Furthermore, the state space X must be upgraded to a probability space (X, Σ , μ), carrying functions A and B (for the settings, which unlike Bell himself-who treated them as labelswe include among the random variables), F and G (for the outcomes) and finally Z (for the hidden variable traditionally called λ ) as random variables, i.e., measurable functions. This also implies that the target spaces X A to X Z (which is traditionally called Λ ) must be equipped with some σ -algebra of measurable subsets. But this is not a big deal, since X A = X B carries a natural Borel structure and X F = X G is finite. The probability measure μ is assumed independent of (A, B, F, G), and vice versa. The measure μ, which gives the "hidden state" of the system that allegedly underlies its quantum-mechanical description, is chosen in such a way that empirical probabilities (typically obtained from long runs of repeated measurements) are recovered as joint conditional probabilities defined by μ and the random variables, i.e., assuming the settings (a, b) are possible in that P(A = a, B = b) > 0, we put where the joint probabilities on the right-hand side are given by This implies that μ depends on (but may not be determined by) the quantum state ψ 0 . On this understanding, the assumptions of Determinism and Locality are the same as for the Free Will Theorem (except that equations like F(x) =F(A(x), Z(x)) are merely supposed to hold almost everywhere with respect to μ). Freedom is now taken to mean that (A, B, Z) are probabilistically independent relative to μ. By definition, this also means that the pairs (A, B), (A, Z), and (B, Z) are independent, so that for any A ⊂ X A , B ⊂ X B , and (measurable) Z ⊂ X Z , defining and analogous expressions for P(A ∈ A) and P(A ∈ A, B ∈ B), etc., we have If we finally define Nature as the claim thatF andĜ are 2-valued and that where the left-hand side is the conditional probability defined by μ and the random variables in question (whereas the left-hand side of (6.90) is the empirical probability for the experiment in question, or, equivalently, the quantum-mechanical prediction thereof), then we obtain the following spin-1 version of Bell's first theorem: Theorem 6.15. Determinism, Freedom, Nature, and Locality are contradictory.
This formulation is literally the same as Theorem 6.13, but the terms have acquired a different technical meaning now, especially Freedom and Nature. Moreover, purists would add Probability Theory as an assumption in Bell's Theorem, as its formalism is decidedly non-tautological and its interpretation is far from obvious, even in a classical setting. In any case, the proof is practically the same as in the more familiar optical version of the EPR-experiment, to which we now turn.
In the classical (sic) form of the experiment, Alice and Bob perform measurements on incoming photons by letting them pass through a polaroid glass whose axis of polarization makes angle a (Alice) or b (Bob) with (say) the horizontal axis in the plane orthogonal to the direction of propagation of the photons. Considered in the light of the previous experiment on spin-1 particles, such a choice of settings may also be seen as a choice of basis for R 3 , with the proviso that, assuming (by convention) the photons move along the y-axis, one basis element u 2 = (0, 1, 0) is fixed so that the remaining two vectors (u 1 , u 3 ) must lie in the x-z plane (in which, on a naive picture, the photons may "vibrate"). This constraint gives rise to bases u 1 = (cos a, 0, sin a), u 2 = (0, 1, 0), u 3 = (− sin a, 0, cos a), (6.100) the first of which (say) gives the actual direction of the axis of polarization. In any case, Alice writes down F = 1 if her photon passes her glass at angle a, and F = 0 if it does not; similarly Bob writes G = 1 (pass) or G = 0 (fail) at setting b.
In a quantum-mechanical description of the experiment, the Hilbert space of the photon pair is C 2 ⊗ C 2 , and the correlated photon state is taken to be where e 1 = (1, 0) and e 2 = (0, 1) form the standard basis of C 2 . The probabilities (6.35) -(6.38) as predicted by quantum mechanics are now replaced by (6.105) which are also the experimentally measured ones. Instead of (6.90) we then obtain In particular, if their settings are the same (i.e., a = b), then Alice and Bob will always find the same outcome (perfect correlation), whereas in case they are orthogonal (i.e., a = b ± π/2), they obtain perfect anti-correlation, in that Alice's photon passes whenever Bob's is blocked, and vice versa. However, this will not be used. Although it should be obvious from the previous case what the assumptions in Theorem 6.15 mean for this particular experiment, we make them explicit: • Determinism means that there is a probability space (X, Σ , μ) with associated (measurable) functions Proof. Determinism and Freedom imply where we use the notation (6.50) -(6.51), the functionÂ : X A × X B × X Z → X A is projection on the first coordinate, likewise the functionB : X A × X B × X Z → X B is projection on the second, and P ABZ is the joint probability on X A × X B × X Z induced by the triple (A, B, Z) and the probability measure μ; by independence, P ABZ is a product measure on X A × X B × X Z . According to Locality,F(a, b, z) does not depend on b, whilstĜ(a, b, z) does not depend on a.
For fixed settings (a, b), we may therefore define the following functions on X Z : z). (6.112) A brief computation then yields where P Z is the joint probability on X Z defined by Z and μ. Therefore, from (6.110), Nature then gives the crucial result This lemma (said to go back to Boole) is very easy to prove directly, but for completeness's sake we mention that it also follows from Proposition 6.17 below.
If F i and G j just take the values ±1, then (6.116) is a special case of (6.117).
Proof. In terms of the function Φ = F 1 · (G 1 + G 2 ) + F 2 · (G 1 − G 2 ), we may write Since |F i (x)| ≤ 1 and |G j (x)| ≤ 1 by assumption, we have |Φ(x)| ≤ 2 and hence since μ is a probability measure. To prove the the last claim, we just note that In Bell's second (1976) theorem on stochastic hidden variables, the assumption of Determinism is dropped, and all we have is a theory stating conditional probabilities P(F = λ , G = γ|A = a, B = b, x) for the outcomes of the above bipartite experiment given some hidden variable x, as well as the single-wing versions P(F = λ |A = a) and P(G = γ|B = b, x). Here F, G, A, B are just notational devices to record such outcomes, which are no longer (necessarily) represented as random variables. On this new understanding of the notation, the Nature assumption is formulated just as before, cf. (6.109). We do assume the existence of a probability space (X, Σ , μ) and of conditional probabilities defined μ-a.e. in x, in which the state of the world is specified as being x ∈ X. In terms of this space, the Freedom assumption means that (6.120) for any settings (a, b), of which μ is independent (as the notation already indicated).
The crucial assumption replacing Determinism is Bell locality, which reads Bell's second theorem for stochastic hidden variable theories reads as follows.
Theorem 6.18. Nature, Freedom, and Bell locality are contradictory.
Proof. The idea of the proof is to introduce an artificial probability space in order to recover the framework of Theorem 6.15. To this end, we takẽ where we denoted the elements ofX by (s,t, x). OnX, define random variables F a (s,t, x) = 1 [0,P(F=1|A=a,x)] (s); (6.124) where 1 Δ is the indicator function for Δ ⊆ [0, 1]. Writing, as usual, we obtain (first for λ = γ = 1, from which the other cases follow): With Freedom and Bell locality, this yields as in (6.114), so that the proof may be completed as for Theorem 6.15.
Let us note that since in Bell's second theorem the settings (a, b) are treated as free parameters to begin with, the difference between X and Z evaporates, so that in the above formulae one might as well have replaced (X, μ) by the space (X Z , μ Z ) that describes all relevant degrees of freedom except the settings (i.e., the experimentalist, in either human or machine form). Either way, Bell's locality condition may be disentangled into the following conditions (introduced by Jarrett and Shimony): 1. Parameter Independence (PI): where we have abbreviated P(F = λ |A = a, B = b, x) by P (λ |a, b, x), etc., and have used the following notation (which states identities in case one has (6.91) -(6.93)): It is easy to see that Bell locality is equivalent to the conjunction of PI and OI. Note that the former (PI), akin to Locality, is a hidden or 'subsurface' version of the no signaling property of the 'surface' probabilities, which states that is independent of b (and vice versa). But a violation of PI only leads to signaling if x can be operationally controlled, similar to the way in which experimental physicists prepare quantum states ψ. Hence it is reassuring that quantum mechanics satisfies PI if we see the quantum state ψ as a hidden variable: assuming as computed in (6.102) -(6.105), PI is valid but OI is not. First, for λ = 0 or λ = 1, 137) which is independent of b, and likewise P(γ|a, b, x) = 1 2 , independently of a. This yields PI, which a similar computation shows to be true for any quantum state. On the other hand, given this result, OI would require which is false, since by (6.102) -(6.105), Alice's and Bob's outcomes are correlated.
Hence Bell locality is violated by quantum mechanics, but this does not imply that "quantum mechanics is nonlocal" (as some say). Bell's is a very specific locality condition invented as a constraint on hidden variable theories. In another important sense, viz. Einstein locality, quantum mechanics is local, in that observables with spacelike separated localization regions commute (this is the case in quantum field theory, but also in any bipartite experiment of the type considered here, where Alice's operators commute with Bob's just by definition of the tensor product).
On the other hand, deterministic theories, which in the present context are defined as those for which all conditional probabilities like P(λ , γ|a, b, x) are either zero or one (in which case one may introduce random variables reproducing these probabilities), violate PI but satisfy OI, at least if they reproduce the Born probabilities (such as Bohmian mechanics). Hence such theories violate Bell locality.
Finally, Bell-type inequalities like (6.117) also give information about quantum mechanics itself, particularly about the degree of entanglement of states. Let H 1 and H 2 be Hilbert spaces, with tensor product H 1 ⊗ H 2 . A unit vector ψ ∈ H 1 ⊗ H 2 is called uncorrelated if it is of the form ψ = ϕ 1 ⊗ ϕ 2 , where ϕ k ∈ H k are unit vectors, k = 1, 2, and correlated otherwise. Clearly, the vectors (6.34) and (6.101) used in the experiments so far are correlated. The simplest result is then as follows.
Theorem 6.19. Let a 1 and a 2 be self-adjoint operators on H 1 , and let b 1 and b 2 be self-adjoint operators on H 2 , each with spectrum contained in [−1, 1] (equivalently X a ≤ 1, etc.). Let ψ be a unit vector in H 1 ⊗ H 2 , and define two-point functions If ψ is uncorrelated, then the Bell inequality (6.117) holds.
Proof. This follows from the factorization property (6.139) where F i = ϕ 1 , a i ϕ 1 and G j = ϕ 2 , b j ϕ 2 . For either sign, this property yields The spectral assumption implies that | F i | ≤ 1 and | G j | ≤ 1, which will be used directly below, as well as its consequence If Φ ≥ 0 we choose the minus sign, whereas for Φ < 0 we take the plus sign. Either way, we obtain |Φ| ≤ 2, which is the inequality (6.117).
This result is actually much more general (as hinted at by the way that the proof only uses the uncorrelated vector state ψ = ϕ 1 ⊗ ϕ 2 ). The simplest generalization is to replace pure states by mixed states, where we say that a density operator ρ on H 1 ⊗ H 2 is uncorrelated if it is of the form ρ = ∑ i p i ρ 1 ⊗ ρ 2 , where the p i are probabilities and ρ k is a density matrix on H k , k = 1, 2. Then all uncorrelated density matrices satisfy the inequality (6.117). Even more generally, uncorrelated states on C*-algebras or von Neumann algebras A ⊗ B satisfy (6.117), see Notes.

The Colbeck-Renner Theorem
One may try to strengthen Bell's second theorem by weakening its assumptions. A remarkable result in this direction states that, roughly speaking, any probabilistic hidden variable theory that satisfies Freedom and Parameter Independence and is compatible with quantum mechanics adds nothing to quantum mechanics. In other words, it appears that quantum mechanics "cannot be extended", or "is complete". In fact, the result turns out to be more modest than this summary suggests, since the reasoning required to prove the claim hinges on certain assumptions which are satisfied by quantum mechanics itself, but might seem unnatural for a hidden variable theory. In any case, we have to state our notation and assumptions very clearly. Definition 6.20. A hidden variable theory T underlying quantum mechanics consists of a measurable space (X, Σ ) whose points x label conditional probabilities P(a 1 = λ 1 ,..., a n = λ n |x) ≡ P(a = λ |x) for the possible outcomes λ = (λ 1 ,..., λ n ) of a measurement of any family a = (a 1 ,..., a n ) of n commuting self-adjoint operators on any Hilbert space H.
These formal conditional probabilities are a priori only supposed to satisfy Apart from these probabilities, for each Hilbert space H and any pure state e ∈ P 1 (H), the theory T yields a classical state μ e , i.e., a probability measure on X.
As the notation indicates, μ e depends on e only and hence is independent of a and λ . From the point of view of T , a quantum state is a probability measure on X! In what follows we assume for simplicity that H is finite-dimensional, so that e = e ψ for some unit vector ψ ∈ H. With slight abuse of notation we then write μ ψ for μ e ψ . An important special case will be the bipartite setting H = H 1 ⊗ H 2 , where Alice and Bob measure self-adjoint operators X and Y on H 1 and H 2 , respectively, so that We then introduce settings c = (a, b), as in the previous sections, so that we typically look at expressions like P(X a = λ 1 ,Y b = λ 2 |x). The other case of interest will simply be n = 1 with a 1 ≡ a, λ 1 ≡ λ ; indeed, this will be the case in the statement of the theorem (the bipartite case playing a role only in the proof, though a crucial one!).
That is, there is a subset X ⊂ X such that μ ψ (X ) = 0 and P ψ (a = λ |x) = α(x) holds for any x ∈ X\X . If X is finite, this simply means that the equality holds for any x for which μ ψ ({x}) > 0. Since this notation may render equalities like P ψ (a = λ |x) = P ϕ (a = λ |x), (6.147) ambiguous, we explicitly define (6.147) as the double implication Furthermore, for ε → 0 we write We are now ready to state our assumptions for the Colbeck-Renner Theorem: • Compatibility with Quantum Mechanics (CQ): for any unit vector ψ ∈ H, where the quantum-mechanical prediction p ψ (a = λ ) is given by the Born rule In the remaining axioms, H = H 1 ⊗ H 2 , and a and b are self-adjoint operators on H 1 and H 2 , respectively (duly identified with operators a ⊗ 1 H 2 and 1 H 1 ⊗ b on H).
• Parameter Independence (PI): • Product Extension (PE): for any pair of states ψ 1 ∈ H 1 , ψ 2 ∈ H 2 , P ψ 1 (a = λ |x) = P ψ 1 ⊗ψ 2 (a = λ |x). (6.155) • Schmidt Extension (SE): if υ i ∈ H 1 (i = 1,..., dim(H)) are eigenstates of a, then for arbitrary orthogonal states u i ∈ H 2 and coefficients c i > 0 with ∑ i c 2 i = 1, ). (6.156) Note that PI makes sense, because (6.151) and (6.150) imply that for p ψ (a = λ ) to be nonzero we must have λ i ∈ σ (a i ) for each i. All assumptions are satisfied by quantum mechanics itself (seen as a hidden variable theory with ψ as the "hidden" variable x). In the context of hidden variable theories, though, one might doubt the plausibility of UI, CP, and SE. But we need all these assumptions to prove: In other words, the hidden variable x is even more hidden than expected, since knowing its value has no effect on the probabilities for the outcomes of experiments.
Proof. We first assume (without loss of generality) that a is nondegenerate as a selfadjoint matrix, in that it has distinct eigenvalues (λ 1 ,..., λ dim(H) ); this assumption will be removed at the end of the proof. The proof consists of three steps.
Proof of step 1. Let H = C 2 , with basis (υ 1 , υ 2 ) of eigenvectors of a, so that ψ ∈ C 2 may be written as Without loss of generality, we may assume that λ 1 = 1 and λ 2 = −1. We now relabel a as a 0 and extend it to a family of operators (a k ) k=0,1,...,2N−1 by fixing an integer N > 1, putting θ k = kπ/2N, and defining (6.161) where, for any angle θ ∈ [0, 2π], the operator e θ = |θ θ | is the orthogonal projection onto the one-dimensional subspace spanned by the unit vector |θ = sin(θ /2) · υ 1 + cos(θ /2) · υ 2 . (6.162) In the bipartite setting, we have operators a k = c k ⊗ 1 2 and b k = 1 2 ⊗ c k on C 2 ⊗ C 2 , as well as a maximally correlated (Bell) state ψ AB ∈ C 2 ⊗ C 2 , given by Using assumptions PI and SE, we then have, for i = 1, 2 λ 1 = 1, and λ 2 = −1, The quantum-mechanical prediction is Our goal is to show that also for each x ∈ X, knowing x is irrelevant in that To this effect we introduce the combination of probabilities (6.167) where = P(a k = b l |x), (6.168) where i = 1, 2, and we used PI. This implies a second inequality: since a 2N = −a 0 , Integrating this with respect to the measure μ ψ AB and using CQ gives We wish to invoke the corresponding quantum-mechanical expression, defined by A straightforward calculation shows that this expression is equal to Since lim N→∞ I (N) ψ AB = 0, letting N → ∞ in (6.169) therefore yields (6.166). From (6.164) we then obtain (6.158).

Proof of step 2.
Let H = C l and let (υ i ) l i=1 be an orthonormal basis of eigenvectors of a, with corresponding eigenvalues λ i , and phase factors for the eigenvectors υ i such that c i > 0 (and of course, ∑ i c 2 i = 1) in the expansion The case of interest will be c 1 = · · · = c l = 1/l, but first we merely assume that c 1 = c 2 (the same reasoning applies to any other pair), with λ 1 = 1 and λ 2 = −1 (which involves no loss of generality either and just simplifies the notation). The other positive coefficients c i are arbitrary. Generalizing (6.166), we will show that P ψ (a = 1|x) = P ψ (a = −1|x). (6.173) This shows that if two Born probabilities defined by some quantum state e ψ are equal, then the underlying hidden variable probabilities must be equal μ ψ -a.e., too. Eq. (6.159) immediately follows from this result by taking all c i to be equal. As in step 1, we pass to the bipartite setting, introducing two copies of H = C l denoted by H A = H B = C l , and define the correlated state (6.174) in H A ⊗ H B . Eq. (6.164) again follows from assumptions PI and SE. Throughout the argument of step 1, we now replace each probability P(a k = λ i , b l = γ j |x) by an adapted probability P (1) (a k = λ i , b l = γ j |x), defined as the conditional probability , (6.175) for all x for which P(|λ i | = |γ 2 | = 1|x) > 0, whereas whenever P(|λ i | = |γ 2 | = 1|x) = 0. The same argument then yields (6.169), with P replaced by P (1) but with the same right-hand side. As in step 1, (6.177) which implies that P ψ AB (a 0 = 1|x) = P ψ AB (a 0 = −1|x), (6.178) either because both sides vanish (if P(|λ i | = |γ 2 | = 1|x) = 0), or because (in the opposite case) the denominator P(|λ i | = |γ 2 | = 1|x) cancels from both sides of (6.177). Combined with (6.164), eq. (6.178) proves (6.173) and hence establishes step 2.
Proof of step 3. This is the most difficult step in the proof, relying on a technique wittily called embezzlement (which we only need for maximally entangled states). We will deal with three Hilbert spaces, namely H = C l , H = C m , and H = C n (where n = m N for some large N, see below), each with some fixed orthonormal basis (υ i ) l i=1 , (υ j ) m j=1 , and (υ k ) n k=1 , respectively. Given a further number m i ≤ m, we now list the nm basis vectors υ k ⊗ υ j of H ⊗ H in two different orders: where the remaining vectors (i.e., those of the form υ k ⊗υ j for 1 ≤ k ≤ n and j > m i ) are listed in some arbitrary order. Define as the unitary operator that maps the first list on the second. We will need the explicit expression (6.180) where for given k = 1,..., n the numbers s i k = 1,..., n i (where n i is the smallest integer such that n i m i ≥ n) and j i k = 1,..., n i are uniquely determined by implicit, we obtain a unitary operator The point of all this is that the unit vector κ n ∈ H A ⊗ H A ; (6.184) 6.185) where C(n) = ∑ n k=1 1/k, acts as a "catalyst" in producing the maximally entangled state Here ε = 1/N if n = m 2N . This follows straightforwardly from (6.183) -(6.187).
After this preparation we are ready for the proof of step 3, continuing to use the notation established at the beginning of step 2, especially (6.172). As in step 1, we introduce two copies H A = H B = C l of H, as well as two states (6.190) where κ n is given by (6.185), we put (6.191) and in our notation we have ignored the obvious permutations of factors in the tensor product. For any ε > 0, pick c i ∈ R + such that (c i ) 2 ∈ Q + and |c i − c i | < ε/ dim(H), (6.192) which implies that, in the sense of (6.149), we have (6.193) Suppose (6.194) with p i , q i ∈ N and gcd(p i , q i ) = 1, and define (6.195) Consequently, writing (6.196) the following quotient is independent of i: (6.197) Given the integers m i thus obtained, we define a unitary operator (6.199) where u (m i ) is defined in (6.180). From this definition, with additional labels to denote the copies u A : H A → H A and u B : H B → H B , and (6.188), and writing (6.200) with corresponding copies we then obtain the important relations (6.206) Here the right-hand sides of (6.203) -(6.206) have been arranged so as to obtain vectors in the six-fold tensor product We will repeatedly invoke the following lemma, whose proof just unfolds the notation (on the appropriate identification of a with a ⊗ 1 H 2 and of b with 1 H 1 ⊗ b). Lemma 6.22. Assume PI and UI. For any pair of unitary operators u 1 on H 1 and u 2 on H 2 , and any unit vector ψ ∈ H 1 ⊗ H 2 , one has P (u 1 ⊗1 H 2 )ψ (b = γ|x) = P ψ (b = γ|x); (6.207) P (1 H 1 ⊗u 2 )ψ (a = λ |x) = P ψ (λ = x|x). (6.208) Since we assume that a is nondegenerate, there is a bijective correspondence between its eigenvalues a = λ i and its eigenvectors υ i . Instead of P(a = λ i ) dressed with whatever parameters x or ψ, we may then write P(υ i ), where a is understood, and analogously for the more complicated operators on tensor products of Hilbert space appearing below. Repeatedly using Lemma 6.22, we proceed as follows.
• From Step 2, using the notation explained below (6.172), • From (6.156) in PE and (6.209), • From (6.155) in SE and (6.210), • From (6.211), CP (whose notation we use), and (6.206), • Recall the number m (satisfying m ≥ m i for all i). From (6.212) and Lemma 6.22, We now start a different line of argument, to be combined with (6.213) in due course.
• From PE, SE, and (6.172), with υ i A ∈ H A denoting υ i ∈ H, we have • Using Lemma 6.22, (6.203), and (6.204), • From quantum mechanics, notably (6.151), and (6.205), for any i = i we have • From CQ and (6.217), for any i = i, • From (6.218), (6.219), and (6.220), Finally, from (6.214), (6.221), (6.213), and (6.197) we obtain Since c i > 0 we have c 2 i = |c i | 2 ; using (6.192) and letting ε → 0 then proves step 3: Finally, we remove our standing assumption that the spectrum of a be nondegenerate. In the degenerate case one has where the sum is over any orthonormal basis (υ j i ) j i of the eigenspace of λ i . Similarly, since each vector υ j i gives a = λ i , probability theory gives for all x, The nondegenerate case of the theorem (which distinguishes the states υ j i ) yields from which (6.157) follows once again: Our proof of the Colbeck-Renner Theorem is now complete.
Under less stringent assumptions this theorem might have been regarded as the conclusion of von Neumann's program to disprove the possibility of completing quantum mechanics by adding hidden variables, but as yet this seems unwarranted.

Notes §6.1. From von Neumann to Kochen-Specker
'For decades nobody spoke up against von Neumann's arguments, and his conclusions were quoted by some as the gospel'. (Belinfante, 1973, pp. 24) Theorem 6.2 is due to non Neumann (1932, §IV.2); it was the first result to impose useful constraints on hidden variable theories, anticipating all later literature on the subject. Unfortunately (as part of their general anti-Copenhagen rhetoric), Bell and his followers left the realm of decent academic discourse by calling von Neumann's arguments against hidden variables 'silly' and 'foolish', through which they merely displayed the depth of their own misunderstanding of von Neumann's reasoning; see Caruana (1995), Bub (2011a), and especially Dieks (2016b). In fact, von Neumann (1932, p. 172) carefully qualifies his Theorem 6.2 by stating that it follows 'im Rahmen unserer Bedingungen' (i.e. 'given our assumptions'), of which he earlier (on p. 164) admits that linearity is physically reasonable only for commuting operators, but nonetheless justifies this assumption through an ensemble argument (now outdated, but by no means 'silly'). Though couched in agreeable academic parlance, the earlier critique by Hermann (1935) was misguided, too (Dieks, 2016b).
The Kochen-Specker Theorem is due to Kochen & Specker (1967); the authors were originally logicians. A similar but less precise statement had appeared earlier in Bell (1966), who was not cited by Kochen and Specker; some authors refer to the Bell-Kochen-Specker Theorem. The Nature assumption has been experimentally verified, cf. Huang et al (2003). The proof of the fundamental Lemma 6.7 we present is essentially due to Kochen and Specker, as simplified by Peres (1995). Our independent proof for C 4 is taken from Cabello et al (1996). Surveys of various proofs are given by Brown (1992) and Gould (2009); see also Waegell & Aravind (2012) and references therein, as well as Bub (1997) for another proof. From the Netherlands, we cannot fail to mention the short proof by Gill & Keane (1996). For geometric aspects (and even a link with M.C. Escher) see Zimba & Penrose (1993).
One finds two opposite directions of research around the Kochen-Specker Theorem. A computational one, which seems hardly relevant to conceptual issues in physics (the goal rather being The Guinness Book of Records), consists of attempts to find a minimal set of vectors that cannot be coloured. See, for example, Pavicic et al (2005)  The other, which is of significant conceptual importance and hence is worth some more extensive discussion, consists of attempts to find a maximal set of vectors that can be coloured. That is, one looks for large (preferably dense and measurable) subsets S 2 c of S 2 for which there exists a functionṼ : S 2 c → {0, 1} that satisfies: •Ṽ (−u) =Ṽ (u) for each u ∈ S 2 c ; •Ṽ (u 1 ) +Ṽ (u 2 ) +Ṽ (u 3 ) = 2, for each (orthonormal) basis (u 1 , u 2 , u 3 ) of R 3 whose elements lie in S 2 c .
The first result in this direction was obtained by Meyer (1999) and Havlicek et al (2001), who showed that one may take S 2 c = S 2 ∩ Q 3 ; this choice was motivated by invoking finite precision arguments to circumvent the Kochen-Specker Theorem, see below. To write down a suitable functionṼ : S 2 ∩ Q 3 → {0, 1}, we first define an auxiliary function S : where lcm is the least common multiple and gcd is the greatest common divisor of the argument. This function is obviously well defined. Then the following works: More generally, for an arbitrary n-dimensional) Hilbert space H, with n < ∞, Clifton & Kent (2000) proved the existence of a countable dense colorable subset P 1 (H) c of P 1 (H) (cf. Definition 6.9), with the additional property that different resolutions of the identity drawn from P 1 (H) c never share a projection (so that the key strategy proof of Lemma 6.7, which is based on the existence of overlapping bases, falls apart). Given some enumeration (e Note that because of the total incompatibility of the projections, each e ∈ P 1 (H) c belongs to a unique resolution (e (k) i ), so that W f is well defined. The statistical predictions of quantum mechanics may then be recovered as follows. For each density operator ρ ∈ D(H) we may define a probability measure μ ρ on the set n N of all functions f : N → {1,..., n} by imposing the conditions i (more generally, for a ∈ B(H) sa we write [a = λ ] for the spectral projection e λ defined by a and λ ∈ σ (a)). The subset of n N in the argument of μ ρ is hereby declared measurable; existence and uniqueness of the measure μ ρ on a suitable σ -algebra follow from the Kolmogorov extension theorem of measure theory, which applies because the marginals (6.232) satisfy the appropriate consistency conditions, cf. Hermens (2009) for details. This formula guarantees that the left-hand side vanishes if λ (k) i = 0 for each i, and also if λ (k) i = 1 for more than one value of i. If K = {k 0 } is a singleton and λ = (λ 1 ,..., λ n ), then the right-hand side (and hence the left-hand side) is the Born probability for the outcome e (6.233) Consequently, it is true by construction that for any admissible measurement in quantum mechanics (in that all observables commute), i.e., for each k 0 ∈ N, averaging over the 'hidden variable' f ∈ n N reproduces the statistical predictions of quantum mechanics. This success is achieved at a high cost, however: • Two random variables e (k) i and e (k ) i are statistically independent (with respect to μ ρ ) whenever k = k , even though e (k) i − e (k ) i may be arbitrarily small.
• For each f ∈ n N the associated coloring W f is maximally discontinuous, in that for each u ∈ P 1 (H) c and each ε > 0 there is u ∈ P 1 (H) c such that although e u − e u < ε one has W f (e u ) = W f (e u ), so that in fact |W f (e u ) −W f (e u )| = 1.
These facts were noted by Clifton & Kent themselves, and Appleby (2005) proved that they are a necessary feature of all constructions that involve sufficiently large subsets of P 1 (H) that can be colored. Without challenging their mathematical significance, these discontinuities undermine any potential physical relevance such models might have, and this in turn challenges the reason such models were introduced in the first place (Meyer, 1999), namely the (alleged) finite precision loophole of the Kochen-Specker Theorem.
The thrust of this loophole is that it would be an illusion for an experimentalist like Alice to claim that she measures some observable a with infinite accuracy; in fact, given ε > 0 she might equally well measure some a with a − a < ε. Consequently, finding a dense colorable subset P 1 (H) c ⊂ P 1 (H) should suffice for a hidden variable interpretation of quantum mechanics, since if Alice believes she measures some projection e, the model assigns a value W (e ) to the projection e ∈ P 1 (H) c she actually measures (where e is selected by some algorithm that is part of the theory itself, cf. Clifton & Kent (2000)), and presents that value to Alice as the outcome of her measurement. However, owing to the discontinuities just mentioned, this value is as arbitrary as the identification of e .
As emphasized by Barrett & Kent (2004), this arbitrariness, although perhaps undesirable, does not by itself affect the ability of the Clifton-Kent model to reproduce the statistical predictions of quantum mechanics. On the other hand, it would be pretty awkward to have a theory whose individual value attributions are completely arbitrary, especially since the finite precision argument is predicated on the idea that observables close to the one Alice believes herself to measure (i.e., e) should have approximately the same value as the one she actually does measure (namely, e ). If this is not the case, her measurements are pointless and the hidden variable W f would be empirically inaccessible and hence truly "hidden" (Appleby, 2005).
See also Hermens (2009Hermens ( , 2016. This last point applies to Corollary 6.12, which would no longer be true if the set X A of all bases of R 3 in Definition 6.11 would be replaced by some subset X c A ⊂ X A drawn from a colorable subset S 2 c of S 2 . Each z ∈ X Z would then correspond to some coloring u →F(u, z) of S 2 c , which, by the above discussion, would be maximally discontinuous and hence empirically inaccessible. Nonetheless, such a theory does exist in principle.
The aim of maximizing colorable sets was pursued in a different direction by Bub & Clifton (1996); see also Bub (1997). Given a "preferred" observable a ∈ B(H) sa and a pure state e ∈ P 1 (H), these authors look for a maximal sublattice P(e, a) of P(H) that contains all spectral projections of a (but, despite the notation P(e, a), does not necessarily contain e!), admits sufficiently many lattice homomorphism h : P(e, a) → {0, 1} (i.e., binary valuations) such that the Born measure μ e on σ (a), i.e., μ e (Δ ) = Tr (ee Δ ), Δ ⊆ σ (a), can be reproduced by averaging over these homomorphisms, and finally is invariant under all unitary isomorphisms of P(H) that commute with both e and a. Equivalently, one wants a maximal C*-subalgebra A(a, e) of B(H) that contains a, admits sufficiently many dispersion-free states so as to reproduce the Born probabilities defined by a in the given state e, and is invariant in the said way (a fourth condition used by Bub and Clifton is superfluous; see Bub, 1997, p. 128). Asuming for simplicity that n = dim(H) < ∞, the answer is A(a, e) = C * (e λ ee λ , λ ∈ σ (a)) (6.234) where, as always, e λ is the projection into the eigenspace H λ for λ ∈ σ (a), and the prime denotes the commutant (one might as well take the commutant of the set of all e λ ee λ ). Equivalently, putting e = e ψ = |ψ ψ|, eq. (6.234) is the C*-algebra generated by all projections f λ onto the nonzero components e λ ψ of ψ in each H λ and all one-dimensional projections that are orthogonal to all f λ (given that dim(H) < ∞, this is the same as the linear span of these projections). Thus A(a, e) always contains C * (a), since it contains each e λ , λ ∈ σ (a)), but note that A(a, e) need not be commutative. In comparison, if the requirement had been the reproduction of all Born probabilities for arbitrary pure states e rather than for some given e, the answer would have been any maximal abelian C*-algebra in B(H) that contains C * (a); if a has non-degenerate spectrum, this is just C * (a) itself. The simplest possibility is Though interesting, this result mainly supports so-called modal interpretations of quantum mechanics, which we reject, since they tell us nothing physical about the measurement process and address the measurement problem only philosophically. §6.2
The original (Strong) Free Will Theorem (FWT) states that three assumptions, called SPIN, TWIN, and MIN, imply that the response of a spin-one particle to the bipartite experiment with spin-one particles described above 'is not a function of properties of that part of the universe that is earlier than this response (. . . ).' Here SPIN and TWIN are the first and second half of our Nature axiom, whilst MIN expresses a form of context-locality as well as the loose assumption that Alice and Bob may 'freely choose' their settings a and b, respectively. Accordingly, in our notation, Conway and Kochen only use the parameter space Z, rather than the full space X we need in order to consistently axiomatize determinism. Their formulation contains an implicit assumption of determinism, whose precise nature only becomes clear from their proof, and which is akin to our formulation, except for the crucial difference that the function they allude to only acts on the particle variables and not on the settings of the experiment (of which, as already noted, Conway and Kochen just say that the experimenters can 'freely choose' them).
Conway and Kochen paraphrase their theorem as follows: 'if indeed we humans have free will, then elementary particles already have their own small share of this valuable commodity. More precisely, if the experimenter can freely choose the directions in which to orient his apparatus in a certain measurement, then the particles response (to be pedantic-the universe's response near the particle) is not determined by the entire previous history of the universe. (. . . ) our theorem asserts that if experimenters have a certain freedom, then particles have exactly the same kind of freedom. Indeed, it is natural to suppose that this latter freedom is the ultimate explanation of our own. (. . . ) Granted our three axioms [i.e., the physical ones and freedom of choice], the Free Will Theorem shows that nature itself is nondeterministic.' However, such far-reaching conclusions seem unwarranted by the actual technical content of the theorem. Indeed, though it is also assumed in Bell's first theorem (see §6.5 below), the conjunction of Determinism and Freedom is a priori is uncomfortable, especially since the main novelty of the FWT lies in the emphasis Conway and Kochen (unlike Bell) put on free will. The authors acknowledge at least this point already on the first page of their first paper (Conway & Kochen, 2006), in which they anticipate criticism of the kind: "'I saw you put the fish in!" said a simpleton to an angler who had used a minnow to catch a bass.' Indeed, also after more serious philosophical analysis, it has been concluded that: 'Their [Conway & Kochen's] case against determinism thus has all the virtues of theft over honest toil. It is truly indeterminism in, indeterminism out.' (Wüthrich, 2011) Our formulation of the FWT, in which the original allusion to undefined free will in allowing arbitrary settings of the experiment has been replaced by complete determinism including the settings, avoids this criticism.
To derive (6.35) -(6.38), we use (6.21) to write down the formulae For example, for any pair of unit vectors u, v we have which gives (6.36). The other cases are similar. The implications of the finite precision loophole of the Kochen-Specker Theorem for the Free Will Theorem were analyzed by Hermens (2014), who concluded that this loophole does not apply. We give a more precise argument to this effect.
We have dense colorable subsets X c A ⊂ X A and X c B ⊂ X B = X A , where X c A may or may not coincide with X c B . If not, the perfect correlation condition (6.54) in the Nature assumption cannot even be stated, but even if X c A = X c B , since finite precision of experiment has been declared to be an issue it would be quite out of character to impose (6.54). Instead, one needs a probabilistic version of this condition, of which it will turn out that it cannot be satisfied. As in the notes to the previous section, for each density matrix ρ one needs a probability measure μ ρ on Z that reproduces the statistical quantum-mechanical predictions for the associated quantum state. Compared to the notes to the previous section, the role of W is now played by z, in that for given F and G one might write W (a, b) = (F(a, z),Ĝ(b, z). (6.236) This measure may be constructed analogously to (6.232), i.e., for any sequence (a (k) ) of bases drawn from X c A , any sequence (b (k) ) of bases drawn from X c B , and any sequences (λ (k) ) and (γ (k) ) in Λ , cf. (6.22), where k ∈ K ⊂ N is arbitrary, we define where, as in the main text, Note that J 2 u i acts on Alice's Hilbert space C 3 whilst J 2 v j acts on Bob's. In particular, for fixed k 0 ∈ K and λ , γ ∈ Λ , we have the special case of (6.237) for compatible measurements, viz.
where in the main text we would have written P ρ (F = λ , G = μ|A = a, B = b) for the right-hand side. Hence for the correlated state ρ = |ψ 0 ψ 0 | we obtain from (6.42): which of course vanishes if u i = v j . If the expression 1 − u i , v j 2 appearing here is small, then the projections e u i and e v j are close (in norm), since Eq. (6.240) therefore allows us to make rigorous sense of Hermens' (2014) heuristic idea that the assumption (6.54) in the FWT should be modified as follows: 'if e u i − e v j is small, then in most of the casesF i (a, z) =Ĝ j (b, z).' Namely, we replace (6.54) by the following approximate correlation condition: Indeed, if the theory existed, on could simply take δ = ε. However, a theory satisfying (6.242) does not exist, as can be proved by contradiction: ifF i (a, z) =Ĝ j (b, z) for all pairs (u i , v j ) such that 1 − u i , v j 2 < ε, then the proof of Theorem 6.13 shows not only that (6.32) still holds on the modified Nature assumption (so that F(·, z) again defines a coloring of S 2 ), but that in addition we have z). (6.243) In particular, the apparently weaker correlation condition ending with (6.242) is actually stronger than its exact counterpart (6.54). Thus Theorem 6.13 still holds on this revised Nature assumption, so that unlike the Kochen-Specker Theorem, the Free Will Theorem is immune to the finite precision loophole. The price for this immunity is that, quite against the spirit of the FWT, some probabilistic reasoning had to be invoked, so that the difference between the FWT and Bell's first theorem has blurred even further.
Unfortunately, confusion may arise if the quotation in the main text 'if I did it, a law would be broken' from Lewis (1981) is subjected to the following explanation: 'On Lewis's account of counterfactuals, the truth conditions for counterfactuals-what makes them true-are as follows. Suppose we have the counterfactual 'if A had been the case, B would have been the case' (so if A is 'I miss the bus' and B is 'I'm late', this counterfactual just says, 'if I'd missed the bus, I would have been late'). This counterfactual will be true if and only if, at the closest possible world to the actual world at which A is true, B is also true. So, our sample counterfactual, 'if I'd missed the bus, I would have been late', is true if and only if: at the closest possible world to the actual world at which I miss the bus, I'm late.' (Beebee, 2013, p. 60).
Removing any possible remaining doubt, on p. 62 she mentions that the closest possible world where I miss the bus is the world w. According to this explanation, then, Lewis's sentence 'if I did it, a law would be broken', would mean that at the closest possible world to the actual world in which I did it, a law is broken, i.e., in w. But according to Beebee's definition quoted in the main text of what Lewis means by a miracle, apparently this is not the right reading (and indeed it would, in our view, be nonsensical). Moreover, Lewis (1981) emphasizes that in the first bullet point in the main text above-which he defends-it is not the agent who would break a law, whereas in the second bullet point -rejected by Lewis-it is; in the first it is the breaking of some law at an earlier time that enables the agent to do what she, in our actual world, did not do. Thus Lewis's phrasing seems awkward.
Our development of Lewis's argument is indebted to Vihvelin (2013, pp. 164-165), who (re)states Lewis's first bullet point as the following conjunction: 1. Slightly Different Past: If I had raised my hand, the past would still have been exactly the same until shortly before the time of my decision. 2. Slightly Different Laws: If I had raised my hand, the laws would have been ever so slightly different in a way that permitted a divergence from the lawful course of actual history shortly before the time of my decision.
A second way in which Alice could (counterfactually) have raised here hand is through an instant (counterfactual) modification of the state of the world, as in Bennett (1984 Here we prefer to write Different Past, since even though in this scenario the state indeed (by determinism) would have been different all the way back to the Big Bang, the entire trajectory of the world may or may not be close to the actual one. In this scenario, the two cases Lewis distinguishes take the form in the main text.
Since the main novelty of their papers lies in the emphasis on free will, the reader might wonder what Conway & Kochen themselves have to say about the subject. As we can read in the delightful biography of Conway by Roberts (2015), or watch in his video lectures on the Free Will Theorem (Conway, 2009), free will is indeed of great importance to at least the first author of the theorem. Unfortunately, his interest in free will seems unaccompanied by any philosophical sophistication, e.g.: 'Compatibilism in my view is silly. Sorry, I shouldn't just say straight off that it is silly. Compatibilism is an old viewpoint from previous centuries when philosophers were talking about free will. The were accustomed to physical theory being deterministic. And then there's the question: How can we have free will in this deterministic universe? Well, they sat and thought for ages and ages and ages and read books on philosophy and God knows what and they came up with compatibilism, which was a tremendous wrenching effect to reconcile 2 things which seemed incompatible. And they said they were compatible after all. But nobody would ever have come up with compatibilism if they thought, as turns out to be the case, that science wasn't deterministic. The whole business of compatibilism was to reconcile what science told you at the time, centuries ago down to 1 century ago: Science appeared to be totally deterministic, and how can we reconcile that with free will, which is not deterministic? So compatibilism, I see it as out of date, really. It's doing something that doesn't need to be done. However, compatibilism hasn't gone out of date, certainly, as far as the philosophers are concerned. Lots of them are still very keen on it. How can I say it? If you do anything that seems impossible, you're quite proud when you appear to have succeeded. And so really the philosophers don't want to give up this notion of compatibilism because it seems to damned clever. But my view is it's really nonsense. And it's not necessary. So whether it actually is nonsense or not doesn't matter.' (Conway, quoted in Roberts, 2015, pp. 361-362).
Finally, our version of van Inwagen's (1975) Consequence Argument is due to Beebee (2003), and the novel parts of this section are based on Landsman (2016c). For interesting philosophical criticism of this approach, see De Mola (2016). §6.4. Technical intermezzo: The GHZ-Theorem The GHZ Theorem appeared in Greenberger et al (1990) See also Clifton, Redhead, & Butterfield (1991) and Bub (1997). Innumerable variations on and generalizations of such arguments may be given, leading to equally many Free Will Theorems. All of these have their roots in algebraic properties of matrices, which hidden variable theories (in vain) try to reproduce. §6.5. Bell's theorems The original contributions to the theme of this section are Bell (1964Bell ( , 1976, of which the first is one of the most famous papers of 20th century theoretical physics. Since there are more than 10,000 papers citing Bell (1964) alone, it is impossible to discuss all literature relevant to Bell's work. What we call his first theorem originates with Bell (1964), which incidentally was written after Bell (1966), but our treatment of the settings (taken from Cator & Landsman, 2014) is different. Though originally motivated as an attempt to make the Free Will Theorem look less of a petitio principii, it also addresses a problem Bell faced even according to some of his staunchest supporters (Norsen, 2009;Seevinck & Uffink, 2011), namely the tension between the idea that the hidden variables (in the pertinent causal past) should on the one hand include all ontological information relevant to the experiment, but on the other hand should leave Alice and Bob free to choose any settings they like.
His second theorem comes from Bell (1976), followed by Bell (1990a). Apart from his own papers, which are reprinted in Bell, Gottfried & Veltman (2001) Unfortunately, we have not been able to come to grips with (and hence do not cite) literature claiming that Bell's theorems are false, or have nothing to do with hidden variables, or prove that quantum mechanics (if not nature itself!) is nonlocal per se, or that he never changed his mind and only has one theorem saying it all.
The verification of (6.102) -(6.105) is analogous to the above computations deriving (6.35) -(6.38). In terms of the unit vector v a = cos a sin a , (6.244) the observable F Alice measures on setting A = a is the projection e a = |v a v a |, and similarly for Bob. Hence the corresponding Born probabilities are given by For example, we have ψ 0 , e a ⊗ e b ψ 0 = 1 2 e 1 ⊗ e 1 + e 2 ⊗ e 2 , |v a v a | ⊗ |v b v b |(e 1 ⊗ e 1 + e 2 ⊗ e 2 ) = 1 2 e 1 ⊗ e 1 + e 2 ⊗ e 2 , (cos a cos b + sin a sin b)v a ⊗ v b = 1 2 (cos a cos b + sin a sin b) 2 = 1 2 cos 2 (a − b).
The CHSH-inequality (6.117) is due to Clauser, Horne, Shimony, & Holt (1969). The definitive (i.e., loophole-free) experimental verification of its violation in nature is Henson et al. (2015). A direct proof starts of (6.117) from the simpler inequality etc., and notes that each term on the left-hand side of (6.245) also occurs on the righthand side. Since each term lies in [0, 1] and hence is positive, this implies (6.245). Our proof of Proposition 6.17 follows Werner & Wolf (2001), as does our proof of Theorem 6.18 (though not our formulation thereof, which once again derives from Cator & Landsman (2014). This proof shows that, as first noted by Fine (1982) and analyzed more deeply in Butterfield (1992b), there is no real distinction between the possibility of reproducing given (empirical) probabilities P(F = λ , G = γ|A = a, B = b) that satisfy Bell locality by a local deterministic hidden variable theory or by a local stochastic hidden variable theory. Most current research in this direction, sparked by Popescu & Rohlich (1994), is therefore concerned with theories defined by formal joint conditional probabilities that satisfy a no signaling condition like OI instead of Bell locality, cf. Bub (2011b) and Brunner et al (2014) for reviews. Formal conditional probabilities of the kind that Bell's second theorem uses have been axiomatized by e.g. Popper (1938) and Rényi (1955); the following axioms are theorems if conditional probabilities are definedà la Kolmogorov by (1.1). Let Σ be some σ -algebra and let F ⊂ Σ \{ / 0} be an ideal in Σ in the sense that if B ∈ Σ and C ∈ F , then B ∩C ∈ F . A conditional probability on (Σ , F ) is a map P : Σ × F → [0, 1]; (6.246) (A,C) → P(A|C), (6.247) such that: 1. For each C ∈ F the map A → P(A|C) is a probability measure on Σ ; 2. P(A ∩ B|C) = P(A|B ∩C) · P(B|C), for each A, B ∈ Σ and C ∈ F .
Generalizations of Theorem 6.19 to operator algebras were given e.g. by Baez (1987), Raggio (1988), Werner (1989), and Bacciagaluppi (1993, as follows. Let A and B be unital C*-algebras, with projective tensor product A⊗B (i.e., the completion of the algebraic tensor product A ⊗ B in the maximal C*-cross-norm), cf. §C.13; the choice of the projective tensor product guarantees that each state on A ⊗ B extends to a state on A⊗B by continuity; conversely, since A ⊗ B is dense in A⊗B, each state on the latter is uniquely determined by its values on the former. In particular, product states ρ ⊗ σ and mixtures ω = ∑ i p i ρ i ⊗ σ i thereof are well defined on A⊗B. If A ⊂ B(H 1 ) and B ⊂ B(H 2 ) are von Neumann algebras, and all states considered are normal, it is easier to work with the spatial tensor product A⊗B, defined as the double commutant (or weak completion) of A ⊗ B in B(H 1 ⊗ H 2 ). Any normal state on A ⊗ B extends to a normal state on A⊗B by continuity. Below we use⊗, but the results also work for ⊗. In what follows, A and B are unital C*-algebras. Definition 6.23. Let ω be a state on A⊗B.
1. A product state is a state of the form ω = ρ ⊗ σ , i.e., ω is defined by linear (and continuous) extension of ω(a ⊗ b) = ρ(a)σ (b). 2. A state ω is uncorrelated when it is in the w * -closure of the convex hull of the product states on A⊗B. In particular, states ω = ∑ i p i ρ i ⊗ σ i , where p i > 0 and ∑ i p i = 1, are uncorrelated (w * -convergent infinite sums are allowed here).

A state is correlated when it is not uncorrelated.
An uncorrelated state ω is pure precisely when it is a product of pure states. This has the important consequence that both its restrictions ω |A and ω |B to A and B, respectively, are pure as well (the restriction ω |A of a state ω on A⊗B to, say, A is given by ω |A (a) = ω(a ⊗ 1 B ), where 1 B is the unit element of B, etc.). A correlated pure state has the property that its restriction to A or B is mixed.
Proposition 6.24. The following conditions are equivalent: • Each state on A⊗B is uncorrelated; • Each pure state on A⊗B is a product state; • At least one of the C*-algebras A and B is commutative.
Corollary 6.25. Correlated states exist iff A and B are both noncommutative.
As one might expect, this result is closely related to the Bell inequalities: Proposition 6.26. For any ω ∈ S(A⊗B), the following conditions are equivalent: • ω is uncorrelated.
Corollary 6.27. If A or B is commutative, then (6.249) holds for all states ω.
An elegant geometric approach to the Bell inequalities was developed by Pitowsky (1989,1994), which we now summarize (also cf. Werner & Wolf, 2001).
Suppose we have a bipartite experiment with m different settings A = a 1 ,...a m and B = b 1 ,..., b m on each wing, and binary outcomes, i.e., in {0, 1}. We now denote the probability P(F = 1|A = a i ) that F(a i ) (i.e. the particular property measured by experiment F at setting a i ) is true by p i (i = 1,..., m), and likewise we write p j+m for P(G|B = b j ), i.e., the probability that G(b j ) is true, once again for j = 1,..., m. Furthermore, we abbreviate the probability that F(a i ) and G(b j ) are both true by p i, j+m ≡ P(F = 1, G = 1|A = a i , B = b j ) (i, j = 1,..., m). (6.250) The 2m+m 2 "surface probabilities" p = (p 1 ,..., p 2m , p 1,m+1 ,..., p m,2m ) form a vector in R 2m+m 2 , which we wish to constrain by the following assumption: there is a fact of the matter underlying each experiment according to which the pair (F(a i ), G(b j )) already had a truth value for each possible setting (a i , b j ), independently of any measurement being carried out or not ("local realism"). Thus the probabilities p (which now arguably have an ignorance interpretation) must lie in the convex polytope in R |2m+m 2 | defined as the convex hull C m of the following set of (extreme) points: for each 2m-tuple λ = (λ 1 ,..., λ 2m ), where λ i ∈ {0, 1}, define x λ = (λ 1 ,..., λ 2m , λ 1 · λ m+1 ,..., λ m · λ 2m ) ∈ R 2m+m 2 , (6.251) i.e., the entry at place k is λ k (k = 1,..., 2m) and the entry at place (i, j) is λ i · λ m+ j , where i, j = 1,..., m. The interpretation of this is that x λ represents the particular fact of the matter where F(a i ) has truth value λ i and G(b j ) has truth value λ m+ j , so that their conjunction (F(a i ), G(b j )) has truth value λ i · λ m+ j . In this state the probability of the said configuration is one and all other states have probability zero; arbitrary probability assignments then lie in C m . The point, then, is to characterize the convex polytope C m ⊂ R 2m+m 2 through a finite set of inequalities, which turn out to be generalized Bell inequalities. Seeing this result requires some background. Let V be a real topological vector space with (continuous) dual V * ; if V = R n we may also put V * = R n and write ϕ(v) as an inner product ϕ, v in what follows.
1. Any (not necessarily convex) subset S ⊂ V has a polar S o ⊂ V * defined by which is a closed convex subset of V * . If S = K is a compact convex set, we have In particular, if K a closed convex set containing the origin, then K oo = K, (6.255) and hence, if K o is a compact convex set, we may reconstruct K from K o as K = {v ∈ V | ϕ(v) ≤ 1 ∀ϕ ∈ ∂ e K o }. (6.256) 3. In particular, if K is a convex polytope in a finite-dimensional vector space containing the origin, then so is K o . In that case, ∂ e K o is a finite set and so points in K are characterized by a finite set of linear inequalities (6.256), which describe the faces of the polytope. In this case, the associated (dual) description of K is called the Minkowski-Weyl Theorem, see e.g. Paffenholz (2010) for applications.