## Introduction

We are interested in defining bounds on the algorithmic capabilities of a mathematical theory and in analysing their implications. We articulate our views from a logical standpoint, by postulating the following principle of rationality:

1. (Coherence)

The theory should be logically consistent.

This is what we essentially require to each well-founded mathematical theory: it has to be based on a few axioms and rules from which we can unambiguously derive its mathematical truths. The next postulate defines the computational limitations we want our theory to be subject to:

1. (Computation)

Inferences in the theory should be computable in polynomial time.

The second postulate will turn out to be central. It requires that there should be an efficient way to execute the theory, and in fact we are going to adopt the metaphor of a computer that executes the theory, i.e., that yields inferences out of it.

In what follows, we shall develop our considerations with regard to the special case of a theory of uncertainty. It will essentially coincide with the Bayesian theory once it is freed of the constraint of completeness (or precision); loosely speaking, such a theory is equivalent to modelling uncertainty with sets of probabilities. This choice will define a perimeter for the mathematical technicalities, while focusing on a case of wide interest and impact.

The postulates of coherence and computation are apparently in conflict with each other: intuitively, if the computer can only execute polynomial tasks, the theory will be consistent only up to what polynomial calculus allows. This is a view from outside the computer, however; it is the view of a hypothetical ‘classical’ observer with no computational limitations and thus external to the theory. An observer that is instead internal to the theory and behaves according to it is still subject to the coherence of the theory; it will therefore be impossible to prove any inconsistency from the inside. This is an instance of what we call an external-internal clash.

1. 1.

We formalise such a clash by what we refer to as ‘the weirdness theorem’. It shows that any theory of ‘algorithmic rationality’, that is, one that obeys the two postulates of coherence and computation, necessarily departs in a very peculiar way from the probabilistic point of view. In particular, the theorem proves that all models compatible with the theory will present some negative ‘probabilities’ (these models are sometimes referred to as ‘quasi-probabilities’ in the literature). Negative probabilities are however incompatible with classical rationality, and for this reason a hypothetical classical observer may regard the internal world as incoherent. Equivalently, to a classical observer the behaviour of the internal world may appear to be incompatible with so-called classical evaluation functionals (a concept used in particular in quantum logic).

2. 2.

We then define a theory of probability on a continuous space of complex vectors that complies with the two postulates of coherence and computation, and we show that its deductive closure (internal view) is tantamount to quantum theory (QT). The complex vectors represent the possible states of the computer while it runs the theory, and any such state bears within it the properties of the particles in QT, such as their directions or angles of polarisation.

By framing it as a theory of rationality, we therefore view QT as a normative theory guiding an agent to assess her subjective beliefs on the results of a quantum experiment. As we are going to stress all along the paper, we ground the normativity of QT on three aspects: firstly, its deductive structure is tantamount to a logical theory, and therefore it is based on a requirement of consistency (coherence)—to follow the rule of QT is to be assured to be consistent. Secondly, the model is based on a possibility (phase) space whose elements are interpreted as states of the world. Finally, we advance that the specific features of our world that ground the use of QT is its being a computation.

The external-internal clash, when transposed to QT, is thus a clash between a computational view and a view stemming from classical physics, the weirdness theorem providing its formal, mathematical formulation. When we try to give a classical physical interpretation to QT we fail, because classical physics, in our common understanding, needs classical probability, and the latter grounds its normativity essentially only on its internal consistency, given the fact that it does not require any limitations on the computational resources available for executing its inferences. As such, to an external observer, QT presents a number of weird phenomena, such as entanglement, and is made up of negative probabilities or is characterised by a non-Boolean structure of events. We will show that this weirdness follows by the computational postulate.

There is also more to it. In our framework QT is naturally based on sets of (or imprecise) probabilities: in fact, requiring the computation postulate is similar to defining a probabilistic model using only a finite number of moments;Footnote 1 and therefore, implicitly, to defining the model as the set of all the (quasi-)probabilities compatible with the given moments’ constraints.

3. 3.

As mentioned above, quantum paradoxes appear to be entirely a consequence of the weirdness theorem; in particular, the weirdness does not follow from having to deal with complex number or quantised states.

We enforce this view by working out another example theory, which is unrelated to QT. Such a theory uses real numbers to model the experiment of tossing a classical coin under algorithmic rationality. Eventually the theory turns out to be based on Bernstein polynomialsFootnote 2 and to admit entanglement. This shows in addition that the quantum-logic and the quasi-probability foundations of QT are two faces of the same coin, being natural consequences of the computation principle, as formalised by the weirdness theorem.

In order to develop the results mentioned above, we rely on a dualFootnote 3 characterisation of probability in terms of lotteries (or gambles). In doing so, we thus provide a subjective foundation, à la de Finetti, of so-called generalised probability theories. Compared to algebraic or information-based extensions of probability theories (e.g., [3, 4]), a gambling foundation, which emphasises the notion of logical consistency, ensures soundness, and naturally provides a ground for comparing different theories—it boils down to assess the compatibility of their different notions of consistency.

### Related Work

The perspective given in this paper may be related to the agent-centered interpretation of QT advanced by QBism [5, 6]. However, by grounding the use of QT particularly in the world as being a computation, we depart from QBism, which puts at the center the Born rule but, for now, is unable to ground its use on something else than a coherence constraint. In this, our view looks more similar to the one advanced by Pitowsky , whose empirical premise in the derivation of the Born rule is that the structure corresponding to the outcomes of incompatible measurements is a non-Boolean algebra.Footnote 4

There is a long tradition of denying to quantum states any reference to the outside world that can be traced back at least to Bohr, and more broadly to the Copenhagen interpretation of QT. Similar contemporary views are labelled as $$\psi$$-epistemic. In addition to QBism, they include for instance Healey’s quantum pragmatism , Rovelli’s relational quantum mechanics , and the empiricist interpretation of de Muynck .Footnote 5 All these interpretations “do not view the quantum state as an intrinsic property of an individual system and they do not believe that a deeper reality is required to make sense of quantum theory” [12, p. 72]. Opposite to this tradition stand $$\psi$$-ontic views such as the many world interpretation [13, 14], hidden variable theories like Bohmian mechanics [15, 16], collapse theories [17,18,19,20], or the transactional interpretations [21,22,23,24], the common trait being that quantum states are regared as descriptions of physical systems.

About the quantum-classical probability clash, since QT foundation, there have been two main ways to explain it. The first one, which goes back to Birkhoff and von Neumann , explains this differences with the premise that, in QT, the Boolean algebra of events is taken over by the ‘quantum logic’ of projection operators on a Hilbert space. The second one is based on the view that the quantum-classical clash is due to the appearance of negative probabilities . More recently, this research programme has been explored following different avenues: extending Boolean logic [25, 27, 28], using operational primitives [3, 29,30,31], using information-theoretic postulates [3, 32,33,34,35,36,37,38,39], building upon the subjective foundation of probability [5,6,7, 40,41,42,43,44,45,46,47], and starting from the phenomenon of quantum nonlocality [3, 33, 34, 48, 49].

Without aiming at reconstructing QT, the present manuscript provides an alternative and original explaination of the differences between quantum and classical probability: the algorithmic intractability of classical probability theory contrasted to the polynomial-time complexity of QT.

### Outline of the Paper

Section 2 is concerned with the coherence postulate. We recall the relatively little known fact that (imprecise) probability is the mathematical dualFootnote 6 of a coherent logical theory. Addressing consistency (coherence or rationality) in such a setting is a standard task in logic; in practice, it reduces to prove that a certain real-valued function is nonnegative.

Section 3 details the computation postulate and its role in developing a model of algorithmic rationality. We consider the problem of verifying the nonnegativity of a function as above. This problem is generally undecidable or, when decidable, NP-hard. We make the problem polynomially solvable by redefining the meaning of (non)negativity. We give our fundamental weirdness theorem (Theorem 1) showing that such a redefinition is at the heart of the clash between classical probability and algorithmic rationality.

We show in Sect. 4 that QT is a special instance of algorithmic rationality and hence that Theorem 1 is the distinctive reason for all quantum paradoxes: the case of entanglement is detailed in Sect. 4.3; in Sect. 4.4 we show that the witness function, in the fundamental ‘entanglement witness theorem’, is nothing else than a negative function whose negativity cannot be assessed in polynomial time—whence it is not ‘negative’ in QT.

Section 5 devises a further theory related to the experiment of tossing a classical coin under algorithmic rationality, which is hence unrelated to QT. We show that the theory admits entangled states, as prescribed by the weirdness theorem.

We give our concluding views in Sect. 6.

Appendix A discussed more in detail the relation of our view of QT highlighting in particular some aspects of our theory as well as some connection with other research fields. Appendix B contains the proofs of formal statements.

## Classical Rationality

De Finetti’s subjective foundation of probability  is based on a notion of rationality (consistency or coherence). The idea is that of introducing a betting scheme and defining bettors as rational if their stakes are placed so as to avoid a sure loss (this is traditionally called a Dutch book; Economics refers to it as ‘arbitrage’). De Finetti shows that avoiding sure loss is equivalent to representing a bettor’s beliefs through classical (subjective) probability, thus providing a solid foundation for the latter.

### Desirability

What is less known, however, is that de Finetti’s bright intuition has greatly been extended in [52, 53], giving rise to the so-called theory of desirable gambles (TDG). This can equivalently be regarded as a reformulation of the well-known Bayesian decision theory (à la Anscombe-Aumann ) once it is freed of the constraint to deal with complete preferences [55, 56]. TDG is a dual theory of probability in the sense that probability is recovered from TGD through standard mathematical duality. In such a dual form, TDG appears just as a set of logical axioms.

These axioms have a natural interpretation as rationality requirements in the way a ‘classical’ subject (we call him Isaac), accepts gambles on the results of an uncertain experiment. For instance, Isaac might claim ‘I find the gamble that returns 1 utilesFootnote 7 if the coin lands heads (H) and $$-2$$ utiles if it lands tails (T) to be desirable’. This means that he is willing to accept the gamble $$g=(1,-2)$$, that is, $$g(H)=1$$ and $$g(T)=-2$$: that is, to commit to both win 1 utile if the coin lands heads and lose 2 utiles if it lands tails.

Gambles are thus rewards about the uncertain outcome of an ‘experiment’, such as tossing a coin in the example above. We denote with $$\varOmega$$ its possibility space (e.g., $$\{heads, tails\}$$, $${\mathbb {R}}^n$$, $${\mathbb {C}}^n$$). For many experiments, there may be more than one possibility space of interest to the ‘experimenter’, $$\varOmega _1,\varOmega _2,\ldots ,\varOmega _k$$. A possibility space describing the joint outcome of this k-valued experiment can be constructed as the Cartesian product $$\varOmega = \varOmega _1 \times \varOmega _2 \times \cdots \times \varOmega _k$$.Footnote 8 Formally, a gamble g on $$\varOmega$$ is a bounded real-valued function of $$\varOmega$$.

In an experiment, not all the quantities are observable and, therefore, bettable; we denote by $${\mathscr {L}}_R$$ the restricted set of all ‘permitted gambles’ on $$\varOmega$$. We assume that $${\mathscr {L}}_R$$ is a linear space (a vector space) including the constant functions. The subset of all nonnegative gambles in $${\mathscr {L}}_R$$, that is, of gambles for which Isaac is never expected to lose utiles, is denoted as $${\mathscr {L}}^{\ge } _R:=\{g \in {\mathscr {L}}_R: \inf g\ge 0 \}$$ (analogously negative gambles are denoted as $${\mathscr {L}}^{<} _R:=\{g \in {\mathscr {L}}_R: \sup g < 0 \}$$). In the following, with $${\mathscr {G}}:=\{g_1,g_2,\ldots ,g_{|{\mathscr {G}}|}\} \subset {\mathscr {L}}_R$$ we denote a finite set of gambles that Isaac finds desirable:Footnote 9 these are the gambles that he is willing to accept and thus commits himself to the corresponding transactions.

The crucial question is how to provide a criterion for a set $${\mathscr {G}}$$ of gambles, representing assessments of desirability, to be regarded rational. As we said, rationality is traditionally imposed by avoiding sure losses: that is, by requiring that Isaac should not be forced to find a negative gamble desirable as a logical consequence of his initial assessments of desirability. An elegant way to formalise this intuition is to regard $${\mathscr {L}}_R$$ as an algebra of formulas on top of which to define a logic. This leads us directly to formulate rationality as logical consistency.

To proceed on this route, we first need to define an appropriate logical calculus (characterising the set of gambles that Isaac must find desirable as a consequence of having desired $${\mathscr {G}}$$ in the first place) and based on it to characterise the family of consistent sets of assessments.

First of all, since nonnegative gambles may increase Isaac’s utility without ever decreasing it, we first have that:

1. A0.

$${\mathscr {L}}^{\ge } _R$$ should always be desirable.

This defines the tautologies of the calculus.

Moreover, whenever fg are desirable for Isaac, then any positive linear combination of them should also be desirable (this amounts to assuming that Isaac has a linear utility scale, which is a standard assumption in probability). Hence the corresponding deductive closure of a set $${\mathscr {G}}$$ is given by:

1. A1.

$${\mathcal {K}}:={{\,\mathrm{posi}\,}}({\mathscr {L}}^{\ge } _R\cup {\mathscr {G}})$$.

Here ‘$${{\,\mathrm{posi}\,}}$$’ denotes the conic hull operator.Footnote 10

In the betting interpretation given above, a sure loss for an agent is represented by a negative gamble: by accepting a negative gamble an agent will lose utiles no matter the output of the experiment. We are led therefore to the following:

### Definition 1

(Coherence postulate) A set $${\mathcal {K}}$$ of desirable gambles is coherent if and only if

1. A2.

$${\mathscr {L}}^{<} _R \cap {\mathcal {K}}=\emptyset$$.

Note that $${\mathcal {K}}$$ is incoherent if and only if $$-1 \in {\mathcal {K}}$$; therefore $$-1$$ can be regarded as playing the role of the Falsum in logic and hence A2 can be reformulated as

1. A2′.

$$-1 \notin {\mathcal {K}}$$.

An example that gives an intuition of the postulates is given in Fig. 1.

Postulate A2 (resp. A2$$'$$), which presupposes postulates A0 and A1, provides the normative definition of TDG, referred to by $${\mathcal {T}}$$. Moreover, as simple as it looks, alone it is the pillar of the foundation of classical subjective probability.

### Probability (The Desirability Dual)

Let us show that probability is dual to desirability as described in Sect. 2.1. First of all, however, let us make some terminology precise: when we write probability, as a function, we mean probability charge, i.e., a finitely additive probability.Footnote 11 In fact the Analysis literature calls ‘charge’ a finitely additive set function [58, Chap. 11]. It coincides then with what we have called a quasi-probability; if we instead want to refer to an actual probability, we have to use the qualified expression probability charge.

Assume $${\mathcal {K}}$$ is coherent. We give $${\mathcal {K}}$$ a probabilistic interpretation by first observing that, since $${\mathscr {L}}_R$$ is a topological vector space,Footnote 12 we can consider its dual space $${\mathscr {L}}_R^*$$ of all bounded linear functionals $$L: {\mathscr {L}}_R \rightarrow {\mathbb {R}}$$. Then the dual of $${\mathcal {K}}$$ is defined as:

\begin{aligned} {\mathcal {K}}^\circ =\left\{ L \in {\mathsf {S}} \mid L(g)\ge 0, ~\forall g \in {\mathscr {G}}\right\} , \end{aligned}
(1)

where $${\mathsf {S}}=\{L \in {\mathscr {L}}_R^* \mid L(1)=1,~~L(h)\ge 0 ~~\forall h \in {\mathscr {L}}_R^{\ge }\}$$ is the set of (belief) states; $$L(1)=1$$ means that linear functionals preserve the unitary gamble (normalisation). $$L(g)\ge 0$$ means that L(g) must be a nonnegative real number for all gambles $$g \in {\mathscr {G}}$$ that Isaac finds desirable.Footnote 13 To $${\mathcal {K}}^\circ$$ we can then associate its extension $${\mathcal {K}}^\bullet$$ in $${\mathscr {M}}$$, that is, the set of all probability charges on $$\varOmega$$ extending an element in $${\mathcal {K}}^\circ$$.

In other words, we can write L(g) as an expectation with respect to a probability: $$L(g)=\int _{\varOmega } g(\omega )d\mu (\omega )$$. One can then show that the extension $${\mathcal {K}}^\bullet$$ is equal to:

\begin{aligned} \begin{aligned} {\mathscr {P}}:={\mathcal {K}}^\bullet =\left\{ \mu \in {\mathsf {S}} \Big | \int _{\varOmega } g(\omega ) d\mu (\omega )\ge 0, ~\forall g\in {\mathscr {G}}\right\} ,\\ \end{aligned} \end{aligned}
(2)

where $${\mathsf {S}}=\{ \mu \in {\mathscr {M}} \mid \inf \mu \ge 0,~\int _{ \omega } d\mu (\omega )=1\}$$ is the set of all probability charges in $$\varOmega$$, and $${\mathscr {M}}$$ the set of all charges on $$\varOmega$$.

Equation (3) states that, whenever an agent is coherent, desirability of g corresponds to nonnegative expectation, that is $$\int _{\varOmega } g(\omega ) d\mu (\omega )\ge 0$$ for all probabilities in $${\mathscr {P}}$$. When $${\mathcal {K}}$$ is incoherent, $${\mathscr {P}}$$ turns out to be empty—there is no probability compatible with the assessments in $${\mathcal {K}}$$. Stated otherwise, satisfying the axioms of classical probability—that is being a nonnegative function that integrates to one—is tantamount of being in the dual of a set $${\mathcal {K}}$$ satisfying the coherence postulate (‘no-Dutch book’).

## Algorithmic Rationality

Let us reconsider the classical theory introduced in the previous section. Assume that Isaac makes an initial (finite) set of assessments $${\mathscr {G}}$$, which represent his initial beliefs about an experiment. In order to evaluate Isaac’s desirability of a further gamble $$f \in {\mathscr {L}}_R$$, we need to solve the membership problem $$f \overset{?}{\in } {\mathcal {K}}$$. This can equivalently be expressed as the following nonnegativity decision problem:

\begin{aligned} \begin{aligned} \exists \lambda _i\ge 0:f-\sum \limits _{i=1}^{|{\mathscr {G}}|} \lambda _i g_i \in {\mathscr {L}}^{\ge }_R. \end{aligned} \end{aligned}
(4)

If the answer is ‘yes’, then the gamble f belongs to $${\mathcal {K}}$$, which is the conic closure of $${\mathscr {G}}\cup {\mathscr {L}}^{\ge }_R$$, and this proves its desirability. Note that checking whether $${\mathcal {K}}$$ is coherent or not is tantamount to solving (4) for $$f=-1$$.

### Algorithmic Desirability

However, computing such an inference may be ‘costly’, if not virtually unfeasible. Indeed, when $$\varOmega$$ is infinite (later on we shall consider the case $$\varOmega \subset {\mathbb {C}}^n$$) and for generic functions $$f,g_i$$, the nonnegativity decision problem is undecidable. In this paper, we consider the case where gambles are (complex) multivariate polynomials of degree at most d. In this case, by Tarski-Seidenberg’s quantifier elimination theory [59, 60], the problem (4) becomes decidable but still intractable, being in general NP-hard. From this perspective, the classical theory is therefore not suitable for constituting a realistic model of rationality.

The idea of modifying the standard theory by considering computational issues traces back to the work of Good  and Simon . Since then there have been two main approaches to the problem, either by charging an agent for doing costly computation (as initiated in ), or by limiting the computation that agents can do (as initiated in , and first used in the context of decision theory in Footnote 14). In what follows, we take inspiration from the second approach, and, employing a terminology stemming from , develop a model of algorithmic rationality for the framework under consideration. Our subject in such an algorithmic world is now called Alice, to distinguish her from Isaac, who lives in the classical world.

The intuition behind our approach is the following. Assume that, due to computational, or other types of, limits, Alice can only work out the decision problem (4) for a closed subcone $$\varSigma ^{\ge }$$ of the nonnegative gambles $${\mathscr {L}}^{\ge }_R$$:

\begin{aligned} \begin{aligned} \exists \lambda _i\ge 0:f-\sum \limits _{i=1}^{|{\mathscr {G}}|} \lambda _i g_i \in \varSigma ^{\ge }. \end{aligned} \end{aligned}
(5)

This means in particular that there will be a nonnegative gamble $$f \in {\mathscr {L}}^{\ge }_R$$ that Alice cannot actually assess as nonnegative; thus she may well decide not to accept it. Similarly, Alice’s initial set of assessments $${\mathscr {G}}$$ may contain a negative gamble but this notwithstanding the answer to the corresponding coherence decision problem may be positive (solving (5) for $$f=-1$$ may lead to a negative answer).

Should these behaviours be counted as rational? Logic claims that they should: in fact, from the perspective of an agent whose rationality is constrained by $$\varSigma ^{\ge }$$, a collection of assessments is logically consistent whenever its deductive closure contains all tautologies as given by $$\varSigma ^{\ge }$$ but does not contain $$-1$$, the Falsum.

In other words, an algorithmic TDG, which we denote by $${\mathcal {T}}^\star$$, should be based on a logical redefinition of the tautologies, i.e., by stating that

1. B0.

$$\varSigma ^{\ge }$$ should always be desirable,

in the place of A0, where $$\varSigma ^{\ge }$$ is a closed subcone of $${\mathscr {L}}^{\ge } _R$$ whose corresponding membership problem (5) delimits the type of computation that an agent can actually do.

The rest of the theory follows exactly the footprints of $${\mathcal {T}}$$. In particular, the deductive closure for $${\mathcal {T}}^\star$$ is defined by:

1. B1.

$${\mathcal {C}}:={{\,\mathrm{posi}\,}}(\varSigma ^{\ge }\cup {\mathscr {G}})$$.

And finally the coherence postulate is simply reformulated by stating that a set $${\mathcal {C}}$$ of desirable gambles is said to be A-coherent if and only if

1. B2.

$$-1 \notin {\mathcal {C}}$$,

where ‘A’ stands for the the fact that in $${\mathcal {T}}^\star$$ the algorithmic bounds of the coherence problem for a finite set of assessments are established according to the particular choice of $$\varSigma ^{\ge }$$.

Hence, $${\mathcal {T}}^\star$$ and $${\mathcal {T}}$$ have the same deductive apparatus; they just differ in the considered set of tautologies, and thus in their (in)consistencies. An example that gives an intuition of the postulates is given in Fig. 2.

### Quasi-Probability (The Algorithmic Desirability Dual)

Interestingly, as we did previously, we can associate a ‘probabilistic’ interpretation to the desirability calculus, now defined by B0–B2, through the dual of an A-coherent set.

So let us consider again the dual space $${\mathscr {L}}_R^*$$ of all bounded linear functionals $$L: {\mathscr {L}}_R \rightarrow {\mathbb {R}}$$. With the additional condition that linear functionals preserve the unitary gamble, the dual cone of an A-coherent $${\mathcal {C}}\subset {\mathscr {L}}_R$$ is given by

\begin{aligned} {\mathcal {C}}^\circ =\left\{ L \in {\mathsf {S}} \mid L(g)\ge 0, ~\forall g \in {\mathscr {G}}\right\} , \end{aligned}
(6)

where $${\mathsf {S}}=\{L \in {\mathscr {L}}_R^* \mid L(1)=1,~~L(h)\ge 0 ~~\forall h \in \varSigma ^{\ge }\}$$ is the set of states. To $${\mathcal {C}}^\circ$$ we can associate its extension $${\mathcal {C}}^\bullet$$ in $${\mathscr {M}}$$, that is, the set of all charges on $$\varOmega$$ extending an element in $${\mathcal {C}}^\circ$$. In other words, we can attempt to write L(g) as an ‘expectation’, that is, an integral with respect to a charge: $$L(g)=\int _{\varOmega } g(\omega )d\mu (\omega )$$. In general however this set does not yield a classical probabilistic interpretation to $${\mathcal {T}}^\star$$: in fact, whenever $$\varSigma ^{\ge }\subsetneq {\mathscr {L}}^{\ge } _R$$, there are negative gambles that Alice, given her rationality constrains, does not recognise as such and therefore, from her perspective, do not lead to a sure loss. This is stated more precisely by the following:

### Theorem 1

(The weirdness theorem) Assume that $$\varSigma ^{\ge }$$ includes all positive constant gambles and that it is closed (in $${\mathscr {L}}_R$$). Denote by $$\varSigma ^{<}$$ the interior of $$-\varSigma ^{\ge }$$. Let $${\mathcal {C}}\subseteq {\mathscr {L}}_R$$ be an A-coherent set of desirable gambles. The following statements are equivalent:

1. 1.

$${\mathcal {C}}$$ includes a negative gamble that is not in $$\varSigma ^{<}$$.

2. 2.

$${{\,\mathrm{posi}\,}}({\mathscr {L}}^{\ge } _R\cup {\mathscr {G}})$$ is incoherent, and thus $${\mathscr {P}}$$ is empty.

3. 3.

$${\mathcal {C}}^{\circ }$$ is not (the restriction to $${\mathscr {L}}_R$$ of) a closed convex set of mixtures of classical evaluation functionals.Footnote 15

4. 4.

The extension $${\mathcal {C}}^\bullet$$ of $${\mathcal {C}}^{\circ }$$ in the space $${\mathscr {M}}$$ of all charges in $$\varOmega$$ includes only non-probabilistic charges (those with some negative value).

Theorem 1 is the central result of this paper (its proof is in Appendix B).

It states that whenever $${\mathcal {C}}$$ includes a negative gamble (item 1), there is no classical probabilistic interpretation for it (item 2). The other points suggest alternative solutions to overcome this deadlock: either to change the notion of evaluation functional (item 3) or to use quasi-probabilities as a means for interpreting $${\mathcal {T}}^\star$$ (item 4). The latter case means that, when we write $$L(g)=\int _{\varOmega } g(\omega )d\mu (\omega )$$, then $$\mu (\omega )$$ satisfies $$1=L(1)=\int _{\varOmega } d\mu (\omega )=1$$ but it is not a probability charge.

Observe that requiring polynomial time complexity is just one way to create the conditions for Theorem 1 to hold. But there are others, in that it is enough that one single negative gamble belongs to $${\mathcal {C}}$$ to make the theorem hold. In other words, even if we allowed for exponential time complexity, there would still be gambles whose negativity we would not be able to evaluate (those that lead to undecidability). This is the reason why we use the terminology ‘algorithmic’ rationality, which appears to faithfully capture the idea that our capabilities are limited by the very fact of reasoning algorithmically.

However, and since we are particularly concerned with physics in this paper, we also embrace Aaronson’s point of view in :

$$\ldots$$ while experiment will always be the last appeal, the presumed intractability of NP-complete problems might be taken as a useful constraint in the search for new physical theories

as a reason to focus on a polynomial-time complexity definition of algorithmic rationality.

## QT as a Theory of Algorithmic Rationality

We are going to show that QT can be deduced from a particular instance of the theory $${\mathcal {T}}^\star$$. As a consequence, we get that the computation postulate, and in particular B0, is the unique reason for all its paradoxes, which all boil down to a rephrasing of the various statements of Theorem 1 in the considered quantum context.

### Setting

Let us initially focus on the possibility space we shall use. Consider first a single particle n-level system and let

\begin{aligned} {\overline{{\mathbb {C}}}}^{n}:=\{ x\in {\mathbb {C}}^{n}: ~x^{\dagger }x=1\}. \end{aligned}

In some cases we can interpret an element $$x\in {\overline{{\mathbb {C}}}}^{n}$$ as ‘input data’ for some classical preparation procedure. For instance, in the case of the spin-1/2 particle ($$n = 2$$), if $$\theta = [\theta _1 , \theta _2 , \theta _3 ]$$ is the direction of a filter in the Stern-Gerlach experiment, then x is its one-to-one mapping into $${\overline{{\mathbb {C}}}}^{2}$$ (apart from a phase term). For spin greater than 1/2, however, the element $$x\in {\overline{{\mathbb {C}}}}^{n}$$ cannot directly be interpreted only in terms of a ‘filter direction’. In our framework element x is thus better interpreted as the state of the ontological world, which we have sketched in the Introduction. It is a world that is not directly accessible to an observer inside the theory (Alice), albeit it has implications for observables within such a theory.

For a composite systems of m particles (each one is an $$n_j$$-level system), the joint possibility space is the Cartesian product

\begin{aligned} \varOmega =\times _{j=1}^m {\overline{{\mathbb {C}}}}^{n_j}. \end{aligned}

Having defined the possibility space, the next step is the definition of the observables, which define the gambles in our setting. Let us recall that in QT any real-valued observable is described by a Hermitian operator. This naturally imposes restrictions on the type of ‘permitted gambles’ g on a quantum experiment. For a single particle, given a Hermitian operator $$G\in {\mathscr {H}}^{n\times n}$$ (with $${\mathscr {H}}^{n\times n}$$ being the set of Hermitian matrices of dimension $$n \times n$$), a gamble on $$x \in {\overline{{\mathbb {C}}}}^{n}$$ can be defined as:

\begin{aligned} g(x)=x^\dagger G x. \end{aligned}

Since G is Hermitian and x is bounded ($$x^{\dagger }x=1$$), g is a real-valued bounded function ($$g(x)=\langle x|G|x \rangle$$ in ‘bra-ket’ notation). For a composite systems of m particles, the gambles are m-quadratic forms:

\begin{aligned} g(x_1,\ldots ,x_m)=(\otimes _{j=1}^m x_j)^\dagger G (\otimes _{j=1}^m x_j), \end{aligned}
(7)

with $$G \in {\mathscr {H}}^{n \times n}$$, $$n=\prod _{j=1}^m n_j$$, and where $$\otimes$$ denotes the tensor product between vectors regarded as column matrices. Therefore, we have that

\begin{aligned} {\mathscr {L}}_R =\{g(x_1,\ldots ,x_m)=(\otimes _{j=1}^m x_j)^\dagger G (\otimes _{j=1}^m x_j)\mid ~ G \in {\mathscr {H}}^{n \times n}, x=[x_1,\ldots ,x_m]\in \varOmega \} \end{aligned}
(8)

is the restricted set of ‘permitted gambles’ in a quantum experiment. We can also define the subset of nonnegative gambles $${\mathscr {L}}^{\ge } _R:=\{g \in {\mathscr {L}}_R: \min g\ge 0 \}$$ and the subset of negative gambles $${\mathscr {L}}^{<} _R:=\{g \in {\mathscr {L}}_R: \max g < 0 \}$$.Footnote 16

### Remark 1

(Hidden-variable theories) The model we have just presented has originally been discussed by Holevo in [68, Sect. 1.7], who treats it as a hidden-variable model. For a single particle ($$m=1$$), Holevo shows that this model does not contradict the existing ‘no-go’ theorems for hidden-variables. For $$m\ge 2$$ entangled particles, ‘no-go’ theorems apply to this model; in [68, Supplementary 3.2] Holevo discusses a way this model could still be considered a hidden variable model. We will detail these points in Appendices A.2 and A.3.

### Remark 2

(The tensor product) In our setting the tensor product is ultimately a derived notion, not a primitive one, as it follows by the properties of m-quadratic forms (see Appendix A.2).

### Polynomial Inference and Agreement with Born’s Rule

For $$m=1$$ (a single particle), evaluating the nonnegativity of the quadratic form $$x^\dagger G x$$ boils down to checking whether the matrix G is positive semi-definite (PSD) and therefore the membership problem

\begin{aligned} g(x)=x^\dagger G x \;\overset{?}{\in }\; {\mathscr {L}}^{\ge } _R \end{aligned}
(9)

can be solved in polynomial time and so can be problem (4). This is no longer true for $$m\ge 2$$: indeed, in this case there exist polynomials of type (7) that are nonnegative, but whose matrix G is indefinite (it has at least one negative eigenvalue). Moreover, it turns out that problem (4) is not tractable:

### Proposition 1

() The problem of checking the nonnegativity of functions of type (7) is NP-hard for $$m\ge 2$$.

What to do? As discussed previously, we could change the meaning of ‘being nonnegative’ by considering a subset $$\varSigma ^{\ge }\subsetneq {\mathscr {L}}^{\ge }$$ for which the membership problem, and thus (4), is in P. For functions of type (7), we can extend the notion of nonnegativity that holds for a single particle to $$m>1$$ particles:

\begin{aligned} \varSigma ^{\ge }:=\{g(x_1,\ldots ,x_m)=(\otimes _{j=1}^m x_j)^\dagger G (\otimes _{j=1}^m x_j): G\ge 0\}. \end{aligned}
(10)

That is, the function is ‘nonnegative’ whenever G is PSD. Note that $$\varSigma ^{\ge }$$ is the so-called cone of Hermitian sum-of-squares polynomials (see Sect. A.4), and that in $$\varSigma ^{\ge }$$ the nonnegative constant functions take the form $$g(x_1,\ldots ,x_m)=c (\otimes _{j=1}^m x_j)^\dagger I (\otimes _{j=1}^m x_j)$$ with $$c\ge 0$$.

Now, consider any set of desirable gambles $${\mathcal {C}}$$ satisfying B0–B2 with the given definition of (10); this results in an algorithmic rationality theory that is precisely QT. In other words, from the algorithmic rationality axioms and the given definition of (10), we can derive the first postulate of QT (see for instance Postulate 1 in [70, p. 110]):

Associated to any isolated physical system is a complex vector space with inner product (that is, a Hilbert space) known as the state space of the system. The system is completely described by its density operator, which is a positive operator $$\rho$$ with trace one, acting on the state space of the system.

Indeed, although the possibility space $$\varOmega$$ is infinite (e.g., the ‘directions’ of the particle’s spins), the vector space of gambles $${\mathscr {L}}_R$$ is finite dimensional: any polynomial $$(\otimes _{j=1}^m x_j)^{\dagger }G(\otimes _{j=1}^m x_j) \in {\mathscr {L}}_R$$ can then be written as the inner product of a vector of complex coefficients, coming from the matrix G, and a vector of complex monomials: the elements of the matrix $$(\otimes _{j=1}^m x_j) (\otimes _{j=1}^m x_j)^{\dagger }$$ that constitute the basis of the vector space $${\mathscr {L}}_R$$. Therefore the dual space $${\mathscr {L}}^*_R$$ is finite dimensional too and corresponds to the space of linear operators $${\tilde{L}}: {\mathbb {C}}\rightarrow {\mathbb {C}}$$, whose basis is given by the elements of the matrix $${\tilde{L}}((\otimes _{j=1}^m x_j) (\otimes _{j=1}^m x_j)^{\dagger })$$ (where $${\tilde{L}}$$ is applied component-wise to $$(\otimes _{j=1}^m x_j) (\otimes _{j=1}^m x_j)^{\dagger }$$).

Said that, let $${\mathscr {G}}$$ be a finite set of assessments, and $${\mathcal {K}}$$ the deductive closure as defined by B1; it is not difficult to prove that the dual of $${\mathcal {K}}$$ is

\begin{aligned} {\mathcal {Q}}&=\{ \rho \in {\mathcal {S}} \mid Tr(G \rho ) \ge 0,~ ~\forall g \in {\mathscr {G}}\}, \end{aligned}
(11)

where $${\mathcal {S}}=\{ \rho \in {\mathscr {H}}^{n\times n} \mid \rho \ge 0,~~Tr(\rho )=1\}$$ is the set of all density matrices. As before, whenever the set $${\mathcal {C}}$$ representing Alice’s beliefs about the experiment is coherent, Eq. (11) means that desirability implies nonnegative ‘expected value’ for all models in $${\mathcal {Q}}$$. Note that in QT the expectation of g is $$Tr(G \rho )$$. This follows by Born’s rule, a law giving the probability that a measurement on a quantum system will yield a given result.

The agreement with Born’s rule is an important constraint in any alternative axiomatisation of QT. This is also the case of our theory, but in the sense that Born’s rule can be derived from it. In fact, in the view of a density matrix as a dual operator, $$\rho$$ is formally equal to

\begin{aligned} \rho ={\tilde{L}}\left( (\otimes _{j=1}^m x_j)(\otimes _{j=1}^m x_j)^\dagger \right) . \end{aligned}
(12)

### Example 1

Consider the case $$n=m=2$$, then

\begin{aligned} {\tilde{L}}\left( (x_1 \otimes x_2)^{\dagger } G (x_1 \otimes x_2)\right) =Tr\left( G {\tilde{L}}\left( (\otimes _{j=1}^2 x_j)(\otimes _{j=1}^2 x_j)^\dagger \right) \right) ; \end{aligned}

this follows by the linearity of the trace operator. The expression $${\tilde{L}}\left( (\otimes _{j=1}^2 x_j)(\otimes _{j=1}^2 x_j)^\dagger \right)$$ means that the operator $${\tilde{L}}$$ is applied component-wise to the elements of the matrix $$(\otimes _{j=1}^2 x_j)(\otimes _{j=1}^2 x_j)^\dagger$$:

\begin{aligned} {\tilde{L}}\left( (\otimes _{j=1}^2 x_j)(\otimes _{j=1}^2 x_j)^\dagger \right) = {\tilde{L}}\left( \left[ {\begin{matrix}x_{11} x_{11}^{\dagger } x_{21} x_{21}^{\dagger } &{} x_{11}^{\dagger } x_{12} x_{21} x_{21}^{\dagger } &{} x_{11} x_{11}^{\dagger } x_{21}^{\dagger } x_{22} &{} x_{11}^{\dagger } x_{12} x_{21}^{\dagger } x_{22}\\ x_{11} x_{12}^{\dagger } x_{21} x_{21}^{\dagger } &{} x_{12} x_{12}^{\dagger } x_{21} x_{21}^{\dagger } &{} x_{11} x_{12}^{\dagger } x_{21}^{\dagger } x_{22} &{} x_{12} x_{12}^{\dagger } x_{21}^{\dagger } x_{22}\\ x_{11} x_{11}^{\dagger } x_{21} x_{22}^{\dagger } &{} x_{11}^{\dagger } x_{12} x_{21} x_{22}^{\dagger } &{} x_{11} x_{11}^{\dagger } x_{22} x_{22}^{\dagger } &{} x_{11}^{\dagger } x_{12} x_{22} x_{22}^{\dagger }\\ x_{11} x_{12}^{\dagger } x_{21} x_{22}^{\dagger } &{} x_{12} x_{12}^{\dagger } x_{21} x_{22}^{\dagger } &{} x_{11} x_{12}^{\dagger } x_{22} x_{22}^{\dagger } &{} x_{12} x_{12}^{\dagger } x_{22} x_{22}^{\dagger }\end{matrix}}\right] \right) , \end{aligned}
(13)

where the monomials inside the above matrix constitute the basis of $${\mathscr {L}}_R$$ and $${\tilde{L}}:{\mathbb {C}}\rightarrow {\mathbb {C}}$$, so:

\begin{aligned} \rho := {\tilde{L}}\left( (\otimes _{j=1}^2 x_j)(\otimes _{j=1}^2 x_j)^\dagger \right) =\left[ \begin{matrix} \rho _{11} &{} \rho _{12} &{} \rho _{13} &{} \rho _{14}\\ \rho _{12}^{\dagger } &{} \rho _{22} &{} \rho _{23} &{} \rho _{24}\\ \rho _{13}^{\dagger } &{} \rho _{23}^{\dagger } &{} \rho _{33} &{} \rho _{34}\\ \rho _{14}^{\dagger } &{} \rho _{24}^{\dagger } &{} \rho _{34}^{\dagger } &{} \rho _{44} \end{matrix}\right] , \end{aligned}
(14)

$$\rho _{11}={\tilde{L}}(x_{11} x_{11}^{\dagger } x_{21} x_{21}^{\dagger })\in {\mathbb {C}}$$, $$\rho _{12}={\tilde{L}}(x_{11}^{\dagger } x_{12} x_{21} x_{21}^{\dagger } )\in {\mathbb {C}}$$, etcetera.

Hence, when a projection-valued measurement characterised by the projectors $$\varPi _1,\ldots ,\varPi _n$$ is considered, it holds that

\begin{aligned} {\tilde{L}}( (\otimes _{j=1}^m x_j)^\dagger \varPi _i (\otimes _{j=1}^m x_j))=Tr(\varPi _i {\tilde{L}}((\otimes _{j=1}^m x_j)(\otimes _{j=1}^m x_j)^\dagger ))=Tr(\varPi _i \rho ). \end{aligned}

Since $$\varPi _i\ge 0$$ and the polynomials $$(\otimes _{j=1}^m x_j)^\dagger \varPi _i (\otimes _{j=1}^m x_j)$$ for $$i=1,\ldots ,n$$ form a partition of unity, i.e.:

\begin{aligned} \sum _{i=1}^n (\otimes _{j=1}^m x_j)^\dagger \varPi _i (\otimes _{j=1}^m x_j)= (\otimes _{j=1}^m x_j)^\dagger I (\otimes _{j=1}^m x_j)=1, \end{aligned}

we have that

\begin{aligned} Tr(\varPi _i \rho )\in [0,1] \text { and } \sum _{i=1}^n Tr(\varPi _i \rho )=1, \end{aligned}

which is Born’s rule.

### Remark 3

(Discrete vs. continuous space probability) Quantum measurements are discrete: when we perform a measurement, we observe a detection along one of the directions $$\varPi _i$$. This phenomenon of quantisation is one of the major differences between quantum and classical physics. We took it into account in the choice of the framework, the possibility space being (only) the ‘directions’ of the particle’s spins and the measurement apparatus sensing only certain fixed ‘directions’ ($$x^\dagger \varPi _i x=x^\dagger v_iv_i^{\dagger } x$$ is a function of two ‘directions’ x and $$v_i$$). Despite its centrality, we want however to point out that quantisation is not the source of Bell-like inequalities and entanglement. As said before, this is because ‘quantum weirdness’ is intrinsic to any theory of algorithmic rationality as above, and is hence not confined to QT only.

It is often claimed that QT includes classical probability theory (CPT) as a special case, or better that QT includes discrete-space CPT.Footnote 17 However, as the possibility space $$\varOmega$$ is infinite (e.g., the ‘directions’ of the particle’s spins), in this paper when we speak about CPT (and compare it with QT), we mean continuous-space classical probability theory (in the complex numbers). Hence again, since both B1,B2 and A1,A2 are the same logical postulates parametrised by the appropriate meaning of ‘being negative/nonnegative’, the only axiom truly separating (continuous-space) classical probability theory from the quantum one is B0 (with the specific form of (10)), thus implementing the requirement of computational efficiency.

In other words, we claim that QT is ‘easier’ than CPT because, once the appropriate possibility space, observables and queries are specified, evaluating the consistency of the theory is NP-hard for CPT. In QT, we realise this clearly when we try to address the question of whether or not an experimentally generated state is entangled. We will discuss in Sect. 4.3 that determining entanglement of a general state is equivalent to proving the nonnegativity of a polynomial that, as we discussed in Proposition 1, is NP-hard. In fact, we can reformulate the entanglement witness theorem as the clash between the classical notion of coherence and A-coherence (see Theorem 2).

### Remark 4

(Truncated moment matrices vs. density matrices) In a single particle system of dimension n, $$\rho ={\tilde{L}}(x x^{\dagger })$$. In such case, $$\rho$$ can be interpreted as a truncated moment matrix, i.e., there exists a probability distribution on the complex vector $$x\in \varOmega$$ such that

\begin{aligned} \rho =\int _{x\in \varOmega } xx^{\dagger } d\mu (x). \end{aligned}
(15)

In fact, consider the eigenvalue-eigenvector decomposition of the density matrix:

\begin{aligned} \rho =\sum \limits _{i=1}^n \lambda _i v_iv_i^{\dagger }, \end{aligned}

with $$\lambda _i\ge 0$$ and $$v_i \in {\mathbb {C}}^{n}$$ being orthonormal. We can define the probability distribution

\begin{aligned} \mu (x)= \sum \limits _{i=1}^n \lambda _i \delta _{v_i}(x), \end{aligned}

where $$\delta _{v_i}$$ is an atomic charge (Dirac’s delta) on $$v_i$$. Then it is immediate to verify that

\begin{aligned} \int _{x\in \varOmega } x x^{\dagger }d\mu (x)=\sum \limits _{i=1}^n \lambda _i v_iv_i^{\dagger }=\rho . \end{aligned}

In Sect. 4.4, we will extend this result to separable states. Note also that a truncated moment matrix does not uniquely define a probability distribution, i.e., for a given $$\rho$$ there may exist two probability distributions $$\mu _1(x)\ne \mu _2(x)$$ such that

\begin{aligned} \rho =\int _{x\in \varOmega } xx^{\dagger } d\mu _1(x)=\int _{x\in \varOmega } xx^{\dagger } d\mu _2(x). \end{aligned}

This means that, if we interpret $$\rho$$ as a truncated moment matrix and thus defining via (15) a closed convex set of probabilities (more precisely charges), QT is a theory of imprecise probability . We will discuss more on this topic in Sect. A.3. In fact, Karr  has proved that the set of probabilities, which are feasible for the truncated moment constraint, e.g., $$\rho ={\tilde{L}}(x x^{\dagger })$$, is convex and compact with respect to the weak$$^*$$-topology. Moreover, the extreme points of this set are probabilities that have at finite number of distinct points of support (e.g., they are finite mixtures of Dirac’s deltas). A similar characterisation for POVM measurements is discussed in the QT context in .

The case of a many-particle system is discussed in the next sections.

### Entanglement

Entanglement is usually presented as a characteristic of QT. In this section we are going to show that it is actually an immediate consequence of algorithmic rationality.

To illustrate the emergence of entanglement from A-coherence, we verify that the set of desirable gambles whose dual is an entangled density matrix $$\rho _{e}$$ includes a negative gamble that is not in $$\varSigma ^{<}$$, and thus, although being logically coherent, it cannot be given a classical probabilistic interpretation.

In what follows we focus only on bipartite systems $$\varOmega _A \times \varOmega _B$$, with $$n=m=2$$. The results are nevertheless general.

Let $$(x,y) \in \varOmega _A \times \varOmega _B$$, where $$x=[x_1,x_2]^T$$ and $$y=[y_1,y_2]^T$$. We aim at showing that there exists a gamble $$h(x,y)=(x \otimes y)^{\dagger } H (x \otimes y)$$ satisfying:

\begin{aligned} \begin{aligned} Tr(H \rho _{e})&\ge 0 \text { and }\\ h(x,y)=(x \otimes y)^{\dagger } H (x \otimes y)&< 0 \text { for all } (x,y)\in \varOmega _A\times \varOmega _B.\\ \end{aligned} \end{aligned}
(16)

The first inequality says that h is desirable in $${\mathcal {T}}^\star$$. That is, h is a gamble desirable to Alice whose beliefs are represented by $$\rho _{e}$$. The second inequality says that h is negative and, therefore, leads to a sure loss in $${\mathcal {T}}$$. By B0–B2, the inequalities in (16) imply that H must be an indefinite Hermitian matrix.

Assume that $$n=m=2$$ and consider the entangled density matrix:

\begin{aligned} \rho _{e}=\frac{1}{2}\begin{bmatrix} 1 &{} 0 &{} 0 &{}1\\ 0 &{} 0 &{} 0 &{}0\\ 0 &{} 0 &{} 0 &{}0\\ 1 &{} 0 &{} 0 &{}1\\ \end{bmatrix}, \end{aligned}

and the Hermitian matrix:

\begin{aligned} H=\left[ \begin{matrix}0.0 &{} 0.0 &{} 0.0 &{} 1.0\\ 0.0 &{} -2.0 &{} 1.0 &{} 0.0\\ 0.0 &{} 1.0 &{} -2.0 &{} 0.0\\ 1.0 &{} 0.0 &{} 0.0 &{} 0.0\end{matrix}\right] . \end{aligned}

This matrix is indefinite (its eigenvalues are $$\{1, -1, -1, -3\}$$) and is such that $$Tr(H\rho _{e})=1$$. Since $$Tr(H\rho _{e})\ge 0$$, the gamble

\begin{aligned} \begin{aligned} (x \otimes y)^{\dagger } H (x \otimes y)&= - 2 x_{1} x_1^{\dagger } y_{2} y_2^{\dagger } + x_{1} x_2^{\dagger } y_{1} y_2^{\dagger } + x_{1} x_2^{\dagger } y_1^{\dagger } y_{2} + x_1^{\dagger } x_{2} y_{1} y_2^{\dagger } \\&\qquad + x_1^{\dagger } x_{2} y_1^{\dagger } y_{2} - 2 x_{2} x_2^{\dagger } y_{1} y_1^{\dagger }, \end{aligned} \end{aligned}
(17)

is desirable for Alice in $${\mathcal {T}}^\star$$.

Let $$x_i=x_{ia}+\iota x_{ib}$$ and $$y_i=y_{ia}+\iota y_{ib}$$ with $$x_{ia},x_{ib},y_{ia},y_{ib}\in {\mathbb {R}}$$, for $$i=1,2$$, denote the real and imaginary components of xy. Then

\begin{aligned} \begin{aligned} (x \otimes y)^{\dagger } H (x \otimes y)&=- 2 x_{1a}^{2} y_{2a}^{2} - 2 x_{1a}^{2} y_{2b}^{2} + 4 x_{1a} x_{2a} y_{1a} y_{2a} + 4 x_{1a} x_{2a} y_{1b} y_{2b} \\&\quad - 2 x_{1b}^{2} y_{2a}^{2} - 2 x_{1b}^{2} y_{2b}^{2} + 4 x_{1b} x_{2b} y_{1a} y_{2a} + 4 x_{1b} x_{2b} y_{1b} y_{2b}\\&\quad - 2 x_{2a}^{2} y_{1a}^{2} - 2 x_{2a}^{2} y_{1b}^{2} - 2 x_{2b}^{2} y_{1a}^{2} - 2 x_{2b}^{2} y_{1b}^{2}\\&=-(\sqrt{2}x_{1a}y_{2a}-\sqrt{2}x_{2a}y_{1a})^2-(\sqrt{2}x_{1a}y_{2b}-\sqrt{2}x_{2a}y_{1b})^2\\&\quad -(\sqrt{2}x_{1b}y_{2b}-\sqrt{2}x_{2b}y_{1b})^2 -(\sqrt{2}x_{2b}y_{1a}-\sqrt{2}x_{2a}y_{1b})^2<0. \end{aligned} \end{aligned}
(18)

This is the essence of the quantum puzzle: $${\mathcal {C}}$$ is A-coherent but (Theorem 1) there is no $${\mathscr {P}}$$ associated to it and therefore, from the point of view of Isaac, who holds a classical probabilistic interpretation, it is not coherent: in any classical description of the composite quantum system, x and y appear to be entangled in a way unusual for classical subsystems.

As previously mentioned, there are two possible ways out from this impasse: to claim the existence of either non-classical evaluation functionals or of negative probabilities. Let us examine them in turn.

1. (1)

Existence of non-classical evaluation functionals: From an informal betting perspective, the effect of a quantum experiment on h(xy) is to evaluate this polynomial to return the payoff for Alice. By Theorem 1, there is no compatible classical evaluation functional, and thus in particular no values $$x,y\in \varOmega _A \times \varOmega _B$$ such that $$h(x,y)=1$$. Hence, if we adopt this point of view, we have to find another, non-classical, explanation for $$h(x,y)=1$$. The following evaluation functional, denoted as $$ev(\cdot )$$, may do the job:

\begin{aligned} \text {ev}\left( \begin{bmatrix} x_1y_1\\ x_2y_1\\ x_1y_2\\ x_2y_2\\ \end{bmatrix}\right) =\begin{bmatrix} \tfrac{\sqrt{2}}{2}\\ 0\\ 0\\ \tfrac{\sqrt{2}}{2}\\ \end{bmatrix},~\text {which implies}~ \text {ev}\left( (x \otimes y)^{\dagger } H (x \otimes y)\right) =1. \end{aligned}

Note that $$x_1y_1=\tfrac{\sqrt{2}}{2}$$ and $$x_2y_1=0$$ together imply that $$x_2=0$$, which contradicts $$x_2y_2=\tfrac{\sqrt{2}}{2}$$. Similarly, $$x_2y_2=\tfrac{\sqrt{2}}{2}$$ and $$x_1y_2=0$$ together imply that $$x_1=0$$, which contradicts $$x_1y_1=\tfrac{\sqrt{2}}{2}$$. Hence, as expected, the above evaluation functional is non-classical. It amounts to assigning a value to the products $$x_iy_j$$ but not to the single components of x and y separately. Quoting Holevo in [68, Supplement 3.4]:

entangled states are holistic entities in which the single components only exist virtually.

2. (2)

Existence of negative probabilities: Negative probabilities are not an intrinsic characteristic of QT. They appear whenever one attempts to explain QT ‘classically’ by looking at the space of charges on $$\varOmega$$. To see this, consider $$\rho _e$$, and assume that, based on (12), one calculates:

\begin{aligned} \int \begin{bmatrix}x_{1} x_1^{\dagger } y_{1} y_1^{\dagger } &{} x_1^{\dagger } x_{2} y_{1} y_1^{\dagger } &{} x_{1} x_1^{\dagger } y_1^{\dagger } y_{2} &{} x_1^{\dagger } x_{2} y_1^{\dagger } y_{2}\\ x_{1} x_2^{\dagger } y_{1} y_1^{\dagger } &{} x_{2} x_2^{\dagger } y_{1} y_1^{\dagger } &{} x_{1} x_2^{\dagger } y_1^{\dagger } y_{2} &{} x_{2} x_2^{\dagger } y_1^{\dagger } y_{2}\\ x_{1} x_1^{\dagger } y_{1} y_2^{\dagger } &{} x_1^{\dagger } x_{2} y_{1} y_2^{\dagger } &{} x_{1} x_1^{\dagger } y_{2} y_2^{\dagger } &{} x_1^{\dagger } x_{2} y_{2} y_2^{\dagger }\\ x_{1} x_2^{\dagger } y_{1} y_2^{\dagger } &{} x_{2} x_2^{\dagger } y_{1} y_2^{\dagger } &{} x_{1} x_2^{\dagger } y_{2} y_2^{\dagger } &{} x_{2} x_2^{\dagger } y_{2} y_2^{\dagger }\end{bmatrix} d \mu (x,y)=\frac{1}{2}\begin{bmatrix} 1 &{} 0 &{} 0 &{}1\\ 0 &{} 0 &{} 0 &{}0\\ 0 &{} 0 &{} 0 &{}0\\ 1 &{} 0 &{} 0 &{}1\\ \end{bmatrix}. \end{aligned}
(19)

Because of Theorem 1, there is no probability charge $$\mu$$ satisfying these moment constraints, the only compatible being quasi-probabilities. Table 1 reports the nine components and corresponding weights of one of them:

\begin{aligned} \mu (x,y)=\sum \limits _{i=1}^{9}w_i\delta _{\{(x^{(i)},y^{(i)})\}}(x,y) ~~\text { with }~~ (w_i,x^{(i)},y^{(i)}) ~~\text { as in Table}~1. \end{aligned}
(20)

Note that some of the weights are negative but $$\sum _{i=1}^{9}w_i=1$$, meaning that we have an affine combination of atomic charges (Dirac’s deltas). Consider for instance the first monomial $$x_{1} x_1^{\dagger } y_{1} y_1^{\dagger }$$ in (12), its expectation w.r.t. the above charge is

\begin{aligned} \begin{aligned}&\int x_{1} x_1^{\dagger } y_{1} y_1^{\dagger } \left( \sum _{i=1}^{9}w_i\delta _{\{(x^{(i)},y^{(i)})\}}(x,y)\right) dxdy=\sum _{i=1}^{9}w_i x^{(i)}_{1} {x^{(i)}_1}^{\dagger } y^{(i)}_{1} {y^{(i)}_1}^{\dagger }\\&\quad = 0.4805 (-0.0963 - 0.6352\iota )(-0.0963 + 0.6352\iota )(-0.3727 \\&\qquad - 0.3899\iota )(-0.3727 + 0.3899\iota )\\&\qquad + 0.7459 (0.251 - 0.9665\iota )(0.251 + 0.9665\iota )(-0.1628 \\&\qquad + 0.561\iota )(-0.1628 - 0.561\iota )\\&\qquad +\dots \\&\qquad + 0.1755(-0.1255 - 0.3078\iota )(-0.1255 + 0.3078\iota )(0.0933 \\&\qquad - 0.4588\iota )(0.0933 + 0.4588\iota )=\frac{1}{2}. \end{aligned} \end{aligned}

The charge described in Table 1 is one among the many that satisfy (12) and has been derived numerically. Explicit procedure for constructing such negative-probability representations have been developed in [73,74,75,76].

Again, we want to stress that the two above paradoxical interpretations are a consequence of Theorem 1, and therefore can emerge when considering any instance of a theory of A-coherence in which the hypotheses of this result hold.

### Entanglement Witness

Do quantum and classical probability sometimes agree? Yes they do, but when at play there are density matrices $$\rho$$ such that Eq. (16) does not hold, and thus in particular for separable density matrices. We make this claim precise by providing a link between Eq. (16) and the entanglement witness theorem [77, 78].

We first report the definition of entanglement witness [79, Sect. 6.3.1]:

### Definition 2

(Entanglement witness) A Hermitian operator $$W \in {\mathscr {H}}^{n_1 \times n_2}$$ is an entanglement witness if and only if W is not a positive operator but $$(x_1 \otimes x_2)^{\dagger } W (x_1 \otimes x_2) \ge 0$$ for all vectors $$(x_1,x_2)\in \varOmega _1\times \varOmega _2$$.Footnote 18

The next well-known result (see, e.g., [79, Theorem 6.39, Corollary 6.40]) provides a characterisation of entanglement and separable states in terms of entanglement witness.

### Proposition 2

A state $$\rho _e$$ is entangled if and only if there exists an entanglement witness W such that $$Tr(\rho _e W ) < 0$$. A state is separable if and only if $$Tr(\rho _e W ) \ge 0$$ for all entanglement witnesses W.

Assume that W is an entanglement witness for the entangled density matrix $$\rho _e$$ and consider $$W'=-W$$. By Definition 2 and Proposition 2, it follows that

\begin{aligned} Tr(\rho _e W' ) > 0 \text { and } (x_1 \otimes x_2)^{\dagger } W' (x_1 \otimes x_2) \le 0. \end{aligned}
(21)

The first inequality states that the gamble $$(x_1 \otimes x_2)^{\dagger } W' (x_1 \otimes x_2)$$ is strictly desirable for Alice (in theory $${\mathcal {T}}^\star$$) given her belief $$\rho _e$$. Since the set of desirable gambles (B1) associated to $$\rho _e$$ is closed, there exists $$\epsilon >0$$ such that $$W'=W'-\epsilon I$$ is still desirable, i.e, $$Tr(\rho _e W' )\ge 0$$ and

\begin{aligned} (x_1 \otimes x_2)^{\dagger } W' (x_1 \otimes x_2) = (x_1 \otimes x_2)^{\dagger } W' (x_1 \otimes x_2) - \epsilon <0, \end{aligned}

where we have exploited that $$(x_1 \otimes x_2)^{\dagger } \epsilon I (x_1 \otimes x_2)= \epsilon$$. Therefore, (21) is equivalent to

\begin{aligned} Tr(\rho _e W' ) \ge 0 \text { and } (x_1 \otimes x_2)^{\dagger } W' (x_1 \otimes x_2) <0, \end{aligned}
(22)

which is the same as (16).

Hence, by Theorem 1, we can equivalently formulate the entanglement witness theorem as an arbitrage/Dutch book:

### Theorem 2

Let $${\mathcal {C}}=\{g(x_1,\ldots ,x_m)=(\otimes _{j=1}^m x_j)^\dagger G (\otimes _{j=1}^m x_j)\mid Tr(G{\tilde{\rho }})\ge 0\}$$ be the set of desirable gambles corresponding to some density matrix $${\tilde{\rho }}$$. The following claims are equivalent:

1. 1.

$${\tilde{\rho }}$$ is entangled;

2. 2.

$${{\,\mathrm{posi}\,}}({\mathcal {C}}\cup {\mathscr {L}}_R^{\ge })$$ is not coherent in $${\mathcal {T}}$$.

This result provides another view of the entanglement witness theorem in light of A-coherence. In particular, it tells us that the existence of a witness satisfying Eq. (21) boils down to the disagreement on rationality (coherence) between Isaac’s classical probabilistic interpretation and Alice’s theory $${\mathcal {T}}^\star$$, and therefore that whenever they agree it means that $$\rho _e$$ is separable. This connection explains why the problem of characterising entanglement is hard in QT: it amounts to proving the negativity of a function, which is NP-hard. We can also prove the following

### Corollary 1

Let $${\tilde{\rho }}$$ be separable, then $${\tilde{\rho }}$$ is a truncated moment matrix.

In other words, when $${\tilde{\rho }}$$ are separable, we have an agreement between the Isaac’s classical view and Alice’s theory $${\mathcal {T}}^\star$$ of rationality, and therefore we can give $${\tilde{\rho }}$$ a fully classical probabilistic interpretation by regarding it as a truncated moment matrix.

## A Theory of Algorithmic Rationality and Entanglement in the Reals

In this section we are going to present an example of entanglement in an A-coherent theory of probability that is different from QT. For this purpose, we consider two classical coins, which we denote as l (left) and, respectively, r (right), and define

\begin{aligned} \begin{bmatrix} \theta _1\\ \theta _2\\ \theta _3\\ 1-\theta _1-\theta _2-\theta _3 \end{bmatrix} =\text {Prob}\begin{bmatrix} H_lH_r\\ T_lH_r\\ H_lT_r\\ T_l,T_r\end{bmatrix}, \end{aligned}

where $$H_i,T_j$$ denote the outcome heads and, respectively, tails for the left or right coin. We consider the possibility space

\begin{aligned} \varOmega =\left\{ \theta \in {\mathbb {R}}^3: ~ \theta _1,\theta _2,\theta _3\ge 0, ~~1-\theta _1-\theta _2-\theta _3\ge 0 \right\} . \end{aligned}
(23)

Note that the following marginal relationships hold:

\begin{aligned} \begin{aligned} \theta _{H_l}=\text {Prob}(H_l)=\theta _1+\theta _3,~~\theta _{H_r}=\text {Prob}(H_r)=\theta _1+\theta _2. \end{aligned} \end{aligned}

As the space of gambles $${\mathscr {L}}_R$$, we consider the set of all polynomials of the unknowns $$\theta =[\theta _1,\theta _2,\theta _3]$$ of degree 2:Footnote 19

\begin{aligned} {\mathscr {L}}_R=\{g(\theta ): g(\theta ) \text { is a degree 2 polynomial}\}. \end{aligned}
(24)

For instance, these are two elements of $${\mathscr {L}}_R$$:

\begin{aligned} g_1(\theta )&=\theta _1^2-\theta _2^2+2\theta _1\theta _3+2\theta _2\theta _3 -\theta _1 -\theta _3, \end{aligned}
(25)
\begin{aligned} g_2(\theta )&=\theta _1 +\theta _2^2 +3 \theta _3. \end{aligned}
(26)

Evaluating the nonnegativity of polynomials in $${\mathscr {L}}_R$$ is in general NP-hard. Therefore, Alice may not have the computational resources to enforce full rationality, A0–A2, or, equivalently, to solve (4).

However, she can use a quick algorithm to prove a sufficient condition for a polynomial in $${\mathscr {L}}_R$$ to be nonnegative: a polynomial of $$\theta$$ is nonnegative in $$\varOmega$$ if its coefficients are nonnegative. For instance, under this criterion, Alice can easily verify that $$g_2(\theta )$$ is nonnegative.

### Proposition 3

Let $${\mathcal {C}}\in {\mathscr {L}}_R$$ be a set of desirable gambles satisfying B0, B1, with $${\mathscr {L}}_R$$ defined in (24) and $$\varSigma ^{\ge }$$ defined as follows:Footnote 20

\begin{aligned} \begin{aligned} \varSigma ^{\ge }=\Bigg \{ \sum \limits _{\alpha _i\ge 0, \alpha _1+\alpha _2+\alpha _3+\alpha _4\le 2} u_{{\alpha _1\alpha _2\alpha _3\alpha _4}} \theta _{1}^{\alpha _1}\theta _{2}^{\alpha _2}\theta _{3}^{\alpha _3}(1-\theta _1-\theta _2-\theta _3)^{\alpha _{4}} : u_{{\alpha _1\alpha _2\alpha _3\alpha _4}}\in {\mathbb {R}}^{\ge } \Bigg \}, \end{aligned} \end{aligned}
(27)

which is the cone of (multivariate) Bernstein’s polynomials of degree less than, equal to 2. A-coherence of $${\mathcal {C}}$$ (or equivalently B2) can be proven in polynomial time by solving a linear programming problem.

Therefore, the definition of nonnegativity (27) gives an algorithmic efficient way to assess rationality: linear programming.

Also in this case, we can define the dual operator $${\tilde{L}}$$. First of all, observe that the vector of monomials $$b(\theta )=[1, \theta _1, \theta _2, \theta _3, \theta _1\theta _2 , \theta _1\theta _3 , \theta _2\theta _3, \theta _1^2, \theta _2^2, \theta _3^2]$$ constitues a basis for $${\mathscr {L}}_R$$ in (24). Therefore, the dual space $${\mathscr {L}}^*_R$$ corresponds to the space of linear operators $${\tilde{L}}: {\mathbb {R}}\rightarrow {\mathbb {R}}$$, whose basis is given by the elements of the matrix $${\tilde{L}}(b)$$, where $${\tilde{L}}$$ is applied component-wise to the elements of $$b(\theta )$$. The dual of an A-coherent set of desirable gambles $${\mathcal {C}}$$ is

\begin{aligned} {\mathcal {C}}^\circ =\left\{ L \in {\mathsf {S}} \mid L(g)\ge 0, ~\forall g \in {\mathscr {G}}\right\} , \end{aligned}
(28)

where $${\mathsf {S}}=\{ {\tilde{L}} \in {\mathscr {L}}_R^* \mid {\tilde{L}}(1)=1,~{\tilde{L}}(g)\ge 0~\forall g \in \varSigma ^{\ge }\}$$ is the set of states.

Consider for instance the state:

\begin{aligned} \begin{aligned} {\tilde{L}}(\theta _1)=1/3&{\tilde{L}}(\theta _1^2)&=1/3\\ {\tilde{L}}(\theta _2)=1/6&{\tilde{L}}(\theta _2^2)&=0\\ {\tilde{L}}(\theta _3)=1/6&{\tilde{L}}(\theta _3^2)&=0\\ {\tilde{L}}(\theta _1\theta _2)=0&{\tilde{L}}(\theta _1\theta _3)&=0\\ {\tilde{L}}(\theta _2\theta _3)=1/6&{\tilde{L}}(1)&=1,\\ \end{aligned} \end{aligned}
(29)

which, as it can be verified, belongs to $${\mathsf {S}}$$. We aim at showing that there exists a gamble $$h \in {\mathscr {L}}_R$$ such that:

\begin{aligned} \begin{aligned} {\tilde{L}}(h)&\ge 0 \text { and }\\ h(\theta )&< 0 \text { for all } \theta \in \varOmega .\\ \end{aligned} \end{aligned}
(30)

Consider the polynomial gamble:

\begin{aligned} h(\theta )=g_1(\theta ) -\epsilon , \end{aligned}

with $$\epsilon >0$$ and $$g_1$$ defined in (26). It can be shown that $$h(\theta )\le -\epsilon$$ and so the polynomial is negative. However, its ‘expectation’ w.r.t. the state (29) is equal to

\begin{aligned} {\tilde{L}}(h)={\tilde{L}}(\theta _1^2)-{\tilde{L}}(\theta _2^2)+{\tilde{L}}(2\theta _1\theta _3)+{\tilde{L}}(2\theta _2\theta _3) -{\tilde{L}}(\theta _1) -{\tilde{L}}(\theta _3)-{\tilde{L}}(\epsilon ) =\frac{1}{6}-\epsilon \ge 0. \end{aligned}

Therefore, we have violated an inequality that holds in classical probability ($$E(h)\le -\epsilon$$ in $${\mathcal {T}}$$), although the set of desirable gambles

\begin{aligned} {\mathcal {C}}=\{g \in {\mathscr {L}}_R \mid {\tilde{L}}(g)\ge 0\}, \end{aligned}

with $${\tilde{L}}$$ defined in (29), is logically consistent in $${\mathcal {T}}^\star$$ (A-coherent). This is the essence of Bell’s type inequalities: the quantum weirdness that is also present in this example.

It is then possible  to set up a thought experiment where two coins are drawn from a bag in the state (29). If we give the left coin to Alice and the right coin to her friend Bob as depicted in Fig. 3, then we can show that after the coins move apart, there are ‘matching’ correlations between the output of their toss. That is, if Alice measures (through a toss) the bias of one coin, then she can predict with certainty the outcome of the measurement (toss) on the other coin. This correlation cannot be explained classically, because there does not exist any classical correlation model that can violate the Bell’s type inequality (30). We have entanglement!

## Discussions

This paper grew out of our desire to understand QT, in the sense of giving it a meaning clear to us. We have been favoured in this by the fact that we have quite a strong background on the foundations of probability, and QT, mathematically, can be regarded as a generalised theory of probability. But, given this, why is probability generalised in such a way, and what does it mean?

We believe that the present paper, without aiming at reconstructing QT, provides a new way to explain the differences between classical and quantum probability: the algorithmic intractability of classical probability theory contrasted to the polynomial-time complexity of QT.

We have obtained this result in a setting that is more general than QT itself. Our ‘weirdness theorem’ establishes that the weirdness of QT is not exclusive to QT: it appears in any probabilistic theory that is (i) logically consistent and (ii) computationally bounded.Footnote 21 QT is just a special case, in the same way as our theory of Bernstein polynomials is another special case.

Yet, our result does talk in particular of QT. And hence it is interesting to know, for one thing, that QT is logically consistent, in the sense that it is a mathematical theory that cannot be proven inconsistent from the inside, by Alice. But it is actually inconsistent from the ‘outside’, i.e., from the point of view of our external observer Isaac that has unbounded computational capabilities, who, in other words, identifies rationality with the logical consistency of classical probability theory. This is the essence of the clash between classical and quantum physics. It also explains why QT is so peculiarly hard for us to grasp: because to classic eyes there is a degree of incoherence in it; and we tend be able to actually understand only logically consistent theories or ideas.

We believe that such a degree of incoherence is also the reason why we should abandon our attempt to reconcile traditional physics with quantum theory. In our narration, such an abandonment is embodied by the metaphor of a computer that ‘runs the universe’. This is not a new idea at all . However, it is new in the sense that the computer has limits due to the algorithmic nature of its tasks; and this is the reason for the weirdness of QT. Stated differently, what follows from this work, in our view, is that there is room for the idea of a more fundamental reality than classical physics, a reality that is just computational. It is by detaching computation from classical physics, in such a way, that we can finally have a solid grip on the meaning of QT and eventually being able to identify the specific features of our world that ground its use.

In order to hold onto some purely physical intuition, instead, one might want to consider for instance the many-worlds interpretation of QT , as many physicists do nowadays. It is certainly a fascinating view of QT, of which we feel the appeal. However, we perceive also the discomfort of having to embrace an interpretation that appears to require an incommensurably huge, and possibly infinite, amount of resources in order to have a universe that branches continuously in multiple copies of itself. Our own algorithmically bounded theory is much more parsimonious. It tells us that we can implement a quantum world in polynomial time, by definition, and such a world would obey the usual axioms of QT: Bob might then as well believe that he is living in one of many worlds, but he would just be wrong. So should we, as entities of our universe, really go as far as postulating the existence of many worlds in presence of such a more parsimonious alternative? Is it not there any Occam’s razor issue at stake here?

Of course one could still criticise our appeal to a more fundamental algorithmic reality on the basis of our postulating the existence of a computer that executes the universe. We have been careful in referring to this as a metaphor, however: in that it need not be a computer that someone has built and in particular there is no need of a programmer. It can simply be another level of reality, which can be interpreted as a computer;Footnote 22 in a sense, our picture only suggests that there can be more levels of reality, one nested into the other.

One might also wonder why we humans perceive the quantum-physical clash given that we, as Alice, are subjects within the quantum theory—in our narrative the inconsistencies of QT are observed from Isaac’s point of view, externally to the theory. The explanation that we give to ourselves about this point is that we are used to the illusion of living in a classical universe. This is just in our minds, however, as we cannot make any physical experiment that reveals an actual inconsistency in our wonderland. And yet, we believe that this illusion can be explained within our framework: classical rationality emerges from algorithmic rationality when we consider the joint state of a system of many identical particles. We plan to address this issue in future work.

Finally, we think that the foundation of generalised probability theory via algorithmic rationality provided in this paper could possibly be useful outside the context of QT, for instance in decision theory. We also plan to address this research direction in future work.