1 Introduction

In a series of recent papers, quantum probabilities are discussed as providing an alternative to classical probability for understanding cognition (e.g., Aerts 2009; Aerts et al. 2005; Blutner 2009; Blutner et al. 2013; Bruza et al. 2009; Busemeyer and Bruza 2012; Busemeyer et al. 2006; Conte et al. 2008; beim Graben 2004; Gabora and Aerts 2002; Kitto 2008; beim Graben 2014). In considerable detail, these authors point out several cognitive phenomena of perception, decision and reasoning that cannot be explained on the basis of classical probability theory, and they demonstrate how quantum probabilities can account for these phenomena. An obvious way to downplay this chain of arguments is by demonstrating that besides classical probability and quantum probability models, alternative approaches are possible that could also describe the phenomena under study without using the strange and demanding instruments of quantum probability. This often happens in the scientific context of bounded rationality where alternative proposals are developed. For instance, one could argue that the so-called conjunction puzzle can be resolved by simple heuristics (Gigerenzer 1997), and the well-known question ordering effects by query theory (Johnson et al. 2007).

A general strategy to encounter such criticism is to look for a universal motivation of quantum probabilities which is based on fundamental (architectural) properties of the area under investigation. As Kuhn (1996) clarified, such basic assumptions constituting a theoretical paradigm normally cannot be justified empirically. Basic assumptions which concern the general architecture of the theoretical system are referred to as design features. Using a term that is common in the generative linguistic literature (Chomsky 1995, 2005), we shall call properties that are consequences of such design features as applying with (virtual)conceptual necessity. Hence, a central aim of this article is to demonstrate the conceptual necessity of the basic formalism of quantum mechanics for modelling concepts and their combination (including a description of vagueness and typicality).

In the present literature, there are several approaches that seek for a general justification of quantum probabilities in the context of cognitive science. For example, Kitto (2008) considers very complex systems such as the growth and evolution of natural languages and other cultural systems and argues that the description of such systems cannot be separated from their context of interaction. She argues that quantum interaction formalisms provide a natural model of these systems “because a mechanism for dealing with such contextual dependency is inbuilt into the quantum formalism itself” (Kitto 2008, p. 12). Hence, the question of why quantum interaction is necessary in modelling cognitive phenomena is answered by referring to its nature as a complex epistemic system.

In their recent book, Busemeyer and Bruza (2012), give several arguments why quantum models are necessary for cognition. Some arguments relate to the cognitive mechanism of judgments. Judgments normally do not take place in definite situations. Rather, judgments create the context where they take place. This is the dynamic aspect of judgments also found in dynamic models of meaning (beim Graben 2014). Another is the logical aspect. The logic of judgments does not obey classical logic. Rather, the underlying logic is very strange with asymmetric conjunction and disjunction operations. When it comes to considering probabilities and conditioned probabilities the principle of unicity is violated, i.e. it is impossible to assume a single sample space with a fixed probability distribution for judging all possible events. Other arguments relate to the problem of compositionality in cognitive semantics (Blutner et al. 2003; de Hoop et al. 2007; Spenader and Blutner 2007). In Sects. 24 we will discuss some of these arguments.

Another line of argumentation seeks to answer the question of “why quantum models of cognition” by speculating about implications for brain neurophysiology. In the taken algebraic approach, even classical dynamical systems such as neural networks, could exhibit quantum-like properties in the case of coarse-graining measurements, when testing a property cannot distinguish between epistemically equivalent states (beim Graben 2004). Beim Graben and Atmanspacher (2009) used this “epistemic quantization” for proving the possibility of incompatible properties in classical dynamical systems. In neuroscience, most measurements, such as electroencephalography or magnetic resonance imaging, are coarse-grainings in this sense. Thus, the quantum approach to cognition has direct implications for brain neurophysiology, without needing to refer to a “quantum brain”, as recently indicated by Pothos and Busemeyer (2013). A novel application of this idea using Hebbian neurodynamics as an underlying classical system to describe emerging properties that exhibit quantum-like traits is given by de Barros and Suppes (2009), de Barros (2012), Large (2010), and by others.

As far as we can see, none of the foundational programs of grounding quantum cognition explicitly refers to the logical programs of theoretical physics that aim to ground all physical processes using the perspective of “operational realism”. In the following, we will come with a new proposal of answering the why question of quantum cognition, one that is explicitly based on operational realism. Mainly the program developed by the “Geneva school” (Jauch 1968; Piron 1972, 1976) is of special interest here:

According to these authors, a “rational” reconstruction of Quantum Mechanics should be stated in terms of the actual operational meaning of the fundamental quantum mechanical concepts and principles, i.e. in terms of outcomes of possible experiments (“answers” to “questions”) that could in principle be performed on a given physical system. This approach might be called “operational-realist”, to distinguish it from the “pure” operationalism of neo-positivists and instrumentalists (Baltag and Smets 2011, p. 285)

The basic idea of the Geneva school (borrowed from Mackey 1963) is to find a general and abstract formulation of quantum mechanics without explicitly starting from the structures of a Hilbert space. Instead, the idea is to justify the use of Hilbert space via a representation theorem (for the historical details, see Smets 2011). Further research by Foulis, Randall and colleagues (Foulis et al. 1983; Foulis and Randall 1972; Randall and Foulis 1973; Foulis 1999) substantiates this point.

Summarizing, the present article makes a careful distinction between phenomenological research and foundational research. In the first part of this article (Sects. 24) we consider several puzzles of bounded rationality, and we argue that the present account of quantum cognition—taking quantum probabilities rather than classical probabilities—does not automatically provide a deeper understanding and a true explanation of these puzzles. The reason is that the quantum idea introduces several new parameters which possibly can be fitted to empirical data but which do not necessarily explain them. Hence, the phenomenological research has to be augmented by responding to deeper foundational issues. In the second part (Sects. 57) of this article, we aim to illustrate how present progress in the foundation of quantum theory can help to answer the foundational questions of quantum cognition. This includes the opportunity of interpreting the free parameters which are pure stipulations in the quantum probabilistic framework.

Here is a more detailed outline of this article. In Sect. 2, we will review the basic idea of bounded rationality in cognitive science and we will outline prospect theory as a basic approach utilizing this idea. Further, we make clear why most of this work can best be characterized as “phenomenological research”, i.e. research that is related to the precise description of different phenomena without claiming a general, explanatory value. Section 3 introduces a broader series of puzzles that are discussed in the context of bounded rationality and argues that prospect theory and its heuristic follow-ups are of limited value only when it comes to ask for explanatory solutions. Section 4 introduces quantum probability as a new way to handle the puzzles and discusses the important role of interference effects. Unfortunately, we will see that within this field one and the same problem is often handled in different ways. In addition, the explanatory value of particular treatments is not always convincing. For that reason, it is essential to ask the foundational question, as done in the second part. This part starts with Sect. 5, which explains the basic distinction between foundational and phenomenological research programs. Further, this section makes a historical note about complementarity and it illustrates that there are many interpretations of complementarity in the historical context. It also argues for an epistemic interpretation of complementarity in quantum cognition. In Sect. 6, we explain the foundational issue from the operational perspective of Piron, Foulis, and Randall, and give a closely related formulation in terms of symbolic dynamics. Taking the lead of symbolic dynamics, we discuss several ideas that are useful for justifying complementarity in cognitive systems. These ideas include the implications of coarse-graining, epistemic accessibility as well as the new idea of autoepistemic accessibility. We argue that the foundation of quantum cognition is due to bounded rationality in the sense of limited (auto)epistemic resources a cognitive agent may deploy in a certain decision situation. Finally, in Sect. 7, we discuss some empirical consequences of the present foundational perspective and we draw some general conclusions.

2 Bounded rationality and cognitive science

Early models of cognitive science are based on general ideas of rationality as developed in logic, probability theory and decision theory. For example, researchers of cognitive science have borrowed the idea of rational decisions from mathematical economics and game theory (e.g., Von Neumann and Morgenstern 1944; Savage 1954). The classical idea is to assume (i) that rational agents are expected utility maximizers (using the mathematical conception of expected utility based on a classical probability measure), and (ii) that they maximize their expected utility with respect to the underlying economic model. Attempts to overcome the theoretical shortcomings of the classical attempt have resulted in the development of the bounded rationality project, originally proposed by Herbert Simon (1955). Simon claimed that we have to replace the “global rationality of the economic man” with a kind of behaviour that is compatible with the boundedness of the human decision maker and the particular kinds of environments in which such organisms exist. Boundedly rational agents experience limits in formulating and solving complex problems and in processing (receiving, storing, retrieving, transmitting) information. There is a number of dimensions along which “classical” models of rationality can be made more realistic without giving up rigorous formalization. A good example is prospect theory (Kahneman and Tversky 1979):

In the classical theory, the utility of an uncertain prospect is the sum of the utilities of the outcomes, each weighted by its probability. The empirical evidence reviewed above suggests two major modifications of this theory: (1) the carriers of value are gains and losses, not final assets; and (2) the value of each outcome is multiplied by a decision weight, not by an additive probability. The weighting scheme used in the original version of prospect theory and in other models is a monotonic transformation of outcome probabilities. (Tversky and Kahneman 1992)

Formally, the difference between the classical scheme and prospect theory is shown here:

  1. (1)

    a. \(U\left( f \right) =\Sigma _\mathrm{i} \hbox { p}_\mathrm{i} \cdot u\left( {\hbox {x}_\mathrm{i} } \right) \)

          b. \(U\left( f \right) =\Sigma _\mathrm{i}\, \upgamma \left( {\hbox {p}_\mathrm{i} } \right) \cdot v\left( {\hbox {x}_\mathrm{i} } \right) \)

In the classical scheme of utility theory (1a), prospects or gambles \(f = (\hbox {x}_{1},\hbox { p}_{1}; {\ldots },\hbox { x}_{\mathrm{n}},\hbox { p}_{\mathrm{n}})\) are contracts that yield (monetary) outcome \(\hbox {x}_{\mathrm{i}}\) with probability \(\hbox {p}_{\mathrm{i}}\, (\Sigma _\mathrm{i}\hbox { p}_\mathrm{i} =1)\). The utility \(u(\hbox {x}_{\mathrm{i}})\) is a direct expression of the monetary outcome \(\hbox {x}_{\mathrm{i}}\).The overall utility of a prospect f, denoted by U(f), is the expected utility of its outcomes \(\hbox {x}_{\mathrm{i}}\).

It is convenient, to illustrate a prototypic decision problem as pictured in Fig. 1. Depending on the chosen utility function, the figure illustrates both the classical scheme and prospect theory.

Fig. 1
figure 1

Prototypic decision problem. An agent has to decide between two prospects, a and b, leading to payoffs \(x_{1}(a)\) and \(x_{2}(a)\) for prospect a and \(x_{1}(b)\) and \(x_{2}(b)\) for prospect b. In classical decision theory, prospect a is chosen over prospect b if the classical utility U(a) is bigger than the classical utility U(b). Note that sometimes more than two payoffs are possible for a given prospect. In some special cases, where only one possibility is open, one payoff is present only for one of the given prospects

Prospect theory calculates the overall value of a prospect as shown in (1b). Hereby, the scale v assigns to each outcome \(\hbox {x}_{\mathrm{i}}\) a number which reflects the subjective value of that outcome utilityFootnote 1. The second scale, \(\upgamma \), associates with each probability \(\hbox {p}_{\mathrm{i}}\) a decision weight \(\upgamma (\hbox {p}_{\mathrm{i}})\), which reflects the impact of \(\hbox {p}_{\mathrm{i}}\) on the overall value of the prospect. Than transformation \(\upgamma \) translates standard probabilities into numbers that do not represent a probability measure but still are normalized (i.e., \(\upgamma (0)=0, \,\upgamma (1)=1\)). The sum of \(\upgamma \) (p) and \(\upgamma \) (1-p) is typically less than unity. The function \(\upgamma \) is a nonlinear function, which is formed by reference points eliciting the nonlinear deformations:

For probability, there are two natural reference points—certainty and impossibility—that correspond to the endpoints of the scale. Therefore, diminishing sensitivity implies that increasing the probability of winning a prize by .1 has more impact when it changes the probability of winning from .9 to 1.0 or from 0 to .1 than when it changes the probability from, say, .3 to .4 or from .6 to .7. This gives rise to a weighting function that is concave near zero and convex near one. (Tversky and Fox 1995, p. 748)

A simple example should illustrate the difference between standard utility theory and prospect theory (cf. Kahneman and Tversky 1979). Which of the following prospects would you prefer?


50% chance to win 1000, 50% chance to win nothing; i.e. a = (1000, .5; 0, .5). In this case \(U(a) = .5\, u(1000)\)


450 for sure; i.e. \(b = (450,1.)\). In this case \(U(b)=u\) (450)

In case of using classical utility theory (assuming linear utility), people should prefer prospect a over prospect b since in the first case the expected utility is higher. In other words, we expect \(u(1000)/u(450) > 2\) assuming a linear function for u. Several forms of such questionnaires were constructed and in each case (with outcomes referring to Israeli currency) the majority of the people voted in the opposite way.

In order to analyse why classical decision theory fails we have to consider two other prospects with the same outcomes but with different probabilities:


10 % chance to win 1000, 90 % chance to win nothing; i.e. c = (1000,.1; 0, .9). In this case \(U(c) = .1\, u\)(1000)


20 % chance to win 450, 80% chance to win nothing; i.e. d = (450, .2; 0, .8). In this case \(U(d) = .2\, u\)(450)

While in the former case, classical decision theory predicts that rational agents prefer b over a iff \(u(1000)/u(450) < 2\), in the latter case the theory predicts that rational agents prefer d over c iff exactly the same condition holds (since the condition 0.2 \(u(450) > .1 u(1000)\) is equivalent to the former condition \(u(1000)/u(450) < 2\)). Interestingly, the majority of people who took place in corresponding experiments preferred c over d in the latter case. As a consequence, the classical utility function is not able to describe the empirical data even in case of using a nonlinear utility function v. Hence, the assumed probabilities have to be corrected, by taking the function \(\upgamma \) into account as stated in prospect theory. In order to account for the empirical data we have to state the following empirical condition: \(\upgamma (1)/ \upgamma (.5) > \upgamma (.2)/ \upgamma \)(.1). Taking the reference points 0 and 1 into account as argued above, it is plausible to assume that \(\upgamma (1) = 1\) and \(\upgamma (.5) \approx .5\); hence, \(\upgamma (1)/ \upgamma (.5) \approx 2\). Further, because of the concavity of the \(\upgamma \)-function near zero (effect of overweighting small probabilities) it is plausible that \(\upgamma (.2)/ \upgamma (.1) < 2\) and the empirically required condition is satisfied.

At this point it should be obvious how the framework of bounded rationality is put into action: People are assumed to evaluate their losses and gains and to judge their probabilities by assuming certain heuristics. The model is descriptive but not explanatory. It tries to model real life choices by introducing several heuristically motivated parameters that can be fitted with the empirical data. Looking at the huge literature on recent approaches to bounded rationality (e.g. Gigerenzer and Selten 2001) suggests an adaptive toolbox of different approaches and models to handle the list of phenomena headed under the same title. From an explanatory point of view such an approach is not really encouraging, and a more systematic approach would be highly welcome. Unfortunately, there is no consensus on the definition of bounded rationality or what the critical questions of the project are (Rubinstein 1998).

The particular puzzle we have used above to motivate prospect theory is called the Allais puzzle (referring to an article by Allais 1953). In prospect theory, the puzzle is solved by using a single uniform function \(\upgamma \) that models the deformation of probabilities by certain reference points. However, there are many other puzzles where a similar solution by using such a uniform function \(\upgamma \) is questionable. In the next section we describe some of these puzzles and make clear that a more systematic approach is desired—one that deals with the origin of probabilities in a cognitive context.

Of course, we do not claim that all problems discussed within the framework of bounded rationality can be solved in a uniform way and each kind of heuristic component is obsolete. However, for a series of puzzles there seems to be a systematic and explanatory way to account for them—one that is quite different from the ideas underlying prospect theory and its follow-ups (e.g., Tversky and Kahneman 1992). The approach that we are proposing here is exploiting quantum principles following several recent proposals (Aerts 1982, 2009; Blutner 2009; Busemeyer and Bruza 2012; Conte 1983; Khrennikov 2010; Yukalov and Sornette 2010).

3 More puzzles

3.1 The Ellsberg puzzle

In handling the Allais paradox it has been assumed that there are specified probabilities accessible to the decision maker. Prospect theory describes how these probabilities are deformed and a decision is made on the basis of (1b). However, in many real decision situations the probabilities are unknown and not specified in advance (in the case of ignorance). Ellsberg (1961) describes what can happen in such a situation. Assume a ball will be randomly sampled from an urn that contains red (R), green (G) and blue (B) balls and the actor can win a certain payoff if she correctly predicts the colour of the ball. The actor is told that the box contains 300 balls, exactly 100 of them are red and the remaining ones are green or blue but the exact proportion of green to blue balls is not known. Consider a first pair of choices, between action R (voting for red) and G (voting for green). Most respondents chose R, possibly because they assume the probability that red will be taken is 1/3 but the probability for green is unknown (bounded only in the interval from 0 to 2/3). This example illuminates the distinction between a risky decision under uncertainty and a decision based on ignorance (cf. Russell and Norvig 1995). Consider now the alternative choice between \(\overline{R}\) (voting for green or blue) and \(\overline{G}\) (voting for red or blue). For a similar reason, most respondents prefer \(\overline{R}\) over \(\overline{G}\) in this case.

It is easy to see that the pattern \(\{\hbox {R}\succ G, \overline{R}\succ \overline{G}\}\) is incompatible with classical expected utility theory. Assume a payoff is $X in case of choosing the correct colour and $0 in the other case. Then in classical utility \(R \succ G\) means \(\hbox {P}(R) \cdot u(\$\hbox {X}) >\hbox { P}(G) \cdot u(\$\hbox {X})\), i.e. \(\hbox {P}(R) >\hbox { P}(G)\). On the other hand \(\overline{R}\succ \overline{G}\) means \(\hbox {P}(G\vee B) \cdot u(\$\hbox {X}) >\hbox { P}(R\vee B) \cdot u(\$\hbox {X})\), i.e. \(\hbox {P}(G\vee B) >\hbox { P}(R\vee B)\). Assuming additivity of classical probabilities (for all possible events) we get \(\hbox {P}(G)+\hbox {P}(B) >\hbox { P}(R) +\hbox { P}(B)\), i.e. \(\hbox {P}(G) >\hbox {P}(R)\). The latter conflicts with the earlier assumption that \(\hbox {P}(R) > \hbox {P}(G)\).

Tversky and Fox (1995) consider a modification of prospect theory in order to resolve this Ellsberg puzzle. Obviously, when the probabilities are unknown, we cannot describe decision weights as a simple transformation \(\upgamma \) of the probability scale as in (1b). Instead, Tversky and Fox (1995) introduce a weighting function W operating directly on the algebra of events. This weighting function realizes what is called a capacity in theories of reasoning with uncertainty (Halpern 2003). In short, it is a normalized function (assigning 0 to the impossible event and 1 to the certain event) that satisfies monotonicity: if \(A \subseteq B\) then \(W(A) \le W(B)\). Capacities do not require additivity. I.e., the Kolmogorovian assumption that \(W(A \cup B)=W(A)+W(B)\) for incompatible events A and B is normally violated. However, a condition called subadditivity is postulated for W:

  1. (2)

    \(W(A)+W(B) \le W(A\cup B)\), for all incompatible events A and B.

Using the function W instead of a deformed probability function, the new expression for the expected utility of a prospects for the events \(E_{\mathrm{i}}\) with outcomes \(x_{\mathrm{i}}\) becomes

  1. (3)

    \(U(f)=\Sigma _{\mathrm{i}} W(E_{\mathrm{i}})\cdot v(x_{\mathrm{i}})\)

Then the scenario \(\{\hbox {R}\succ G, \overline{R}\succ \overline{G}\}\) is equivalent to the conditions \(\{W(R) > W(G), W(G \cup B) > W(R \cup B)\}\). If W is a subadditive capacity, then the latter condition can be satisfied. For instance, assume \(W(R) = 1/3,\, W(G) = 0, \,W(G\cup B) = 2/3\), and \(W(R\cup B) = 1/3\}\). Obviously, these stipulations satisfy the two inequalities. Further, it can be seen that these stipulations define a special case of a subadditive capacity which is called an inner measure (Halpern 2003).

Of course, this account is not really explanatory since the corresponding assumptions are not motivated independently. Further, one may wonder how ignorance and proper uncertainty differ in principle and the answer is rather unclear. In Sect. 4, we will come back to this issue.

3.2 The disjunction effect

The disjunction effect (Tversky and Shafir 1992) occurs when conditioned decisions are considered. It is closely connected to violations of the ‘sure-thing principle’, one of the basic claims made by a (classically) rational theory of decision making. In decision making, this principle is just a psychological version of the law of total probability, which we will explain later. Let us assume that a decision maker prefers option B over option \(\overline{B}\) when knowing that event A occurs and also when knowing that event A does not occur. Then the ‘sure-thing principle’ claims that the decision maker should prefer B over \(\overline{B}\) when not knowing whether A occurs or not. If the decision maker refuses B (or prefers \(\overline{B}\)), we have a violation of this principle.

In everyday reasoning, human behaviour is not always consistent with the sure thing principle. For example, Tversky and Shafir (1992) reported that more students would purchase a non-refundable Hawaiian vacation if they were to know that they had passed or failed an important exam, compared to a situation where the exam outcome was unknown. Specifically, \(\hbox {P}(B|A)= 0.54,\hbox { P}( B|\overline{A}) = 0.57\), and \(\hbox {P}(B) = 0.32\), where B stands for the event of purchasing a Hawaiian vacation, A for the event of passing the exam, \(\overline{A}\) for the event of not passing the exam, and P for the averaged judgments of probability. Disjunction fallacies are fairly common in behaviour (Busemeyer and Bruza 2012).

Classical probability theory does not allow patterns such as \(\{\hbox {P}(B/A) > 1/2,\hbox { P}(B{\vert }\overline{A})> 1/2,\hbox {P}(B) \le 1/2\}\). This is a simple consequence from the law of total probability that can be stated as follows:

  1. (4)

    \(\hbox {P}(B) = \hbox {P}(B{\vert }A) \cdot \hbox {P}(A) + \hbox {P}(B{\vert }\overline{A})\cdot \hbox {P}(\overline{A})\).

If we assume that \(\hbox {P}(B{\vert }A) \,{>}\, 1/2\) and \(\hbox {P}(B{\vert }\overline{A}) > 1/2\), then we get \(\hbox {P}(B) > 1/2\) as a consequence. This conflicts with the assumption \(\hbox {P}(B)\le 1/2\) comprising the disjunction fallacy.

It is helpful to examine how the law (4) of total probability arises. We require three assumptions. First, we assume that the underlying algebraic structure of events is Boolean. That means essentially, that we have distributivity. In the present case, distributivity allows us to derive \(B=BA\cup B \overline{A}\). Second, we assume that probability is an additive measure function. In particular, we have \(\hbox {P}(B)=\hbox {P}(BA)+\hbox {P}(B\overline{A})\) for the two disjoint conjunctive events BA and \(B\overline{A}\). Third, the standard definition of conditional probability allows us to write:

  1. (5)

    \(\hbox {P}(X{\vert }Y) = \hbox {P}(XY)/\hbox {P}(Y)\)

The above three assumptions readily derive the law of total probability and explain the classical requirement that there is no disjunction effect, i.e. the difference \(\hbox {P}(B) - (\hbox {P}(B{\vert }A)\cdot \hbox {P}(A) + \hbox {P}(B{\vert }\overline{A})\cdot \hbox {P}(\overline{A}))\) is always required to be zero.

Unfortunately, prospect theory or its modifications based on the general conception of subadditive capacities is not sufficient to explain the disjunction effect. The reason is that capacities are still based on the idea of a (partial) Boolean algebra and the law of subadditivity is valid within this framework, taking the following form:

  1. (6)

    \(W(B\cap A)+W(B\cap \overline{A}) \le W(B)\)

Using the corresponding modification of (5) for the conditioned weight function \(W(B{\vert }A)=_{\mathrm{def}} W(B\cap A)/W(A)\), it is not difficult to derive min\(\{W(B{\vert }A), \,W(B{\vert }\overline{A})\} \le W(B)\). Hence, scenarios such as \(\{W(B{\vert }A)= 0.54, \,W(B{\vert }\overline{A}) = 0.57,\, W(B) = 0.32\}\) as in the Hawaiian vacation example are theoretically excluded.

The original explanation Tversky and Shafir (1992) gave for the disjunction effect referred to a psychologically plausible intuition, which corresponds to a failure of plausible reasoning under the unknown condition:

We attribute this pattern of preference to the loss of acuity induced by the presence of uncertainty. Once the out-come of the exam is known, the student has good—albeit different—reasons for taking the trip: If the student has passed the exam, the vacation is presumably seen as a reward following a successful semester; if the student has failed the exam, the vacation becomes a consolation and time to recuperate before taking the exam again. A student who does not know the outcome of the exam, however, has less clear reasons for going to Hawaii. In particular, she may feel certain about wanting to go if she passes the exam, but unsure about whether she would want to go if she fails. Furthermore, she may feel it is inappropriate to reward herself with a trip to Hawaii regardless of whether she passes or fails. Only when she focuses exclusively on the possibility of passing and of failing the exam does her preference for going to Hawaii become clear. The presence of uncertainty, we suggest, tends to blur the picture and makes it harder for people to see through the implications of each outcome. Broadening the focus of attention can lead to a loss of acuity. (Tversky and Shafir 1992, p. 306)

Busemeyer and Bruza (2012, p. 267) note that this psychological explanation is quite consistent with an approach based on interference:

If choice is based on reasons, then the unknown condition has two good reasons. Somehow these two good reasons cancel out to produce no reason at all! This is analogous to wave interference where two waves meet with one wave rising while the other wave is falling so they cancel out. This analogy generates an interest in exploring quantum models. (Busemeyer and Bruza 2012, p. 267).

In Sect. 4, we will develop this idea in considerable mathematical detail.Footnote 2

3.3 The conjunction fallacy

The conjunction fallacy was described first by Tversky and Kahneman (1983) and then verified by numerous authors (cf. Busemeyer and Bruza 2012). In one of their experiments subjects are presented with a story such as the following one:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. (Tversky and Kahneman 1983, p. 298)

After the presentation of the story the subjects are asked to assess the probabilities of several propositions on a numbered scale. We represent the critical propositions only (together with the averaged judgements of the probabilities):



Linda is active in the feminist movement



Linda is a bank teller.



Linda is a bank teller and is active in the feminist movement


Obviously, the results contradict Kolmogorov axioms of probability theory: The conjunction of two propositions can never assume a higher probability than each of the two conjuncts. Closely related results were obtained when asking for ranking the probabilities of the different events or when asking for frequency judgments (cf. the discussion in Busemeyer and Bruza 2012).

Unfortunately, the conjunction fallacy cannot be explained by using any of the theoretical ideas discussed before. Prospect theory accounts for the fact that small probabilities are overweighted and large probabilities are underweighted in decision making. That does not help in the present case since the conjunction fallacy appears both with large and small probabilities. The idea of generalizing standard probabilities to capacities, which was helpful in case of resolving the Ellsberg puzzle, is not successful in the present case either. Capacities satisfy the condition of monotonicity, and monotonicity is violated if we consider the conjunction fallacy: \( A \& B \subseteq B\), but we do not have \( \hbox {P}(A \& B) \le \hbox {P}(B)\). Hence, a new heuristic idea is required in order to resolve the conjunction fallacy.

Tversky and Kahneman (1983) propose the judgmental heuristic of representativeness. Representativeness refers to estimating the probability of a situation or event by judging how similar it is to a coherent situation or prototype. As noted by Tversky & Kahneman (1983, p. 299) “representativeness depends on both common and distinctive features (Tversky 1977), it should be enhanced by the addition of shared features.” In the case of the Linda example sketched in (7), \( A \& B\) is more representative for Linda’s activities than A or B considered in isolation. Contrary to the standard extensional logic, this makes \( A \& B\) more probable than its constituents.

Unfortunately, there is no formal outline of a theory of representativeness within the context of judgmental heuristic. In the next section we will outline such a theory within the framework of quantum cognition.

3.4 Order effects

Survey researchers have demonstrated repeatedly that the same question often produces quite different answers, depending on the question context (for numerous survey examples, see Sudman and Bradburn 1982; Schuman and Presser 1981). To cite just one particularly well-documented example, a group of (North-American) subjects were asked whether “the United States should let Communist reporters come in here and send back to their papers the news as they see it?” The other group was asked whether “a Communist country like Russia should let American newspaper reporters come in and send back to their papers the news as they see it?” Support for free access for the Communist reporters varied sharply depending on whether that question preceded or followed the question on American reporters. The differences are quite dramatic: in a study of 1950, 36 % accepted communist reporters when the communist question came first and 73 % accepted them when the question came second. When the study was repeated in 1982, the numbers changed to 55 versus 75 % (Schuman 1992).

Schuman and Presser (1981) described two kinds of ordering effects, which they called ‘consistency’ and ‘contrast’ effects. The example with the case of accepted communist reporters illustrates the consistency effect, where in the context of the other question the answer frequencies are assimilated. In the contrast case, the differences of the answer frequencies are enlarged. In a more recent article, Moore (2002) reports on the identification of two different types of question-order effects termed as ‘additive’ and ‘subtractive’. Figure 2 gives a schematic sketch of the four different types of question ordering effects (adapted from Blutner 2012):

Fig. 2
figure 2

Four types of order effects for attitude questions. The size of the blocks for the two questions indicates the percentage of ‘yes’-answers when the questions appear isolated. The arrows indicate whether the percentage of ‘yes’-answers to these questions increases or decreases if the question is preceded by the alternative question

Question order effects are normally not considered in the context of bounded rationality. The reason is simply that the theoretical models that are standardly deliberated in order to describe the data—such as prospect theory—are actually not used for handling question-ordering effects. Instead, a completely different group of phenomenological models is applied to get insights into the nature of order effects. For instance, Weber and Johnson (Weber and Johnson 2006, 2009; Johnson et al. 2007; Johnson and Busemeyer 2010) have proposed that certain memory processes can be used to model such tasks:

This approach, most recently dubbed ‘Query theory’, assumes that preferences that drive choice and other decisions are based on a collection of serially posed queries to memory concerning relevant characteristics of the task. For example, if deciding whether to buy a certain digital camera, an individual might attempt to recall experiences with similar models or generate the pros and cons of buying the camera. Query theory is able to explain some empirical trends in human decision behavior by embellishing this simple notion with what is known about human memory, such as serial position effects, priming, and interference. Although the theory’s assumptions have been empirically supported, at this point it has not been formally introduced as a mathematical model or at a specific algorithmic level, as the preceding computational models have.” (Johnson and Busemeyer 2010, p. 745).

However, in the theoretical context of quantum models, the situation has changed radically and we will see that the existence of such order effects is constitutive for the appearance of the typical phenomena of bounded rationality. Hence, whether a given phenomenon is meaningfully grouped under a certain label, let it be the label of ‘bounded rationality’ or any other label, depends on the theoretical paradigm that aims to capture the phenomenon. It is not surprising that prospect theory and quantum cognition generate different groupings. Regarding ‘query theory’, we can add that it designates a system of informal ideas that possibly can be explicated and mathematized in the formalism of quantum cognition (rather than in terms of prospect theory).

4 Quantum probabilities and the proper analysis of some puzzles of bounded rationality

In the last section, we have introduced some puzzles of bounded rationality. We have shown how the Allais paradox can be resolved by prospect theory using a single uniform function \(\upgamma \) that models the deformation of probabilities by certain reference points. In contrast, for solving the Ellsberg puzzle a substantial modification of prospect theory was required introducing a novel weighting function. This function realizes what is called a capacity in formal theories of reasoning with uncertainty (Halpern 2003). Importantly, the disjunction effect, by contrast, cannot be explained by this modification of prospect theory. Instead, a new idea is required based on the failure of plausible reasoning when the conditions are unknown. Yet another new idea, the concept of representativeness, is required to resolve the conjunction and disjunction fallacies. Some readers might have the impression that the field of judgmental heuristic is dominated by a rather eclectic methodology. We actually agree with this view. Interestingly, some authors suggest to making a virtue out of necessity. For instance, Gigerenzer and Selten (2001) proposed an “adaptive toolbox” to handle these problems. In our opinion, this conclusion is premature and questionable.Footnote 3

In order to avoid misunderstandings, we do not disapprove the empirical findings of judgemental heuristics including the results of Gigerenzer and other representatives of the toolbox idea. Rather, we challenge the fruitfulness of those theoretical ideas for the foundation of cognitive science. In natural sciences, it is not a very attractive methodology to propose a toolbox of different theories in order to handle closely related empirical data. Instead, natural scientists try to explain the data on hand by means of deduction from a universal theory which is the end of their striving. We think this idea should apply also to psychology, linguistics and cognitive science in general. In this section we suggest the formalism of quantum theory as a powerful research instrument that can provide uniform and systematic solutions.

4.1 An elementary introduction to Hilbert space semantics

One of the main arguments for a quantum approach to cognitive phenomena is the existence of interference effects in higher cognitive processes such as perception, decision making, and reasoning (Aerts 2009; Blutner 2009; Bruza et al. 2009; Busemeyer and Bruza 2012; Busemeyer et al. 2011; Conte 1983; Conte et al. 2008; Khrennikov 2006; Pothos and Busemeyer 2009; Primas 2007). In quantum cognition, interference terms typically appear when considering the probability of composed events or propositions.

After a concise introduction of the mathematics of Hilbert spaces, we will explain the basic concepts of quantum probability and the concept of measurement. It goes without saying that we can introduce the mathematical framework in a rather sketchy way here only. Readers who are interested in a deeper mathematical understanding are referred to the rich literature of (text-) books on quantum theory, quantum information, and quantum logic (e.g., Engesser et al. 2009; Pitowsky 1989).

In traditional theories of formal semantics (Lewis 1983; Montague 1970), the building blocks of the underlying ontology are possible worlds, i.e. unstructured objects without any intrinsic properties and without any relations to other objects. Propositions are considered as sets of possible worlds, and the algebra of propositions is a Boolean algebra (formed by the set-theoretic operations intersection, union, and complement). In contrast, the ontology of quantum theory is based on vectors in a Hilbert space \({\mathscr {H}}\). Vectors are entities that can be added with other vectors and that can be multiplied with a scalar number. For instance, multiplying a vector \(v \in {\mathscr {H}}\) by the number 1.5 gives a new vector written 1.5 v (which is equivalent to the sum \(v+0.5v\)). It should be noted that vectors can also be multiplied with complex numbers.

Complex numbers are nothing else than pairs \([a_{1},\, a_{2}]\) of real numbers. The first part of a complex number is called its real part; the second part is called its imaginary part. Complex numbers are usually written in the form \(a=a_{1}+a_{2}\mathrm{i}\), with real numbers \(a_{1}\) and \(a_{2}\). For calculating with complex numbers the same rules are used as for calculating with real numbers, respecting the assumption \(\hbox {i}^{2} = -1\). Through the Euler formula, a complex number \(a=a_{1}+a_{2}\hbox {i}\) may be written in the form

  1. (8)

    \(a = {\vert }a{\vert } (\cos \theta + \hbox {i } \sin \theta ) = {\vert }a{\vert }\hbox { e}^{\mathrm{i}\theta }\), where \({\vert }a{\vert } = \sqrt{a_1 ^{2}+a_2 ^{2}}\) and \(\theta =\arctan \left( \frac{a_2 }{a_1 }\right) \),

where \({\vert }a{\vert }\) designates the absolute magnitude of the complex number and \(\theta \) is the angle between the abscissa and the vector from the origin pointing to the point \([a_{1}, \,a_{2}]\) in the two-dimensional geometric representation of the Gaussian number plane. Each complex number \(a=a_{1}+a_{2}\hbox {i}\) has a complex conjugate \(a^{*} = a_{1}-a_{2}\hbox {i}\). The product a a* equals \({\vert }a{\vert }^{2}=a_{1}^{2}+a_{2}^{2}\) and provides the modulus of the complex number a.Footnote 4

Another important Hilbert space operation is the scalar product. Scalar multiplication and scalar product are different operations. Whereas the former is an operation between (real or complex) numbers and vectors resulting in vectors; the latter is an operation between two vectors resulting in a (real or complex) number. It is intended to express the similarity of two vectors. The scalar product of two vectors \(v,\, w\) in a given vector space is written in the form \(v\cdot w\). If the scalar product of two vectors is zero, the vectors are to be called ‘orthogonal’. The scalar product of a vector with itself defines the square of the ‘length’ of the vector: \(\Vert v\Vert ^{2}=v\cdot v\).Footnote 5

Vector spaces (denoted by VW, ...) are sets of vectors that are closed under the two operations of vector addition and scalar multiplication. In other words, if all the vectors \(v_{\mathrm{i}} (\hbox {i}= 1, {\ldots }, \hbox {n})\) are elements of a vector space, then each linear combination of them, i.e. each sum \(x_{1} \cdot v_{1}+x_{2} \cdot v_{2} + {\ldots } + x_{\mathrm{n}} \cdot v_{\mathrm{n}}\), is also an element of the vector space. Vector spaces based on scalar multiplication with real numbers are called real vector spaces; vector spaces based on scalar multiplication with complex numbers are called complex vector spaces.

A subset of a vector space V is called a subspace of V if it is a vector space in itself (i.e. closed under addition and scalar multiplication). Let \(V_{1}\) and \(V_{2}\) be two subspaces of V, then the sum of the two vector spaces, written \(V_{1}+V_{2}\), is the set of all possible sums of elements of \(V_{1}\) and \(V_{2}\). The sum of two vector spaces is a vector space again. We will say that the sum \(V_{1}+V_{2}\) is a direct sum, written \(V_1 \oplus V_2 ,\) of the two subspaces \(V_{1}\) and \(V_{2}\) if and only if each element of the sum can be written uniquely as a sum \(v_{1}+v_{2}\) where \(v_{1} \in V_{1}\) and \(v_{2}\in V_{2}\). It can be proven (e.g. Axler 1996) that a sum \(V_{1}+V_{2}\) is a direct sum if and only if \(V_{1} \cap V_{2} = \{0\}\). The linear hull of a list of vectors \((v_{1}, v_{2},{\ldots }, v_{\mathrm{n}})\) in V is defined as the set of all linear combinations of these vectors, denoted \(\hbox {LH}(v_{1},\, v_{2},{\ldots }, v_{\mathrm{n}})\). A vector space is called finite-dimensional if it is the linear hull of some finite list of vectors.

An important concept is linear independence of a set of vectors. A set of vectors is called linearly independent if none of its elements is a linear combination of the others. Otherwise it is called linearly dependent. A basis of a vector space V is a list of linearly independent vectors in V iff V is the linear hull of these vectors. If there are several bases of a vector space, it can be proven that the number of the vectors is the same in each base. This number is called the dimension of the vector space. Observe that \(\text {dim}(V_{1}+V_{2}) = \text {dim}(V_{1}) + \text {dim}(V_{2}) - \text {dim}(V_{1} \cap V_{2})\). This result allows the following conclusion: the dimension of the direct sum of two vector spaces is the sum of the dimensions of the two spaces. A basis of a vector space can consist of vectors which are pairwise orthogonal and each has unit length. Such a basis is called an orthonormal basis. Is it easy to show that the scalar product of two vectors is the sum of the products of their respective coefficients taken an orthonormal basis.

Let us consider a subspace V of a Hilbert space \({{\mathscr {H}}}\). With the help of the scalar product the orthocomplement of V—written \(V^{\bot }\)—can be defined as the set of vectors that are orthogonal to each vector in \(V: V^{\bot } = \{v\in {{\mathscr {H}}}: \, v\cdot w = 0\) for any \(w \in V\}\). It is not difficult to prove that the orthocomplement is a vector space again and that \({{\mathscr {H}}}\) is the direct sum of V and \(V^{\bot }: \,{{\mathscr {H}}}=V \oplus V^{\bot }\). The algebraic structure that arises from considering the three basic operations on vector spaces, intersection, sum, and orthocomplement is called an orthomodular lattice and will be considered more closely in Sect. 6. At this moment it is sufficient to say that the algebraic structure has properties similar to a Boolean algebra with the exception that the principle of distributivity is lacking. Considering all subspaces of a Hilbert space \({{\mathscr {H}}}\) is useful then when it comes to look for alternatives to the classical conception of propositions (sets of possible worlds).

Closely related to subspaces of a Hilbert space \({{\mathscr {H}}}\) are so-called projection operators A. They project each vector of \({{\mathscr {H}}}\) into a given subspace of \({{\mathscr {H}}}\).Footnote 6 Formally, they can be represented by linear operators, i.e. operators that satisfy the condition \({\varvec{F}}(av + bw) = a {\varvec{F}}(v)+b {\varvec{F}}(w)\) for any complex numbers a and b and any vectors \(v, w \in {{\mathscr {H}}}\). They have to satisfy the special property of idempotence: \({\varvec{AA}}={\varvec{A}}\). Equivalently, this can be expressed by the conditions that projection operators have eigenvalues 0 or 1.Footnote 7 In the quantum approach, propositions are modelled by projection operators. This is equivalent to modelling them by subspaces of \({{\mathscr {H}}}\). The subspace can be reconstructed as the positive eigenspace of \({\varvec{A}}\), i.e. the set of all vectors satisfying the eigenvalue equation \({\varvec{A}}(v)=v\). In combining projection operators, order can matter. That is, it can be that \({\varvec{AB}}\ne {\varvec{BA}}\) for two projection operators \({\varvec{A}}\) and \({\varvec{B}}\). Interestingly, if all projection operators relative to a given Hilbert space commute (i.e., \({\varvec{AB}}={\varvec{BA}}\)), then we get a Boolean algebra of projectors.

It is instructive to consider the two main differences between the treatment of classical propositions (sets of possible worlds) and quantum propositions (projections in Hilbert spaces). First, instead of union \(A\cup B\) in the classical case, we consider the sum operation \({\varvec{A}}+{\varvec{B}}\) in the quantum case. For defining the sum of two projectors, we construct the two subspaces corresponding to the projectors A and B, build the sum of these subspaces, and finally construct a projector corresponding to this sum. In other words, we construct the projector that projects the smallest subspace that contains the two subspaces corresponding to A and B. It is not difficult to prove that the relation \(({\varvec{A}}+{\varvec{B}})(v) = {\varvec{A}}(v)+{\varvec{B}}(v)\) holds only in case that \({\varvec{AB}} = -{\varvec{BA}}\). The second difference refers to complementation. In the quantum case, a negated proposition refers to a subspace orthogonal to the original one. We will write \(\overline{{\varvec{A}}}={\varvec{I}}-{\varvec{A}}\) for the orthogonal projection operatorFootnote 8 (I is the identity operator mapping any vector to itself). It is easy to see that \({\varvec{A}}\overline{{\varvec{A}}} =\overline{{\varvec{A}}}{\varvec{A}}= ({\varvec{I}}-{\varvec{A}}){\varvec{A}}={\varvec{IA}}-{\varvec{AA}}={\varvec{A}} -{\varvec{A}}={\varvec{0}}\).

Let us consider now the mathematical notion of probability. Even in the quantum case, a probability function is an additive measure function: \(\hbox {P}({\varvec{A}}+{\varvec{B}}) = \hbox {P}({\varvec{A}}) + \hbox {P}({\varvec{B}})\) assuming that the two considered projectors A and B are orthogonal to each other. In both the classical and the quantum case, the second condition is normalization, assigning the value 1 to the weakest proposition and the value 0 to the strongest one: \(\hbox {P}({\varvec{I}}) = 1\) and \(\hbox {P}({\varvec{0}}) = 0\).Footnote 9

For each normalized vector v, it is possible to define a probability function in the following way:

  1. (9)

    \(\mathrm{P}_{v} \left( {\varvec{A}} \right) =\Vert {\varvec{A}}\left( v \right) \Vert ^{2}\)

This is the so-called Born-rule stating that the square of the length of the projection \({\varvec{A}}(v)\) should be seen as probability for the proposition A, or, in a physical jargon, the probability that the state v collapses into the positive eigenspace of A (with eigenvalue 1). It is important to see that this probability function is defined by the geometry of the Hilbert space, especially the geometric features of the projector A and the (pure) state v. For that reason we will call the probability function \(\hbox {P}_{v}\) a structural or geometric probability. It contrasts with another notion of probability where a system of orthonormal vectors \(v_{\mathrm{i}}\) is given together with (statistical) weights \(p_{\mathrm{i}} (\Sigma p_{\mathrm{i}} = 1)\) and the resulting statistical probability is calculated in the following way:

  1. (10)

    \(\hbox {P}_{\left\{ {v_{i}, p_{i}} \right\} } \left( {\varvec{A}} \right) =\sum _{i} p_{i} \cdot \hbox {P}_{v_{i}} \left( {\varvec{A}} \right) \)

This statistical probability is relative to the (statistical) weights \(p_{\mathrm{i}}\) and can be seen as a weighted mixture of (structural) probabilities. John von Neumann has introduced so-called density matrices in order to express such mixed states in an elegant and uniform way (von Neumann 1932). However, for the present purpose it is sufficient to use the concept of mixture in an informal way.

Next, we have to explain the idea of a physical measurement, where the observable being measured is represented by a Hermitian operator F. Assume that the considered physical system is in a certain state \(v\in {{\mathscr {H}}}\). There are two possibilities now. First, the state v is an eigenstate of F, say with eigenvalue \(\uplambda \). Then measurement of F yields this eigenvalue and the state after measurement has not changed (i.e. it is v again). The second possibility is that the state v is not an eigenstate of F. In this case, quantum mechanics assumes that the act of measurement changes this state into another state which is always an eigenstate of F. However, the particular eigenstate out of several possible ones that becomes actualized is decided by chance. There is no way to formulate a deterministic mechanism for this decision. Hence, indeterminism is an essential component of quantum mechanics. The only thing that can be predicted is the probability of finding the output state in a certain eigenspace of F. For example, consider the space described by the projection operator \({\varvec{A}}_{\mathrm{i}}\). The probability that the state v collapses to the eigenspace of \({\varvec{A}}_{\mathrm{i}}\) can be calculated by the Born-rule: \(P_v \left( {{\varvec{A}}_i } \right) =\Vert {\varvec{A}}_i \left( v \right) \Vert ^{2}\). It yields the probability that the eigenvalue \(\uplambda _{i}\) is measured.

To repeat the deep insight from quantum theory: the act of measurement can change the state of the system. Only if the initial state is already an eigenstate of the observable being measured the final state does not change in the course of measurement. In all other cases, the initial state u is changed into a mixed state describing the possible outcomes of the measurement and their probabilities. Generally, a measurement can be seen as a question addressed to nature. The act of questioning can change the state of the system. This is not really surprising when considering modern versions of update semantics (Blutner 2012).

If an exact outcome of the measurement cannot be calculated, what is the expectation value of measuring observable F when measurement takes place in a pure state u? Based on the intuitive idea of measurements, the scalar product between u and the outcome of applying the observable F to the state u can be used to calculate this expectation value. This is labelled by \({\varvec{F}}_{v}\).

  1. (11)

    \({\varvec{F}}_{v}=v \cdot {\varvec{F}}(v)\)

In case of projection operators, formula (11) gives the probability of the proposition expressed by the projector: \({\varvec{A}}_{v}=v \cdot {\varvec{A}}(v)=v \cdot {\varvec{A}}({\varvec{A}}(v)) = {\varvec{A}}(v) \cdot {\varvec{A}}(v)=\Vert {\varvec{A}}(v)\Vert ^{2}= \hbox {P}_{v}({\varvec{A}})\).

Lüders (1951) has proposed a formula that refers to a sequence of two measurements \({\varvec{A}}\) and then \({\varvec{B}}\). We will abbreviate the corresponding operator (A; B). According to Lüders the probability for such a sequence can be calculated as follows:

  1. (12)

    \(\hbox {P}_{v}({\varvec{A}}; {\varvec{B}}) = \Vert {\varvec{B}}({\varvec{A}}(v))\Vert ^{2}\)

Hence, the operator A is first applied to the state v, and then the operator B is applied to the resulting state and transforms it into a final state whose squared length determines the structural probability of the sequence.

In order to determine the operator representing the sequence \(\left( {{\varvec{A}};{\varvec{B}}} \right) \) we apply the Born rule (9) to the state \(w={\varvec{A}}\left( v \right) \), i.e. \(\hbox {P}_w \left( {\varvec{B}} \right) =\Vert {\varvec{B}}\left( w \right) \Vert ^{2}={\varvec{B}}\left( w \right) \cdot {\varvec{B}}\left( w \right) ={\varvec{B}}\left( {{\varvec{A}}\left( v \right) } \right) \cdot {\varvec{B}}\left( {{\varvec{A}}\left( v \right) } \right) ={\varvec{BA}}\left( v \right) \cdot {\varvec{BA}}\left( v \right) ={\varvec{A}}\left( v \right) \cdot {\varvec{BBA}}\left( v \right) =v\cdot {\varvec{ABA}}\left( v \right) \). Therefore the following equation holds (Niestegge 2008):

  1. (13)

    \(({\varvec{A}}; {\varvec{B}}) = {\varvec{ABA}}\)

This sequence operator can be used to define conditionalized probabilities in the quantum case (Niestegge 2008):

  1. (14)

    \(\hbox {P}({\varvec{B}}{\vert }{\varvec{A}}) = \hbox {P}({\varvec{ABA}})/\hbox {P}({\varvec{A}})\).

The operator (A; B) or ABA is called asymmetric conjunction. Note that \(\hbox {P}({\varvec{ABA}}) = \hbox {P}({\varvec{A}}\) and then B), which is how Busemeyer et al. (2011) modelled conjunction in human decision making (see also Blutner 2009).Footnote 10 Note further that standard systems of quantum logic (Engesser et al. 2009) always use symmetric versions of conjunction and have deep intrinsic problems when it comes to consider non-commuting projections. One problem is that the resulting systems do not obey the deduction theorem which is constitutive for a proper logic in the opinion of many logicians. In the following, we will see that the notion of asymmetric conjunction is very useful when discussing certain puzzles of bounded rationality.

4.2 The Allais puzzle

In Sect. 2 we have considered four prospects abcd, representing pairs of one or two winning situations (to win 1000 units or to win 450 units) and one losing situation (to win nothing). In the classical case, we can represent the utilities as the diagonal elements of the following matrix:

  1. (15)

    \(U=\left( {{\begin{array}{ccc} {u\left( {1000} \right) }\quad &{} 0&{} \quad 0 \\ 0\quad &{} {u\left( {450} \right) }&{} \quad 0 \\ 0&{} \quad 0&{} \quad 0 \\ \end{array} }} \right) \)

The four prospects are then defined by the vectors \(a= \left( {\sqrt{.5}, 0,\sqrt{.5}} \right) ,\, b= \left( {0, 1, 0} \right) , c=\left( {\sqrt{.1}, 0, \sqrt{.9}} \right) ,\, d= \left( {0,\sqrt{.2}, \sqrt{.8}} \right) \). The single components of these vectors give the square root of the probability of the corresponding situation. Let us call such vectors risk profiles. The expected utility of any risk profile x can then be calculated by the quadratic form:Footnote 11

  1. (16)

    \(U_{x}=x \cdot U x^{T}\)

It is easy to check that this formula gives exactly the results we found earlier for the expected utility in the classical model: \(U_{a} = .5\, u(1000),\, U_{b}=u(450),\, U_{c }= .1 \,u(1000)\), and \(U_{d}= .2\, u(450)\). In this way, it is not possible to reproduce the experimentally expected results that prospect b is preferred to a while c is preferred to d. For example, if we assume that \(u(1000) = 10\) and \(u(450) = 5.5\), then we get correctly that b is preferred to a but wrongly that d is preferred to c.

La Mura’s idea of ‘projected expected utility’ (La Mura 2009) is to consider formula (16) as valid even when U is not a diagonal matrix but contains non-diagonal elements different from zero. For example, he suggests to assuming a slight aversion to the risk of obtaining no gain (third component of the vectors) such that we have to consider matrices as the following one:

  1. (17)

    \(U=\left( {{\begin{array}{ccc} {10}&{} \quad 0&{} \quad {-1} \\ 0&{} \quad {5.5}&{} \quad {-1} \\ {-1}&{} \quad {-1}&{} \quad 0 \\ \end{array} }} \right) \)

Using formula (16) again now yields the following results: \(U_{a} = 4,\, U_{b} = 5.5, U_{c}= .4\), and \(U_{d}= .3\). Hence, prospect b is preferred to a while c is preferred to d.

4.3 The Ellsberg puzzle

La Mura’s theory of ‘projected expected utility’ can be applied to resolving the Ellsberg puzzle as well (La Mura 2009). In Sect. 3.1 we have considered two prospects R and G assuming a payoff of $X in case of choosing the correct colour and $0 in the other case. Hence, instead of two winning situations as in the example discussed before, we are concerned with one winning situation only (to win $X) and one losing situation (zero-payoff). For simplicity, we assume that u($X) = 1 and u($0) = 0. Hence, we can model the scenario by assuming the following diagonal utility matrix:

  1. (18)

    \(U=\left( {{\begin{array}{ll} 1&{} 0 \\ 0&{} 0 \\ \end{array} }} \right) \)

The two prospects R and G can modelled then by two risk profiles: \(r=\left( {\sqrt{\frac{1}{3}}, \sqrt{\frac{2}{3}}} \right) \) and \(g=\left( {\sqrt{\frac{j}{300}}, \sqrt{\frac{300-j}{300}}}\right) _{0\le j\le 200} \). In the latter case the risk profile contains a hidden parameter j ranging from 0 to 200 (the number of green balls the actor does not know). We will call such risk profiles mixed risk profiles. It is easy to calculate the expected utilities for both profiles:

\(U_{r} = 1/3\) and \(U_{g}=j/300\). In the latter case we will assume that all possibilities for choosing j are equally mixed. In the average, then we get mean \(U_{g }=\frac{1}{201}\sum _{j=0}^{200} \left( {\frac{j}{300}} \right) = 1/3\). This is the classical solution, which does not make a difference between risk and ignorance. Consequently, both prospects are ranked equally.

Again, we consider formula (16) as valid even when U is not a diagonal matrix but contains non-diagonal elements different from zero. Following La Mura (2009), we consider the following Hermitean utility matrix containing a free parameter \(\upalpha \) (real number):

  1. (19)

    \(U= \left( {{\begin{array}{ll} 1&{} \upalpha \\ \upalpha &{} 0 \\ \end{array} }} \right) \).

Again, we can calculate the expected utilities and obtain now: \(U_{r} = 1/3 + 0.94\upalpha \) and mean \(U_{g} = \hbox {j}/300 + 2\upalpha \sqrt{j\left( {300-j} \right) }/300\). Assuming again that all possibilities for choosing j are equally mixed, we get mean \( U_{g }=\frac{1}{201}\sum _{j=0}^{200} \left( {\frac{j+2\alpha \sqrt{j\left( {300-j} \right) }}{300}} \right) = 1/3+0.83\upalpha \).Footnote 12 If \(\upalpha = 0\), there is no difference in preference between the risky action R and the uncertain action (action of ignorance) G. If \(\upalpha > 0\), then the risky action R is advantageous over the uncertain action G (and converse in the case of \(\upalpha < 0\)). Hence, the empirical data can be fitted by a positive parameter \(\upalpha \) determining the avoidance of situations of ignorance.

Summarizing, the probability of risky events is calculated by the Born rule; the probability of events under ignorance is calculated by mixing. Quantum probabilities correspond to the normal case of judging risks. The case of ignorance (no explicit probabilities are provided by the geometry of projections) are handled by the mixed case of the density matrix (Franco 2007).Footnote 13

4.4 The disjunction effect

In Sect. 3.2 we have seen that classical probability theory cannot account for the disjunction effect. The reason was that in the classical theory we can derive the law of total probability (4), for convenience repeated here:

  1. (20)

    \(\hbox {P}(B) = \hbox {P}(B{\vert }A) \cdot \hbox {P}(A) + \hbox {P}(B{\vert }\overline{A}) \cdot \hbox {P}(\overline{A})\).

Let us see now what happens with Eq. (20) in the quantum case. As mentioned above, also in the quantum case, a probability function is an additive measure function. However, instead of using sets of possible worlds to model propositions, the quantum approach models propositions by subspaces of a given Hilbert space, or projection operators that project any vector into the given subspace. In order to get the quantum version of Eq. (20), we can decompose the projector B in the following way:

  1. (21)

    \({\varvec{B}}={\varvec{IBI}}=\left( {{\varvec{A}}+\overline{{\varvec{A}}}} \right) {\varvec{B}}\left( {{\varvec{A}}+\overline{{\varvec{A}}}} \right) ={\varvec{ABA}}+\overline{{\varvec{A}}}{\varvec{B}} \overline{{\varvec{A}}}+{\varvec{AB}}\overline{{\varvec{A}}} +\overline{{\varvec{A}}}{\varvec{B}}{\varvec{A}}\)

The four parts of this decomposition are orthogonal to each other. Using the definition of asymmetric conjunction as given in Eq. (13), we get (22a); using in addition the definition of conditionalized probabilities—Eq. (14)—, we obtain (22b).

  1. (22)

    a. \(\hbox {P}({\varvec{B}}) = \hbox {P}({\varvec{A}}; {\varvec{B}}) + \hbox {P}(\overline{{\varvec{A}}}; {\varvec{B}})+\partial ({\varvec{A}},{\varvec{B}})\), where \(\partial ({\varvec{A}},{\varvec{B}})=\hbox {P}({\varvec{AB}}\overline{{\varvec{A}}}+ \overline{{\varvec{A}}}{\varvec{BA}})\)

              b. \(\hbox {P}({\varvec{B}}) = \hbox {P}({\varvec{B}}{\vert }{\varvec{A}}) \hbox {P}({\varvec{A}}) + \hbox {P}({\varvec{B}}{\vert }\overline{{\varvec{A}}}) \hbox {P}(\overline{{\varvec{A}}})+\partial ({\varvec{A}},{\varvec{B}})\),

Note that the measure function P can have negative values for some of its arguments (cf. Niestegge 2008; Blutner 2009). The term \(\partial \)(A,B) is called the interference term. It is zero if A and B commute, in which case Eq. (22b) reduces to (20)—the law of total probability.

Equation (22b) allows to expressing the disjunction effect. In fact, the difference \(\hbox {P}(B) - (\hbox {P}(B{\vert }A) \cdot \hbox {P}(A) + \hbox {P}(B{\vert }\overline{A}) \cdot \hbox {P}(\overline{A}))\) is the interference term \(\partial ({\varvec{A}},{\varvec{B}})=\hbox {P}({\varvec{AB}}\overline{{\varvec{A}}} +\overline{{\varvec{A}}}{\varvec{BA}})\). Hence, we get a numerical value for the disjunction effect when we calculate the interference term. In quantum theory, the probability function P is always relative to the state of the system, either mixed or pure. For simplicity, we consider pure states v only at the moment. In this case, we can calculate the following expression for the interference term:

  1. (23)

    \(\hbox {P}_{v}({\varvec{AB}}\overline{{\varvec{A}}}+\overline{{\varvec{A}}} {\varvec{BA}})= 2\sqrt{\hbox {P}_v ({\varvec{B}}|{\varvec{A}})\hbox {P}_v \left( {\varvec{A}} \right) }\cdot \sqrt{\hbox {P}_v \left( {{\varvec{B}}|\overline{{\varvec{A}}}} \right) \hbox {P}_v \left( {\overline{{\varvec{A}}}} \right) }\cdot \cos (\Delta )\). Footnote 14

The phase shift parameter \(\Delta \) relates to the impact of knowing A or \(\overline{A}\) for assessing the likelihood of B. This angle is zero if the subspaces corresponding to the events A and then B (or \(\overline{A}\) and then B) are orthogonal. If they are not orthogonal, the subspaces are incompatible. This means that if a participant decides for B, then he/she must necessarily be undecided with regard to A. From a psychological perspective, the interference term is the correlation between two decision paths: For the Hawaiian vacation example considered in Sect. 3.2, (i) First consider you won’t pass the exam and then consider the trip to Hawaii and (ii) first consider you will pass the exam and then consider the trip to Hawaii. A negative correlation corresponds to a negative interference term (\(\partial ({\varvec{A}},{\varvec{B}})<0\) in (22)), which negatively affects the law of total probability (i.e., reduces the probability for the trip, in the unknown case), and conversely for a positive correlation.

Considering the numerical values of the Hawaiian vacation example, \(\{\hbox {P}(B{\vert }A)= 0.54,\hbox { P}(B{\vert }\overline{A}) = 0.57,\hbox { P}(B) = 0.32\}\), we get a value of \(-\).23 for the interference term (assuming the chances for passing and not passing are equal), and from this outcome we can fit the phase shift parameter: \(\cos (\Delta ) = -0.42\), i.e. \(\Delta = 114^{\circ }\).

4.5 The conjunction fallacy

We will demonstrate now how asymmetric conjunction resolves this conjunction fallacy discussed in Sect. 3.3 in the quantum probabilistic case. We can define the conjunction effect as difference \(\hbox {P}(A;B) - \hbox {P}(B)\) with the actual numerical value of +0.13 found for the Linda-example reviewed above. Obviously, the results contradict the Kolmogorov axioms of probability theory: The conjunction of two propositions can never get a higher probability than each of the two conjuncts. We will demonstrate now how asymmetric conjunction resolves this conjunction puzzle in the quantum probabilistic case. Considering Eqs. (22b) and (23), we get the following expression for the conjunction effect:Footnote 15

  1. (24)

    \(\hbox {P}(A; B) - \hbox {P}(B) = - \hbox {P}(\overline{{\varvec{A}}}; {\varvec{B}}) - 2 \cdot \sqrt{\hbox {P}\left( {{\varvec{A}};{\varvec{B}}} \right) \cdot \hbox {P}\left( {\overline{{\varvec{A}}}}; {\varvec{B}} \right) } \cdot \cos (\Delta )\).

According to classical probability theory, the value of the conjunction effect is always negative. This corresponds to the case of commuting operators \(\varvec{A}\) and \(\varvec{B}\), and can be expressed by assuming \(\Delta =\uppi /2\) or \(\cos (\Delta ) = 0\)). The conjunction effect can be positive if cos(\(\Delta \)) is negative (e.g., if \(\cos (\Delta )= -1\) or \(\Delta =\uppi \)). In the example case with P(B) = 0.38 (Linda is a bank teller), \(\hbox {P}(A) = 0.61\) (Linda is a feminist), \(\hbox {P}(A;\, B) = 0.51\) (Linda is a feminist bank teller), we get a conjunction effect \(\hbox {P}(A;\, B) - \hbox {P}(B) = 0.13\). This corresponds to a parameter \(\cos (\Delta ) = -0.9\) assuming \(\hbox {P}(\overline{A}; B) = 0.09\) (Linda is a non-feminist bank teller).

The present approach followed Conte and Khrennikov (e.g. Conte et al. 2008; Khrennikov 2003a, b, 2006) who pioneered the investigation of interference effects in cognitive macro-systems. The same idea can be used for modelling data on the conjunction and disjunction of concepts. Research pioneered by Hampton and others concerning the structure of vague concepts provides significant empirical basis showing deviations from set theoretic rules in conceptual combination (Hampton 1987, 1988a, b). The relevant empirical results include violations of the conjunction and disjunction rules, the famous ‘guppy effect’, and cases of ‘dominance’, ‘over- and underextension’, which were all successfully described on the basis of quantum principles (Aerts 2009; Aerts and Gabora 2005).

For linguists, the distinction between vagueness and prototypicality is very important (Kamp and Partee 1997). While classical models have huge problems to model this distinction, it is quite easy to model it in quantum approaches. Concepts are modelled like propositions as projection operators where the Hilbert space defines the set of instances (feature vectors). The corresponding measure functions define the graded membership function for the described concept. In addition, within the positive eigenspace of this operator there is a low-dimensional subspace defined which describes the relevant prototypes of the category. Projecting into this subspace provides a measure function for typicality (Blutner 2009; Blutner et al. 2013).

4.6 Order effects

Recently, Wang and Busemeyer (2013) have shown that the quantum approach can describe all four types of order effects we have considered in Sect. 3.4. They treat the ordering effect very similar to the disjunction effect, namely in terms of the asymmetric conjunction (see Sect. 4.4). Let the question order effect be the difference between the probabilities for B in the contexts A and \(\overline{{\varvec{A}}}\) and the probability for B in a neutral context, i.e. the difference \(\hbox {P}({\varvec{A}}; {\varvec{B}}) + \hbox {P}(\overline{{\varvec{A}}}; {\varvec{B}}) - \hbox {P}({\varvec{B}})\). This difference has the negative amount of the interference term that was introduced in \((22): -\partial ({\varvec{A}},{\varvec{B}}) = -\hbox {P}({\varvec{AB}}\overline{{\varvec{A}}}+\overline{{\varvec{A}}}{\varvec{BA}})\). For simplicity, we consider pure states v only at the moment as we did in case of calculating the disjunction effect (Sect. 4.4). Then we obtain the following expression for the question order effect:

  1. (25)

    \(-\partial _{v}({\varvec{A}}, {\varvec{B}}) = 2\hbox {P}_{v}({\varvec{A}}; {\varvec{B}}) - 2(\hbox {P}_{v}({\varvec{A}}) \hbox {P}_{v}({\varvec{B}}))^{1/2} \cos \updelta \)

Here, \(\updelta \) is a phase shift parameter introduced by factorizing the complex number defined by the scalar product \({\varvec{A}}u \cdot {\varvec{B}}u\).Footnote 16 In the classical case of commuting operators the order effect becomes zero. In the non-classical case of non-commuting operators we can describe all four classes of ordering effects. For instance, let us assume that \(\hbox {P}_{u}({\varvec{A}}) > \hbox {P}_{u}({\varvec{B}})\), and \(\hbox {P}_{u}({\varvec{A}};{\varvec{B}}) > \hbox {P}_{u}({\varvec{B}};{\varvec{A}})\), then the consistency (assimilation) effect is obtained by stipulating \(\hbox {P}_{u}({\varvec{A}};{\varvec{B}}) > \cos \updelta > \hbox {P}_{u}({\varvec{B}};{\varvec{A}})\). Similarly, for the contrast effect we assume \(\hbox {P}_{u}({\varvec{A}};{\varvec{B}}) < \cos \updelta < \hbox {P}_{u}({\varvec{B}};{\varvec{A}})\); and correspondingly for addition and subtraction (cf. Blutner 2012).

The phenomenon of order effects convincingly illustrates that quantum models of cognition are much more powerful than simply fitting parameters to a collection of isolated phenomena. We think that quantum models sometimes have the potential of providing real explanations. This suggests that—once the relevant parameters are fixed—the theory can make predictions about correlations between different effects (Atmanspacher and Römer 2012). An excellent example is provided by Wang’s and Busemeyer’s (2013) verification of what they call the ‘law of reciprocity’:Footnote 17

  1. (26)

    \(\hbox {P}({\varvec{A}}; {\varvec{B}}) + \hbox {P}(\overline{{\varvec{A}}}; {\varvec{B}}) = \hbox {P}({\varvec{B}}; {\varvec{A}}) + \hbox {P}(\overline{{\varvec{B}}}; {\varvec{A}})\)

The theorem can be tested by considering two questions that are answered one immediately after the other with no possibility to insert additional information in between. The yes-answer for a question can be seen as realizing proposition A (or B, respectively), whereas the no-answer corresponds to the proposition \(\overline{{\varvec{A}}}\) (or \(\overline{{\varvec{B}}}\), respectively). The empirical test of reciprocity (Wang and Busemeyer 2013; see also Busemeyer and Bruza 2012) was surprisingly successful with one exception, the test of examples producing subtractive effects. It could be argued that in the latter case the key assumption of the model, no intervening information between the two events, is violated.Footnote 18

There is a variety of other examples where quantum models have proven their explanatory value: the simultaneous explanation of the conjunction and disjunction fallacies (Busemeyer and Bruza 2012, p. 126 ff), the prediction of borderline contradictions in case of conjoining vague predicates (Blutner et al. 2013), the prediction and empirical verification of inequalities for complementary questions in the context of Jung’s personality theory (Blutner and Hochnadel 2010), and belief revision in update semantics (beim Graben 2014).

4.7 General discussion

In Sect. 3 we have discussed some puzzles of bounded rationality. Classical decision theory based on standard probabilities, i.e. measure functions on the basis of Boolean algebras, cannot resolve these problems and for that reason they have been called puzzles. Further, we have introduced prospect theory and modifications thereof, and we have discussed how these models can help to solve the puzzles. For instance, we have argued that the Allais paradox can be resolved by using a single uniform function \(\upgamma \) that models the deformation of probabilities by certain reference points, a basic assumption of prospect theory. Further, we have reported that for solving the Ellsberg puzzle a substantial modification of prospect theory is required, one that introduces a new weighting function (called a capacity in formal theories of reasoning). Next, we have argued that the disjunction effect cannot be explained by this modification and a new psychological idea is required (based on the failure of plausible reasoning when the conditions are unknown). For resolving conjunction and disjunction fallacies yet another new idea is required—the idea of representativeness.

Summarizing, the impression is unavoidable that prospect theory (and their modifications) make isolated stipulations in order to describe apparently isolated phenomena. This is not really a satisfying theoretical situation (even if some authors have tried to make a virtue out of necessity). Models based on quantum probabilities (i.e., measure functions on the basis of orthomodular lattices such as Hilbert space projection lattices) improve the situation drastically. Allais paradox and Ellsberg puzzle can be resolved by the standard matrix formalism of expected utility where some non-diagonal elements of the utility matrix have to be taken into account. The remaining puzzles are described as interference phenomena and crucially rely on the operation of asymmetric conjunction. It should be stressed that the available models of quantum cognition are still based on several stipulations (size and sign of the interference effect, additional parameters for the utility matrices). With a proper fit of the respective parameters, we are able to describe the available data. However, descriptions are not yet explanations. Proper explanations are very close to predictions of effects based on independently motivated assumptions. We think that quantum models sometimes have the potential of providing explanations and do more than just describing a list of (isolated) phenomena. Once the relevant parameters are fixed, the theory can make predictions about correlations between different effects (as discussed in Sect. 4.6).

There is another opportunity to overcome the shortcomings of purely descriptive models. This is the exploration of foundational research questions. In the following sections, we will leave the descriptional level and advance to a more foundational level. This may help to achieve better understanding why the quantum models can be successful.

5 Phenomenological and foundational research programs

In the last section, we have outlined how quantum cognition can be seen as a phenomenological research programFootnote 19 that solves several puzzles of bounded rationality. Not every cognitive scientist will be convinced by the proposed analyses and some researchers might ask for a deeper motivation of the technical instrument of quantum probabilities. In other words, they might ask why this formalism is appropriate. Is there a handful of independently motivated first principles that account for the character of quantum probability? In Sect. 6 we will give a positive answer to this question. This justifies the claim that we are following a research program that can truly be called foundational.

The distinction between phenomenological and foundational research programs is not a mainstream issue in the current philosophy of science. However, it is lively discussed within particular disciplines such as physics (e.g., Streater and Wightman 1964; Piron 1976), chemistry (e.g., Carbó 1995), and linguistics (e.g., Hinzen 2000). With regard to the foundational perspective, we have to distinguish two different kinds of foundational research. One kind intends to reduce phenomenological aspects to a deeper level of description (such as explaining chemical properties by molecular structures). The other type of foundational research tries to get a deeper understanding of the theoretical instruments by reducing them to a small set of independently motivated first principles (Primas 1990). These principles are all stated at the same ‘level’ of description by a series of axioms or postulates. Accordingly, the second type of foundational research is anti-reductionist.

In the following, we will concentrate on the second way of understanding foundational research. We will give some examples in order to illustrate several aspects of foundational research and finally we will present some basic traits generally characterizing foundational research in the domain of cognitive science. This paves the way for a proper understanding of the foundational approach to quantum probabilities as considered in later sections.

A classic example of foundational research is the realization of abstract Boolean algebras through a field of sets by means of a representation theory. Another important class of examples is due to fundamental representation theorems in the theory of measurement (Suppes et al. 1989). In addition, Tversky (1977) developed a famous representation theorem for the representation of similarity in terms of features.

5.1 Linguistic example

Before we can conclude this section, we will illustrate the difference between phenomenological and foundational research program by an actual example from linguistics. In the last couple of years, within theoretical and philosophical linguistics, a lot of work was devoted to the comparison between Chomsky’s (1995) minimalistic program and the earlier approach of principle and parameters (Chomsky 1981). Here are some claims of how to contrast the two programs (we roughly follow the outline in Hinzen 2000):

  • What Chomsky calls “minimalism” is the search for general explanatory principles such as computational efficiency and related economy principles. It is one thing to find certain effects and an appropriate description of them (Principles and Parameters) and it is another thing to ask why this should be as it is (“minimalism”).

  • The aim of “minimalism” is to derive the earlier developed generalizations, principles, and explanations from independently needed more general principles. The task is to eliminate merely technical solutions.

  • The attempt of “minimalism” is to rationalize the domain under discussion rather than to describe it (as in “principles and parameters”).

  • “Minimalism” tries to make sense of the different facts. Listing of facts is one thing, understanding them is another thing. Obviously, the Husserlian idea of ‘phenomenology’ refers to Chomsky’s foundational program of minimalism.

  • The evolving theory should be free of inner contradictions and conflicts. This relates to the interpretational issue discussed earlier. Even when the current theoretical debate is far from providing mathematical insights about the underlying structure and algebraic properties,Footnote 20 it should not be disregarded as a substantive research prospect (Sternefeld 2012).

What we can learn from these discussions is that foundational approaches (such as minimalism) and phenomenological approaches (such as “principles and parameters”) are different modes of inquiry rather than different theories. In this sense, foundational approaches do not compete against phenomenological ones. The choice for one of the two approaches is a free decision determined by our interests; there is no way to argue against the “wrong” approach. Both have their justification. However, in the history of science it is sometimes more useful to pursue one direction and not the other. It further should be added that a sound distinction between foundational and phenomenological programs should focus on the issue of representational theorems. The development of representational theorems is one of the main tasks of foundational research. It develops independently motivated abstract structures and proves how they can be realized by particular concrete structures. The latter can be applied in phenomenological research for describing particular phenomena and sensations.

5.2 A historical note about complementarity

One of the most important ideas of quantum mechanics is the concept of complementarity. Originally, the idea came from the psychology of consciousness, in particular the writings of William James:

It must be admitted, therefore, that in certain persons, at least, the total possible consciousness may be split into parts which coexist but mutually ignore each other, and share the objects of knowledge between them. More remarkable still, they are complementary. (James 1890)

As noted by Max Jammer (1989), Niels Bohr, one of the founding fathers of quantum theory, was acquainted with the writings of James, and he has borrowed that idea from him.Footnote 21 In turn, Bohr introduced the idea into physics (originally as complementarity of momentum and position), and he proposed to apply it beyond physics to human knowledge in general. However, his physical conception of complementarity is quite different from James’, and his often-cited claim to apply it to human knowledge was never concretized by Bohr. In chapter VII of his book (James 1890)—on more than 10 pages—James describes several phenomena which illustrate the splitting of consciousness into parts that are not accessible from each other. For example, these phenomena concern the “unconsciousness in hysterics” (p. 202), partial blindness under “post-hypnotic suggestion” (p. 207) or the splitting of a person in several selves in “alcoholic delirium” (p. 208). One example describes the common situation of partial anaesthesia:

The mother who is asleep to every sound but the stirrings of her babe, evidently has the babe-portion of her auditory sensibility systematically awake. Relatively to that, the rest of her mind is in a state of systematized anaesthesia. That department, split off and disconnected from the sleeping part, can none the less wake the latter up in case of need. (p. 213)

Another example refers to the famous subject “Lucie” who was in a state of “post-hypnotic suggestion” and could see of all the cards covering her lap only those cards that were not a multiple of 3. She was particularly blind to numbers such as 9, 12, 15. Hence, the part consisting of the multiples of 3 was split off and disconnected from the part of numbers. However, under special conditions, when she had not to tell which cards she saw but to write it down by her hand, the other part of the numbers was accessible (p. 207).

Taking all the examples together, it seems adequate to use the term “autoepistemic accessibility” to refer to these phenomena. We use the term “autoepistemic” to refer to the epistemic states of a human subject who can reflect on her own epistemic states.Footnote 22 If two different states are not simultaneously epistemically accessible to the subject under discussion then they can be seen as complementary in James’ sense.

It should be noted that the related term of “epistemic accessibility” has been introduced by beim Graben and Atmanspacher (2006) in a not so different sense. In their approach, the term refers to an external observer and his measurement apparatus for describing the “microstates” of a dynamical system. These states are not accessible by macroscopic measurements which only provide a coarse-graining of the state space into “macrostates”. In philosophical contexts, often the term observer-dependency is used for this phenomenon (Searle 1980, 1998). In this sense, different macroscopic measurement devices appear as complementary when they lead to such differences in epistemic accessibility. This phenomenon is pertinent in the neurosciences when different measurements such as fMRI or EEG generate different coarse-grainings of the brain’s state space thus leading to incompatible descriptions of neurodynamics in the sense of de Barros and Suppes (2009). In the next section we will explain this idea in detail.

Bohr’s concept of complementarity is clearly not copied from James’ epistemic conception. As pointed out by Murdoch (Murdoch 1987, p. 55), the first occurrence of the word “complementarity” in Bohr’s correspondence is in a letter to Pauli of August 13, 1927:

What you write about your and Jordan’s work on electrodynamics is extremely attractive and is very much in agreement with my own view about the nature of quantum theory, according to which the apparently contradictory requirements of superposition and individuality do not subsume contrary but complementary sides of nature. I am in complete agreement with your remarks on de Broglie’s work: he is trying to achieve the impossible by a blending of two sides of the matter.” (Cited from Murdoch 1987, p. 55).

The terms “superposition” and “individuality” refer to the wave and particle, and in a draft version for a note to Nature referring to the wave-particle duality of light the word “complementarity” occurs in the very same sense:

It seems that we here meet with an unavoidable dilemma, ... the question being not of a choice between two rivalizing concepts but rather of the description of two complementary sides of the phenomenon. (Cited from Murdoch 1987, p. 55. The emendations are Bohr’s)

Bohr’s conception of complementarity refers to the laws of nature rather than to the idea of (auto)epistemic accessibility as in James’ writings. In other words, it is an ontic conception rather than an epistemic one.

Let us return now to the interpretational problem for probabilities. In quantum theory, a deep problem concerns the nature of the state vector. There are two basic questions the answers of which decide about its basic nature: (i) Does the state vector directly describe reality or is it related to our knowledge of the system? (ii) Does the state vector describe a single object (particle or wave) or does is describe a whole ensemble of objects (or an ensemble of experiments with the object)? In modern quantum theory (contrasting with naïve quantum physics), the state vectors are connected with probabilities. However, there are three different interpretations of the whole concept of probability (e.g., Halpern 2003): frequency interpretation, subjective probability, and propensity interpretation. It is interesting to see how the different answers to the two basic questions concerning how state vectors relate to the involved concept of probability.

It was Max Born (1955) who gave a realistic answer to the first question, and he gave the ensemble answer to the second question. His concept of probability is clearly seeing probabilities as relative frequencies. There are deep conceptual problems with Born’s approach, some of them are overcome by Ballentine’s ensemble approach (Ballentine 1970).

Another realist conception assumes that the state vector describes a single object by giving probabilities a propensity interpretation (Popper 1959). Propensities can be assumed of being some kind of abstract, objective forces (unobservable dispositional properties) that provide a measure of the tendency of a situation to produce a certain event. Of course, Popper’s article was an attack against Heisenberg’s Copenhagen interpretation, which is based on a subjective interpretation of probabilities (reflecting our knowledge or our “consciousness” of particles).Footnote 23

The third possibility is to assume an epistemic interpretation of the state vector and to assume that the state vector describes single objects. This clearly is the view of Heisenberg’s Copenhagen interpretation with a subjective interpretation of probabilities. As made clear recently, this view does not necessarily entail observer-induced wave packet collapse (Barnum et al. 2000; Caves et al. 2002a, b). More importantly, this picture conforms to the predominant picture of the Bayesian interpretation in Artificial Intelligence and Cognitive Psychology. Hence, it is suggestive to take this interpretation as the basic conception of probability in quantum cognition.

In the following section, we will concretize the notion of (auto)epistemic accessibility based on the operational interpretation of quantum physics. The operational setting suggests a particular algebraic structure for modelling propositions, one that is very different from the classical Boolean setting. The Boolean setting allows to model propositions as sets of possible worlds (with the operation of union, intersection and complement for the basic propositional operations. In contrast, the operational setting motivates a non-Boolean algebraic structure that invites to model propositions by subspaces of a Hilbert space or by projection operators of the Hilbert space and the corresponding lattice-theoretic operations. The algebraic structure of propositions is defined by non-statistical axioms. Hence, the operational understanding does not require any notion of probability. The concept of probability will emerge by means of a measure function, its subjective interpretation can be motivated by a (quantum) de Finetti representation theorem (Barnum et al. 2000; Caves et al. 2002a, b). Hereby, probabilities are taken to be degrees of belief, which are justified by axioms of fair bedding behaviour.

In this section, we have seen that James psychological conception of complementarity is clearly an epistemic one. In quantum physics, there is a big debate about the relationship between ontic and epistemic interpretations of complementarity, which will not concern us here further. Another historical lecture points to the abstract character of the mathematical formalism of quantum theory. It forms a kind of meta-theory providing a useful language for expressing certain generalizations, hopefully also in the domain of cognitive science.

6 Bounded rationality and the foundation of quantum probabilities

We come back now to the operational interpretation of quantum physics—an interpretation that is grounded in the reality of the process of measurement (contrasting with interpretations which assume real values for measurements before the measurements are actually carried out, before the questions are really asked). As we will see, this interpretation nicely fits with our concern of grounding the basic design features of quantum cognition. Instead of performing ‘measurements’ we confront our subjects with ‘questions’ in cognitive science. And in both cases we notice the ‘answers’ of our system and get a probabilistic outcome.

The mentioned design features can be based on a fairly abstract axiomatic framework, as found in the mathematical language of formal logic (as pioneered by Birkhoff and von Neumann (1936)). These features are free from any direct concern to a statistical or probabilistic interpretation. They concern what sometimes is called ‘general quantum physics’ (Piron 1972) or ‘generalized quantum theory’ (Atmanspacher et al. 2002). This is a passage illustrating the crucial issue:

It is also clear that a satisfactory axiomatic structure of the kind referred to above cannot be formulated a priori in terms of wave functions, since their use would imply a statistical interpretation at the outset. The role of the wave functions in general quantum physics must emerge from the analysis of the more fundamental theory. As said above, the linear structure of the Hilbert space does appear, without reference to any statistical notions, as the appropriate description of general quantal systems (a set consisting of a family of Hilbert spaces describes systems which are not purely quantal, and the purely classical limit is described by a family of trivial one-dimensional Hilbert spaces). It is in this way that the statistical interpretation for wave mechanics will emerge as a consequence of essentially nonstatistical axioms, and what is presupposed in classical physics is clearly brought into evidence. (Piron 1972)

6.1 Operational realism

Classical (Kolmogorov) probability is based on a certain sample space S (also called the set of possible states/worlds). In this picture, propositions are considered to be subsets of S. The sample space S refers to the so-called ontic description where the state of a system is considered as if it could be characterized precisely as it is, independently from any observer (Atmanspacher 2000).

Another important notion for classical theory is that of a ‘random variable’ which next operationalizes the notion of epistemic descriptions. They refer to the knowledge that can be obtained about an ontic state by coarse-graining and noisy measurements (Atmanspacher 2000). A random variable \(\hat{\upalpha }\) on a sample space S is a function from S onto some range \(\upalpha \) (called the test range); for example, \(\upalpha \) can be a set of arbitrary ‘symbols’ or a set of natural or real numbers. The proposition that the value of the random variable \(\hat{\upalpha }\) is x (for some \(x\in \upalpha \)) can be defined as follows:

  1. (27)

    \(\pi \left( {{\hat{\upalpha }} =x} \right) =_{def} \{s\in S: {\hat{\upalpha }} \left( s \right) =x\}\)

It is easy to see that the system of propositions generated by \(\pi \left( {{\hat{\upalpha }} =x} \right) \) considering all \(x\in \upalpha \) defines a partition of the sample space S.

Similarly to the literature on test spaces (Wilce 2009), we consider all subsets X of a test range \(\upalpha \) and call these subsets ‘events’ of the random variable \({\hat{\upalpha }} \). With each of these events a particular proposition (also called logical event) is connected; this is the proposition that the value of the random variable \({\hat{\upalpha }} \) is an element of the considered subset X:

  1. (28)

    \(\pi \left( {{\hat{\upalpha }} \in X} \right) =_{def} \{s\in S: {\hat{\upalpha }} \left( s \right) \in X\}\)

If we consider the set of all propositions generated by \(\pi \left( {{\hat{\upalpha }} \in X} \right) \) for any \(X \subseteq \upalpha \) (including the empty set), and if we order these propositions by the subset relation, then we get a Boolean lattice. We call these propositions experimental or testable propositions. Yet this Boolean lattice is only a partial one, i.e. not all subsets of S are testable propositions with respect to the given random variable \({\hat{\upalpha }}\). In this sense, a given random variable \({\hat{\upalpha }}\) defines a particular epistemic description, or, likewise, a context in the sense of the famous Kochen-Specker theorem (Kochen and Specker 1967; Peres 1991).Footnote 24 In this sense, a given random variable \({\hat{\upalpha }}\) defines a particular epistemic description. If a family of random variables is intrinsically related to the nature of the system under study, we shall speak about an autoepistemic description. Such random variables might, e.g., be the result of natural evolution or adaptation of neural systems of biological agents. If, on the other hand, a family of random variables describes different perspectives of an external observer, we simply speak about epistemic descriptions in general. These describe situations of bounded rationality and ignorance.

The interesting question is what happens when we consider several of such partial Boolean lattices (induced by different random variables) and form the ‘union’ of these lattices. The obvious answer is that we do not get a Boolean lattice again. Instead, we obtain a so-called ‘orthomodular’ lattice. An orthomodular lattice is weaker than a Boolean lattice. Technically, it is a bounded lattice (the bounds are 0, 1) with join \(\vee \) and meet \(\wedge \), and on the lattice a unary operation ‘ (orthocomplementation) is defined such that, for all elements x and y of the lattice:

  1. (29)

    a.   \(x'' = x\)

              b.    if \(x \le y\) then \(y' \le x'\)

              c.   \(x \wedge x' = 0\)

              d.    if \(x \le y\) then \(y=x \vee \, (x' \wedge y)\) (orthomodular law)

Two elements x and y are called orthogonal iff \(x\le y'\). An orthomodular lattice is a Boolean lattice, if in addition we have distributivity: \(x\vee (y \wedge z) = (x \wedge y) \vee (x \wedge z)\). It can be shown (e.g. Piron 1972) that every orthomodular lattice is the union of its maximal Boolean sublattices (called blocks). Jenca (2001) provides some useful generalization of this standard finding. The following example—first introduced und discussed by Foulis and colleagues (Foulis and Randall 1972; Foulis 1999)—gives a handy illustration of the basic ideas. It defines the firefly box and its event logic.

Assume that there is a firefly erratically moving inside the box depicted in Fig. 3. The box has two translucent (but not transparent) windows, one at the front and another one at the right. All other sides of the box are opaque. In principle, the firefly can be situated in one of the four quadrants {1,2,3,4}, and the firefly can be blinking or not (the latter is indicated by being in world 5).

Fig. 3
figure 3

The firefly box. From perspective \({\hat{\upalpha }}\), left view (worlds 1 and 3) corresponds to the half space a and right view (worlds 2 and 4) corresponds to the half space b. From perspective \(\hat{\upbeta }\), however, left view (worlds 1 and 2) corresponds to the half space c and right view (worlds 3 and 4) corresponds to the half space d, by contrast

For an external observer, the position of the firefly in one of the four quadrants is not visible even if the firefly is blinking. For testing whether the firefly is blinking and where it is the external observer can take one of two perspectives:

\({\hat{\upalpha }} :\) looking at the front window. If no blinking can be seen the outcome is n; if the blinking is at the left part the outcome is a; if the blinking is at the right part the outcome is b.

\({\hat{\upbeta }} :\) looking at the side window: again no blinking is registered by n; if the blinking is at the left part the outcome is c; if the blinking is at the right part the outcome is d.

These two perspectives can be described by the sets \(\upalpha = \{a, b, n\}\) and \(\upbeta = \{c, d, n\}\). As already noted, all subsets of \(\upalpha \) generate a (partial) Boolean algebra of events. The same holds for all subsets of \(\upbeta \). The results for testing \(\{a, b\}\) and \(\{c, d\}\) will be always the same—they are epistemically equivalent (beim Graben and Atmanspacher 2006, 2009). We can express this fact by saying that both events realize the same experimental proposition: \(\uppi \{a, b\} = \uppi \{c, d\}\). Further, calling X’ the complement of the event X, we can postulate that \(\uppi \{n'\} = \uppi \{a, b\}\). Hence, for both perspectives we have eight experimental propositions: \(\uppi \{a\}, \uppi \{b\}, \uppi \{n\}, \uppi \{a, b\} = \uppi \{n'\}, \uppi \{a, n\} = \uppi \{b'\}, \uppi \{b, n\} = \uppi \{a'\}, \uppi \{a, b, n\} = \mathbf{1}\) and \(\uppi \{\} = \mathbf{0}\).

Two experimental propositions \(\uppi \{X\}\) and \(\uppi \{Y\}\) are called compatible iff the sub-lattice generated by \(\{\uppi \{X\}, \uppi \{Y\}, \uppi \{X'\}, \uppi \{Y'\}\}\) is distributive. For instance \(\uppi \{a\}\) and \(\uppi \{b\}\) are compatible but \(\uppi \{a\}, \uppi \{c\}\) are not. Experimental propositions that are not compatible are called complementary. Hence, \(\uppi \{a\}\) and \(\uppi \{c\}\) are complementary and so are \(\uppi \{a'\}\) and \(\uppi \{c'\}\). As a matter of fact, two propositions of different blocks (i.e., generated by two different perspectives), where one does not contain the other informationally, are always complementary.

We get an explicit expression of the experimental (testable) propositions as sets of situations, and a set-theoretic idea of their identity and inclusion conditions, when we formulate two random variables \(\hat{\alpha } \left( s \right) \) and \(\hat{\beta } \left( s \right) \) as defined in Table 1. With this instrument we can see, for instance, that \(\uppi \{a\} = \{1,3\}, \uppi \{a, b\} = \{1,2,3,4\}\), and \(\uppi \{c, d\} = \{1,2,3,4\}\). Hence, we obtain \(\uppi \{a, b\} = \uppi \{c, d\}\) and \(\uppi \{a\} \le \uppi \{a, b\}\) etc.

Table 1 Two perspectives to model the firefly box by using random variables

Figure 4 shows the two arising Boolean lattices of testable propositions based on the two perspectives.

Fig. 4
figure 4

Two Boolean lattices resulting from the event algebras of perspective \({\hat{\upalpha }} \) (left part) and perspective \({\hat{\upbeta }} \) (right part), respectively

Now consider the union of the two perspectives which results in the lattice shown in Fig. 5.

Fig. 5
figure 5

The union of the two Boolean lattices from Fig. 4

The resulting lattice is orthomodular but not Boolean.Footnote 25 The non-Boolean character is a consequence of the violation of distributivity. For instance, consider \(\uppi \{a\} \vee (\uppi \{a'\} \wedge \uppi \{d'\}) = \uppi \{a\} \vee \uppi \{n\} = \uppi \{b'\}\). Distributivity would predict the equivalence with the term \((\uppi \{a\}\vee \uppi \{a'\}) \wedge (\uppi \{a\} \vee \uppi \{d'\})\). However, both parts of the conjunction give 1, and hence the whole expression comes out as 1. Since \(\uppi \{b'\}\) is different from 1, distributivity is violated.

Atoms are events \(A \ne 0\) such that there is no event \(x \le A\) unless \(x = 0\) or \(x=A\). A lattice is called atomistic iff for each element \(x \ne 0\) of the lattice there exists some atom \(A \le x\). The atomic covering law states that for any atom A the event \(A\vee \) x covers x (i.e. no element of the lattice lies strictly between \(A\vee x\) and x). Wilce (2012) discusses several arguments that could help to motivate atomic covering. The example illustrated in Fig. 4 is obviously atomistic and satisfying the atomic covering law. Yet there is another condition—irreducibility. An orthomodular lattice is called irreducible iff it cannot be expressed as a non-trivial direct product of simpler orthomodular lattices. Again, our firefly example lattice satisfies this condition.

Atomistic, irreducible orthomodular lattices obeying the atomic covering law are called Piron lattices (after Piron 1972 who was the first investigating such lattices). What we get by considering several random variables, defining the partial Boolean lattices of experimental propositions defined by each of these random variables, and considering the union of all these partial Boolean lattices, is a Piron lattice indeed. How do Piron lattices relate to Hilbert spaces? Interestingly, the lattice comprised by all sub-vector spaces of a given Hilbert space is equivalent to a Piron lattice. The atoms of this Piron lattice are the one-dimensional Hilbert spaces, of course.

The crucial question arising now is: how close is the connection between Piron lattices and projection lattices on Hilbert spaces? All projection lattices are Piron lattices. Unfortunately, the converse is not true: not every Piron lattice can be represented by a corresponding projection lattice. It needs additional conditions in order to prove the corresponding representation theorem (Solér 1995; Holland 1995). These conditions are rather technical and concern the infinite case. If they are satisfied, we call the lattice a Piron-Solér lattice. In case of the firefly box the corresponding lattice representation theorem entails the following three-dimensional vector space where the one-dimensional subspaces representing the atoms of the lattice are indicated by the symbols used in Fig. 3.

We see that in a Hilbert space representation compatible experimental propositions are represented by commuting projection operators while complementary experimental propositions are represented by operators that do not commute. Moreover, the complementation operator \(X'\) is represented by Hilbert space orthocomplementation. Note further that the elements of the sample space S have the ontological status of abstract objects. They help to express the identity conditions for experimental propositions. However, they are not necessarily represented as subspaces in the Hilbert space. For instance, worlds {1} and {2} do not correspond to some subspace, only their union {1, 2} does (corresponding to c). However, world {5} explicitly refers to the subspace denoted by n. In case that the overall structure of the Piron-lattice is Boolean, the sample space S corresponds to an orthonormal basis of the Hilbert space. Hence, the classical case of a standard possible world semantics is a special case of the Hilbert space semantics.

In Sect. 4.1 we have introduced the notion of an additive measure function based on the lattice of projection operators of a given Hilbert space and we have shown that each unit state u determines such a function of quantum probabilities on the testable propositions \({\varvec{A}}_{\mathrm{i}}\) of the Hilbert space: \(\hbox {P}_{u}({\varvec{A}}_{\mathrm{i}})=\Vert {\varvec{A}}_{\mathrm{i}}(u)\Vert ^{2}\) . More generally, we have seen that also each mixture (or convex combination) of these probabilities is an additive measure function. Now the important question is whether each additive measure function can be represented by the mixture of quantum probabilities for pure states. The positive answer is the content of Gleason’s theorem (Gleason 1957). Under the assumption that the dimension of the Hilbert space is larger than two, it states that each countably additive measure function can be expressed as the mixture (convex hull) of quantum probabilities for pure states (the latter following the Born rule, i.e. calculating the squared length of the projections of a given state).Footnote 26 For details, the reader is referred to the original paper by Gleason, and for a constructive proof see Richman and Bridges (1999).

Summarizing, we have considered measurements as providing partial Boolean lattices (also called blocks), which can be assumed to be formed by experimental propositions. If two experimental propositions of different blocks are not identical but overlap, then they are called complementary. Building the union of several blocks results in a structure called Piron-Solér lattice. The fundamental representation theorem states that Piron-Solér lattices and lattices of projectors of a Hilbert space are equivalent. The next step is to define additive measure functions on the space of experimental propositions—either defined as proposed in test theory or defined as subspaces of a Hilbert space, i.e. projection operators. Gleason’s theorem tells us that probabilities arise either from the Born rule or their convex combinations. In this way, quantum probabilities are based on the lattice of experimental propositions. In physics, this lattice is constituted by the algebra of complementary observables. In the next subsections, we will consider how the related lattice can be motivated in cognitive science using a dynamic venue and this leads us back to the nature of bounded rationality.

6.2 Dynamical systems and symbolic dynamics

So far, the firefly box discussed in the previous subsection only provides a static picture about experimentally testable propositions. Observing the firefly’s motion inside the box over an extended period of time would definitely deliver additional information about its actual position. The question is then, whether this additional information could be used to restore a classical Kolmogorovian description in the limit of infinite observation time. In order to deal with this question, we have to discuss some basic issues of dynamical system theory (see e.g. Ott 1993).

The firefly box can be regarded as a deterministic dynamical system with the firefly’s actual position (namely three Cartesian coordinates \({\varvec{x}}=\left[ {x,y,z} \right] )\) inside the box as its (ontic) state \({\varvec{x}}\left( t \right) \) at time t. The set of all possible states (i.e. of all possible spatial localizations) is called the phase space X.Footnote 27 Starting with some time \(t_{0}\) the state \({\varvec{x}}\left( {t_0 } \right) \) is called initial condition. The motion of the firefly is given by a one-parameter map \(\Phi ^{t}:X\rightarrow X\), the phase flow with parameter time \(t\in \left[ {t_0, \infty } \right) \), such that \({\varvec{x}}\left( {t_0 +t} \right) =\Phi ^{t}\left( {{\varvec{x}}\left( {t_0 } \right) } \right) \). Following the states \({\varvec{x}}\left( t \right) \) for all times in the real interval \(\left[ {t_0, \infty } \right) \), yields the firefly’s trajectory.

The unobserved firefly has a classical trajectory exploring the phase space X. As the firefly can only be observed when blinking, we get a temporal discretization of its continuous trajectory. In dynamical systems theory such discretization can be identified with stroboscopic or so-called Poincaré mappings. Finally, we only allow observational access either through the front window (perspective \({\hat{\upalpha }} )\), or through the side window (perspective \({\hat{\upbeta }} )\), but not through both windows simultaneously. Again, these perspectives provide two complementary partitions of the phase space X into pairwise disjoint sets.

In order to illustrate the following argumentation let us crucially simplify the firefly box as follows: Assume the firefly’s motion is confined to a two-dimensional phase space, the square \([-1,1] \times [-1,1]\). Assume further that the firefly moves with constant speed along a closed circle of radius \(R=1\), such that it reaches its initial state \({\varvec{x}}\left( {t_0 } \right) \) after every period T again. Moreover, we suppose that the firefly also blinks periodically with some ratio \(T_B =qT, \,\left( {0\le q\le 1} \right) \). For this two-dimensional firefly box, the front window coincides with the x-axis while the side window coincides with the y-axis. These views lead to two complementary partitions of the phase space X into pairwise disjoint sets \(a = [-1,0] \times [-1,1],\, b = [0,1] \times [-1,1]\) for perspective \({\hat{\upalpha }}\), and \(c = [-1,1] \times [-1,0],\, d = [-1,1] \times [0,1]\) for perspective \({\hat{\upbeta }} \). Figure 7 displays the situation.

So far, we have explained the idea of a (deterministic) dynamic system. Standard examples are classical mechanics describing the evolution of states of the phase space (pairs of location and momentum for one-particle systems). In neural network research, the fast dynamics of single neurons/neural networks can be seen as another dynamic system. It describes the evolution of neuronal activation for a single neuron or the activation spreading in neural networks.

Let us turn now to symbolic dynamics, which is a useful method for studying discrete-time dynamical systems with continuous state space. The crucial idea is to partition the state space into a finite number of subsets and to label each of these subsets with some symbol. In case of the firefly, the discretization of time through stroboscopic blinking and the discretization of phase space through partitioning, yields a symbolic representation of the system’s trajectory as a sequence of partition cells being visited. If there is more than one observational perspective, more than one partition (and labelling function) will be involved.

In Fig. 7 the initial condition belongs to rectangle b with respect to partition (perspective) \(\upalpha \) and to rectangle d with respect to partition \(\upbeta \). Considering partition \(\upalpha \) first, we assign the symbol “b” to the initial condition \({\varvec{x}}\left( {t_0 } \right) \). When the system evolves as indicated by the arrow in Fig. 7, its state belongs to rectangle a after a half period T / 2 when blinking appears again. Thus, we can assign the sequence “ba” to the first two iterations. When our firefly still blinks every half period, the complete trajectory is mapped onto an infinite string “babababa...” which is called a symbolic dynamics (Hao 1989; Lind and Marcus 1995). On the other hand, with respect to partition \(\upbeta \), we assign the symbol “d” to the initial condition \({\varvec{x}}\left( {t_0 } \right) \). The state at time \(t_{0}+T/2\) is contained in rectangle c, such that the corresponding string is “dc” for the first two iterations, eventually leading to the symbolic dynamics “dcdcdcdc...” for the given stroboscopic discretization.

Symbolic dynamics provides very powerful tools for discussing complex nonlinear dynamical systems when “unessential details” are captured by the coarse-graining of the phase space into partition cells. It allows investigating symbolic sequences by means of formal language and automata theory (e.g., Crutchfield 1994), information theory and Markov chains (Crutchfield and Feldman 2003; Crutchfield and Packard 1983) and algebraic quantum theory as well (Matsumoto 1997; Exel 2004).

The concatenation operation of symbolic dynamics has a straightforward interpretation in terms of phase space propositions. Generally, the set of phase space points generating a given symbolic sequence when taken as initial conditions can be interpreted as the proposition represented by the sequence. In our first example, the sequence “ba” corresponds to the proposition \(\Pi \) (“ba”) \(= \{x: x\in b\) and \(\Phi ^{\frac{T}{2}}\left( x \right) \in a\}\). Since \(\Phi ^{\frac{T}{2}}\left( x \right) \in a\) iff \(x\in \Phi ^{-\frac{T}{2}}\left( a \right) \), we get the following proposition representing the sequence “ba”: \(\Pi \) (“ba”) \(= b\cap \Phi ^{-\frac{T}{2}}\left( a \right) \). Here, \(\Phi ^{-\frac{T}{2}}\left( a \right) =\left\{ {x: \Phi ^{\frac{T}{2}}\left( x \right) \in a} \right\} \) denotes the pre-image of the cell a.

In general, a symbolic sequence \(s_1 s_2 \ldots s_n \) refers to the following proposition:

  1. (30)

    \(\Pi \left( {s_1 s_2 \ldots s_n } \right) =\cap _{k=0}^{n-1} \Phi ^{-kT_B }\left( {s_{k+1} } \right) .\)

Therefore, any string \(s_1 s_2 \ldots s_n \) of finite length n corresponds to an intersection of pre-images of partition cells under the phase flow. Because the intersection of two sets is in general smaller than each of the original sets (unless one of them is a subset of the other), longer strings correspond to smaller sets of initial conditions. If this is indeed the case, one speaks about the dynamic refinement of a partition.

Next, we have to distinguish two important cases. In the first one, the sets of possible initial conditions in phase space become smaller and smaller for increasing string lengths, eventually shrivelling into a singleton set which contains exactly one initial condition for an infinitely long symbolic sequence. In this case, which is common for chaotic dynamics where different trajectories are exponentially divergent, one speaks about a generating partition since the original partition generates the complete phase space in the limit of everlasting observation time. In other words, in the limit of sufficiently long symbolic sequences, we can identify each point of the phase space by such a sequence. There is no information lost by using the symbolic labelling method. For our two-dimensional firefly box, we can easily construct a generating partition when the sampling frequency is an irrational number, say \(T_B =\sqrt{2}T\). In this case, the iterations of \(\Phi ^{kT_B }\left( {{\varvec{x}}\left( {t_0 } \right) } \right) \) for integer k form a dense set around the circular trajectory. Hence the pre-images eventually converge to some initial condition \({\varvec{x}}\left( {t_0 } \right) \). Thus, two generating partitions \(\upalpha \) and \(\upbeta \) both have all the singletons of the complete phase space X as their finest refinements such that their union is a Boolean lattice allowing for a Kolmogorovian probability theory.

In the second case, even the finest dynamic refinement of an initial partition exhibits residual coarse grains that cannot be further refined. These residuals correspond to the only propositions that are epistemically accessible with respect to a given perspective. Therefore, we obtain a Boolean partition algebra from the finest grains of an original partition. Two different initial partitions \(\upalpha \) and \(\upbeta \) could yield two different Boolean partition algebras that are only partially overlapping. Then, our original firefly argumentation applies and the union of these algebraic structures becomes an orthomodular lattice, leading to a canonical Hilbert space representation as required in quantum cognition (cf. beim Graben and Atmanspacher 2006, 2009).

The question whether any two partitions \(\upalpha ,\, \upbeta \) are compatible or complementary to each other, depends crucially on the stroboscopic sampling. For a half period sampling \(T_{\mathrm{B}}=T/2\) we get the following pre-images: \(\Phi ^{-T/2}\left( a \right) =b\) and \(\Phi ^{-T/2}\left( b \right) =a\) one the one hand, and \(\Phi ^{-T/2}\left( c \right) =d\) and \(\Phi ^{-T/2}\left( d \right) =c\) one the other hand. As a consequence, the finest (symbolic) refinement of \(\upalpha \) is \(\upalpha \) itself, and the finest (symbolic) refinement of \(\upbeta \) is \(\upbeta \) itself. Therefore, both partitions are dynamically complementary as no additional information is gained from the dynamic refinement.

By contrast, when we take \(T_{\mathrm{B}}=T/4\), the pre-images are \(\Phi ^{-T/4}\left( a \right) =d\) and \(\Phi ^{-T/4}\left( b \right) =c\) (cf. Fig. 7). In this case, the sequence “ba”, let say, is represented by the proposition \(\Pi \) (“ba”) \(= b\cap \Phi ^{-\frac{T}{4}}\left( a \right) = b\cap d\). This proposition corresponds to the first quadrant of Fig. 7. Hence, the finest refinements of both \(\upalpha \) and \(\upbeta \) is the partition of the coordinate plane X into the four quadrants. For this discretization, the partitions \(\upalpha \) and \(\upbeta \) become compatible rather than complementary. One can express this situation by stipulating that the angle \(\upvarphi \) from Fig. 6 will be an integer multiple of \(90^{\circ }\).Footnote 28

Fig. 6
figure 6

Hilbert space representation of the firefly box representing the two Boolean blocks \(\upalpha = \{a, b, n\}\) and \(\upbeta = \{c, d, n\}\). The atoms abn (and cdn, respectively) are represented by pairwise orthogonal axes. The angle \(\upvarphi \) between the complementary elements ac (and bd, respectively) cannot be derived by the representation theorem from the orthomodular lattice and depends explicitly on a chosen probability model

Fig. 7
figure 7

Phase space of the two-dimensional firefly box with circular trajectory. Partitions are \(\upalpha = \{a, b\}\) (left) and \(\upbeta = \{c, d\}\) (right). Time discretization (“blinking”) with half period T / 2

The firefly box demonstrates that an epistemic quantization of an ontologically classical system is almost inevitable for coarse-grained descriptions. Such coarse-grainings can be due to limited precision and resolution of observational measurement devices. Sensory and perceptional apparatuses exhibit finite registration and relaxation times as well as finite resolution of their sampling ranges.

In the previous section we found a Hilbert space representation of the firefly box exhibiting two Boolean blocks \(\upalpha = \{a, b, n\}\) and \(\upbeta = \{c, d, n\}\). In constructing this representation, a new parameter—the angle \(\upvarphi \) between the complementary elements ac (and bd, respectively)—has been introduced. The question we will investigate now is whether symbolic dynamics can provide an interpretation of this parameter. As the first step in answering this question, we have to implement a probability model to the firefly box. Following Foulis (1999) again, this is achieved by a real-valued function \(\omega \) that assigns numbers of the interval \(\left[ {0,1} \right] \) to the propositions of the corresponding test range. Given an epistemically accessible statistical state, i.e. a probability distribution over the firefly box, the atomic propositions a, b, c, d gain the measures of their respective partition cells with respect to the original statistical state as probabilities. Thus, \(\omega \left( S \right) \) with \(S = a, b, c, d\) provides a largely condensed static representation of the picture studied by beim Graben and Atmanspacher (2006, 2009). For perspective \({\hat{\upalpha }} \), the condition \(\omega \left( a \right) +\omega \left( b \right) +\omega \left( n \right) =1\) has to be satisfied on one hand, and for perspective \(\hat{\upbeta }\) the condition \(\omega \left( c \right) +\omega \left( d \right) +\omega \left( n \right) =1\), on the other hand. By means of continuation, this function becomes an additive probability measure over the proposition lattice depicted in Fig. 5. The resulting space of probability models is convex, i.e., closed under convex combinations.

Therefore, the probability measure \(\omega \) is uniquely determined by three real numbers

$$\begin{aligned} x=\omega \left( a \right) ;\quad y=\omega \left( c \right) ;\quad z=\omega \left( n \right) \end{aligned}$$

as any other value, e.g. \(\omega \left( b \right) =1-x-z\), can be obtained through convexity: \(0\le x,y,z\le 1,\, x+z\le 1\), and \(y+z\le 1\).

On the other hand, in our Hilbert space representation (see Fig. 6), the measure \(\omega \) is given by a unit vector \(w=\left[ {b,a,n} \right] \) due to Gleason’s theorem. Both descriptions can be mediated by the probability amplitudes \(x=a^{2}, y=c^{2}, z=n^{2}\), such that the normalization constraint is identically fulfilled through convexity, via \(b^{2}=1-x-z\). However, for the complete characterization of the vector w the rotation angle \(\varphi \) in Fig. 6 is required. This obeys the following constraint:

  1. (31)

    \(\sqrt{x}\cos \varphi -\sqrt{1-x-z}\sin \varphi =\sqrt{y}\)

Using complex arithmetic, the solution for this equation is given as follows:

  1. (32)

    \(\varphi = \arccos \sqrt{\frac{y}{1-z}}-\arcsin \sqrt{\frac{1-x-z}{1-z}}\)

The two-dimensional firefly moves continuously along the circle depicted in Fig. 7 where it is not blinking most of the time. If we assume that blinking requires some amount of time, we can interpret x as the proportion of blinking time in cell a and y as the proportion of blinking time in cell c. For the stroboscopic sampling with \(T_B =T/2\) the firefly only blinks in cell c when it is in cell b as well. Therefore \(\omega \left( b \right) =1-x-z=y=\omega \left( c \right) \) and hence \(1-z=x+y\). Inserting these numbers into Eq. (32) we obtain

  1. (33)

    \(\varphi =\arccos \sqrt{\frac{y}{x+y}}-\arcsin \sqrt{\frac{y}{x+y}}=\frac{\pi }{2}-2\arcsin \sqrt{\frac{y}{x+y}}.\)

In the special case of uniformly distributed blinking durations \(x=y\) we get \(\varphi =\frac{\pi }{2}-2\hbox { arcsin }\sqrt{0.5}=\frac{\pi }{2}-2\frac{\pi }{4}=0\).

As a result, in the static firefly picture when dynamic refinement is not taken into account, the perspectives \({\hat{\upalpha }} \) and \({\hat{\upbeta }} \) are compatible for uniform blinking time distributions. However, the general case could be realized when the firefly is attracted and repelled in cells a and b with different probabilities. In the dynamical systems framework this could be described by the presence of saddle nodes that are connected via heteroclinic sequences—an approach that has recently become increasingly popular in neural modelling (Rabinovich et al. 2008). Tuning the system’s parameter, e.g. in such a way that \(y=x\left( {3-2\sqrt{2}} \right) \) yields \(\varphi =\frac{\pi }{4}\), i.e. maximal incompatibility for the static firefly picture.

To summarize, three aspects make symbolic dynamics promising as a unifying framework of quantum cognition and neurodynamic systems theory. First, symbolic dynamics through coarse-graining normally leads to a loss of information. This is advantageous when the obtained abstraction captures essential traits of the original dynamics. There are many theoretical instruments to analyse symbol sequences, for example formal language theory, making symbolic dynamic especially powerful.

Second, for many important systems generating partitions exist. In such a case, each point of the phase space can be uniquely identified by an infinite sequence of symbols. Studying the symbolic dynamics is therefore completely equivalent to studying the original dynamics. In other words, symbol-manipulating computations can be completely equivalent to continuous dynamics in such systems. In other cases yet where no generating partitions exist, we are confronted with complementary descriptions that still allow for a satisfactory symbolic description of the underlying continuous dynamics.

Third, symbolic dynamics can help to find interpretations of parameters that otherwise are left without any plausible interpretation. We have illustrated this in length with the hidden parameter \(\varphi \) (introduced by Gleason’s theorem) and its probabilistic interpretation above. Other cases concern the phase shift parameters we have used in Sect. 4 for resolving some puzzles of bounded rationality. A careful interpretation of these parameters in terms of an underlying symbolic dynamics may be the missing link for a real understanding of quantum probabilities. It is fair to say that our present understanding of the phase shift parameters is rather rudimentary

6.3 Complementarity in the cognitive domain

In Sect. 6.1, we have considered the general structure of propositions. We have argued that this structure conforms to orthomodular lattices. These lattices can be obtained through the union of (partial) Boolean lattices (or Boolean blocks) conforming to different perspectives, a cognitive agent can assume. According to James (1890), we regard these perspectives in correspondence to the different knowledge domains the agent is able to access. In this connection we introduced the term “(auto)epistemic accessibility”. It is of primary interest to ask for the rationale behind the existence of different Boolean blocks. One possible hypothesis is that these blocks result from an underlying symbolic dynamics. In the present section, we will check this suggestion, and we will outline how this idea could get empirical relevance in several domains of cognitive science.

6.3.1 Speed-accuracy tradeoff

Decision-making often relies on information that accumulates over time. Will the approaching dog bite me or play with me? Should I stop reading this article? Should I open the door when somebody is knocking late at night? For such decisions, one is confronted with a dilemma that is known as the speed–accuracy tradeoff or SAT (Wickelgren 1977; Reads 1973; Busemeyer and Townsend 1993; Ratcliff 1978; Ratcliff and Rouder 1998). These examples reveal how important it is to find a sensible compromise between the competing demands of accuracy and speed. When I strive for high accuracy, I have to accumulate much information, which takes time, possibly too much time in order to run away from the dangerous dog. When I come up with a fast decision and do not accumulate enough information, then the probability is very high that my decision is bad.

Fig. 8
figure 8

Three realizations of a random walk model (Busemeyer and Townsend 1993) with positive drift \(d = 0.05\) and residual variance \(\sigma ^{2} = 0.2\). a Small decision threshold \({\vert }\uptheta {\vert } = 1\) (dashed lines) yields small first-passage times but error rate = 0.33. b Larger threshold \({\vert }\uptheta {\vert } = 2\) (dashed lines) yields longer decision times, yet perfect accuracy of 1.0

In traditional psychological experimentation, the SAT effect is induced by differences in the preparation of the experiment: A subject could be instructed either to be as fast or to be as accurate as possible (Wickelgren 1977). In the first case, time constraints can be prescribed. In the second case, accuracy can be controlled by different payoffs. In a random walk model, both kinds of manipulation are eventually reflected by the positioning of the first-passage threshold, \(\uptheta \), triggering the decision. If \(\uptheta \) is large, accuracy is emphasized on the expense of long first-passage times. If, on the other hand, \(\uptheta \) is small, first-passage times become small as well on the expense of a large error rate. Fig. 8 illustrates these two possibilities along the lines of Ratcliff and Rouder (1998) and Busemeyer and Townsend (1993). Here we simulate a random walk process (Busemeyer and Townsend 1993, Eq. (3b))

  1. (34)

    \(\hat{A} \left( t \right) = \hat{A} \left( {t-1} \right) + d+ \hat{\varepsilon } \left( t \right) \)

where \(\hat{A} \left( t \right) \) is the accumulated preference after sampling time t, d is the drift whose sign indicates the direction of the decision dynamics: \(d>0\) for YES and \(d<0\) for NO, and \(\hat{\varepsilon } \left( t \right) \), the residual, is a Gaussian random process with zero mean and variance \(\upsigma ^{2}\).

Looking at Fig. 8 immediately suggests an interpretation of the SAT effect in terms of dynamical systems and symbolic dynamics: Accumulated preference \(\hat{A} \left( t \right) \) is a scalar quantity comprising a one-dimensional phase space (in contrast to the two-dimensional example of the firefly oscillator discussed above). Therefore, the y-axis in Fig. 8 can be interpreted as the system’s phase space alone. A partition of the system’s phase space is then any decomposition of the y-axis into disjoint intervals. Thus, the decision threshold \(\uptheta \) partitions the scale of accumulated preference in three intervals. Assigning a symbol “a” to the interval \([\uptheta ,\, \infty )\), “b” to the interval \((-\uptheta ,\, \uptheta )\) and “c” to the interval \([-\infty , -\uptheta )\) of Fig. 8a creates a trinary symbolic dynamics (beim Graben and Kurths 2003) of the random walk model. Likewise, the different encoding threshold \(\uptheta \) of Fig. 8b, partitions the y-axis into cells symbol “d” for the interval \([\uptheta ,\, \infty )\), “e” for the interval \((-\uptheta , \uptheta )\) and “f” for the interval \([-\infty , -\uptheta )\). Interestingly, the two different partitions that are induced by two different decision thresholds \(\uptheta \) such as in Figs. 8a, b, respectively, are incompatible to each other in the sense of the epistemic quantization of a classical dynamical system (beim Graben and Atmanspacher 2006, 2009). Hence we conclude that the SAT effect can be interpreted in terms of that autoepistemic quantization being compatible with its classical description by random walk processes.

Summarizing we see that the coarse-grained random walk model of the SAT phenomenon leads naturally to an autoepistemic quantization of the agent’s one-dimensional preference dynamics. Different experimental instructions of the subject yield different partitions of the subject’s epistemic states which appear to be complementary since none of them is generating. To be precise, the (experimental) propositions expressing sufficient accumulated preference under certain instructions are complementary in the present case. Interpreting the sequences of the resulting symbolic dynamics as propositions, their lattices become Boolean blocks of a united orthomodular lattice.

6.3.2 Complementary opponent cell activities and colour perception

In neuronal network research, the idea of coarse coding plays a significant role (e.g., Bechtel 2002). Coarse coding requires that a property is encoded by a set of detectors rather than by a singular detector (or neuron assembly). Usually, such detectors will have overlapping sensitivities. Many examples of this type of coding are found in the human visual system. A standard example is the coding of colours by three detectors according to the opponent colours theory (Hering 1920). The three detectors are bipolar and sensitive to the colours Red/Green, Blue/Yellow, and White/Black, respectively. Since the partitions of the colour space generated by the three detectors are non-overlapping (due to different perception thresholds) it can be concluded that the three detectors could be complementary. In the underlying symbolic dynamics, identical symbols are designed to colours that cannot be discriminated by the corresponding detector.

Hurwich and Jameson (1957) have developed a neuronal model that reflects this opponent process picture of colour vision. Figure 9 illustrates the model via a three dimensional firefly box with three complementary perspectives. These perspectives correspond to the three complementary detectors. The averaged values of each detector are the relevant parameters that are responsible for the colour qualia. Collectively, they determine a region within the colour spindle.

Fig. 9
figure 9

The Hurvich-Jameson activation box. The three perspectives marked by double arrows correspond to the three complementary detectors. The averaged values of each detector determine collectively a region within the colour spindle

We can regard this kind of complementarity being related to autoepistemic accessibility as introduced earlier. However, the whole phenomenon of coarse coding in vision seems to reflect laws of (cognitive) nature in Bohr’s sense. Hence, the gap between autoepistemic accessibility and the ontic conception cannot be very deep if it makes sense at all in the cognitive domain.

7 Quantum cognition and its formal grounding: some tentative conclusions

In the first part of this article, we have considered several puzzles of bounded rationality, and we have shown how the present account of quantum cognition—taking quantum probabilities rather than classical probabilities—can give a more systematic clarification of these puzzles than the alternate and rather eclectic treatments in the traditional framework of bounded rationality. Unfortunately, the quantum probabilistic treatment does not always and does not automatically provide a deeper understanding and a true explanation of these puzzles. The reason is that the quantum idea introduces several new parameters which possibly can be fitted to empirical data but which do not necessarily explain them. Hence, the phenomenological research has to be augmented by responding to deeper foundational issues. The second part of the paper is devoted to that problem.

The foundational approach to quantum cognition exploits ideas of the operational approach to quantum physics that does not take the Hilbert space as a given conceptual framework but rather tries to motivate it. Following the research by Foulis, Randall and colleagues, such a foundational framework is motivated by assuming partial Boolean algebras that describe the particular perspectives of a cognitive agent. These perspectives are combined into a uniform system while considering certain capacity restrictions. Technically, this gives an orthomodular-lattice—a structure that can violate distributivity. From an empirical perspective, it is at this point that one important aspect of the whole idea of bounded rationality directly enters the theoretical scenery of quantum cognition: resource limitation. Resource limitation has the effect that a cognitive agent cannot simultaneously maintain all possible perspectives.Footnote 29

One important issue both in physics and in the new field of quantum cognition concerns the notion of compatibility and complementarity. In physics, the complementarity of two observables has nothing to do with the incompetence or inability of the observer being able to perform two simultaneous measurements. Rather, it has to do with the nature of the measurements per se. We have seen already that in the cognitive realm observables correspond to questions that are asked to a human subject whose behaviour is investigated. Complementarity can be the result of the inability of our human subject to perform simultaneous tasks. In Sect. 6.3 we have illustrated how this idea helps to understand the effect of instructions on speed-accuracy trade-offs. Complementary can also be due to biological dispositions as illustrated by an example from colour perception.

In discussing the Ellsberg puzzle another important aspect of the foundational issue came to light. The foundational argument is connected with Gleason’s theorem and automatically leads to a distinction between probabilities that are defined by pure states and probabilities arising from the statistical mixture of such states. It is possible to relate this formal distinction to the deep conceptual distinction between risk and ignorance.

Curiously enough, there are proponents of cognitive science who set up an opposition between computation, conceived in the usual symbol-manipulation sense, and continuous nonlinear dynamics (e.g., Fodor and Pylyshyn 1988). This seems to us quite misleading, if only because the existence of generating partitions shows how symbol-manipulating computation can be completely equivalent to a dynamical system. In cases where we do not find generating partitions it still can be possible that complementary descriptions exist that give a satisfying symbolic description of the underlying continuous nonlinear dynamics. In this sense, the instruments of quantum description help to minimize the gap between symbolic descriptions and descriptions in terms of nonlinear differential equations as used in neuroscience.

According to beim Graben (2004), the transient dynamics of a nonlinear system in general and of a neural network in particular can be interpreted as computation, i.e. the manipulation of discrete symbols, upon a partition of the system’s phase space. The resulting symbolic dynamics belongs to a certain complexity class and can be generated by a suitable dynamic automaton (beim Graben 2004). So far, partitions have been introduced as voluntary perspectives of an observer. However, they can also be created internally, from an autoepistemic point of view, which has been shown in the framework of liquid computation (Maass et al. 2002; beim Graben et al. 2009). A liquid computer is a large disordered neural network with random connectivity. Only a small sub-network of detectors, so-called read-out neurons is trained to solve prescribed classification tasks on the high-dimensional phase space trajectories, i.e. the read-out neurons partition the remaining phase space through course-coding as mentioned above. If the read-out neurons are trained with more than one classification tasks, the resulting partitions may become complementary, hence yielding quantum cognition capabilities of neural networks (de Barros and Suppes 2009; de Barros 2012).

It should be noted that we were not able to comment on all important foundational developments which are important for the domain of quantum cognition. For example, some researchers (e.g. Kitto 2008) see quantum cognition as one attempt to formalize contextuality of mental processes by using the mathematical instruments of quantum mechanics. There are alternate ideas to account for contextuality, for instance “contextuality by default” (Dzhafarov et al. 2015) and approaches based on “negative probabilities” (Abramsky and Brandenburger 2011) that deal with contextuality differently. At the moment, it is an open question which accounts has to be considered as the mostly promising for the field of cognition and decision making.

In quantum physics, we have an impressive series of crucial experiments that provide prima facie evidence for the basic assumption of quantum theory and the failing of classical theories. Prominent examples include the photoelectric effect (demonstrating particle properties of light), the Compton effect, the Frank-Hertz experiment, double slit experiments with single photons or with single electrons and Stern-Gerlach experiment. Textbooks of quantum physics give many other examples that provide a direct demonstration of quantum effects. In all these cases, the things speak for themselves and there is no need to provide a complicated reasoning process or to provide extraneous details, since any reasonable person would immediately find the facts convincingly demonstrating the quantum hypothesis.

Unfortunately, the situation in quantum cognition is much less convincing concerning prima facie evidence that proves the need of basic assumptions of quantum cognition. Critical experiments are rare in the field. Perhaps most convincing is what we have considered in Sect. 4.3 in connection with order phenomena, namely Wang’s and Busemeyer’s (2013) demonstration that the quantum approach can describe all four types of order effects that were considered in Sect. 3.4. Of special interest is their verification of the ‘law of reciprocity’. In this case, each reasonable person will admit that classical probability cannot account for this constraint. Only the formalism of quantum theory is able to do so.

In the field of cognitive linguistic, the situation is much less clear. Though phenomena such as prototypicality, vagueness, polysemy, and invited inferences (Grice 1989) are cases in point to apply the formalism of quantum theory (e.g. Blutner 2009), there are only few phenomena that provide a rather direct evidence in favour of such a formalism. A potential example is the phenomenon of borderline vagueness (Blutner et al. 2013).

Summarizing, we conclude that the future of quantum cognition depends on the discovery of phenomena that provide prima facie evidence for crucial assumptions of quantum cognition. Perhaps, cognitive psychology and cognitive linguistics are potential areas for corresponding discoveries. However, also fields such as mathematical theories of tonal music could give new impulses (Mazzola 2002; Tymoczko 2011; Lerdahl 2001). Structural quantum probabilities—arising from the structure of the Hilbert spaces—could be used to model universal traits of cognition that cannot be learned but have to be assumed as innate and emerging from an underlying Hebbian neurodynamics (Large 2010; de Barros 2012). The methodological instrument of symbolic dynamics we have discussed in this paper could be a helpful guide on this road.