1 Introduction

For an outsider, one of the really difficult things to accept about quantum mechanics is its state concept: The state of a physical system is given by a normalized vector in a complex separable Hilbert space. One question that I will raise here, is whether this state concept can be derived, or at least motivated, by some other considerations. The discussions given here, may be seen as a continuation of the discussions in the book [1].

My point of departure will be the notion of theoretical variables, variables attached to some observer or to a group of communicating observers. This will be taken as a very wide concept, but it includes physical variables like position, time, momentum and energy. The term ‘theoretical’ points to a distinction from real, measured variables, that in a typical connection may be seen as a theoretical variable plus some random error.

The notion of theoretical variables can also be connected to several interpretations of quantum mechanics. To give an example of such a connection, according to Relational Quantum Mechanics, see Rovelli [2] and van Fraassen [3], variables take values only at interactions, and the values that they take are only relative to the other system affected by the interaction. This other system might well be an observer, and I will think of such a situation.

Other examples of theoretical variables are decision variables in a quantum decision context. This will be briefly discussed below, and is discussed in more details in [4].

Variables which can take definite values relative to an observer or group of observers will be called accessible. But relative to the observers, there may also be other theoretical variables which I will call inaccessible. An example may be the vector (position, momentum) connected to the observation of a particle. Another example may be the spin vector of a particle, where I think of a mental model of the spin, where the spin components are seen as quantizised values of the projection of this vector upon some given direction.

These inaccessible variables must be seen as purely mathematical variables; from a physical point of view, they do not take any values, but nevertheless they can be seen as variables.

The distinction between accessible and inaccessible variables is very important to my approach. On the one hand, this distinction can be explained roughly to outsiders. On the other hand, one can give many precise physical examples. I am currently working on a book together with Harish Parthasarathy, where we among other things will illustrate this distinction on several examples from quantum field theory.

In this article, I discuss these notions more closely with a focus on the more mathematical aspects of the situations described above. First, concentrate on the variables and the corresponding group theory. I assume the existence of a concrete (physical) situation, and that there is a space \(\Omega _\phi\) on which an inaccessible theoretical variable \(\phi\) varies, with a group K acting on this space. There is at least one accessible theoretical variable, \(\theta\) defined, where it is assumed that it can be seen as a function on \(\Omega _\phi\). This is a crucial assumption.

This \(\theta\) varies on a space \(\Omega _\theta\), and the group K may or may not induce a transformation group G on \(\Omega _\theta\). In any case, I will focus on such a group G on \(\Omega _\theta\), whether it is induced by K or not. An essential requirement is that G is transitive on \(\Omega _\theta\). It is shown below that the existence of G, together with symmetry assumptions assumed by the author in [5], will be satisfied without further assumptions in the finite-dimensional case.

A special situation is when \(\phi\) is a spin vector, and \(\theta\) is a spin component in a given direction. In the simple spin situation, the natural group K for the spin vector does not directly induce groups on the components. But does so if we redefine \(\phi\) to be the projection of the spin vector upon the plane spanned by two different directions in which spin can be measured and take K to be the corresponding rotation group.

When there are several potential accessible variables, I may denote this by a superscript a: \(\theta ^a\) for the variables and \(G^a\) for the groups, with elements \(g^a\). Both here and in [1, 4, 5] I use the word ‘group’ as synonymous to ‘group action’ or transformation group on some set, not as an abstract group.

The article has some overlap with the papers [4, 5], but the results there are further discussed and clarified here.

My derivations here will be independent of other Hilbert space reconstructions in the literature, but they will in some sense compete with several rather deep investigations recently on deriving the Hilbert space structure from various assumptions [6,7,8,9,10]. By relying on group representation theory, I use at the outset some Hilbert space structure, but this is by construction, not by assumption. The construction is shown to be realizable in the important finite-dimensional case. It is interesting to see, as stated in [11], that there is a problem connecting the above general derivations to the many different interpretations of quantum theory. By contrast, the derivation presented here seems to lead naturally to a particular interpretation: A general epistemic interpretation.

There is a large literature on different interpretations of quantum mechanics. In my opinion, one should try to clarify the questions around quantum foundations before one goes into a deeper discussion of possible interpretations. Brief discussions of interpretation are given in a later Section of this article. This discussion is in agreement with the view expressed by Robert Spekkens in a recent video: It may be seen as a categorical error to look upon quantum theory as describing the real world; it is a theory of our knowledge of the world. Of course the word knowledge can be interpreted in many ways, and an epistemic view upon quantum mechanics may be discussed, but in my opinion a focus on describing our knowledge of the world leads to more attractive interpretations of the theory.

There has been many attempts in the literature to motivate the Hilbert space formalism. What is special about my approach? First, it is based upon notions that can be communicated to oputsiders with a limited mathematical background. Secondly, it is based on very few postulates. One postulate, Postulate 3 in Sect. 3 below, may be seen by some people as strong, but this postulate is discussed here from several points of view. One view is a possible relation between science and religion; see also [12, 13].

Finally, I see it as crucial that my main result, also formulated in Sect. 3, in addition to the postulates only assumes the existence of just two maximal accessible variables in order to establish essential elements of the Hilbert space apparatus, These two variables may be seen, in the language of Niels Bohr, to be complementary.

One of the last derivations of the Hilbert space formalism is Brezhnev [14], where also other derivations are referred to. [14] is based upon an unlimited validity of the superposition principle. As a contrast, in the present approach the operators associated with theoretical variables are derived first, and a ket vector is only seen as a valid state vector if it is an eigenvector of a physically meaningful operator. This leads to a limitation of the superposition principle, but in my opinion, it facilitates the interpretation of quantum mechanics.

Several applications of the theory given here can be mentioned. One application is to give an explanation [15, 16] of the fact that Bell’s inequalities may be violated in real experiments, contrary to what could be expected intuitively. Other applications are discussions of the two-slit experiment and the paradoxes connected to Schrödinger’s cat and Wigner’s friend [17]. In [17] I also consider links to field theory and to general relativity theory, themes that are currently under further considerations.

The link to quantum decision theory [4] is also under further considerations. One goal is to find connections to statistical inference theory, where my accessible theoretical variables will be statistical parameters. (I have avoided the term ‘parameter’ in this article, since in physics this word has other meanings.) Such interdisciplinary investigations would have been impossible with the common quantum theory formalism.

In my proofs I use group representation theory in an essential way. Group representation theory in discussing quantum foundation has also been used in other places; see for instance [18]. In quantum field theory and particle physics theory, the use of group representation theory is crucial [19, 20].

Sections 2 and 3 give some background. In Sects. 4 and 5 I introduce some basic group theory and group representation theory that is needed in the paper. Then, in Sects. 6 and 7, I formulate my approach to the foundation of quantum theory. Simple postulates for the relevant situations are assumed. From this, the ordinary quantum formalism is derived, and it is shown how operators attached to accessible physical variables may be defined. A brief comparison with some other approaches towards quantum foundation is included in Sect. 8. In Sect. 9 a corresponding interpretation of quantum mechanics is discussed, and Sect. 10 gives some concluding remarks.

To complete the derivation of quantum theory along these lines, one will also need a derivation of the Born rule under suitable conditions, and a derivation of the Schrödinger equation. A very brief discussion of this is included in the crucial Sect. 7 for completeness; a more thorough discussion is given in [1]. The main focus of the present article is the construction of the Hilbert space apparatus from simple assumptions and the corresponding interpretation.

2 Convivial Solipsism

In Hervé Zwirn’s paper [21] it is shown how a long range of foundational issues in quantum mechanics, including the famous measurement problem, can be enlightened using a new philosophy called Convivial Solipsism. The basic thesis is: Every description of the world must be relative to some observer. But different observers can communicate. Mathematically the philosophy rests upon Everett’s relative state formulation of pure wave mechanics [22, 23]; see also [24] for philosophical issues. The distinction between absolute states, which can be taken to describe the whole, including physical system, measuring apparatus and observer, on the one hand, and relative states, describing, say, the state of an observer’s brain after a measurement, is crucial.

In concrete terms, assume a system which can be in one out of two states \(|S1\rangle\) or \(|S2\rangle\); the corresponding states of the measurement apparatus are \(|E1\rangle\) and \(E2\rangle\), while this induces states of the observer’s brain \(|B1\rangle\), respectively \(|B2\rangle\). All these are relative states. The global state will be of the form

$$\begin{aligned} |\psi \rangle = \alpha |S1\rangle |E1\rangle |B1\rangle +\beta |S2\rangle |E2\rangle |B2\rangle ), \end{aligned}$$
(1)

Zwirn distinguishes between the states of the brain and the ‘states’ of the consciousness of the corresponding observer, which can be denoted by \(\tilde{B1}\) or \(\tilde{B2}\). From this, the measurement problem and other problems related to the interpretation of quantum mechanics are discussed. The basic principles behind this discussion are the hanging-up mechanism and the relativity of states assumption; see [21].

All this presupposes a foundation of quantum mechanics based upon ket vectors as representing physical states. Below, I will try to avoid this formal apparatus as a starting point and start with completely different notions. But I will keep Zwirn’s basic thesis.

3 Motivation, First Postulates, and Some Basic Results

Can one find a new foundation of quantum theory, a foundation that ultimately leads to the full theory, but at the same time a foundation that can also be explained to persons who never have been exposed to the ordinary Hilbert space machinery?

My answer is yes. I have up to now discussed my approach in two books [1, 25] and in several papers [4, 5, 15,16,17, 26, 27]. Now I aim to collect all the mathematical arguments in a single article and also give results beyond the above books and articles.

My basic notion is that of a theoretical variable, which is a very wide notion. This variable can be a physical variable, a statistical parameter, a future data variable, a decision variable, or perhaps also other things. In this discussion, the variables can always be seen as mathematical variables. I divide the variables into accessible ones and inaccessible ones, as briefly discussed in the Introduction. From a mathematical point of view, I only require that if \(\theta\) is accessible and \(\lambda\) is a function of \(\theta\), then \(\lambda\) is also accessible.

Here are some examples:

  1. (1)

    Spin of one particle. An observer A can have the choice between measuring the spin component in the x direction or in the z direction. This gives two different accessible variables related to A. An inaccessible variable is the unit spin vector \(\phi\), which we think of as a three dimensional vector such that the spin component in a certain direction is a discretized version of the projection of \(\phi\) in that direction. In the qubit case, the spin component in any direction a can be modeled as a simple function of \(\phi\): \(\theta ^a = \textrm{sign}(\textrm{cos}(a,\phi ))\), taking the values -1 or +1. A correct distribution of each \(\theta ^a\) will result if we let \(\phi\) have a uniform distribution on the unit sphere. By using the Born rule, which requires a separate derivation, a conditional distribution of \(\theta ^b\), given \(\theta ^a\), \(a\ne b\), can also be found. This is the first example of quantum mechanics discussed in the book [28].

  2. (2)

    The EPR situation with Alice and Bob. For an independent observer Charlie, the unit spin vectors of both are inaccessible, say \(\phi _A\) and \(\phi _B\). But it can be shown that the dot product of the two is accessible to Charlie: \(\eta =\phi _A \cdot \phi _B\). Specifically, one can show that Charlie is forced to be associated with an eigenstate for the operator corresponding to the variable \(\eta\), which is the entangled singlet state corresponding to \(\eta =-3\). One can show that this implies that for Charlie and for the measured components in some fixed direction a, the component of Alice is opposite to the component of Bob. Note that Charlie can be any person. See Sect. 7.5 below for more details.

  3. (3)

    The Bell experiment situation. Look at the subsample of data where Alice measures her spin component in direction a and gets a response A, either -1 or +1, and where Bob measures in a direction b and gets a similar response B. For an independent observer Charlie, analysing the data after the experiment, both A and B are accessible. This implies by Born’s formula - anticipating this formula - a fixed joint distribution of A and B. But Charlie has his limitations, as in 2). In an article [15] in Foundations of Physics, I discuss what this limitation implies for him, using my point of view. This may be used to explain the now well-known violation of Bell’s inequality in practice. Unfortunately [15] contains some smaller errors, which were corrected later in the article [16].

  4. (4)

    Consider first a general decision problem with two alternatives. In the simplest case the observer A knows the consequences of both choices, they are accessible. But in more complicated cases, the consequences are inaccessible, and hence the results of his choice are inaccessible. Then an option can be to make a simpler sub-decision, where he knows the consequences. Alternatively, a decision variable can be connected to a decision process, and this decision variable can be said to be accessible if A is able to make a decision. Maximal accessible decision variables seem to be of some interest here. Quantum decision problems are discussed in [4], where further references are given. See also Sect. 7.4 below.

All these examples can be coupled to my approach towards quantum mechanics. I will now sketch the basic elements of this approach.

My point of departure is a statement of Hervé Zwirn’s Convivial Solipsism, as noted before: Every description of the world must be relative to some observer A or to a group of communicating observers. I will assume that this observer/ these observers is/ are in some fixed physical or non-physical context. My primitive concept is then theoretical variables related to this situation.

A concrete area where a non-physical context is meaningful and required is quantum decision theory; see Sect. 7.4.

Postulate 1

If \(\eta\) is a theoretical variable and \(\gamma = f(\eta )\) for some function f, then \(\gamma\) is also a theoretical variable.

The theoretical variables may be accessible or inaccessible to A (or to the group).Very roughly we can say: If \(\theta\) is accessible, A will, in principle in some future be able to find an accurate value of \(\theta\) as he likes. But as a referee remarks, this rough definition raises many questions: What is meant by the future? Is the accuracy limited by the Planck length? - and so on.

So I will just take ‘accessible’ as a primitive notion. From a mathematical point of view, I only assume:

Postulate 2

If \(\theta\) is accessible to A and \(\lambda = f (\theta )\) for some function f, then \(\lambda\) is also accessible to A.

The crucial model assumption is now the following (; see also [1, 4, 5]):

Postulate 3

In the given context there exists an inaccessible variable \(\phi\) such that all the accessible ones can be seen as functions of \(\phi\).

First look at two physical examples:

  1. (A)

    Consider a Stern-Gerlach experiment where the spin component of a particle is measured in an arbitrary direction a. These spin components are accessible variables, and can, as noted in 1) above be seen as functions of an inaccessible variable, the full spin vector, which only exists theoretically relative to the relevant observer.

  2. (B)

    The position of a particle is accessible, and so is its momentum. Both are functions of the variable \(\phi\)=(position, momentum), which is inaccessible by Heisenberg’s inequality.

In any more general mathematical setting, following ideas by Palmer [29], Postulate 3 can be motivated as follows in the finite-dimensional case, say where maximal accessible variables have dimension n. (See Definition 1 below for a definition of ‘maximal’): Assume that the variable \(\phi\) is generated by some chaotic process, and let it be written in base-n. Then we can let the maximal accessible variable \(\theta\) be determined by the millonth digit in \(\phi\), the maximal variable \(\eta\) be determined by digit number 1000 003, and so on. By Postulate 4 below, all accessible variables can be seen as some function of a maximal accessible variable, so Postulate 3 will hold in this setting. Note, however, that with this choice of \(\phi\), it seems difficult to define a transitive group K on \(\Omega _\phi\) such that the functions \(\theta (\cdot )\) are permissible with respect to K (see later).

A more satisfying explanation of Postulate 3, is to assume some relation between science and religion. I know that many scientists are skeptical to such relationships, but here it seems to me to clarify the possible background of quantum theory. One can simply assume that \(\phi\) is known to God, but unknown to us humans. This also explains why one can think of a quantum foundation attached to a human observer. Presumably, quantum mechanics and its generalizations have existed in some form forever, but human beings only for some million of years.

As a background for my theory, I assume that God has existed forever, and so have basic physical laws. A rather common assumption in various religions is that we humans are created in God’s image. Thus, in very metaphysical terms, one can simplify a theory of God’s mind by a theory of an observer’s mind, which is partly done in this article. For more on my views on religion and quantum foundation, see [12, 13].

In general, it should only be noted that \(\phi\) is a variable and is not assumed to take any value known to us. This is also analoguous to the main assumptions of probability theory, which is based upon functions on a probability space \((\Omega , \mathscr {F},P)\), with no assumption that \(\omega \in \Omega\) is known to us.

It will be shown below that Postulate 3, taken together with some symmetry assumptions, has far-reaching consequences. It will imply the existence of the whole Hilbert space apparatus, in particular, that each accessible variable has a unique symmetric operator connected to it. These symmetry assumptions will be shown in this article to be satisfied when all accessible variables take a finite number of values.

One can consider a concrete context with an observer A or with a set of communicating observers in this context. Let \(\phi\) be an inaccessible theoretical variable varying in a space \(\Omega _\phi\). It is a basic philosophy of the present paper that I always regard groups as group actions or transformations, acting on some space.

Starting with \(\Omega _\phi\) and a group K acting on \(\Omega _\phi\), let \(\theta (\cdot )\) be an accessible function on \(\Omega _\phi\), and let \(\Omega _\theta\) be the range of this function.

As mentioned in the Introduction, I regard ‘accessible’ and ‘inaccessible’ as primitive notions. But they have concrete interpretations, at least in the physical case: Roughly, a physical variable \(\theta\) is called accessible if an observer, by a suitable measurement, can obtain as accurate values of \(\theta\) as he wants to. More prcisely, from a mathematical point of view, I only assume that if \(\theta\) is accessible, and \(\lambda\) can be defined as a fixed function of \(\theta\), then \(\lambda\) is also accessible.

If \(\Omega _\theta\) and \(\Omega _\phi\) are equipped with topologies, all functions are assumed to be Borel-measurable.

Definition 1

The accessible variable \(\theta\) is called maximal if the following holds: If \(\theta\) can be written as \(\theta =f(\psi )\) for a function f that is not surjective, the theoretical variable \(\psi\) is not accessible. In other words: \(\theta\) is maximal under the partial ordering defined by \(\alpha \le \beta\) iff \(\alpha =f(\beta )\) for some function f.

Note that this partial ordering is consistent with accessibility: If \(\beta\) is accessible and \(\alpha =f(\beta )\), then \(\alpha\) is accessible. Also, \(\phi\) from Postulate 3 is an upper bound under this partial ordering.

Postulate 4

There exist maximal accessible variables relative to this partial ordering. For every accessible variable \(\lambda\) there exists a maximal accessible variable \(\theta\) such that \(\lambda\) is a function of \(\theta\).

Two different maximal accessible variables come very close to what Bohr called complementary variables; see Plotnitsky [30] for a thorough discussion.

It is crucial what is meant by ‘different’ here. If \(\theta =f(\eta )\) where f is a bijective function, there is a one-to-one correspondence between \(\theta\) and \(\eta\), they contain the same information, and they must be considered ‘equal’ in this sense. \(\theta\) and \(\eta\) are said to be ‘different’ if they are not ‘equal’ in this meaning. This is consistent with the partial ordering in Definition 1. The word ‘different’ is used in the same meaning in the Theorems below.

Postulate 4 can be motivated by using Zorn’s lemma - if this lemma, which is equivalent to the axiom of choice, is assumed to hold - and Postulate 3, but such a motivation is not necessary if Postulate 4 is accepted. Physical examples of maximal accessible variables are the position or the momentum of some particle, or the spin component in some direction. In a more general situation, the maximal accessible variable may be a vector, whose components are simultaneously measurable.

In example (A) the individual spin components can be taken to be maximal. In example (B) both position and momentum are maximal as accessible theoretical variables.

A statistical model for position measurement might be that the measured position is equal to the theoretical position plus noise. In this model, the accessible variable ’theoretical position’ is a statistical parameter.

These 4 postulates are all that I assume. The first goal of this article is to prove through some mathematical arguments versions of the following theorem:

Theorem 0

Assume that there relative to an observer A in some given context among other variables exist two different maximal accessible variables, each taking n values. Assume that these two are not bijective functions of each other. Then there exists an n-dimensional Hilbert space \(\mathscr {H}\) describing the situation, and every accessible variable in this situation will have a unique self-adjoint operator in \(\mathscr {H}\) associated with it.

This Theorem can be seen as my first starting point for developing the quantum formalism from simple postulates. In a more general version of this Theorem, see [5], it is assumed that the two basic variables above are related. This is necessary in the more general case, but in Sect. 7.2 below it is shown to be automatic in the finite-dimensional case above. The property of being related will be defined here:

Definition 2

Let \(\theta\) and \(\eta\) be two maximal accessible variables in some context, and let \(\theta =f(\phi )\) for some function f. If there is a transformation k of \(\Omega _\phi\) such that \(\eta (\phi )=f(k\phi )\), we say that \(\theta\) and \(\eta\) are related (relative to this \(\phi\)). If no such \((\phi ,k)\) can be found, we say that \(\theta\) and \(\eta\) are non-related relative to the variable \(\phi\).

It is easy to show that the property of being related is an equivalence relation. And if \(\theta\) is maximal, it follows from the relationship property that \(\eta\) above also is maximal. Finally, if G is a group acting on \(\Omega _\theta\), there can be defined a corresponding group H acting on \(\Omega _\eta\) by \(h\eta (\phi )=gf(k\phi )\) if \(\eta (\phi )=f(k\phi )\). The mapping from g to h is an isomorphism.

Theorem 0 is identical to Theorem 6 of Sect. 7, which can be deduced from the more general Theorem 4. Note that the crucial assumption is that we have two - in Niels Bohr’s terminology - complementary variables.

Using the above 4 postulates in the finite-dimensional case, further results can be proved, among other things:

  1. -

    The eigenvalues of the operator associated with \(\theta\) are the possible values of \(\theta\).

  2. -

    The accessible variable \(\theta\) is maximal if and only if all eigenvalues are simple.

  3. -

    The eigenspaces of the operator are associated with one of several variables, say \(\theta\). are in one-to-one correspondence with questions of the form ‘What is \(\theta\)?/ What will \(\theta\) be if it is measured?’ together with sharp answers ‘\(\theta =u\)’ for some u. In the maximal case, this gives a simple interpretation of eigenvectors.

To show all this in detail requires some further conceptual developments. In particular, for the most general version of my approach, I need:

Postulate 5

One can define a group K of actions on the space \(\Omega _\phi\) associated with \(\phi\). For at least one maximal accessible variable \(\theta\) there is a group G of actions on the associated space \(\Omega _\theta\).

I will also need the following:

Definition 3

The accessible variable \(\theta\) is called permissible with respect to K if the following holds: \(\theta (\phi _1)=\theta (\phi _2)\) implies \(\theta (t\phi _1 )=\theta (t\phi _2 )\) for all group elements \(t\in K\).

With respect to parameters and subparameters along with their estimation, the concept of permissibility is discussed in some detail in Chapter 3 in [25]. The main conclusion, which also is valid in this setting, is that under the assumption of permissibility, one can define a group G of actions on \(\Omega _\theta\) with elements g defined for any \(t\in K\) by

$$\begin{aligned} (g\theta )(\phi ):=\theta (t\phi );\ t\in K. \end{aligned}$$
(2)

Herein I use different notations for the group actions g on \(\Omega _\theta\) and the group actions t on \(\Omega _\phi\); by contrast, the same symbol g was used in [25]. The background for that is

Lemma 1

Assume that \(\theta\) is a permissible variable. The function from K to G defined by (2) is then a group homomorphism.

Proof

See [26]. \(\square\)

In general, whether the function \(\theta (\cdot )\) is permissible or not, I will assume that a transitive group G is acting on \(\theta\), and this is enough for the general Theorem 4 below, from which Theorem 6/Theorem 0 can be deduced. To prove the above properties of the Hilbert space formulation, I need a further basic result. Theorem 5 of Sect. 8 is the general result, and Theorem 7 is a simpler version, valid in the finite-dimensional case.

Note that my approach here can be seen as fully epistemic. It has to do with an agent seeking knowledge. In the finite-dimensional case, we may concentrate on state vectors that are eigenvectors of some meaningful operator. If this operator is associated with a maximal accessible variable \(\theta\), then in general these state vectors have interpretations as questions-and-answers: First, look at questions of the form: ‘What is \(\theta\)?’ or ‘What will \(\theta\) be if we measure it?’. Then consider sharp answers of the form ‘\(\theta =u\)’, where u is an eigenvalue of the operator corresponding to \(\theta\).

To show this requires some mathematics, given in this article, where also a further discussion is given. What is lacking in this development, are arguments for the Schrödinger equation and for the Born formula from simple assumptions. These issues will also be briefly discussed in Sect. 7 below, and are discussed in more detail in [1].

It is crucial for my development that operators corresponding to the accessible theoretical variables are found first. As a consequence of this article I can consider a version of quantum mechanics where a ket vector is only seen as a state vector when it is an eigenvector of some physically meaningful operator.

The development above was limited to a single observer A. Now the same mathematics applies to the following situation: There is a set of communicating observers, and jointly accessible or inaccessible variables in some context are associated with these observers. There may be difficulties, in general, to establish what the basic inaccessible variable \(\phi\) should be in the latter situation, but at least in the two physical examples above the construction is clear. Through discussions the set of observers can establish their theoretical variables, and find out which of them are accessible. The only difference now is that, in order to secure communication, the variables must be possible to define by words.

Note that in this whole discussion, I have said nothing about the ontology. I am fully convinced that there exists an external world, but the detailed properties of this world may be outside our ability to find out. Quantum mechanics as a model, although it is a very good model, can sometimes only give partial answers. Ontological aspects of my approach are discussed in [27].

I admit that this approach is unusual and that the postulates to some may seem a little farfetched. However, for an outsider, I will claim that it is much easier to understand these postulates than jumping right into the usual Hilbert space formalism. For those of us who have learned formalism, the approach may in some sense require some unlearning first.

Possible relationships between my approach and some other approaches towards the foundation of quantum mechanics will be briefly discussed in Sect. 8 below.

4 Group Actions and Measures

Starting with a point \(\theta _0\in \Omega _\theta\), an orbit of a group G acting on \(\Omega _\theta\) is the set \(\{g\theta _0:\ g\in G\}\). It is trivial to see that the orbits are disjoint, and their union is the full space \(\Omega _\theta\). The point \(\theta _0\) may be replaced by any point of the same orbit. In the case of one orbit filling the whole space, the group is said to be transitive.

The isotropy group at a point \(\theta \in \Omega _\theta\) is the set of g such that \(g\theta =\theta\). It is easy to see that this is a group.

It is important to define left and right invariant measures, both on the groups and on the spaces of theoretical variables. In the mathematical literature, see for instance [31, 32], Haar measures on the groups are defined (assuming locally compact groups). Right (\(\mu _G\)) and left (\(\nu _G\)) Haar measures on the group G satisfy

$$\begin{aligned} \mu _G(Dg)=\mu _G(D), \ \textrm{and}\ \nu _G(gD)=\nu _K(D)\\ \textrm{for}\ g\in G\ \textrm{and}\ D\subset G,\ \textrm{respectively}. \end{aligned}$$

Next, define the corresponding measures on \(\Omega _\theta\). As is commonly done, I assume that the group operations \((g_1,g_2)\mapsto g_1g_2\), \((g_1,g_2)\mapsto g_2g_1\) and \(g\mapsto g^{-1}\) are continuous. Furthermore, I will assume that the action \((g,\theta )\mapsto g\theta\) is continuous.

As discussed in Wijsman [33], an additional condition is that every inverse image of compact sets under the function \((g,\theta )\mapsto (g\theta ,\theta )\) should be compact. A continuous action by a group G on a space \(\Omega _\theta\) satisfying this condition is called proper. This technical condition turns out to have useful properties and is assumed throughout this paper. When the group action is proper, the orbits of the group can be proved to be closed sets relative to the topology of \(\Omega _\theta\).

The following result, originally due to Weil, is proved in [31, 33]; for more details on the right-invariant case, see also [25].

Theorem 1

The left-invariant measure \(\nu\) on \(\Omega _\theta\) exists if the action of G on \(\Omega _\theta\) is proper and the group is locally compact.

The connection between \(\nu _G\) defined on G and the corresponding left invariant measure \(\nu\) defined on \(\Omega _\theta\) is relatively simple: If for some fixed value \(\theta _0\) of the theoretical variable the function \(\beta\) on G is defined by \(\beta : g\mapsto g\theta _0\), then \(\nu (E)=\nu _G (\beta ^{-1}(E))\).This connection between \(\nu _G\) and \(\nu\) can also be written \(\nu _G(dg)=d\nu (g\theta _0))\), so that \(d\nu (hg\phi _0)=d\nu (g\phi _0)\) for all \(h, g \in G\) if \(\nu\) is left-invariant..

Note that \(\nu\) can be seen as an induced measure on each orbit of G on \(\Omega _\theta\), and it can be arbitrarily normalized on each orbit. \(\nu\) is finite on a given orbit if and only if the orbit is compact. In particular, \(\nu\) can be defined as a probability measure on \(\Omega _\theta\) if and only if all orbits of \(\Omega _\theta\) are compact. Furthermore, \(\nu\) is unique only if the group action is transitive. Transitivity of G as acting on \(\Omega _\theta\) will be assumed throughout this paper.

In a corresponding fashion, a right invariant measure can be defined on \(\Omega _\theta\). This measure satisfies \(d\mu (gh\phi _0)=d\mu (g\phi _0)\) for all \(g,h\in G\). In many cases the left invariant measure and the right invariant measure are equal.

5 A Brief Discussion of Group Representation Theory

A group representation of G is a continuous homomorphism from G to the group of invertible linear operators V on some vector space \(\mathscr {H}\):

$$\begin{aligned} V(g_1 g_2 )=V(g_1 )V(g_2 ). \end{aligned}$$
(3)

It is also required that \(V(e)=I\), where I is the identity, and e is the unit element of G. This assures that the inverse exists: \(V(g)^{-1}=V(g^{-1})\). The representation is unitary if the operators are unitary (\(V(g)^{\dagger }V(g)=I\)). If the vector space is finite-dimensional, we have a representation D(V) on the square, invertible matrices. For any representation V and any fixed invertible operator U on the vector space, we can define a new equivalent representation as \(W(g)=UV(g)U^{-1}\). One can prove that two equivalent unitary representations are unitarily equivalent; thus U can be chosen as a unitary operator.

A subspace \(\mathscr {H}_1\) of \(\mathscr {H}\) is called invariant with respect to the representation V if \(u\in \mathscr {H}_1\) implies \(V(g)u\in \mathscr {H}_1\) for all \(g\in G\). The null-space \(\{0\}\) and the whole space \(\mathscr {H}\) are trivially invariant; other invariant subspaces are called proper. A group representation V of a group G in \(\mathscr {H}\) is called irreducible if it has no proper invariant subspace. A representation is said to be fully reducible if it can be expressed as a direct sum of irreducible subrepresentations. A finite-dimensional unitary representation of any group is fully reducible. In terms of a matrix representation, this means that we can always find a \(W(g)=UV(g)U^{-1}\) such that D(W) is of minimal block diagonal form. Each one of these blocks represents an irreducible representation, and they are all one-dimensional if and only if G is Abelian. The blocks may be seen as operators on subspaces of the original vector space, i.e., the irreducible subspaces. The blocks are important in studying the structure of the group.

A useful result is Schur’s Lemma; see for instance [32]:

Let \(V_1\) and \(V_2\) be two irreducible representations of a group G; \(V_1\) on the space \(\mathscr {H}_1\) and \(V_2\) on the space \(\mathscr {H}_2\). Suppose that there exists a linear map T from \(\mathscr {H}_1\) to \(\mathscr {H}_2\) such that

$$\begin{aligned} V_2 (g)T(v)=T(V_1 (g)v) \end{aligned}$$
(4)

for all \(g\in G\) and \(v\in \mathscr {H}_1\).

Then either T is zero or it is a linear isomorphism. Furthermore, if \(\mathscr {H}_1=\mathscr {H}_2\), then \(T=\lambda I\) for some complex number \(\lambda\).

Let \(\nu\) be the left-invariant measure of the space \(\Omega _\theta\) induced by the group G, and consider in this connection the Hilbert space \(\mathscr {H}=L^2 (\Omega _\theta ,\nu )\). Then the left-regular representation of G on \(\mathscr {H}\) is defined by \(U^{L}(g)f(\phi )=f(g^{-1}\phi )\). This representation always exists, and it can be shown to be unitary, see [34].

If V is an arbitrary representation of a compact group G in some Hilbert space \(\mathscr {H}\), then there exists in \(\mathscr {H}\) a new scalar product defining a norm equivalent to the initial one, relative to which V is a unitary representation of G.

For references to some of the vast literature on group representation theory, see Appendix A.2.4 in [25].

6 The Construction of Operators for the Hypothetical Case of an Irreducible Representation of the Basic Group

In the quantum-mechanical context defined in [1] and discussed above, \(\theta\) is an accessible variable, and one should be able to introduce an operator associated with \(\theta\). The following discussion, which is partly inspired by [34, 35], assumes first an irreducible unitary representation of G on a complex Hilbert space \(\mathscr {H}\). In the next Section, the assumption of irreducibility will be removed, by simply assuming that we have two related maximal accessible variables in the given context.

6.1 A Resolution of the Identity

In the following, I assume that the group G has representations that give square-integrable coherent state systems (see page 43 of [34]). For instance, this is the case for all representations of compact semisimple groups, representations of discrete series for real semisimple groups, and some representations of solvable Lie groups.

Let G be an arbitrary such group, and let \(V(\cdot )\) be one of its unitary irreducible representations acting on a Hilbert space \(\mathscr {H}\). Assume that G is acting transitively on the space \(\Omega _\theta\), and fix \(\theta _0\in \Omega _\theta\). Then every \(\theta \in \Omega _\theta\) can be written as \(\theta =g\theta _0\) for some \(g\in G\). I also assume that the isotropy groups of G are trivial. Then this establishes a one-to-one correspondence between G and \(\Omega _\theta\). In particular, this implies that the group action is proper and a left-invariant measure \(\nu\) on \(\Omega _\theta\) exists; see Theorem 1 above.

Also, fix a vector \(|\theta _0\rangle \in \mathscr {H}\), and define the coherent states \(|\theta \rangle =|\theta (g)\rangle =V(g)|\theta _0\rangle\). With \(\nu\) being the left invariant measure on \(\Omega _\theta\), introduce the operator

$$\begin{aligned} T=\int |\theta (g)\rangle \langle \theta (g)|d\nu (g\theta _0). \end{aligned}$$
(5)

Note that the measure here is over \(\Omega _\theta\), but the elements are parametrized by G. T is assumed to be a finite operator.

Lemma 2

T commutes with every \(V(h); h\in G\).

Proof

\(\ \ V(h)T=\)

$$\begin{aligned} \int V(h) |\theta (g)\rangle \langle \theta (g)|d\nu (g\theta _0) =\int |\theta (hg)\rangle \langle \theta (g)|d\nu (g\theta _0)\\=\int |\theta (r)\rangle \langle \theta (h^{-1}r)|d\nu (h^{-1}r\theta _0 ). \end{aligned}$$

Since \(|\theta (h^{-1}r)\rangle =V(h^{-1}r)|\theta _0\rangle =V(h^{-1})V(r)|\theta _0\rangle =V(h)^\dagger |\theta (r)\rangle\), we have \(\langle \theta (h^{-1}r)|=\langle \theta (r)|V(h)\), and since the measure \(\nu\) is left-invariant, it follows that \(V(h)T=TV(h)\). \(\square\)

From the above and Schur’s Lemma, it follows that \(T=\lambda I\) for some \(\lambda\). Since T by construction only can have positive eigenvalues, we must have \(\lambda >0\). Defining the measure \(d\rho (\theta )=\lambda ^{-1}d\nu (\theta )\) we therefore have the important resolution of the identity

$$\begin{aligned} \int |\theta \rangle \langle \theta |d\rho (\theta )=I. \end{aligned}$$
(6)

For a more elaborate similar construction taking into account the isotropy subgroups, see Chapter 2 of [35]. In [5] a corresponding resolution of the identity is derived for states defined through representations of the group K acting on \(\Omega _\phi\).

6.2 Simple Quantum Operators

Let now \(\theta\) be a maximal accessible variable and let G be a group acting on \(\theta\), satisfying the requirements of the last subsection.

In general, an operator corresponding to \(\theta\) may be defined by

$$\begin{aligned} A^\theta =\int \theta |\theta \rangle \langle \theta | d\rho (\theta ). \end{aligned}$$
(7)

\(A^\theta\) is defined on a domain \(D(A^\theta )\) of vectors \(|v\rangle \in \mathscr {H}\) where the integral defining \(\langle v|A^\theta |v\rangle\) converges.

This mapping from an accessible variable \(\theta\) to an operator A has the following properties:

  1. (i)

    If \(\theta =1\), then \(A^\theta =I\).

  2. (ii)

    If \(\theta\) is real-valued, then \(A^\theta\) is symmetric (for a definition of this concept for operators and its relationship to self-adjointness, see [36].)

  3. (iii)

    The change of basis through a unitary transformation is straightforward.

For further important properties, we need some more theory. First, consider the situation where we regard the group G as generated by a group K defined on the space of an inaccessible variable \(\phi\). This represents no problem if the mapping from \(\phi\) to \(\theta\) is permissible, a case discussed in [5], and in this case, the operators corresponding to several accessible variables can be defined on the same Hilbert space. In the opposite case, we have the following theorem.

Theorem 2

Let H be the subgroup of K consisting of any transformation h such that \(\theta (h\phi )=g\theta (\phi )\) for some \(g\in G\). Then H is the maximal group under which the variable \(\theta\) is permissible.

Proof

Let \(\theta (\phi _1) =\theta (\phi _2)\) for all \(\theta \in \Theta\). Then for \(h\in H\) we have \(\theta (h\phi _1)=g\theta (\phi _1)=g\theta (\phi _2)=\theta (h\phi _2)\), thus \(\theta\) is permissible under the group H. For a larger group, this argument does not hold. \(\square\)

Next look at the mapping from \(\theta\) to \(A^\theta\) defined by (7).

Theorem 3

For \(g\in G\), \(V(g^{-1})AV(g)\) is mapped by \(\theta '=g\theta\).

Proof

\(V(g^{-1})AV(g)=\)

$$\begin{aligned} \int \theta |g^{-1}\theta \rangle \langle g^{-1}\theta |d\rho (\theta )=\int g\theta |\theta \rangle \langle \theta |d\rho (g\theta ). \end{aligned}$$

Use the left invariance of \(\rho\). \(\square\)

Further properties of the mapping from \(\theta\) to A may be developed in a similar way. The mapping corresponds to the usual way that the operators are allocated to observables in the quantum mechanical literature. But note that this mapping comes naturally here from the notions of theoretical variables and accessible variables on which group actions are defined.

7 The Main Theorems and Theoretical Results

7.1 The General Case

Up to now, I have assumed an irreducible representation of the group G. A severe problem with this, however, is that the group G in many applications is Abelian, and Abelian groups have only one-dimensional irreducible representations. Then the above theory is trivial.

In [5] this problem is solved by taking as a point of departure two different related maximal accessible variables \(\theta\) and \(\eta\). Recall here the meaning of the word ‘different’ discussed in Sction 3. The main result is then as follows.

Theorem 4

Consider a context where there are two different related maximal accessible variables \(\theta\) and \(\eta\). Assume that both \(\theta\) and \(\eta\) are real-valued or real vectors, taking at least two values. Make the following additional assumptions:

  1. (i)

    On one of these variables, \(\theta\), there can be defined a transitive group of actions G with a trivial isotropy group and with a left-invariant measure \(\rho\) on the space \(\Omega _\theta\).

  2. (ii)

    There exists a unitary multi-dimensional representation \(U(\cdot )\) of the group behind the group actions G such that for some fixed \(|\theta _0\rangle\) the coherent states \(U(g)|\theta _0\rangle\) are in one-to-one correspondence with the values of g and hence with the values of \(\theta\).

Then there exists a Hilbert space \(\mathscr {H}\) connected to the situation, and to every (real-valued or vector-valued) accessible variable there can be associated a symmetric operator on \(\mathscr {H}\).

For conditions under which a symmetric operator is self-adjoint/ Hermitian, see [36].

The crucial point in the proof of Theorem 4 is to construct a group N acting on the vector \(\psi =(\theta , \eta )\), and then a representation \(W(\cdot )\) of N which I prove is irreducible. The coherent states \(|v_n\rangle =W(n)|v_0\rangle\) are then in one-to-one correspondence with \(n\in N\). For the details of all this, I refer to Appendix 1 and to [5].

This gives the crucial identity

$$\begin{aligned} \int |v_n\rangle \langle v_n |\mu (dn) = I, \end{aligned}$$
(8)

where \(\mu\) is a left-invariant measure on the group N.

One can show that there is a function \(f_\theta\) on N such that \(\theta =f_\theta (n)\), and a function \(f_\eta\) on N such that \(\eta =f_\eta (n)\). We can now define operators corresponding to \(\theta\) and \(\eta\):

$$\begin{aligned} A^\theta= & {} \int f_\theta (n)|v_n\rangle \langle v_n |\mu (dn), \end{aligned}$$
(9)
$$\begin{aligned} A^\eta= & {} \int f_\eta (n)|v_n\rangle \langle v_n |\mu (dn). \end{aligned}$$
(10)

The properties (i)-(iii) of Subsection 5.2 can now be proved for the operators \(A^\theta\) and \(A^\eta\). All proofs are in Appendix 1.

Note that any pair of related maximal accessible variables may be used as a basis for Theorem 4. Accessible variables that are not maximal, can always be seen as functions of a maximal variable by postulate 4. Hence for these variables, the spectral theorem may be used, and operators constructed as in the last part of Appendix 1.

An essential part of the proof of Theorem 4 is to prove that if \(U(\cdot )\) is a representation of G which is not irreducible, then \(W(\cdot )\) is an irreducible representation of N. In order to carry out this part of the proof, I need a representation \(U(\cdot )\) which is at least three-dimensional, so that it can be reduced to a lower-dimensional space if not irreducible, and similarly the representation of the corresponding group H acting upon \(\eta\) must be at least three-dimensional. (The two-dimensional case, the qubit case, is treated separately in [1].)

The transformation k defining \(\eta =f(k\phi )\) from \(\theta =f(\phi )\) cannot be just the trivial one interchanging \(\theta\) and \(\eta\), taken together with the assumption that the group G acting upon \(\theta\) is just the identity. This is clear, since if such a trivial interchange was allowed in this case, every pair of variables would be related by the above definitions. Note, however, that according to Definition 2, the notion of being related depends on the inaccessible variable \(\phi\). If we for instance take \(\phi\) as the vector (position, momentum), and have a sufficiently large group G connected to \(\theta\)=position (say, the translation group), then an interchange of position and momentum is permitted relative to this \(\phi\). More concretely, this interchange will involve a Fourier transform, as in the ordinary theory.

To complete the construction of the usual Hilbert space formalism from the mathematical model of Sects. 2 and 5, I need a further main theorem.

Theorem 5

Assume that the functions \(\theta (\cdot )\) and \(\eta (\cdot )\) are permissible with respect to a group K acting on \(\Omega _\phi\). Assume that K is transitive and has a trivial isotropy group. Let \(T(\cdot )\) be a unitary representation of K such that the coherent states \(T(t)|\psi _0\rangle\) are in one-to-one correspondence with t. For any transformation \(t\in K\) and any such unitary representation T of K, the operator \(T(t)^\dagger A^\theta T(t)\) is the operator corresponding to \(\theta '\) defined by \(\theta '(\phi )=\theta (t\phi )\).

In addition, if \(\theta\) and \(\eta\) are different, but related through a transformation k of \(\Omega _\phi\), there is a unitary operator S(k) such that \(A^\eta = S(k)^\dagger A^\theta S(k)\).

This is also proved in Appendix 1.

One final remark to the developments above: The above theorems have so far been connected to a single observer A and the mathematical model of Sect. 3. But the same arguments can be used with the following model: Assume a set of communicating observers and assume that these have defined joint variables that may be accessible or inaccessible to the set of observers. Then the same mathematics is valid, and the same mathematical/physical examples of variables may be used.

7.2 The Case Where the Maximal Accessible Variable Takes a Finite Number of Values

I will show here first that if \(\theta\) takes a finite n number of values, then we can choose G and k such that the symmetry assumptions of Theorem 4 are satisfied. This leads to a simplification of the theory. The case \(n=2\), the qubit case, is discussed separately in [1]; see Subsection 4.5.3 and Section 5.2 there. I will here assume \(n\ge 3\), which in fact can be shown to be needed in the proof in Appendix 1 of Theorem 4.

In the finite case, it is crucial that the reducibility of the representation \(U(\cdot )\) is permitted. Concretely, let G be the cyclic group acting on the distinct values \(u_1,...,u_n\) of \(\theta\), that is, the group generated by the element \(g_0\) such that \(g_0 u_i =u_{i+1}\) for \(i=1,..., n-1\) and \(g_0 u_n = u_1\). This is an Abelian group, which only has one-dimensional irreducible representations. However, we can define \(U(\cdot )\) as taking values as diagonal unitary \(n\times n\) matrices with different complex nth roots of the identity on the diagonal. For the specific matrix \(U(g_0)\), take these nth roots in their natural order, and then let every element of G be mapped into the diagonal matrices \(U(\cdot )\) by the corresponding cyclical permutation.

It is easy to see then that the coherent states \(U(g)|\theta _0\rangle\) are in one-to-one correspondence with the group elements \(g\in G\) when \(|\theta _0\rangle\) is a unit vector with one element equal to 1 and the others zero, and this can be generalized to any \(|\theta _0\rangle\). Also, G is transitive in its range and has a trivial isotropy group.

Thus the only assumption of Theorem 4 that is left to verify, is the assumption that \(\eta\) can be found as a related variable to \(\theta\), that is, the existence of an inaccessible variable \(\phi\) and a transformation k in the corresponding space \(\Omega _\phi\) such that \(\eta (\phi )=\theta (k\phi )\).

To this end, let \(\Omega _\phi\) be the three-dimensional unit sphere, plot the values of \(\eta\) along the equator E, and the values of \(\theta\) along the great circle F containing the south pole and the north pole. See Fig. 1.

Fig. 1
figure 1

The construction of the transformation k

Without loss of generality, we can let the values of \(\theta\) and \(\eta\) be equidistant. (If this is not the case, we can define new variables \(\theta\) and \(\eta\) by taking one-to-one functions. If operators are proved to exist for these functions, operators of the original variables can be constructed from the spectral theorem.) If these values are plotted in a corresponding way, we can transform the values of \(\theta\) onto the values of \(\eta\) by a \(90^o\) rotation k of the sphere as indicated in the figure. Thus \(\eta (\phi )=\theta (k\phi )\). This implies that all the symmetry assumptions of Theorem 4 are satisfied, and we have simply

Theorem 6

Assume Postulate 1 to Postulate 4 of Sect. 3, and that there exist two different maximal accessible variables \(\theta\) and \(\eta\), each taking n values, and not being one-to-one functions of each other. Then, there exists an n-dimensional Hilbert space \(\mathscr {H}\) describing the situation, and every accessible variable in this situation will have an associated self-adjoint operator in \(\mathscr {H}\).

According to Postulate 5 of Sect. 3, there always exists a group K acting on \(\Omega _\phi\). The question is now whether this group can be constructed such that the symmetry assumptions of Theorem 5 also are satisfied. Take as a basis the cyclic group G acting on the values of \(\theta\), and let a corresponding group H act on the values of \(\eta\). Without loss of generality, assume these values to be equidistant. Construct K, acting on two copies of the sphere as in Fig. 2 as follows: Make a grid on both spheres by discretizing the longitude angle \(\lambda\) and the latitude angle \(\phi\) as on the Figure (not to be confused with the basic inaccessible variable \(\phi\)). In the first copy, this includes the lines E (representing \(\theta\)) and F (representing \(\eta\)). In the second copy, E (\(\theta\)) and F (\(\eta\)) are switched. The states are the intersections of meridians and latitude circles. In the first copy, let G act on the latitude circles and H act on the meridians. In the second copy, G and H are switched. This gives a version of \(G\otimes H\) acting on both copies. We let K consist of this and the following elements: From the North pole and the South pole of both copies there is a group action element j going from one copy to an arbitrarily chosen state of the other. This state can be chosen by a uniform distribution.

Fig. 2
figure 2

The construction of the group K, acting on grids on two copies of the sphere

The representation \(U(\cdot )\) of G is taken as above, we construct a similar representation \(V(\cdot )\) of H, and we take \(T(K)=W(K)\) be the irreducible group used in the proof of Theorem 4, where K is extended as above.

Using this geometry, the following must be proved, but proofs are omitted here:

  1. (1)

    K is transitive on its values and has a trivial isotropy group.

  2. (2)

    \(\theta (\cdot )\) and \(\eta (\cdot )\) are permissible functions of the state variable \(\phi\) with respect to K.

  3. (3)

    Taking as \(|\psi _0\rangle\) one of the points where the great circles E and F intersect, the coherent states \(T(t)|\psi _0\rangle\) are in one-to-one correspondence with the group elements \(t\in K\).

  4. (4)

    \(T(\cdot )\) is unitary and irreducible.

From this, the conditions of the first part of Theorem 5 are satisfied, and we have:

Theorem 7

Assume Postulate 1 to Postulate 4 of Sect. 3, and that there exist two different accessible variables \(\theta\) and \(\eta\), each taking n values. Let \(T(\cdot )\) be the unitary representation of K defined above. Then for any transformation \(t\in K\), the operator \(T(t)^\dagger A^\theta T(t)\) is the operator corresponding to \(\theta '\) defined by \(\theta '(\phi )=\theta (t\phi )\).

For the special transformation k above, we have \(A^\eta =S(k)^\dagger A^\theta S(k)\) for some unitary matrix S(k)..

From these two theorems follow a rich class of results, as discussed in detail in [1] (The first part of Theorem 7 is not needed for these results):

  1. -

    Every accessible variable has a self-adjoint operator connected to it.

  2. -

    The set of eigenvalues of the operator is equal to the set of possible values of the variable.

  3. -

    An accessible variable is maximal if and only if all eigenvalues are simple.

  4. -

    The eigenvectors can, in the maximal case, be interpreted in terms of a question together with its answer. Specifically, this means that in a context with several variables, a chosen maximal variable \(\theta\) may be identified with the question ‘What will \(\theta\) be if we measure it?’ and a specific eigenvector of \(A^\theta\), corresponding to the eigenvalue u may be identified with the answer ‘\(\theta =u\)’.

  5. -

    In the general case, eigenspaces have the same interpretation.

  6. -

    The operators of related variables are connected by a unitary similarity transform.

For the proofs of the second and third statements above, see Appendix 2. The third and fourth statements can be taken as basis for my proposal of a version of quantum mechanics: A ket vector is only seen as a valid state vector if it is an eigenvector of a physically meaningful operator. This requires a separate discussion of the superposition principle,

In my theory, the only valid state vectors are related to some variable \(\eta\) and the possible values \(v_k\) of this \(\eta\). Assume for simplicity that \(\eta\) is maximal. Then the statement \(\eta =v_k\) corresponds to an eigenvector \(|v_k\rangle\) of \(A^\eta\), with the resolution of the identity

$$\begin{aligned} \sum _k |v_k\rangle \langle v_k |=I. \end{aligned}$$
(11)

This implies a large class of possible new state vectors

$$\begin{aligned} | \psi \rangle = \sum _k |v_k\rangle \langle v_k | \psi \rangle = \sum _k \langle v_k | \psi \rangle | v_k \rangle = c_k |v_k\rangle , \end{aligned}$$
(12)

but it does not follow from this that every superposition of the orthogonal ket vectors \(|v_k\rangle\) can be written in this way.

Note that \(|\psi \rangle\) also has an interpretation in terms of some variable \(\theta\) and some statement \(\theta =u\). The physical situation may be that we know something about \(\theta\) or know something about \(\eta\), but we can also be without such knowledge. The most definite statement about knowledge of \(\eta\) is of the form \(\eta =v\), but we may also know probability statements of the form \(\{ \pi (v_k) \}\), which is formalized by a density matrix

$$\begin{aligned} \rho = \sum _k \pi (v_k) |v_k\rangle \langle v_k | \end{aligned}$$
(13)

As is well known, this leads by Born’s formula to probability statements for \(\theta\):

$$\begin{aligned} P[\theta =u]=\langle \psi | \rho |\psi \rangle =\sum _k \pi (v_k) |\langle \psi |v_k\rangle |^2. \end{aligned}$$
(14)

Going back to the superposition principle: Let \(| \alpha \rangle\) and \(| \beta \rangle\) be two different state vectors. Then \(| \alpha \rangle\) can be connected to a statement \(\gamma =u\) for some theoretical variable \(\gamma\), or equivalently, \(c(\gamma )=c(u)\) for any bijective c, and \(| \beta \rangle\) can be connected to a statement \(\xi =v\) for some theoretical variable \(\xi\), or equivalently, \(d(\xi )=d(v)\) for any bijective d. By Postulate 4, if \(\gamma\) and \(\xi\) are not maximal, they are functions, say \(f_1\) and \(f_2\), of some maximal variables, say \(\theta\) and \(\eta\), respectively.

$$\begin{aligned} \gamma = f_1(\theta ),\ \ \xi =f_2 (\eta ). \end{aligned}$$
(15)

We have two possibilities. Either \(\theta\) and \(\eta\) are bijective functions of each other. Then every states connected to them can be expressed in the same basis \(\{ | v_k\rangle \}\). Or they are different. Then by Theorem 6 they can be taken to construct the Hilbert space and the necessary operators.

In either case we have by (9) and (10) and the spectral theorem,

$$\begin{aligned} a A^\gamma + b A^\xi = \sum _k (af_1(u_k)+bf_2(v_k)) |v_k\rangle \langle v_k |, \end{aligned}$$
(16)

This is the operator of the theoretical variable \(\lambda =a\gamma +b\xi\), or more generally \(\lambda =ac(\gamma )+bd(\xi )\) for some bijective mappings c and d, and every state \(|w\rangle\) associated with \(\lambda =w\) can be expressed as

$$\begin{aligned} |w\rangle =\sum _k r_k |v_k\rangle , \end{aligned}$$
(17)

Postulate 6

If \(\{u_i \}; i\in I\) are the possible values of the accessible variable \(\gamma\), \(\{ v_j\}; j\in J\) are the possible values of the accessible variable \(\xi\), and \(\lambda = f(\gamma ,\xi )\) is accessible, then the possible values of \(\lambda\) is contained in \(\{f(u_i, v_j\}; i\in I, j\in J\).

Theorem 8

Assume that \(|\alpha \rangle\) and \(|\beta \rangle\) are possible state vectors, and \(|\alpha \rangle\) can be associated with an event \(f_1(\theta )=u\), and \(|\beta \rangle\) can be associated with an event \(f_2(\eta )=v\). Here \(\theta\) and \(\eta\) are two different meaningful maximal accessible variables. Then \(a|\alpha \rangle +b|\beta \rangle\), \((a\ne 0, b\ne 0)\) is a possible state vector if and only if there exist bijective functions c and d such that \(\lambda = ac(f_1(\theta ))+bc(f_2(\eta ))\) is a meaningful variable..

Proof

As above, if \(|\alpha \rangle\) is interpreted as \(\gamma =u\), and \(|\beta \rangle\) is interpreted as \(\xi =v\), this implies an interpretation of \(a|\alpha \rangle +b|\beta \rangle\) as \(\lambda = ac(\gamma ) + bd(\xi )= w = au+bv\) for some bijective mappings c and d, and this \(\lambda\) is of the desired form. It is left to prove the ‘only if’ part. To this end, assume that \(\theta\). \(\eta\) and \(\lambda\) have the assumed properties. Without loss of generality, let c and d be identities. By Postulate 4 there exist a maximal variable \(\mu\) such that \(\lambda\) is a function of \(\mu\). Since \(\theta\) and \(\eta\) are different, and \(\mu\) must involve \(\eta\) in a non-trivial way, \(\theta\) and \(\mu\) must be different. Then by Theorem 6 we can construct a Hilbert space based upon \(\theta\) and \(\mu\), operators \(A^\mu\) and then the operator \(A^\lambda\) by the spectral theorem correponding to the meaningful variable \(\lambda\) exist. Let \(\{w_k\}\) be the possible values of \(\lambda\).

Associated with \(\lambda =w_k\) there must according to Postulate 6 be an \(i=i(k)\) and a \(j=j(k)\) such that \(w_k =au_{i(k)}+bv_{j(k)}\). Here \(u_{i(k)}\) is a possible value of \(\gamma\), an eigenvalue of \(A^\gamma\) with a corresponding eigenvector \(|\alpha _{i(k)}\rangle\), and \(v_{j(k)}\) is a possible value of \(\xi\), an eigenvalue of \(A^\xi\) with a corresponding eigenvector \(|\beta _{j(k)}\rangle\). This gives that each \(a|\alpha _{i(k)}\rangle +b|\beta _{j(k]}\rangle\) is a possible state vector, corresponding to \(\lambda =w_k\). \(\square\)

Note the assumption in Theorem 8. For instance, in the Schrödinger cat example. let \(|\alpha _1\rangle\) be the state corresponding to a cat which is known to be dead, and let \(|\beta _1\rangle\) corresponding to a cat which is known to be alive. Let \(|\alpha _2\rangle\) and \(|\beta _2\rangle\) correspond to the complements of these two events. Denote the indicators of the mentioned events \(\gamma _i\) and \(\xi _j\).

An observer outside the box knows nothing, and is associated with the state \(||\alpha _2\rangle \otimes |\beta _2\rangle\), corresponding to \(\gamma _2 = \xi _2 =1\). Any superposition of the states \(|\alpha _i\rangle\) and \(|\beta _j\rangle\) is for him meaningless. An observer inside the box, wearing a gas mask, will know the answers, and is associated with the state given by \(\gamma _1=1\) or \(\xi _1=1\) Again, superposition is meaningless. The two observers will agree on the status of the cat once the door to the box is opened.

It is crucial now that this full theory follows by - in addition to the simple Postulate1 to Postulate 4 of Sect. 3-only assuming that two different maximal accessible variables exist, in Niels Bohr’s terminology, the existence of two different complementary variables. Born’s formula requires additional postulates, as briefly discussed in Sect. 7.6, and more fully in [1].

7.3 The Case of Position and Momentum of a Particle

It is of interest also to develop further the basis of quantum theory for the general case where \(\theta\) and \(\eta\) are continuous theoretical variables, but this is outside the scope of the present paper. But it is fairly straightforward now to complete the theory for an important special case: Let \(\theta\) be the theoretical position of some particle and let \(\eta\) be its theoretical momentum. I choose the accessible variables to be such theoretical variables and assume that a measurement consists of a theoretical value plus a measurement error. This is similar to how measurements are modeled in statistics.

The simplest approach is the following: Approximate \(\theta\) with an n-valued variable \(\theta _n\), find an operator \(A_n\) corresponding to \(\theta _n\), and let n tend to infinity. This approach is carried out in Section 5.3 in [1]. It is shown that the Hilbert space for \(\theta\) can be taken to be \(L^2(\mathbb {R}, dx)\), and the transformation k, which gives the operator for momentum, is a Fourier transform on this Hilbert space. The operators connected to \(\theta\) and \(\eta\) are the usual ones.

A more direct approach, using the general theory here, is to take the group G acting on \(\theta\) to be the translation group, and let the group K acting on \(\phi =(\theta ,\eta )\) be the Heisenberg-Weyl group; see [35]. This will not be further discussed here.

7.4 Quantum Decision Theory

There is a large literature on quantum decision theory; see for instance the survey article [37], the book [38] and the series of articles [39,40,41,42,43] by Yukalov and Sornette. The whole field of quantum decisions can be linked to the theory introduced here, as discussed in [4]. The clue is to let my variables \(\theta ,\eta ,\xi ,...\) no longer be physical variables, but decision variables. In the simplest case, a decision variable takes a finite number of values.

Let a person A be in a concrete decision situation. He is among other things faced with the choice between taking actions \(a_1,...,a_n\). Define a decision variable \(\theta\) as equal to j if he chooses to make a decision \(a_j\). If this is linked to my theory, we have to define what is meant by accessible and inaccessible decision variables. Let \(\theta\) be accessible if A really is able to perform all the actions \(a_1,...,a_n\), and is able to make a decision here. If not, we say that \(\theta\) is inaccessible.

To carry out this connection, we have to give meaning to all the Postulates 1 to 4 of Sect. 3. Postulate 1 gives no problem; all variables connected to A satisfy this postulate. Postulate 2 must be assumed. Then we consider, corresponding to the concrete decision associated with \(\theta\), simpler decisions, with decision variables \(\lambda\), such that each \(\lambda\) is a function of \(\theta\). A way to achieve this is to let these simpler decisions be associated with disjoint subsets of the actions \(a_1,...,a_n\). It then seems obvious that the simpler decisions are accessible when the decision connected to \(\theta\) is accessible.

Postulate 3 is a challenge here, but it can be satisfied in the following situation: Assume that A has concrete ideals when making his decisions, and he can imagine that one of these ideals has made similar decisions before, but he does not know this so concretely that he can figure out what the ideal person would have done in his concrete case. Let the inaccessible variable \(\phi\) correspond to the choices that A’s ideal would have done.

Postulate 4 may be justified by appealing to Zorn’s lemma for the partial order defined by taking functions of decision variables. The maximal decisions that can be made by A will have a special place in the proposed quantum decision theory.

If all these assumptions are made, we now have the results of Theorem 6 and Theorem 7, which give a Hilbert space apparatus connected to the situation. We then make the assumption that A really at the same time is confronted with two difficult decisions, each involving decision variables which to him are maximal.

To complete the link to quantum decisions, we must find probabilities connected to the decision variables. For this, one can use the Born formula, which is briefly discussed below; a detailed derivation is given in [1].

I hope to discuss all this further and give concrete examples, elsewhere.

7.5 On Entanglement and EPR

Consider two spin 1/2 particles, originally in the state of total spin 0, then separated, one particle sent to Alice and one particle sent to Bob. This can be described by the entangled singlet state

$$\begin{aligned} |\psi \rangle = \frac{|1+\rangle |2-\rangle - |1-\rangle |2+\rangle }{\sqrt{2}}, \end{aligned}$$
(18)

where \(|1+\rangle\) means that particle 1 has spin component +1 in some fixed direction, and \(|1-\rangle\) means that the component is -1; similarly, for \(|2+\rangle\) and \(|2-\rangle\) with particle 2.

As in David Bohm’s version of the EPR situation, let Alice measure the spin component of her particle in some direction a, and let Bob measure the spin component of his particle in the same direction. As has been described in numerous papers, there seemingly is a strange correlation here: The spin components are always opposite.

I want to couple this with the philosophy of Convivial Solipsism [21]: Every description of the world must be relative to some observer. So let us introduce an observer, Charlie, observing the results of both Alice and Bob. Charlie’s observations are all connected to the entangled state (18).

Let us try to describe all this in terms of accessible and inaccessible variables. The unit spin vectors \(\phi _1\) and \(\phi _2\) of the two particles are certainly inaccessible to Charlie, but it turns out that their dot product \(\eta =\phi _1\cdot \phi _2\) is accessible to him. In fact, Charlie’s observations are forced to be related to the state given by \(\eta =-3\).

Mathematically this is proved as follows. The eigenvalues of the operator \(A^\eta\) corresponding to \(\eta\) are 1 and − 3. The eigenvector associated with the eigenvalue − 3 is just \(|\psi \rangle\) of (18), while the eigenspace associated with the eigenvalue 1 is three-dimensional. (See for instance exercise 6.9. page 181 in [28].)

What does it mean that \(\eta =\phi _1\cdot \phi _2=-3\)? It means that \(\phi _{1x}\phi _{2x}+\phi _{1y}\phi _{2y}+\phi _{1z}\phi _{2z}=-3\), and since all components here are either -1 or +1, this is only possible if \(\phi _{1x}\phi _{2x}=-1\) etc., which implies \(\phi _{1x}=-\phi _{2x}\), \(\phi _{1y}=-\phi _{2y}\), and \(\phi _{1z}=-\phi _{2z}\). It follows that \(\phi _{1a}=-\phi _{2a}\) in every direction a. It is assumed here that, even though the derivation is concerned with inaccessible variables, the conclusion, which is an accessible conclusion for some experiment, is nevertheless valid.

Note that Charlie can be any person. So, we conclude: To any observing person, the spin components as measured by Alice and Bob must be opposite. This seems to be a necessary conclusion, implied by the fact that the person, relative to his observations, is related to the state given by (18).

For the further limitations of the observer Charlie and the corresponding explanation of why the CHSH inequality can be violated in Bell-type experiments, see [15, 16].

7.6 The Born Rule

Born’s formula is the basis for all probability calculations in quantum mechanics, The version given in most textbooks can be formulated as follows: Given some mixed state \(\rho\) the expectation of the theoretical variable \(\theta\) is given in terms of the associated operator \(A^\theta\) as

$$\begin{aligned} E(\theta )= \textrm{trace} (\rho A^\theta ). \end{aligned}$$
(19)

In textbooks, Born’s formula is usually stated as a separate axiom, but it has also been argued for by using various sets of assumptions [44,45,46]; see also Campanella et al. [47] for some references. In fact, the first argument for the Born formula, assuming that there is an affine mapping from a set of density functions to the corresponding probability functions, is due to von Neumann [48]. Many modern arguments rely on Gleason’s theorem, which is not valid in dimension 2, and also requires some assumptions, in particular the assumption of non-contextuality. In [1] a simple version of Born’s formula is derived under reasonable assumptions from a Gleason-type theorem due to Busch [49], a theorem which also holds in dimension 2. It is shown that (19) can be derived from this simple version for the case where \(\theta\) is maximal. In general, (19) requires an extra assumption: There is a maximal accessible variable \(\lambda\) such that \(\theta\) is a function of \(\lambda\). The distribution of \(\lambda\), given \(\theta\), is uniform.

7.7 The Hamiltonian and the Schrödinger Equation

First: In the approach of this article, I have concentrated on the construction of operators connected to accessible variables. From this, state vectors or ket vectors can in some generality be seen as eigenvectors of some operator; see Sect. 10 below and also a further discussion in [1].

During a time when no measurement is done on the system, the ket vectors are known in quantum mechanics to develop according to the Schrödinger equation:

$$\begin{aligned} i \frac{h}{2\pi } \frac{d}{dt}|\psi \rangle _t =H |\psi \rangle _t , \end{aligned}$$
(20)

where H is a self-adjoint operator called the Hamiltonian. (Referring to the general theory above, this is the operator corresponding to the variable \(\theta\)= total energy, in a context where the relevant observer or set of communicating observers also think of the complementary variable time).

In [1] I gave two sets of arguments for the Schrödinger equation, one rough and general, and then one specific related to position. The last argument also includes a discussion of the wave function. The general argument for the Schrödinger equation will be reproduced here.

7.8 The General Argument for the Schrödinger Equation; Unitary Transformations and Entanglement

Assume that the system at time 0 has a state given by the ket \(|\psi \rangle _0\) and at time t by the ket \(|\psi \rangle _t\). Let’s assume that the contexts are given as follows: We can ask an epistemic question about a maximal accessible variable \(\theta\), and the ket corresponding to a specific value of this variable is \(|\theta \rangle _0\) at time 0 and \(|\theta \rangle _t\) at time t. We have the choice between making a perfect measurement at time 0 or at time t. Since there is no disturbance through measurement of the system between these two time points, the probability distribution of the answer must be the same whatever choice is made. Hence according to the simple version of Born’s formula

$$\begin{aligned} |_0\langle \theta |\psi \rangle _0 |^2 =|_t\langle \theta |\psi \rangle _t|^2 . \end{aligned}$$
(21)

Now we refer to a general theorem by Wigner [50], proved in detail by Bargmann [51]: If an equation like (21) holds, then there must be a unitary or antiunitary transformation from \(|\psi \rangle _0\) to \(|\psi \rangle _t\). (Antiunitary U means \(U^{-1}=-U^{\dagger }\).) Since by continuity, an antiunitary transformation can be excluded here, so we have

$$\begin{aligned} |\psi \rangle _t =U_t|\psi \rangle _0 \end{aligned}$$

for some unitary operator \(U_t\). Writing \(U_t=\textrm{exp}(\frac{2\pi A_t}{ih})\) for some selfadjoint operator \(A_t\), and assuming that \(A_t\) is linear in t: \(A_t=Ht\), this is equivalent to (20). In fact, assuming that \(\{U_t\}\) forms a strongly continuous group of unitary transformations, the form Ht of \(A_t\) follows from a theorem by Stone; see Holevo [52].

Unitary transformations of states play an important role in quantum mechanics. Both in the continuous case and in the discrete case such a transformation can be used to illuminate the state concept as introduced in the present article. More specifically, a unitary change of an operator can quite generally be coupled to a concrete change of the involved theoretical variable; see Theorem 5 and Theorem 7 above. When an operator is changed in this way, its eigenvectors are changed accordingly, hence there is a change of states. Note that, subject to linearity, a unitary operator U always can be written as \(U=\textrm{exp}(\frac{2\pi Ht}{ih})\) for some suitable Hamiltonian H, so these transformations can be seen as closely related to time developments of states.

Consider the discrete case. Let the initial state be \(|a;k\rangle \otimes |b;j\rangle\), corresponding to the answers of two focused questions: \(\theta ^a=u_k^a\) and \(\gamma ^b =v_j^b\). Assume that \(\theta ^a\) and \(\gamma ^b\) are maximal. By a unitary transformation, essentially by a time development, this initial state is transformed into a state which cannot be written as a product of states in this way but is a linear combination of such states. This is an entangled state. Thus, in my terminology, entangled states can at least in some cases be given concrete interpretations: Some fixed time ago they were given as answers to two focused questions. By the inverse unitary transformation, the entangled state may be transformed back to the state \(|a;k\rangle \otimes |b;j\rangle\) again. Thus, we have then a concrete interpretation of the entangled state: Subject to a suitable Hamiltonian, the state can be interpreted as the answer to two focused questions posed at some past time.

8 A Brief Comparison with Some Other Approaches

As mentioned in the Introduction, there are several rather recent investigations with the purpose of deriving the Hilbert space structure from physical assumptions. Some of the resulting models are called generalized probability models. I will briefly discuss some of these approaches.

The article [6] by Hardy is a pioneering one. This article led to several other investigations, summarized in Hardy [53]. All these investigations start with a set of postulates, stated slightly differently.

Both in [6] and in [53] postulates are stated in terms of two basic numbers N and K. Here, N is the maximum number of states for a given system for which there exists some measurement, which can identify which state from a set in a single shot, while K is the number of probabilities that are entries in the state vectors that are constructed. Hardy presents a set of postulates that characterize both classical systems and quantum systems, and one postulate which distinguishes the two. It is proved that \(K=N\) for classical systems, and \(K=N^2\) for quantum systems. In modern form, Hardy’s postulates are stated as P1, P2, P3, P4’ and P5 of [53]. These lead to either classical mechanics or quantum mechanics, while a variant, P4, singles out quantum mechanics. In fact, it leads to a very general version of quantum mechanics, which is further discussed in [53].

In contrast, my postulates of Sects. 3 and 4 here can be associated with a version of quantum theory where I limit the pure state concept to ket vectors that are eigenvectors of some physically meaningful operator.

Another difference is that Hardy starts his investigations by introducing pure states and measurements, while I start with the notion of theoretical variables and their associated operators. It is an open problem to find a minimal set of postulates which cover all different approaches towards the formalism. It is crucial that my approach also leads to a particular view of the interpretation of quantum mechanics, discussed below.

Among the various articles that are related to the ones by Hardy, I can mention Goyal [9], who relies on the framework of information geometry, and Masanes and Müller [10], who state 5 axioms based on elementary assumptions regarding preparations, transformations, and measurements,

Rovelli’s book [54], which is thought-provoking and very informative, is more concerned with interpretation than with foundation. I find very much in this book that I appreciate, and Rovelli’s interpretation of quantum mechanics is close to mine.

9 Interpretation of Quantum Mechanics

Based on the results above, we can now start to discuss the interpretation of quantum theory. The results were based upon accessible theoretical variables \(\theta\), which were assumed to be connected to an observer or jointly to a group of communicating observers. From a scientific point of view, these variables are the ones for which questions of the form ‘What is \(\theta\)?’ or ‘What will \(\theta\) be if we measure it?’ can be posed. It is tempting here to cite Rovelli [54]: ‘I believe that we need to adapt our philosophy to our science, and not our science to our philosophy.’ I fully agree with this. The mathematical discussion above belongs to the domain of science; the philosophical discussion of quantum interpretation should come after this.

My theory seems to be more connected to our knowledge of reality rather than reality itself. I will call this a general epistemic interpretation of quantum mechanics.

There exist several interpretations of quantum mechanics, and the discussions between the supporters of the different interpretations are still going on. In recent years, there have been held a broad range of international conferences on the foundation of quantum mechanics. A great number of interpretations have been proposed; some of them look very peculiar to the layman. The many worlds interpretation assumes that there exist millions or billions of parallel worlds and that a new world appears every time one performs a measurement; there is also a related many mind’s interpretation.

On two of these conferences recently there was taken an opinion poll among the participants[55, 56]. It turned out to be an astonishing disagreement on many fundamental and simple questions. One of these questions was: Is quantum mechanics a description of the objective world, or is it only a description of how we obtain knowledge about reality? The first of these descriptions is called ontological, and the second is called epistemic. Up to now, most physicists have supported some version of an ontological or realistic interpretation of quantum mechanics, but variants of the epistemic interpretation have received a fresh impetus in recent years.

I look upon my book ’Epistemic Processes’ [1] and also this article as a contribution to this debate. An epistemic process can denote any process to achieve knowledge. It can be a statistical investigation or a quantum mechanical measurement, but it can also be a simpler process. The book starts with an informal interpretation of quantum states, which in the traditional theory has a very abstract definition. In my opinion, a quantum state can under wide circumstances be connected to a focused question and a sharp answer to this question, see above.

A related interpretation is QBism, or quantum Bayesianism, see Fuchs [57,58,59] and von Baeyer [60]. (What started as a variant of Bayesianism, has now developed into a somewhat wider QBism.) The predictions of quantum mechanics involve probabilities, and a QBist interprets these as purely subjective probabilities, attached to a concrete observer. Many elements in QBism represent something completely new in relation to classical physical theory, in relation to many people’s conception of science in general and to earlier interpretations of quantum mechanics.

QBism has been discussed by several authors. For instance, Hervé Zwirn’s views on QBism, which I largely agree with, are given in [61].

By using group theory, group representation theory, and some simple category theory, I aim to study a general situation involving theoretical variables mathematically, and it seems to appear from this that essential elements of the quantum formulation can be derived under weak conditions. This may be of some scientific relevance. Empirically, it has turned out that the the quantum formalism provides a very extensive description of our world as we know it [62], and in physical situations in microcosmos an all-embracing description.

Focus on the case where the accessible variable \(\theta\) takes a discrete set of values. In the case where \(\theta\) takes an infinite discrete set of values, we can still prove that Theorem 6 and Theorem 7 hold; the proof goes by taking a limit of cases where \(\theta\) takes a finite number of values.

The following simple observation should be noted and is in correspondence with the ordinary textbook interpretation of quantum states: Trivially, every vector \(|v\rangle\) is the eigenvector of some operators. Assume that there is one such operator A that is physically meaningful, and for which \(|v\rangle\) is also a non-degenerate eigenvector, say with a corresponding eigenvalue u. Let \(\lambda\) be a physical variable associated with \(A=A^\lambda\). Then \(|v\rangle\) can be interpreted as the question ‘What is the value of \(\lambda\)?’ along with the definite answer ‘\(\lambda =u\)’.

More generally, accepting operators with non-degenerate eigenspaces (corresponding to observables that are accessible, but not maximally accessible), each eigenspace can be interpreted as a question along with an answer to this question.

Binding together these two paragraphs, we can also think of the case where \(\lambda\) is a vector, such that each component \(\lambda _i\) corresponds to an operator \(A_i^{\lambda ^i}\), and these operators are mutually commuting. Then \(A^\lambda =\bigotimes _i A_i^{\lambda ^i}\) has eigenspaces which can be interpreted as a set of questions ‘What are the values of \(\lambda _i\ i=1,2,...\)?’ together with sharp answers to these questions. In the special case of systems of qubits, Höhn and Wever [63] have recently proved that there is a one-to-one correspondence between sets of question-and-answer pairs and state vectors.

The following is proved in [1, 64] under certain general technical conditions, and also specifically in the case of spin/ angular momentum: Given a vector \(|v\rangle\) in a Hilbert space \(\mathscr {H}\) and a number u, there is at most one pair (aj) such that \(|a;j\rangle =|v\rangle\) modulus a phase factor, and \(|a;j\rangle\) is an eigenvector of an operator \(A^a\) with eigenvalue u.

The main interpretation in [1] is motivated as follows: Suppose the existence of such a vector \(|v\rangle\) with \(|v\rangle =|a;j\rangle\) for some a and j. Then the fact that the state of the system is \(|v\rangle\) means that one has focused on a question (‘What is the value of \(\lambda ^a\)?’) and obtained the definite answer (\(\lambda ^a=u\).) The question can be associated with the orthonormal basis \(\{|a;j\rangle ;j=1,2,...,d\}\), equivalently with a resolution of the identity \(I=\sum _j |a;j\rangle \langle a;j|\). The general technical result of [1] is also valid in the case where \(\lambda ^a\) and u are real-valued vectors.

After this, we are left with the problem of determining the exact conditions under which all vectors \(|v\rangle \in \mathscr {H}\) in the non-degenerate discrete case and all projection operators in the general case can be interpreted as above. This will require a rich index set \(\mathscr {A}\) determining the index a. This problem will not be considered further here, but this is stated as a general question to the quantum community [64]. But from the evidence above, I will in this paper rely on the assumption that each quantum state/ eigenvector space can be associated in a unique way with a question-and-answer pair. Strictly speaking, this requires a new version of quantum mechanics, where we only permit state vectors that are eigenvectors of some physically meaningful operator.

Superposition of quantum states can be introduced in my setting as follows: Take as a point of departure the states \(|a;j\rangle\), each such state interpreted in the way that we know that \(\lambda ^a=u_j^a\) for a maximally accessible variable \(\lambda ^a\). Then consider another maximal variable \(\lambda ^b\) and a hypothetical possible value \(u_i^b\) for \(\lambda ^b\). Since \(\sum _j |a;j\rangle \langle a;j|=I\), we have

$$\begin{aligned} |b;i\rangle =\sum _j |a;j\rangle \langle a;j|b;i\rangle = \sum _j \langle a;j |b;i\rangle |a;j\rangle . \end{aligned}$$
(22)

Here the corresponding operators \(A^a\) and \(A^b\) do not commute, and this is a fairly general linear combination of states \(|a;j\rangle\). Such linear combinations will then be state vectors. The state \(|b;i\rangle\) may be a very hypothetical state, not coupled to the observer’s concrete knowledge. Then (22) corresponds to a ‘do not know’ state.

This discussion of superposition may also be generalized to the double-slit experiment and to more general experiments involving multiple paths; see for instance Rovelli [54] for such experiments. The inference pattern in the double-slit experiment can be explained by a momentum variable in the plane of the slits orthogonal to the two slits, a momentum which by de Broglie’s theory is connected to a wave. What is not known, is the position variable in the same direction, in particular, the answer to the question ‘Which slit?’. The answer to a similar question is also unknown in experiments involving multiple paths.

When \(\lambda\) is a continuous scalar or vector variable, we can still interpret the eigenspaces of the operator \(A^\lambda\) as questions ‘What is the value of \(\lambda\)?’ together with answers in terms of intervals or more generally sets for \(\lambda\). This is related to the spectral decomposition of \(A^\lambda\), which gives the resolution of the identity (recall (34))

$$\begin{aligned} I=\int _{\sigma (A^\lambda )} dE(\lambda ). \end{aligned}$$
(23)

This resolution of the identity is tightly coupled to the question ‘What is the value of \(\lambda ?\)’, and it implies projections related to indicators of intervals/sets C for \(\lambda\), that is, yes-no questions of the type ‘Does \(\lambda\) belong to C?’, as

$$\begin{aligned} \Pi (C)=\int _{\sigma (A^\lambda )\cap C} dE(\lambda ). \end{aligned}$$
(24)

This is of course just simple quantum logic. But it can be related to an interpretation if we can agree on the basic assumption of Convivial Solipsism: Every description of reality should be relative to an observer or a group of communicating observers. All theoretical variables of this article are assumed to have such a relation. Thus yes-no questions associated with such a variable should be related to an observer or a group of observers.

Then (24) can be interpreted as connected to a general epistemic interpretation of quantum states and projection operators. A special case is the QBist interpretation but this interpretation is more general. It can also be seen as a concrete specification of Relational Quantum Mechanics: Accessible variables of a system are seen as relative to other systems, where one of these other systems may be an observer. In the multiple-world interpretation, only variables connected to one world are accessible to a given observer at some fixed time. In the Bohm interpretation, the position of a particle at time t will be accessible, but the path from time \(t_1\) to time \(t_2\) will be inaccessible.

There is a huge literature on interpretations of quantum theory. Some of the proposed interpretations have relationships to my epistemic interpretation, but I will not go into more details with this discussion here.

In general, \(\lambda\) may be seen as a maximal accessible variable associated with the operator \(A^\lambda\). If \(\theta\) is another maximal accessible variable, it will be associated with another operator \(A^\theta\), and \(A^\lambda\) and \(A^\theta\) will not be commuting. We can then say that \(\lambda\) and \(\theta\) are complementary variables in the sense of Bohr. More precisely, it is the questions related to these variables that are complementary, each given by an orthonormal full set of ket vectors. Variables/operators corresponding to the same formal question, but having different sharp answers to this question, are equivalent in this respect. They are given by the same orthonormal basis, and the variables are bijective functions of each other.

In a physical context, Niels Bohr’s complementarity concept has been thoroughly discussed by Plotnitsky [30].

Here is Plotnitsky’s definition of complementarity:

  1. (a)

    A mutual exclusivity of certain phenomena, entities, or conceptions; and yet

  2. (b)

    The possibility of applying each one of them separately at any given point; and

  3. (c)

    The necessity of using all of them at different moments for a comprehensive account of the totality of phenomena that we consider.

This definition points to the physical situation discussed above and has Niels Bohr’s interpretation of quantum mechanics as a point of departure. In [1] I have also tried to couple the complementarity concept to macroscopic situations.

Here is one remark concerning QBism, which can be said to represent a special case of my views: Subjective Bayes probabilities have also been in fashion among groups of statisticians. In my opinion, it can be very fruitful to look for analogies between statistical inference theory and quantum mechanics, but then one must look more broadly at statistics and statistical inference theory, not only focusing on subjective Bayesianism. This is only one of several philosophies that can form a basis for statistics as a science. Studying connections between these philosophies is an active research area today. From such discussions, one might infer that another interesting version of Bayesianism is objective Bayesianism, with a prior based on group actions.

Finally, I need to discuss my epistemic interpretation of quantum mechanics in light of the various no-go theorems in the literature.

For the relationship to Bell’s theorem, I refer to my recent article [16]. It seems possible to avoid the non-locality assumption if we replace it with an assumption that all observers are limited in some specific sense.

My inaccessible variables are more general than what is usually perceived as hidden variables. Nevertheless, the Kochen-Specker theorem may a priori be of some relevance. However, the Kochen-Specker theorem only excludes noncontextual hidden variable theories. I think of my theory of an observer as connected to a fixed physical context at some fixed time.

A greater challenge is the recent Pusey-Barrett-Rudolph (PBR) theorem. This theorem seemingly excludes models where a pure quantum state represents only knowledge about an underlying physical state of the relevant system. A crucial assumption, however, is the existence, for every system, of such an underlying ‘real physical state’. This assumption can be questioned. Also, the arguments against an epistemic interpretation given by Colbeck and Rennes [65] rely on certain assumptions. In particular, it assumes a list of elements of reality \(\Lambda\) which satisfies a certain Markov condition. A detailed discussion will not be given here.

It is important, however, that also arguments against a realistic interpretation of quantum states are given in the literature [66, 67].

10 Concluding Remarks

The treatment of this paper is not quite complete. Some remaining problems include:

  1. -

    A further development of the case of continuous theoretical variables.

  2. -

    Giving concrete conditions under which the Born formula is applicable in practice. This is in particular relevant in connection to cognitive modeling.

  3. -

    Developing an axiomatic basis in the spirit of quantum logic (see for instance [68]). But note the simple postulates of Sect. 3 above.

  4. -

    A treatment of open quantum systems.

  5. -

    A further discussion of the relationship to other approaches and to other interpretations.

  6. -

    More concrete examples of how this approach can be used to address the conceptual and technical challenges of quantum theory. Some aspects of this are discussed in [17] and in [16].

  7. -

    A discussion of the implications of this approach for the experimental and technological aspects of quantum mechanics.

Group theory and quantum mechanics are intimately connected, as discussed in detail in several books and papers. In this article, it is shown that the familiar Hilbert space formulation can be derived mathematically from a simple basis of groups acting on theoretical variables. The consequences of this are further discussed in [1]. The discussion there also seems to provide a link to statistical inference.

From the viewpoint of purely statistical inference, the accessible variables \(\theta\) discussed in this paper are parameters. In many statistical applications it is useful to have a group of actions G defined on the parameter space; see for instance the discussion in [69]. In the present paper, the basic group G is assumed to be transitive, hence, tentatively, if we have a group on some parameter that is not transitive, the quantization of quantum mechanics can be derived from the following principle: all model reductions in some given model should be to an orbit of the group.

It is of some interest that the same criterion can be used to derive the statistical model corresponding to the partial least squares algorithm in chemometrics [70], and this connection also motivates an important case of the recently proposed more general envelope model [71].

The main message of the present paper: the reconstruction of quantum mechanics from simple assumptions, is connected to an observer or jointly to a group of communicating observers, by just taking accessible variables as a primitive notion. But some postulates like Postulate 3 and Postulate 4 seem to be necessary. Then under weak assumptions, the main condition needed seems to be that there exist two different complementary variables, where the word ‘complementary’ is taken to mean different maximal accessible theoretical variables.

In this paper, the first axioms of quantum theory are derived from reasonable assumptions. As briefly stated in [1], one can perhaps expect after this that such a relatively simple theoretical basis for quantum theory may facilitate a further discussion regarding its relationship to relativity theory. One can regard physical variables as theoretical variables, inaccessible inside black holes. These ideas are further developed in [17].

Crucial in this paper is the existence of an inaccessible variable \(\phi\) such that all accessible variables are functions og \(\phi\). An alternative approach, discussed in an earler version of this article, is to base the foundation upon some simple concepts from category theory. Then Postulate 3 is replaced by

Postulate 3’: Related to a given physical context there exists an object \(\Omega\) such that for each accessible variable \(\eta\) there exists a morphism from \(\Omega\) onto \(\Omega _\eta \subset \Omega\).

Category theory was founded by Mac Lane [72], and has been used by several physicists [53, 73, 74] in the foundation of quantum mechanics. Bob Coecke has in several papers from the point of view of category theory; see also [53].

Further aspects of the connection between quantum theory and statistical inference theory, which relies heavily on decisions, decisions that in my view can largely be modeled by using quantum decision theory, are under investigation.

Finally, I want to point out that some of my previous published papers contain certain errors and inaccuracies. In [5] and [4] it is erroneously stated that the basic group representation U should be irreducible. This is now corrected in [5]. Sloppy formulations in [15] are now cleared up in [16]. Most errors are now corrected in the book [75], but the proof there, p. 33, that the variable \(\theta\) can be written as a function of the group element n, is incorrect. The correct version is in Lemma A2 below. It is a strong hope that all mathematical arguments of the present article are correct.