1 Introduction

Human language is a substance consisting of combinations of concepts giving rise to meaning. We will show that a good model for this substance is the one of a gas of entangled bosonic quantum particles such as they appear in physics in the situation close to a Bose–Einstein condensate. In this respect we also introduce the new notion of ‘cogniton’ as the entity playing the same role within human language of the ‘bosonic quantum particle’ for the ‘quantum gas’. There is a gas of bosonic quantum particles that we all know very well, and that is the electromagnetic field, which we will also briefly call ‘light’, which is a substance of photons. Often we will use ‘light’ as an example and inspiration of how we will talk and reason about human language where ‘concepts’ (words), as ‘states of the cogniton’, are then like ‘photons of different energies (frequencies, wave lengths)’. With the new findings we present here, we also make an essential and new step forward in the elaboration of our ‘conceptuality interpretation of quantum theory’, where quantum particles are the concepts of a proto-language, in a similar way that human concepts (words), are the quantum particles (cognitons) of human language (Aerts 2009a, 2010a, b, 2013, 2014; Aerts et al. 2018d, 2019c).

There are several new results and insights that we will put forward in the coming sections. We summarize them here, referring also to earlier work on which they are built, guaranteeing however that the article is self-contained, so that it is not necessary to have studied these earlier works for understanding its content. The reason we can present here a self-contained theory of human language is because most of our earlier results take a simple and transparent form in the model of a boson gas that we elaborate here for human language. Since we also introduce the basics of the physics of a boson gas, our presentation will remain self-contained also from a physics’ perspective. In the article, we will use the terms ‘words’ and ‘concepts’ interchangeably because their difference does not play a role in the aspects of language we study.

We will see that the state of the gas of bosonic quantum particles which we identify explicitly to also be the state of a piece of text such as that of a story is one of very low temperature, i.e. a temperature in the neighborhood of where also the fifth state of matter appears, namely the Bose–Einstein condensate. This means that the interactions between ‘words’, which are the boson particles of language in our description, is mainly one of ‘quantum superposition’ and ‘quantum entanglement’, or more precisely one of ‘overlapping de Broglie wave functions’. This corresponds well with some of our earlier findings, when studying the combinations of concepts in human language, namely that superposition and entanglement are abundant, and the type of entanglement is deep, namely it also violates additionally to Bell’s inequality the marginal laws (Aerts 2009b; Aerts et al. 2011; Aerts and Sozzo 2011, 2014; Aerts et al. 2012, 2015a, 2016, 2018a, b, c, 2019a, b; Aerts Arguëlles 2018; Beltran and Geriente 2019).

When we present our model in the next sections, we will see that it contains several new explanations of aspects of human language which we brought up in earlier work. For example, we elaborated an axiomatic quantum model for human concepts, which we called SCoP (state context property system), and in which different exemplars of a specific concept are considered as different states of this concept (Gabora and Aerts 2002; Aerts and Gabora 2005a, b; Aerts 2009b; Aerts et al. 2013a, b). In the theory of the boson gas for human language that we develop here, we will not only introduce these states explicitly, but also introduce them as eigenstates for specific values of the energy and a detailed energy scale for all the words appearing in a considered piece of text will be introduced. If we compare this with the quantum description of light, it means that the cognitons of our piece of text of human language will radiate their meaning with different frequencies to the human mind, engaging in the meaning of this piece of text.

Let us consider an example of a text, namely the Winnie the Pooh story entitled ‘In Which Piglet Meets a Heffalump’ (Milne 1926), to make this introduction of ‘energy’ in our theory of language more concrete. We define the ‘energy level’ of a word (concept, cogniton) in the story by looking at the number of times this word appears in that story. The most often appearing word, namely 133 times, is the concept And (we will denote concepts or words when they are looked upon as states of a cogniton in italics and with a capital letter, like in our earlier works we have denoted concepts) and we attribute to it (for reasons that will become clearer later) the lowest energy level \(E_0\). The second most often appearing word, 111 times, is the concept He, and we attribute to it the second lowest energy level \(E_1\), and so on, till we reach words such as Able, which only appears once. In other words, if we think of a story as a ‘gas of bosonic particles’ in ‘thermal equilibrium with its environment’, these ‘number of times of appearance in the story’ indicate different energy levels of the particles of the gas, following the ‘energy distribution law governing the gas’, and this is our inspiration for the introduction of ‘energy’ in human language. Remember indeed that each of these words (concepts) is a ‘state of the cogniton’, exactly like different energy levels of photons (different wave lengths of light) are each ‘states of the photon’. Proceeding in this way we arrive at 452 energy levels for the story ‘In Which Piglet Meets a Heffalump’, the values of which are taken to be

$$\begin{aligned} \{E_i = i \ |\ i \in [0, 1, \ldots , 451, 452]\} \end{aligned}$$
(1)

We denote \(N(E_i)\) the ‘number of appearances’ of the word (concept, cogniton) with energy level \(E_i\), and if we denote n the total number of energy levels, we have that

$$\begin{aligned} N = \sum _{i=0}^n N(E_i) \end{aligned}$$
(2)

is the total number of words (concepts, cognitons) of the considered piece of text, which is 2655 for the story ‘In Which Piglet Meets a Heffalump’.

For each of the energy levels \(E_i\), \(N(E_i)E_i\) is the amount of energy ‘radiated’ by the story ‘In Which Piglet Meets a Heffalump’ with the ‘frequency or wave length’ connected to this energy level. For example, the energy level \(E_{54} = 54\) is populated by the concept Thought and the word Thought appears \(N(E_{54})=10\) times in the story ‘In Which Piglet Meets a Heffalump’. Each of the 10 appearances of Thought radiates with energy value 54, which means that the total radiation with the wave length connected to Thought of the story ‘In Which Piglet Meets a Heffalump’ equals \(N_{54}E_{54} = 10 \cdot 54 = 540\).

The total energy E radiated by the considered piece of text is therefore

$$\begin{aligned} E = \sum _{i=0}^n N(E_i) E_i \end{aligned}$$
(3)

For the story ‘In Which Piglet Meets a Heffalump’ we have \(E = 242{,}891\). Let us represent now some of the other findings that we will describe more in detail in the following sections.

When we applied the Bose–Einstein distribution

$$\begin{aligned} N(E_i) = {1 \over {Ae^{{E_i \over B}}-1}} \end{aligned}$$
(4)

to model the data we collected on the story ‘In Which Piglet Meets a Heffalump’, determining the parameters A and B by the two requirements

$$\begin{aligned} \sum _{i=0}^n N(E_i) = 2655 \quad \sum _{i=0}^n N(E_i) E_i = 242{,}891 \end{aligned}$$
(5)

we found an almost complete fit with the data (see Sect. 2, Table 1, Figs. 1a, b and 2). We tested numerous other texts, short stories (see Sect. 3, Table 4, Figs. 3, 4) and long stories of the size of novels (see Sect. 4, Fig. 7b), and each time it showed that a modeling by means of a Bose–Einstein statistical energy distribution, like explained above, gives rise to an almost complete fit with the data.

We started this investigation with the idea that ‘concepts within human language behave like bosonic entities’, an idea we expressed earlier as one of the basic pieces of evidence for the ‘conceptuality interpretation’ (Aerts 2009a). The origin of the idea is the simple direct understanding that if one considers, for example, the concept combination Eleven Animals, then, on the level of the ‘conceptual realm’ each one of the eleven animals is completely ‘identical with’ and ‘indistinguishable from’ each other of the eleven animals. It is also a simple direct understanding that in the case of ‘eleven physical animals’, there will always be differences between each one of the eleven animals, because as ‘objects’ present in the physical world, they have an individuality, and as individuals, with spatially localized physical bodies, none of them will be really identical with the other ones, which means that each one of them will also always be able to be distinguished from the others. Even if all the animals are horses, simply because they are ‘objects’ and not ‘concepts’, they will not be completely identical and hence they will be distinguishable. The idea is that it is ‘this not being completely identical and hence being distinguishable’ which makes the Maxwell–Boltzmann statistics being applicable to them. However, when we consider ‘eleven animals’ as concepts, such that their ontological nature is conceptual, they are all ‘completely identical and hence intrinsically indistinguishable’. Within the conceptuality interpretation of quantum theory, where we put forward the hypothesis that quantum entities are ‘conceptual’ and hence are not ‘objects’, their ‘being completely identical and hence intrinsically indistinguishable’, would also be due to their being conceptual instead of objectual entities.

In earlier work we already investigated this idea by looking at simple combinations of concepts with numerals, such as indeed Eleven Animals and then considering two states of Animal, namely Cat and Dog. We then checked whether the twelve different exemplars of them that form in these two states, namely Eleven Dogs, One Cat And Ten Dogs, Two Cats And Nine Dogs,..., Ten Cats And One Dog, Eleven Cats, in their appearance in texts follow a Maxwell–Boltzmann or rather a Bose–Einstein statistical pattern. In a less convincing way because of a collection of limited data (Aerts 2009a; Aerts et al. 2015b), but with an abundance of data and very convincingly (Beltran 2019), it was shown that indeed the Bose–Einstein statistics delivers a better model for the data as compared to the Maxwell–Boltzmann statistics.

The result that we put forward in the present article, namely that the Bose–Einstein statistics as explained above models entire texts of any size, is a much stronger one, although it expresses the same idea. Consider any text, and then consider two instances of the word Cat appearing in the text, if then one of the concepts Cat is exchanged with the other concept Cat, absolutely nothing changes in the text. Hence, a text contains a perfect symmetry for the exchange of cognitons (concepts, words) in the same state. This is not true for physical reality and its physical objects. Suppose one considers a physical landscape where two cats are within the landscape, exchanging the two cats will always change the landscape, because the cats are not identical and are distinguishable as physical objects. If we introduce a quantum description of the text, the wave function must be invariant for the exchange of the two cats, which would again be not the case if the wave function would describe the physical landscape containing two cats as objects. This is the result we will present in Sect. 2.

Section 3 is devoted to a self-contained presentation of the phenomenon of Bose–Einstein condensation in physics. We illustrate the different aspects of the Bose–Einstein condensation valuable for our discussion, by means of two examples of Bose gases, the rubidium 87 atom gas and the sodium atom gas, that also originally where the first ones to be used to realize a Bose–Einstein condensate (Anderson et al. 1995; Davis et al. 1995). We compare the Bose–Einstein condensates of the gases and how their energy level distribution is modeled by the Bose–Einstein distribution function with our Bose–Einstein modeling of pieces of texts of stories and point out the points of correspondence.

Another finding that we will put forward, in Sect. 4, was completely unexpected. The method of attributing an energy level to a word depending on the number of appearances of the word in a text, introduces the typical ranking considered in the well-known Zipf’s law analysis of this text (Zipf 1935, 1949). When we look at the \(\log /\log\) graph of ranking in function of the number of appearances, we indeed see the linear function, or a slight deviation of it, which represents the most common version of Zipf’s law. Zipf’s law is an experimental law, which has not yet been given any theoretical foundation, hence perhaps our finding, of its unexpected connection with Bose–Einstein statistics, might provide such a foundation. We also show, in Sect. 4, how the connection with Zipf’s law allows us to develop more in depth the Bose–Einstein model of texts of different sizes, short stories and long stories of the size of novels.

In Sect. 5, we reflect about the issue of ‘identity and indistinguishability’ from the perspective we developed in the foregoing sections, taking into account the conundrum this issue actually still is in quantum theory with respect to quantum particles (Dieks and Lubberdink 2019). Confronting the theoretical view where bosons and fermions are considered to be identical and indistinguishable even if they are in different states, we note that experimentalists take another stance in this respect considering, for example, photons of different frequencies as distinguishable. A recent experiment shows that if this experimentally accepted possibility to distinguish them is erased by means of a quantum eraser, these different frequency photons behave as indistinguishable (Zhao et al. 2014). This makes us put forward the proposal that ‘the way in which we clearly see and understand the identity and indistinguishability of concepts (words, cognitons) in human language’ is also ‘the way in which identity and indistinguishability for quantum particles can be understood’. More specifically, it shows that ‘identity and indistinguishability’ are contextual notions for a quantum particle, depending on the way a measuring apparatus or a heat bath interacts with the quantum particle, similarly to how ‘identity and indistinguishability’ are contextual notions for a human concept, depending on how a mind interacts with the concept. We elaborate with examples this new way of interpreting ‘identity and indistinguishability’ and show how it is a strong confirmation of our conceptuality interpretation of quantum theory.

2 Human Language as a Bose Gas

Let us consider again the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’ as published in Milne (1926). In Table 1, we have presented the list of all words that appear in the story (in the column ‘Words concepts cognitions’), with their ‘number of appearances’ (in the column ‘Appearance numbers \(N(E_i)\)’), ordered from lowest energy level to highest energy level (in the column ‘Energy levels \(E_i\)’), where the energy levels are attributed according to these numbers of appearances, lower energy levels to higher number of appearances, and their values are given as proposed in (1).

The word And is the most often appearing word, namely 133 times, hence the cognitons in this state populate the ground state energy level \(E_0\), which as per (1) we put equal to zero. The word He is the second most often appearing word, namely 111 times, hence the cognitons in this state populate the first energy level \(E_1\), which following (1) we put equal to 1. Hence, the ‘words’, their ‘energy levels’ and their ‘numbers of appearances’ are in the first three columns of Table 1.

The question can be asked ‘what is the unity of energy in this model that we put forward?’, is the number ‘1’ that we choose for energy level \(E_1\) a quantity expressed in joules, or in electronvolts, or still in another unity? This question gives us the opportunity to reveal already one of the very new aspects of our approach. Energy will not be expressed in ‘\({\mathrm{kg\,m}}^2/{\mathrm{s}}^2\)’ like it is the case in physics. Why not? Well, a human language is not situated somewhere in space, like we believe it to be the case with a physical boson gas of atoms, or a photon gas of light. Hence, ‘energy’ is here in our approach a basic quantity, and if we manage to introduce—this is one of our aims in further work—what the ‘human language equivalent’ of ‘physical space’ is, then it will be oppositely, namely this ‘equivalent of space’ will be expressed in unities where ‘energy appears as a fundamental unit’. Hence, the ‘1’ indicating that ‘He radiates with energy 1’, or ‘the cogniton in state He carries energy 1’, stands with a basic measure of energy, just like ‘distance (length)’ is a basic measure in ‘the physics of space and objects inside space’, not to be expressed as a combination of other physical quantities. We used the expressions ‘He radiates with energy 1’, and ‘the cogniton in state He carries energy 1’, and we will use this way of speaking about ‘human language within the view of a boson gas of entangled cognitons that we develop here’, in similarity with how we speak in physics about light and photons.

The words The, It, A and To, are the four next most often appearing words of the Winnie the Pooh story, and hence the energy levels \(E_2\), \(E_3\), \(E_4\) and \(E_5\) are populated by cognitons respectively in the states The, It, A and To carrying respectively 2, 3, 4 and 5 basic energy units. Hence, the first three columns in Table 1 describe the experimental data that we extracted from the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’. As we said, the story contains in total 2655 words, which give rise to 542 energy levels, where energy levels are connected with words, hence different words radiate with different energies, and the size of the energies are determined by ‘the number of appearances of the words in the story’, the most often appearing words being states of lowest energy of the cogniton and the least often appearing words being states of highest energy of the cogniton. In Table 1, we have not presented all 542 energy levels, because that would lead to a too long table, but we have presented the most important part of the energy spectrum, with respect to the further aspects we will point out.

More concretely, we have represented the range from energy level \(E_0\), the ground state of the cogniton, which is the cogniton in state And, to energy level \(E_{78}\), which is the cogniton in state Put. Then we have represented the energy level from \(E_{538}\), which is the cogniton in state Whishing, to the highest energy level \(E_{542}\) of the Winnie the Pooh story, which is the cogniton in state You’ve.

These last five highest energy levels, from \(E_{538}\) to \(E_{542}\), corresponding respectively to the cogniton in states Whishing, Word, Worse, Year and You’ve, all have a number of appearance of ‘one time’ in the story. They do however radiate with different energies, but the story is not giving us enough information to determine whether Whishing is radiating with lower energy as compared to Year or vice versa. Since this does not play a role in our actual analysis, we have ordered them alphabetically. So, different words which radiate with different energies that appear an equal number of times in this specific Winnie the Pooh story will be classified from lower to higher energy level alphabetically.

In the column ‘Energies from data \(E(E_i)\)’, we represent \(E(E_i)\), the ‘amount of energy radiated by the Winnie the Pooh story by the cognitons of a specific word, hence of a specific energy level \(E_i\)’. As we mentioned already in the previous section, the formula for this amount is given by

$$\begin{aligned} E(E_i) = N(E_i) E_i \end{aligned}$$
(6)

the product of the number \(N(E_i)\) of cognitons in the state of the word with energy level \(E_i\) multiplied by the amount of energy \(E_i\) radiated by such a cogniton in that state. In the last row of Table 1, we give the Totalities, namely in the column ‘Appearance numbers \(N(E_i)\)’ of this last row the total number of words

$$\begin{aligned} \sum _{i=0}^n N(E_i) = N = 2655 \end{aligned}$$
(7)

and in the column ‘Energies from data \(E(E_i)\)’ of the last row we give the total amount of energy

$$\begin{aligned} \sum _{i=0}^n E(E_i) = \sum _{i=0}^n N(E_i) E_i = E = 242{,}891 \end{aligned}$$
(8)

radiated by the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’. Hence, columns ‘Words concepts cognitions’, ‘Energy levels \(E_i\)’, Appearance numbers \(N(E_i)\) and ‘Energies from data \(E(E_i)\)’ contain all the experimental data of the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’.

In columns ‘Bose–Einstein modeling’ and ‘Maxwell–Boltzmann modeling’ of Table 1, we give the values of the populations of the different energy states for, respectively, a Bose–Einstein and a Maxwell–Boltzmann model of the data of the considered story. Let us explain what these two models are. As we recalled in the introduction, the Bose–Einstein distribution function is given by

$$\begin{aligned} N(E_i) = {1 \over {Ae^{{E_i \over B}}-1}} \end{aligned}$$
(9)

where \(N(E_i)\) is the number of bosons obeying the Bose–Einstein statistics in energy level \(E_i\) and A and B are two constants that are determined by expressing that the total number of bosons equals the total number of words, and that the total energy radiated equals the total energy of the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’, hence by the two conditions

$$\begin{aligned}&\sum _{i=0}^n {1 \over {Ae^{{E_i \over B}}-1}} = N = 2655 \end{aligned}$$
(10)
$$\begin{aligned}&\sum _{i=0}^n {E_i \over {Ae^{{E_i \over B}}-1}} = E = 242{,}891 \end{aligned}$$
(11)

We remark that the Bose–Einstein distribution function is derived in quantum statistical mechanics for a gas of bosonic quantum particles where the notions of ‘identity and indistinguishability’ play the specific role they are attributed in quantum theory (Huang 1987). We will come back to this in Sect. 5, when we will analyze what our findings and our aim are, given our conceptuality interpretation of quantum theory, to understand better how ‘identity and indistinguishability’ can be explained for a physical Bose gas using our understanding of it in human language.

Fig. 1
figure 1

In a we represent the ‘number of appearances’ of words in the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’ (Milne 1926), ranked from lowest energy level, corresponding to the most often appearing word, to highest energy level, corresponding to the least often appearing word as listed in Table 1. The blue graph (Series 1) represents the data, i.e. the collected numbers of appearances from the story (column ‘Appearance numbers \(N(E_i)\)’ of Table 1), the red graph (Series 2) is a Bose–Einstein distribution model for these numbers of appearances (column ‘Bose–Einstein modeling’ of Table 1), and the green graph (Series 3) is a Maxwell–Boltzmann distribution model (column ‘Maxwell–Boltzmann modeling’ of Table 1). In b we represent the \(\log / \log\) graphs of the ‘numbers of appearances’ and their Bose–Einstein and Maxwell–Boltzmann models. The red and blue graphs coincide almost completely in both a and b while the green graph does not coincide at all with the blue graph of the data. This shows that the Bose–Einstein distribution is a good model for the numbers of appearances, while the Maxwell–Boltzmann distribution is not

Since we want to show the validity of the Bose–Einstein statistics for concepts in human language, we compared our Bose–Einstein distribution model with a Maxwell–Boltzmann distribution model, hence we introduce also the Maxwell–Boltzmann distribution explicitly. It is the distribution described by the following function

$$\begin{aligned} N(E_i) = {1 \over Ce^{{E_i \over D}}} \end{aligned}$$
(12)

where \(N(E_i)\) is the number of classical identical particles obeying the Maxwell–Boltzmann statistics in energy level \(E_i\) and C and D are two constants that will be determined, like in the case of the Bose–Einstein statistics, by the two conditions

$$\begin{aligned}&\sum _{i=0}^n {1 \over Ce^{{E_i \over D}}} = N = 2655 \end{aligned}$$
(13)
$$\begin{aligned}&\sum _{i=0}^n {E_i \over Ce^{{E_i \over D}}} = E = 242{,}891 \end{aligned}$$
(14)

The Maxwell–Boltzmann distribution function is derived for ‘classical identical and distinguishable’ particles, and can also be shown in quantum statistical mechanics to be a good approximation if the quantum particles are such that their ‘the Broglie waves’ do not overlap (Huang 1987). In the last two columns ‘Energies Bose–Einstein’ and ‘Energies Maxwell–Boltzmann’ of Table 1, we show the ‘energies’ related to the Bose–Einstein modeling and to the Maxwell–Boltzmann modeling, respectively.

Table 1 An energy scale representation of the words of the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’ by A. A. Milne as published in Milne (1926)

We have now introduced all what is necessary to announce the principle result of our investigation.

Fig. 2
figure 2

A representation of the ‘energy distribution’ of the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’ (Milne 1926) as listed in Table 1. The blue graph (Series 1) represents the energy radiated by the story per energy level (column ‘Energies from data \(E(E_i)\)’ of Table 1), the red graph (Series 2) represents the energy radiated by the Bose–Einstein model of the story per energy level (column ‘Energies Bose–Einstein’ of Table 1), and the green graph (Series 3) represents the energy radiated by the Maxwell–Boltzmann model of the story per energy level (column ‘Energies Maxwell–Boltzmann’ of Table 1)

When we determine the two constants A and B, respectively C and D, in the Bose–Einstein distribution function (9) and Maxwell–Boltzmann distribution function (12), by putting the total number of particles of the model equal to the total number of words of the considered piece of text, (10) and (13), and by putting the total energy of the model to the total energy of the considered piece of text, (11) and (14), we find a remarkable good fit of the Bose–Einstein modeling function with the data of the piece of text, and a big deviation of the Maxwell–Boltzmann modeling function with respect to the data of the piece of text.

The result is expressed in the graphs of Fig. 1a, where the blue graph represents the data, hence the numbers in column ‘Energies from data \(E(E_i)\)’ of Table 1, the red graph represents the quantities obtained by the Bose–Einstein model, hence the quantities in column ‘Bose–Einstein modeling’ of Table 1, and the green graph represents the quantities obtained by the Maxwell–Boltzmann model, hence the quantities of column ‘Energies Maxwell–Boltzmann’ of Table 1. We can easily see in Fig. 1a how the blue and red graphs almost coincide, while the green graph deviates abundantly from the two other graphs which shows how Bose–Einstein statistics is a very good model for the data we collected from the Winnie the Pooh story, while Maxwell–Boltzmann statistics completely fails to model these data.

To construct the two models, we also considered the energies, and expressed as a second constraint the conditions (11), (14), that the total energy of the Bose–Einstein model and the total energy of the Maxwell–Boltzmann model are both equal the total energy of the data of the Winnie the Pooh story. The result of both constraints, (10), (13) and (11), (14) on the energy functions that express the amount of energy per energy level—or, to use the language customarily used for light, the frequency spectrum of light—can be seen in Fig. 2. We see again that the red graph, which represent the Bose–Einstein radiation spectrum, is a much better model for the blue graph, which represents the experimental radiation spectrum, as compared to the green graph, which represents the Maxwell–Boltzmann radiation spectrum.

Both solutions, the Bose–Einstein shown in the red graph, and the Maxwell–Boltzmann shown in the green graph, have been found by making use of a computer program calculating the values of A, B, C and D such that (10), (11), (13) and (14) are satisfied, which gives the approximate values

$$\begin{aligned} A \approx 1.0078 \quad B \approx 593.51 \quad C \approx 0.0353 \quad D \approx 93.63 \end{aligned}$$
(15)

In the graphs of Fig. 2, we can see that a maximum is reached for the energy level \(E_{71}\), corresponding to the word First, which appears seven times in the Winnie the Pooh story. If we use the analogy with light, we can say that the radiation spectrum of the story ‘In Which Piglet Meets a Heffalump’ has a maximum at First, which would hence be, again in analogy with light, the dominant color of the storyFootnote 1. We have indicated this radiation peak in Table 1, where we can see that the amount of energy the story radiates, following the Bose–Einstein model, is 522.79.

Due to their shape, the graphs in Fig. 1a are not easily comparable, and although quite obviously the blue and red graphs are almost overlapping, while the blue and green graphs are very different, which shows that the data are well modeled by Bose–Einstein statistics and not well modeled by Maxwell–Boltzmann statistics, it is interesting to consider a transformation where we apply the \(\log\) function to both the x-values, i.e. the domain values, and the y-values, i.e. the image values, of the functions underlying the graphs. This is a well-known technique to render functions giving rise to this type of graphs more easily comparable.

In Fig. 1b, the graphs can be seen where we have taken the \(\log\) of the x-coordinates and also the \(\log\) of the y-coordinates of the graph representing the data, which is again the blue graph in Fig. 1b, of the graph representing the Bose–Einstein distribution model of these data, which is the red graph in Fig. 1b, and of the graph representing the Maxwell–Boltzmann distribution model of the data, which is the green graph in Fig. 1b. For readers acquainted with Zipf’s law as it appears in human language, they will recognize Zipf’s graph in the blue graph of Fig. 1b. It is indeed the \(\log /\log\) graph of ‘ranking’ versus ’numbers of appearances’ of the text of the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’, which is the ‘definition’ of Zipf’s graph. As to be expected, we see Zipf’s law being satisfied, the blue graph is well approximated by a straight line with negative gradient close to -1. We see that the Bose–Einstein graph still models very well this Zipf’s graph, and what is more, it also models the (small) deviation from Zipf’s graph of the straight line. Zipf’s law and the corresponding straight line when a \(\log /\log\) graph is drawn is an empirical law. Intrigued by the modeling of the Bose–Einstein statistics by the Zipf graph, we have analyzed this correspondence in detail in Sect. 4.

In the next section, however, we want to describe what a Bose gas is in physics, when it is brought nearby its state of Bose–Einstein condensate, with the aim of identifying the physical equivalent to the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’ and other pieces of texts which we will also consider.

3 The Bose–Einstein Condensate in Physics

We will explain in this section different aspects related to the experimental realization of a Bose gas near to it being a Bose–Einstein condensate where most of the bosons are in the lowest energy state. The awareness of the existence of this special state of a Bose gas came about as a consequence of a peculiar exchange between the Indian physicist Satyendra Nath Bose and Albert Einstein (Bose 1924; Einstein 1924, 1925). Bose actually devised a new way to derive Planck’s radiation law for light—which has the form of a Bose–Einstein statistics, hence, like we now know, being a consequence of the indistinguishability of the photon as a boson, but that was not known in these pre-quantum theory times—and sent the draft of his calculation to Einstein. Although what Bose did was far from being fully understood in that time, the new method of calculation must have caught right away the full attention of Einstein, because he translated the article from English to German and supported its publication in one of the most important scientific journals of that time (Bose 1924). Einstein himself then, inspired by Bose’s method, worked out a new model and calculation for an atomic gas consisting of bosons, and predicted the existence of what we now call a Bose–Einstein condensate, an amazing accomplishment, taken into account that the difference between bosons and fermions and the Pauli exclusion principle were not yet known (Einstein 1924, 1925). Because of the intense study of Bose–Einstein condensates that took off after their first experimental realizations (Anderson et al. 1995; Bradley et al. 1995; Davis et al. 1995), a lot of new knowledge, experimental, but also theoretical, has been obtained, material on which we built upon for some of the details of the present article (Ketterle and van Druten 1996; Parkins and Walls 1998; Dalfovo et al. 1999; Ketterle et al. 1999; Görlitz et al. 2001; Henn et al. 2008).

The principle idea is still the one foreseen by Einstein, namely to take a dilute gas of boson particles and then stepwise lower its temperature and as a consequence its total energy such that at a certain moment there is so little energy in the gas that all boson particles are forced to transition to the lowest energy state. At that moment, all boson particles are in the same state, namely this lowest energy state, and the gas behaves then in a way for which there is no classical equivalent—we will see that given our conceptuality interpretation of quantum theory and the boson gas model we built here for human language, we will be able to put forward a new way to view the indistinguishability that lies at the heart of a Bose–Einstein condensate (see Sect. 5).

The Bose–Einstein condensates that have been realized so far all consist mainly of massive boson particles, hence generally atoms with integer spins, which makes them bosons. Indeed, the situation of the bosons of light, i.e. of photons, is more complicated, because photons interact so abundantly with matter that their number is never constant, which makes it difficult to realize a thermal equilibrium in this case, albeit not impossible (Klaers et al. 2010a, b, 2011; Klaers and Weitz 2013). We do want to keep using our analogy of language with light, although of course the pieces of texts that we will study contain a fixed number of words, but a dynamic use of human language will also give rise to a continuous coming into existence of new words, which means that for such a dynamic situation the example of light is probably even more representative than gases with a fixed number of atoms. In this stage of our analysis, also because they are the more easy to realize Bose–Einstein condensates, we however focus on massive bosons, hence atoms with integer spins.

The underlying idea is that the gas consists of atoms in a good approximation not interacting with each other, hence only carrying the kinetic energy \(K = p^2 / 2m\) generated by random movements due to the temperature T. It can be shown that in this situation the average kinetic energy of a free particle equals \(K= \pi kT\), where k is Boltzmann’s constant, hence we have

$$\begin{aligned} {\overline{p^2} \over 2m} = \pi kT \end{aligned}$$
(16)

where m is the mass of the atoms and p the absolute value of their momentum. From (16) and de Broglie’s formula \(\lambda = h / p\) we can calculate the ‘thermal de Broglie wave length’ \(\lambda _{th}\) of the atoms of the gas

$$\begin{aligned} \lambda _{th} = {h \over \sqrt{2 \pi m k T}} \end{aligned}$$
(17)

Let us make things more concrete and calculate this thermal de Broglie wave lengths for the atoms that were used in the Bose–Einstein condensates realized by Eric Cornell and Carl Wieman at the University of Colorado at Boulder in their NIST-JILA lab (Anderson et al. 1995), and by the group led by Wolfgang Ketterle at MIT, for which they jointly were attributed the Nobel Prize in physics in 1999. At Cornell they used a vapor of rubidium 87 atoms in a number density of \(2.5 \times 10^{12}\) atoms per cubic centimeter, cooled down to a temperature of 170 nanokelvin, to see the condensate fraction appear containing an estimated 2000 atoms and be preserved for more than 15 seconds. At MIT, they used a dilute gas of sodium atoms in a number density higher than \(10^{14}\) atoms per cubic centimeter to realize the formation of a condensate containing up to 500,000 atoms at a temperature of 2 microkelvin, with a lifetime of 2 seconds.

Let us calculate \(\lambda _{th}\) for both these condensate formations. Next to the values of Planck’s and Boltzmann’s constants, and the value of \(\pi\), we only need the value of the mass of a rubidium 87 atom and of a sodium atom to do the calculation. The atomic mass of a rubidium 87 atom and of a sodium atom are, respectively, 86.909180527 and 22.989769 unified atomic mass units, and given that one such unified atomic mass unit is \(1.66053904 \times 10^{-27}\,{\mathrm{kg}}\) we get

$$\begin{aligned}&m_{Rb} \approx 1.44316 \times 10^{-25}\,{\mathrm{kg}} \end{aligned}$$
(18)
$$\begin{aligned}&m_{Na} \approx 3.81754 \times 10^{-26}\,{\mathrm{kg}} \end{aligned}$$
(19)
$$\begin{aligned}&h \approx 6.62607004 \times 10^{-34}\,{\mathrm{kg\,m}}^2/{\mathrm{s}} \end{aligned}$$
(20)
$$\begin{aligned}&k \approx 1.38065 \times 10^{-23}\,{\mathrm{kg\,m}}^2 / {\mathrm{s}}^2 {\mathrm{K}} \end{aligned}$$
(21)
$$\begin{aligned}&\pi \approx 3.14159 \end{aligned}$$
(22)

Using the above values into (17), we obtain for the rubidium gas at 170 nanokelvin and the sodium gas at 2 microkelvin

$$\begin{aligned}&\lambda _{thRb} \approx 4.54195 \times 10^{-7}\,{{\mathrm{m}}} \approx 454\, {\mathrm{nm}} \end{aligned}$$
(23)
$$\begin{aligned}&\lambda _{thNa} \approx 2.57465 \times 10^{-7}\,{{\mathrm{m}}} \approx 257\, {\mathrm{nm}} \end{aligned}$$
(24)

Often one can read that in states of the Bose gas that are ‘nearing the Bose–Einstein condensate’, the ‘de Broglie waves’ of the particles start to ‘overlap’, and that this is the reason why quantum effects become dominant. There is an interesting measure to express in a quantitative way this notion of ‘overlapping de Broglie waves’ and it is called the ‘phase space density’ \(\rho _{ps}\) of the boson gas

$$\begin{aligned} \rho _{ps} = n \times \lambda _{th}^3 \end{aligned}$$
(25)

where n is the ‘atom density’ of the gas expressed in ‘number of atoms per cubic centimeter’. From (25) follows that \(\rho _{ps}\) corresponds to the number of atoms in a region of space of the ‘de Broglie wave’ cube size. If this number is much smaller than 1, this means that the de Broglie wave length is much smaller than the distance between the atoms, hence there will be no overlapping and the gas will behave classically. The more this number is greater than 1, the more the de Broglie waves of the atoms are overlapping, hence quantum behavior will increase. It has been shown (Bagnato et al. 1987) that independent of the trapping device used for the atoms, a box, or a magnetic trap—which is the one used in actually realized Bose–Einstein condensates—the condensate starts to form whenever the value of \(\rho _{ps}\) is such that

$$\begin{aligned} 2.612 \le \rho _{ps} \end{aligned}$$
(26)

Considering (17) and (25), the value of \(\rho _{ps}\) in the process of formation of a Bose–Einstein condensate is determined by the temperature T and number density n of the atom gas. In the last stage of the formation, the temperature is lowered by a technique called ‘evaporative cooling under influence of a radio frequency field’. The effect is that also the number density decreases, hence to attain the quantum regime of overlapping de Broglie wave lengths it is necessary to lower the temperature faster than diluting the gas. The group at MIT mentions explicitly the number density that they reached when the Bose–Einstein condensate is formed, namely, between \(10^{14}\) and \(4 \times 10^{14}\) atoms per cubic centimeter (Davis et al. 1995). The Boulder group, since they identified the formation of their rubidium Bose–Einstein condensate at a temperature of \(170\, {{\mathrm{nK}}}\), taking into account (26), we can calculate that the number density of the rubidium gas must have been around \(2.8 \times 10^{13}\) atoms per cubic centimeter.

We give in Table 2 an overview of the energies and lengths that are characteristic for the realizations of the sodium condensate in MIT (Ketterle et al. 1999). Because the gas is very dilute and the temperature is very low, the size of the atoms is very small compared to the distance between the atoms, while the thermal de Broglie wave lengths are large, such that they are overlapping. With each length scale l there is an associated energy scale which is the kinetic energy \(K = \pi kT\) of a particle with a de Broglie wavelength l, that is

$$\begin{aligned} K \approx {h^2 \over 2ml^2} \end{aligned}$$
(27)

gives a good indication of the relation between sizes and energies.

A good measure for the size of atoms which are diluted like in the considered boson gas is the so-called elastic s-scattering length \(a = l/2\pi\). For sodium this has been measured to be 3 nanometers, which using (27) corresponds to an energy of 1 millikelvin in temperature (Marte et al. 2002). Around this temperature elastic s-wave scattering between the atoms will be dominant.

Table 2 Energy and length scales of the sodium Bose–Einstein condensate

The separation between the atoms in the gas can be estimated by considering the cubic root \(n^{1 \over 3}\) of the number density, which gives us the number of atoms spread out over 1 centimeter. For sodium, with a number density higher than \(10^{14}\) atoms per cubic centimeter, this gives rise to a spacing between the atoms of around 200 nanometers. The length l can be calculated by making use of (26) which gives us the following estimate for l

$$\begin{aligned} 2.612 \approx n \times \lambda _{th}^3 \Leftrightarrow (2.612)^{1/3} \approx n^{1/3} \times {l \over \sqrt{\pi }} \Leftrightarrow l \approx {\sqrt{\pi } \times (2.612)^{1/3} \over n^{1/3}} \end{aligned}$$
(28)

and hence, by making use of (27) we find that E is around \(2\, \upmu {\mathrm{K}}\).

A temperature of around \(1\, \upmu {\mathrm{K}}\) gives rise to a thermal de Broglie wavelength of around \(300\, {\mathrm{nm}}\).

The largest length scale is related to the confinement characterized by the size of the box potential or by the oscillator length \(a_{HO} = {1 \over 2\pi }\sqrt{h / m \nu }\), which is the typical size of the ground state wave function in a harmonic oscillator potential of frequency \(\nu\) (see “Appendix 2”). With \(\nu = 10\, {\mathrm{Hz}}\), we get a value for \(a_{HO}\) of about \(6.5\, \upmu {{\mathrm{m}}}\). The energy scale related to the confinement is characterized by the harmonic oscillator energy level spacing, given by \(h \nu\). Again, for \(\nu = 10\, {\mathrm{Hz}}\) we get an energy value for the spacing of about \(0.5\, {{\mathrm{nK}}}\).

In Table 3, we made the calculations of length and energy scales for the rubidium 78 Bose–Einstein condensate, taking into account that a density of around \(2.8 \times 10^{13}\) atoms per cubic centimeter was realized within the condensate of 2000 atoms.

Table 3 Energy and length scales of rubidium Bose–Einstein condensate

We want to show now that our Bose–Einstein distribution model of the Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’ is well modeled by a Bose gas close to the Bose–Einstein condensate of this gas, and will take the rubidium and sodium gases that we described in as inspiration. What is important to notice is the difference in order of magnitude between the energy level spacings of the harmonic trap oscillator, they are of the order of \(1\, {\mathrm{nK}}\), and the energies involved with the gas itself, of the order of \(1\, \upmu \mathrm{K}\). The Winnie the Pooh story ‘In Which Piglet Meets a Heffalump’ is not in a Bose–Einstein condensate state, because then all the words of the story should be the word And, populating the zero energy level. So, it is in a state which is close to a Bose–Einstein condensate.

We have not yet explained what the parameters A and B of (9) are for the situation of a physical boson gas, for which the Bose–Einstein distribution is often written as

$$\begin{aligned} N(E_i) = {g_i \over {e^{{E_i-\upmu \over kT}}-1}} \end{aligned}$$
(29)

where \(\upmu\) is called the ‘chemical potential’, and \(g_i\) the ‘multiplicity’. The multiplicity \(g_i\) of a specific energy level \(E_i\) is the number of states that are different but have this same energy \(E_i\). That different states can have the same energy is connected to the symmetries of the configuration, often spatial ones. For example, for the most simple model of the harmonic trap, the one of a quantum harmonic oscillator, the multiplicity in s dimensions equals

$$\begin{aligned} {(n + s -1)! \over n!(s-1)!} \end{aligned}$$
(30)

which becomes \((n+1)(n+2) / 2\) in 3 dimensions, \((n+1)\) in 2 dimensions, and 1 in the one-dimensional situation. The different dimensions are relevant for the Bose–Einstein condensates realized in laboratories, because, although the boson gas exists always in 3 dimensions, often the harmonic traps give rise to very elongated cigar-like configurations, such that a quantum description in terms of an effective one-dimensional harmonic oscillator is a better model. Anyhow, for the text of the Winnie the Pooh story we do not have to hesitate about its dimension, pronouncing a text while reading it is certainly one-dimensional. Also a written text, although materialized on a page which is two dimensional, is a one-dimensional structure. This means that in the formula for the Bose–Einstein distribution we have rightly taken \(g_i = 1\) for every energy level \(E_i\).

What about the ‘chemical potential’ \(\upmu\)? There is another quantity which is introduced with respect to it which is called the ‘fugacity’

$$\begin{aligned} f = e^{\upmu \over kT} = {1 \over A} \end{aligned}$$
(31)

If we look at (29), taking into account that \(g_i = 1\) and \(E_0 = 0\), we get

$$\begin{aligned} N(E_0)&= N_0 = {1 \over {e^{{-{\upmu \over kT}}}-1}} = {f \over 1 - f} \end{aligned}$$
(32)
$$\begin{aligned}\Leftrightarrow & f = {N_0 \over 1+ N_0} \end{aligned}$$
(33)
$$\begin{aligned}\Leftrightarrow & \upmu = kT \log {N_0 \over 1+ N_0} \end{aligned}$$
(34)

which means that the chemical potential and the fugacity are determined by the number \(N_0\) of particles that are in the lowest energy state, hence the number of particles that are in the condensate state. More specifically, for the Winnie the Pooh story we find

$$\begin{aligned} f \approx 0.9923 \quad \upmu \approx -4.581 \end{aligned}$$
(35)

Let us note that from (33) follows that the fugacity is a number contained between 1/2 and 1, in case we have at least one particle in the condensate state, and the chemical potential is a negative number, they respectively approach 1 and 0 when the condensate grows in terms of number of particles in the lowest energy level. For what concerns the second constant B, we have

$$\begin{aligned} B = kT \end{aligned}$$
(36)

which means that the second constant B is given by the temperature of the Bose gas.

The rubidium condensate is a better example for the Winnie the Pooh story, as also the number of atoms, 2000, is of the same order of magnitude as the number of words, 2655, of the Winnie the Pooh story. The energy levels of the trap for the rubidium condensate are of the order of \(1\, {\mathrm{nK}}\), while the temperature of the gas is \(170\, {\mathrm{nK}}\) (Table 3), which is 170 times bigger. We see for the Winnie the Pooh story that if we take 1 unit of energy for the energy level spacings, we have \(B = kT = 593\), following (15), and hence \({1 \over 2} kT\), being a good estimate for the average energy per atom of a one-dimensional gas, gives for the latter 271, which means that we are in this respect also in the same order of magnitude for the Winnie the Pooh story and the rubidium condensate. Hence, we can say that the Winnie the Pooh story can be looked at as behaving similarly to a Bose gas of rubidium 87 atoms in one-dimension at a temperature of \(170\, {\mathrm{nK}}\). We will see in Section 4, where we consider the text of the novel ‘Gulliver’s Travels’ of Jonathan Swift (Swift 1726), that the sodium condensate is a better example for this text.

Let us introduce a second piece of text in Table 4, namely a story entitled ‘The magic shop’ written by Herbert George Wells (Wells 1903), with which we want to illustrate an aspect of our ‘Bose gas representation of human language’ that we have not yet touched upon. For the Winnie the Pooh story, If we look at Fig. 2 and Table 1, we can see that the ‘energy spectrum’ does not cover the whole range of possible energy values. Indeed, the red graph of Fig. 2 on the right hand side of the graph has still a substantial value, and is not at all close to zero. Hence one can wonder what happens further on for the higher energy spectrum with this graph?

On the low energy spectrum, the amount of radiation increases starting from zero radiation for energy level \(E_0\), hence for the words that are captured in the zero energy level of the Bose–Einstein condensate, there is no radiation emerging from them following the considered choice of zero in the energy scale—for the case of the Winnie the Pooh story, the zero level energy state puts the cogniton in state And—and then the amount of radiation increases steeply—we have already a radiation of 111 energy units (and 105.84 in the Bose–Einstein model) for \(E_1\) for the Winnie the Pooh story and the cogniton in state He. The energy radiation keeps increasing steeply—182 for \(E_2\) (179.36 for the Bose–Einstein model) for the cogniton in state The, 255 for \(E_3\) (233.36 for the Bose–Einstein model) for the cogniton in state It, 280 for \(E_4\) (274.65 for the Bose–Einstein model) for the cogniton in state A, 345 for \(E_5\) (307.23 for the Bose–Einstein model) for the cogniton in state To, etc.—to reach a maximum at \(E_{71}\) with a radiation level of 522.79 energy units for the cogniton in state First. Then the radiation starts to decrease slowly. But, remark that at energy level \(E_{542}\), with the cogniton in state You’ve, which is the highest energy level of Table 1, we still have a radiation of 385.55 energy units, which is more than half of the maximum radiation reached at energy level \(E_{71}\) for the cogniton in state First.

How can we understand this, because we have in Table 1 exhausted all the words of the Winnie the Pooh story and hence seemingly represented all possible energy levels. But is this true? To see clear in this, we have to reflect about the difference of the numbers in the third and the fourth column of Table 1, respectively the ‘numbers of appearances’ of the specific words in the Winnie the Pooh story and the ‘values of the Bose–Einstein distribution that we used to model these numbers of appearances’. The values in the fourth column are of a probabilistic nature and express averages of stories ‘similar’ to the one of Winnie the Pooh with respect to the numbers of appearances of the specific words, while the values in the third column express real counts for one specific story. More concretely, by ‘similar’ we actually mean ‘containing the same total number of words, and containing the same total amount of energy’. Remember indeed that the Bose–Einstein distribution function only contains two parameters, which hence will be determined by the total number of words and the total amount of energy. Or to put it even more concretely, suppose we would collect a vast number of pieces of ‘meaningful’ texts all containing the same total number of words N and the same amount of total energy E, the Bose–Einstein distribution function (9) is then supposed to model a specific type of average that can be obtained for all these texts, and the more numerous these texts the better this average will correspond with the Bose–Einstein distribution function. The reason is that this function is the consequence of the limit process in statistical mechanics of a micro-canonical ensemble of states of particles with the same N and E (Bose 1924; Einstein 1924, 1925; Huang 1987).

The above reasoning indicates that we can consider to introduce a ‘place for words that do no appear in the considered text but could have appeared’. Remark that these new words do not add to the sum N of all words, since they have ‘number of appearance zero’, which means that this operation of ‘adding new words’ leaves N unchanged. In the ranking of energy levels, they have to be classified by ‘additional energy levels higher than the highest one we now identified with respect to the last alphabetically classified word that appears one time in the text’. Remark that also E remains unchanged by this adding of words that could have appeared. Indeed, although these new added words carry high energies, since all of them have appearance number zero, they do not add to the total amount of energy because the product of the energy of an even very high energy level with the zero of its number of appearances equals zero. Since N and E are left unchanged by the adding of these new words that could have appeared also the micro-canonical ensemble and its thermodynamical equilibrium remain unchanged. However the adding of the new words does alter substantially the Bose–Einstein distribution function and the Maxwell–Boltzmann distribution function calculated to model the data, because they both do not have appearance values equal to zero for these words, which means that there will be contributions to the total number of words and the total energy of their modeling. Hence, this operation of adding words such that the energy spectrum completes itself over the whole range is a necessary operation in the modeling with Bose–Einstein or Maxwell–Boltzmann.

Again more concretely, let us consider the words that appear one time in the Winnie the Pooh story, and look for synonyms of these words, then a word that appears now one time could not have appeared and instead its synonym could then have appeared. So, the synonyms can be listed in a new set of words to add with zero appearance, as ‘could have appeared’, and indeed, the Bose–Einstein distribution function will not be zero for them, which expresses exactly this ‘they could have appeared’.

To illustrate the above, we consider the H. G. Wells story ‘The magic shop’ (Wells 1903) for which we have classified its words in energy levels in Table 4. As we can see, the energy level \(E_{1153}\) corresponding to the state of the cogniton characterized by the word Youngster, would have been the highest energy level in case we had stopped, like we did for the Winnie the Pooh story, to add energy levels at the ‘one word appearance number’. For this new story ‘The magic shop’ we have however added the ‘zero word appearance number’ explicitly, starting with Garden, which is a word that does not appear in the story, synonym of Yard of energy level \(E_{1149}\) and we attributed energy level \(E_{1154}\) to the cogniton in a state characterized by Garden. And indeed, in the third column in the row where Garden appears in Table 4 there is 0, indicating that Garden does not appear in the story ‘The magic shop’. In the fourth column, in the row of Garden in Table 4, we however have 0.25, which is the value of the Bose–Einstein distribution function at energy level \(E_{1154}\), and in the fifth column, in the row of Garden in Table 4, we have 0.07, which is the value of the Maxwell–Boltzmann distribution function at energy level \(E_{1154}\). Both numbers indicate that ‘Garden could have appeared in a story similar to the H. G. Wells story’, because they are not zero. These numbers are linked to the probability of Garden to appear in a similar story than the story of ‘The magic shop’ in the way we explained above. And indeed there should be not zeros in these places because there is a probability that Garden would appear in such a similar story. We added the word Okay at energy level \(E_{1155}\) as synonym of Yes at energy level \(E_{1150}\), as a new not appearing state of the cogniton, however potentially appearing in a similar story. We continued in the same way adding Junior as synonym of Youngster, but there are no synonyms of You’d and You’re, which gives us the occasion to mention that the added words that could appear in a similar story do not have to be synonyms.

Table 4 An energy scale representation of the words of the story ‘The magic shop’ by H. G. Wells as published in Wells (1903)

The only criterion is that ‘they appear in a meaningful story with the same total number of words and the same total energy’. Hence, adding synonyms is a simple way to ensure that the whole story remains meaningful, but also a completely new meaningful part to the story can be added with words that are no synonyms’.

So, we added many more energy levels, namely till the cogniton being in energy level \(E_{3500}\). We have only shown the seven last ones of these words in Table 4, namely Continued, Adding, Mention, Similar, Criterion, Obviously and Appearing, having zero number of appearances in the H. G. Wells story, but their Bose–Einstein value in the Bose–Einstein model, as well as their Maxwell–Boltzmann value in the Maxwell–Boltzmann model, being not zero.

In Fig. 3a, b, we have represented, respectively, the numbers of the appearing and not appearing words with respect to the energy levels, a graph very steeply going down, and the \(\log /\log\) graphs of these numbers of appearances, where we take the logarithm of both y and x. In Fig. 4, we have represented the amounts of radiated energy with respect to the energy levels, and we see that this time the red graph representing the Bose–Einstein model of the data, after steeply going up and reaching a maximum, goes slowly down to touch closely the zero level of amount of energy radiated for high energy level cognitons. We see again, like in Fig. 1, that the Bose–Einstein distribution function, the red graph, gives an almost complete fit with the data, the blue graph, and gives definitely a much better fit than the Maxwell–Boltzmann distribution function, the green graph, does. Let us look more carefully to the amounts of energy graphs in Fig. 4. Also here we see that the red graph, which is the Bose–Einstein distribution, is a much better fit for the blue graph of the data, than the green graph, which is the Maxwell–Boltzmann distribution. We see that the maximum amount of radiation is reached at energy level \(E_{70}\) in the state of the cogniton characterized by Door and the amount is 652.55204 energy units. So the frequency of Door would be the dominant color with which the story ‘The magic shop’ shines.

Comparing with the Winnie the Pooh story, we have a higher temperature, kT equals 722 instead of 593, a higher fugacity, f equals 0.9951 instead of 0.9923, and a higher chemical potential, \(\upmu\) is \(-3.576\) instead of \(-4.581\). This will be generally so when we consider longer texts like again will be illustrated by the text of ‘Gulliver’s Travels’ considered in Sect. 4. We mentioned already that the sodium condensate realized at MIT, which we described above in detail, is a better model for the ‘magic shop’ story, and indeed, in Table 2 we can see that the harmonic oscillator level spacing for the sodium condensate is around \(0.5\, {\mathrm{nK}}\) while the temperature of the sodium gas is \(1\, {\mathrm{mK}}\), which is a factor 2000 in difference of size. In Table 4, we see that we have 3500 energy levels for the story ‘The magic shop’, which is of the same order of magnitude. The number of atoms in the MIT sodium condensate was estimated to be 500,000, which is way more still than the number of words in the H. G. Wells story ‘The magic shop’, which is 3934. When we analyze larger texts that come closer to this size, such as the text of Gulliver’s Travels in Sect. 4, we find an even better correspondence in magnitudes with the data of the sodium condensate. But before showing this, we have to investigate more in depth another aspect of our modeling, namely the aspect related to the ‘global energy level structure’.

Fig. 3
figure 3

In a the numbers of appearances of words in the H. G. Wells story ‘The magic shop’ (Wells 1903) is represented, ranked from lowest energy level, corresponding to the most often appearing word, to highest energy level, corresponding to the least often appearing word, as listed in Table 4. The blue graph (Series 1) represents the data, i.e. the collected numbers of appearances from the story (column ‘Appearance numbers \(N(E_i)\)’ of Table 4), the red graph (Series 2) is a Bose–Einstein distribution model for these numbers of appearances (column ‘Bose–Einstein modeling’ of Table 4), and the green graph (Series 3) is a Maxwell–Boltzmann distribution model (column ‘Maxwell–Boltzmann modeling’ of Table 4). In b the \(\log / \log\) graphs of the appearance numbers distributions are represented. The red and blue graphs coincide almost completely in both a and b while the green graph does not coincide at all with the blue graph of the data. This shows that the Bose–Einstein distribution is a good model for the numbers of appearances while the Maxwell–Boltzmann distribution is not

We have not yet revealed the parameters A, B, C and D for the story ‘The magic shop’, they have the following values

$$\begin{aligned}&A \approx 1.0005 \quad B \approx 722.05 \quad f \approx 0.9951 \quad \upmu \approx -3.576 \quad C \approx 0.0531 \quad D \approx 208.28 \end{aligned}$$
(37)

There are two quantum models that also in physics are used as an inspiration for the energy level structure of the trapped atoms, one is the ‘harmonic oscillator and its variations’ (“Appendix 2”) and the other is the ‘particle in a box and its variations’ (“Appendix 1”). From the harmonic oscillator model follows that the energy levels are equally (linearly) spaced, which is also the way we have modeled them for the two examples that we have considered, the Winnie the Pooh story and the H. G. Wells story. However, the energy levels of the particle in a box are quadratically spaced. We will see in the following of our analysis that in view of our experimental findings in analyzing numerous texts in all generality, the energy levels of the cognitons, depending on the story considered, are spaced following a power law, with a power coefficient which is in principle between 0 and 2, but for all the stories that we investigated was between 0.75 and 1.25. This indicates that different energy situations on both sides of the ‘harmonic oscillator’ are at play, from the ‘anharmonic oscillator’, with converging spacings between energy levels, to the ‘particle in a box’, with quadratic spacings between energy levels. We will show in next section how this generalization for the energy spacings strengthens the correspondence with Zipf’s law in human language.

Fig. 4
figure 4

A representation of the ‘energy distribution’ of the H. G. Wells story ‘The magic shop’ (Wells 1903) as listed in Table 4. The blue graph represents the energy radiated by the story per energy level (column ‘Energies from data \(E(E_i)\)’ of Table 4), the red graph represents the energy radiated by the Bose–Einstein model of the story per energy level (column ‘Energies Bose–Einstein’ of Table 4), and the green graph represents the energy radiated by the Maxwell–Boltzmann model of the story per energy level (column ‘Energies Maxwell–Boltzmann’ of Table 4)

4 Zipf’s Law and the Bose Gas of Human Language

Zipf’s law is considered to be one of the mysterious structures encountered in language (Zipf 1935, 1949). It was originally noted in its most simple form in the following way. When ranking words according to their numbers of appearances in a piece of text, the product of the rank with the number of appearances is a constant. Hence Zipf’s law was originally stated mathematically as follows

$$\begin{aligned} R \times N = c \end{aligned}$$
(38)

where R is the rank, N the number of appearances, and c is a constant. We have presented in Figure 5 the products \(R_i \times N_i\) for the text of the Winnie the Pooh story that we have investigated in Sect. 2, where \(R_i\) is the i-th Zipf’s ranking and \(N_i\) is the number of appearances corresponding to this ranking. The x-coordinate of the graphs in Fig. 5 represents the ranks \(R_i\), and the y-coordinate represents the products \(R_i \times N_i\) for the blue graph, and the values of respectively the Bose–Einstein distribution, and the Maxwell–Boltzmann distribution for the red and green graphs.

It is not a coincidence that there is a striking resemblance between the graphs shown in Fig. 5 and the energy distribution graphs of the Winnie the Pooh story as a boson gas shown in Fig. 2. Indeed, the energy levels \(E_i\) that we introduced are very simply related to the Zipf rankings \(R_i\), the only difference being that we started with value zero for the lowest energy level, while Zipf started with value 1 for his first rank. Hence, more concretely, we have

$$\begin{aligned} R_i = E_i +1 \end{aligned}$$
(39)

This means that although none of the values of the Zipf products in Fig. 5 is equal to the energies in Fig. 5, the differences are small, because \(R_i\) equals \(E_i +1\).

Fig. 5
figure 5

The blue graph (Series 1) is a representation of the products \(R_i \times N_i\) for the text of the Winnie the Pooh story that we have investigated in Sect. 2, where \(R_i\) is the i-th rank in Zipf’s ranking and \(N_i\) is the number of appearances corresponding to this ranking. The x-ccordinate represents the ranks \(R_i\), and the y-coordinate represents the products \(R_i \times N_i\). For the red graph (Series 2) and the green graph (Series 3) the values of respectively the Bose–Einstein distribution and the Maxwell–Boltzmann distribution which we developed in Sect. 2 were used as a comparison with the graph in Figure 2

Consulting Table 1, we can see that the biggest difference is at the zero point of the graph, where on the x-axis \(E_0 = 0\) and \(R_0 = 1\), hence between the product \(R_0 \times N_0\), which equals \((E_0+1) \times N_0\), that is between \(1 \times 133 = 133\) and \(E_0 \times N_0 = 0 \times 133 = 0\). This can not easily be seen as a difference between the graphs of Fig. 5 and the graphs of Fig. 2, since 133 is still little compared to the values the functions take at \(R_1\) and \(E_1\). Again consulting Table 1, we indeed see that \(R_1 \times N_1 = (E_1+1) \times N_1 = 2 \times 111 = 222\), while \(E_1 \times N_1 = 1 \times 111 = 111\). This means that both the ‘product graph’ of Fig. 5 and the ‘energy distribution graph’ of Fig. 2 go quickly up between \(R_0\) and \(R_1\) and between \(E_0\) and \(E_1\), the first from value 113 to value 222, and the second from value 0 to value 111, which is almost with the same steepness. Both graphs will then remain increasing quite quickly and then slowly flatten till they reach their maxima at Zipf rank \(R_{70}\) and energy level \(E_{71}\). Then, from this maximum on, both the Zipf product and the energy distribution slowly decrease from their maxima to a lower value. More specifically, the maximum value is 522.79 in both cases, and for the last considered Zipf rank \(R_{542}\) and energy level \(E_{542}\) we find values 359.22 and 358.55 respectively. This shows that there is a decreasing for the Zipf products and not constancy like Zipf’s law predicts.

In the foregoing reasoning on Zipf’s law, we have always considered the two graphs, the blue and the red one, in both Figs. 5 and 2. Of course, Zipf did not know of the Bose–Einstein distribution that is represented by the red graph in both figures, and which we used to model the data, represented by the blue graph in both figures. Hence Zipf only had the blue graph in Fig. 5 available to come up with the hypothesis that the product of rank and number of appearances is a constant. If one considers the blue graph in Fig. 5, one could indeed imagine it to vary around a constant function, certainly in the middle part of the graph. The beginning part can then be considered as a deviation, which is also what Zipf did when noting that in the first ranks the law did not hold up well. It was also known to Zipf that the end part of the graph, as a consequence of how ranks and numbers of appearances behave there, making the product go up and down heavily, did not behave very well with respect to his law either, and the slight downward slope all at the end was identified by Zipf as well. We see it explicitly pictured by the red graph, representing the Bose–Einstein distribution modeling of the data.

There is however another aspect of the situation which was overlooked by Zipf. It is self-evident that ‘if Zipf’s law is a law, it has to be a probabilistic law’. Let us specify what we mean by this. Suppose we had a large number of texts available with exactly the same number of different words in it, such that a Zipf analysis would lead to the same total number of ranks for each of the texts. Zipf’s graphs, including the ‘product graph’, i.e. the blue graph in Fig. 5, will then show a statistical pattern for the set of texts where it is tested on. Suppose we make averages for the numbers of appearances pertaining to the same rank over the available texts, then the function representing these averages of the numbers of appearances for the different texts will be a distribution function with a steep upward slope in the first ranks going towards a maximum and then a slow downwards slope in the ranks after this maximum. It will be a function similar to the Bose–Einstein distribution we have used to model texts as Bose gases, i.e. the red graph. This will be even more so when we add the two constraints that in our case follow naturally from our modeling, namely that the different texts need to count the same total number of words, and the sum of the products, which in our interpretation of the Bose gas model is the total energy, needs to be the same for each one of the texts. What is however more important still is that ‘if Zipf’s law is a probabilistic law, we should also introduce rankings that represent words with a zero number of appearances’, exactly like what we have done for the H. G. Wells story ‘The magic shop’, for which we have represented the data and the Bose–Einstein model in Table 4, and the graphs representing these data in Figs. 3a, in b and in 4.

If we look carefully at the energy distribution graph in Fig. 4, we can understand again somewhat better why Zipf came to believe that the products of the ranks and the numbers of appearances are a constant. Indeed, having added the zero number of appearance till the energy distribution becomes close to zero in the high energy levels, like shown in Fig. 4, we can see how the blue graph goes first far up where the one word appearance cases are, to compensate the long row of zero appearance cases that take a great part of the x-axis. So, if one leaves out the zero appearance part, one easily can get the impression that the blue graph represents a constant on average, at least when neglecting the low energy levels at the start, where it goes steeply up.

Fig. 6
figure 6

Representation of the \(\log /\log\) graphs of the Zipf data. The blue graph represents the data (Series 1), the red graph represents the Bose–Einstein model (Series 2), the green graph represents the Maxwell–Boltzmann model (Series 3) and the purple graph represents a straight line (Series 4) that is an ‘as good as possible approximation’ of the other graphs to illustrate that the gradient of the ‘straight line approximation’ is not equal to \(-1\)

Most of the investigations of Zipf’s findings afterwards concentrated on the \(\log /\log\) graph representation, where the \(\log\) is taken for the rank as well as for the numbers of appearances, hence the Zipf equivalents for the \(\log /\log\) graphs we considered for our Bose gas modeling represented in Fig. 1b and in Fig. 3b. For what concerns Zipf’s law expressed in (38), the \(\log /\log\) graph of the Zipf product gives rise to a straight line with gradient equal to \(-1\). Indeed, when we take the \(\log\) of both sides of (38) we get

$$\begin{aligned} \log R + \log N = \log c \end{aligned}$$
(40)

which graph, with \(\log R\) on the x-axis and \(\log N\) on the y-axis, is a straight line with gradient equal to \(-1\). It is indeed much more easy to see by the naked eye that such a \(\log /\log\) graph like those in Fig. 1b and in Fig. 3b can be approximated well by a straight line as compared to seeing the constancy of the Zipf’s products in a graph like the one in Fig. 5, where the constancy needs to be approximated to the up and down moving blue graph. However, the focus of all Zipf’s investigations on the \(\log /\log\) graphs also has its down side, in the sense that the upper and lower parts of the graph will be more easily considered as slight deviations of the straight line, while, as we see with our Bose–Einstein distribution modeling in its energy graph version, they really represent essential and significant deviations from Zipf’s original product law (38). That in both Fig. 1b and in Fig. 3b the graphs are slightly bent towards a concave form is the expression of Zipf’s law essentially not being satisfied for low ranks and high ranks.

The foregoing analysis is meant to provide evidence to the Bose–Einstein distribution being a better model for the Zipf data than a constant, or also still than later more complex versions of Zipf’s law along the lines of still believing that the product graph is in good approximation a constant, and the \(\log /\log\) version in good approximation a straight line. There is however another aspect of Zipf’s finding that we want to put forward here, since it will be important for our model of a Bose gas for human language.

In Fig. 6, we represented the \(\log /\log\) graphs of the Zipf data (blue graph) and the Bose–Einstein (red graph) and Maxwell–Boltzmann (green graph) distributions which we used to model them, and we added a straight line (purple graph) that approximates the other graphs as good as possible. We can see that the gradient of the straight line is not equal to \(-1\), but to \(-0.94\). Although Zipf himself kept focusing on the straight line with gradient \(-1\), it was noted by many who studied Zipf’s law that a generalization was needed to take into account the gradient of the straight line usually being smaller than \(-1\), hence the \(\log /\log\) version of law was generalized to

$$\begin{aligned} p\log R + \log N = \log c \end{aligned}$$
(41)

which made the original product of rank and frequency be generalized to

$$\begin{aligned} R^p \times N = c \end{aligned}$$
(42)

where p is called the ‘power coefficient’ of Zipf’s law.

We will apply this ‘power coefficient’ in Zipf’s law also in our modeling. Let us explain why and how we will do so. First of all, there is no a priori reason why the energy levels would be as simple as we presented it in the two examples that we considered, namely such that

$$\begin{aligned} E_i = i (E_1 - E_0)+E_0 \end{aligned}$$
(43)

where \(E_1 - E_0\) is the unit of energy that we introduced. Of course, we have systematically taken \(E_0 = 0\), see (1), which makes the energy levels we have introduced in both stories even more simple, but it is not necessarily so that \(E_0 = 0\) as a rule, which is why we now formulate the ‘linear system of energy levels’ as in (43). This simple linear system is inspired by the energy levels of the quantum harmonic oscillator (“Appendix 2”), where we have

$$\begin{aligned} E_i = {h \nu \over 2} + i h \nu \end{aligned}$$
(44)

with \(\nu\) being the frequency of the oscillator. But that energy spacings between consecutive energy levels are the same, like in the case of the harmonic oscillator, is a very exceptional situation of quantization. For general quantized systems the spacings between consecutive energy levels will not be the same, and both cases exist, for not confined quantized situations the spacings will decrease, while for confined situations the spacings will increase. For example, for the quantized energy levels of the ‘particle in a box’ (“Appendix 1”), we have

$$\begin{aligned} E_i = {h^2 \over 8mL^2} + {h^2 \over 8mL^2} i^2 \end{aligned}$$
(45)

which means that the energy levels change quadratically in function of the unit of energy

$$\begin{aligned} E_i = i^2 (E_1 - E_0) + E_0 \end{aligned}$$
(46)

Remark that in “Appendices 1 and 2” we have used n to indicate the ‘quantum numbers’, because that is the traditional letter used for quantum numbers within standard quantum theory. In the approach we followed we have used i to indicate the ‘energy levels’, because we do not want to make a direct and exclusive reference to standard quantum theory alone, since our aim is to also make a connection with Zipf’s law in language. More generally, we want to elaborate a ‘quantum cognition theory’ for ‘human language and cognition’ from basic principles on a more foundational level than the one where standard quantum theory is situated, building on earlier work in quantum cognition and quantum computer science (Aerts and Aerts 1995; Khrennikov 1999; Atmanspacher et al. 2002; Gabora and Aerts 2002; van Rijsbergen 2004; Aerts and Czachor 2004; Widdows 2004; Bruza and Cole 2005; Busemeyer et al. 2006; Pothos and Busemeyer 2009; Lambert Mogiliansky et al. 2009; Bruza et al. 2009; Busemeyer and Bruza 2012; Dalla Chiara et al. 2012, 2015; Haven and Khrennikov 2013; Melucci 2015; Pothos et al. 2015; Blutner and beim Graben 2016; Moreira and Wichert 2016; Broekaert et al. 2017; Gabora and Kitto 2017; Busemeyer and Wang 2018).

In this we will also be inspired by the global foundational work we have done in our Brussels group (Aerts 1986, 1990, 1999, 2009b; Aerts et al. 2010, 2012, 2013a, 2018a, 2019a, 2011; Aerts and Gabora 2005a, b; Aerts et al. 2013b; Aerts and de Bianchi 2014, 2017; Aerts et al. 2016; Aerts and Sozzo 2011, 2014; Aerts et al. 2015a, 2016; Sassoli de Bianchi 2011, 2013, 2014, 2019; Sozzo 2014, 2015, 2017, 2019; Veloz et al. 2014; Veloz and Desjardins 2015), and by the more specific work on the ‘conceptuality interpretation’ (Aerts 2009a, 2010a, b, 2013, 2014; Aerts et al. 2018d, 2019c). To mention a concrete aspect in need of a more foundational approach, there is yet no well identified spatial domain for human language, which means that we will have to build a ‘quantum cognition’ without reference to space (Aerts 1999; Sassoli de Bianchi 2019).

The ‘harmonic oscillator’ and the ‘particle in a box’ are both special cases where the one-dimensional Schrödinger equation can be solved analytically, but for boson gases power law potentials have been studied as more general models (Bagnato et al. 1987), and hence we will also introduce in our approach a more general variation of the energy levels than the linear one, namely one of a ‘power law change’

$$\begin{aligned} E_i = i^p (E_1 - E_0) + E_0 \end{aligned}$$
(47)

Let us show right away how the introduction of a power law for the energy level spacings gives extra strength to the Bose–Einstein modeling of the texts of stories expressed in human language. This time we choose a much larger text than the two ones we investigated before, namely the text of the satirical work Gulliver’s Travels by Jonathan Swift (Swift 1726), which contains in total 103184 words, hence of the order of 40 times more than the Winnie the Pooh story and 25 times more than the H. G. Wells story. When analyzed as the Winnie the Pooh and the H. G. Wells story, with the hypothesis of equally spaces energy levels, or, which is equivalent, with a power coefficient spacing of the energy levels with power coefficient equal to 1, we find a total of 8294 energy levels without adding the zero number of appearances levels, and the ten highest numbers of appearances and their corresponding words are The, 5838, Of, 3791, And, 3633, To, 3400, I, 2852, A, 2442, In, 1976, My, 1593, That, 1280 and Was, 1263.

Table 5 The eleven lowest energy levels of the novel Gulliver’s Travels by Jonathan Swift (Swift 1726). The values of the Bose–Einstein model are compared with the data, i.e. the numbers of appearances of the words in the text in (a) without the introduction of a power coefficient and in (b) with the introduction of a power coefficient. The comparison for all energy levels can be seen for (a) in Fig. 7a and for (b) in Fig. 7b

In Fig. 7a, we represented the \(\log /\log\) version of the ‘numbers of appearances’ graphs for the Gulliver’s Travels story, the blue graph representing the data, the red graph the Bose–Einstein model, and the green graph the Maxwell–Boltzmann model. We can see right away that again the Bose–Einstein model is a much better representation of the data than the Maxwell–Boltzmann model, but we can also see that it is a less good representation of the data than it was the case for the Winnie the Pooh story and the H. G. Wells story. Indeed, the red graph indicates noticeably too high values in the low energy levels and for a large region in the middle energy levels it has values that are too low. In Table 5 (a) we give the eleven lowest energy levels values of the Bose–Einstein distribution model corresponding to the states of the cognitons, i.e. the corresponding words, and compare with the data, and see that the first ones are too high, while the following ones are too low.

Fig. 7
figure 7

The log/log graph of the frequency distributions of the novel ‘Gulliver’s Travels’ (Swift 1726). In a it is shown how the Bose–Einstein distribution represented by the red graph (Series 2), although still a much better model than the Maxwell–Boltzmann distribution represented by the green graph (Series 3), fails to be as good a model when compared with the Winnie the Pooh story and the H. G. Wells story (Figs. 1b and 3b). Indeed, its values (Table 5) are too high in the lowest energy levels and too low in the middle energy levels, when compared to the data represented by the blue graph (Series 1). However, with addition of the power coefficient 1.08, applied to the spacings between energy levels, in b it is shown how the Bose–Einstein distribution model is again a very good model for the data. See Table 5 for the explicit values of the eleven lowest energy levels

For the lowest energy level, with cognitons in state The, we find the Bose–Einstein distribution to have a value of 16454.07 while The appears only 5838 times in the Gulliver’s Travels text. This is indeed a big difference, the Bose–Einstein is more than three times the experimental value of the number of appearances. We find a similar too high value for the Bose–Einstein distribution for the two next states of the cognitons, the state Of has a Bose–Einstein distribution value of 6297.00, while Of appears only 3791 in the text, the state And has a Bose–Einstein distribution value of 3893.39, while And appears only 3633 times in the text. For the next states of the cognitons the Bose–Einstein model, however, gives values too low with respect to the experimental data. For To the Bose–Einstein distribution value is 2817.73 while To appears 3400 times in the text, for I the Bose–Einstein distribution value is 2207.73 while I appears 2852 times, for A the Bose–Einstein distribution value is 1814.80 while it appears 2442 times, for In the Bose–Einstein distribution value is 1540.59 while it appears 1976 times, for My the Bose–Einstein distribution value is 1338.35 while it appears 1593 times, for That the Bose–Einstein distribution value is 1183.03 and it appears 1280 times, for Was the Bose–Einstein distribution value is 1060.00 and it appears 1263 times, and for Me the Bose–Einstein distribution value is 960.14 while Me appears 991 times in the text of the Gulliver’s Travels story.

We will now apply a ‘power law’ to the spacings between the energy levels, as per (47), and will see that we can come to a much better match of the Bose–Einstein distribution with the data. Indeed, after applying the power \(p=1.08\) to the energy spacings between the energy intervals, we found an almost perfect match and represented the \(\log /\log\) version of the graphs in Fig. 7b. The values for the eleven lowest energy levels data compared with the Bose–Einstein model with power coefficient 1.08 are given in Table 5 (b).

Fig. 8
figure 8

A representation of the ‘energy distribution’ of the story of Gulliver’s Travels (Swift 1726). The blue graph (Series 1) represents the energy radiated by the story per energy level, the red graph (Series 2) represents the energy radiated by the Bose–Einstein model of the story per energy level, and the green graph (Series 3) represents the energy radiated by the Maxwell–Boltzmann model of the story per energy level. We have not added the highest energy levels radiation, but the very slowly descending slope after the maximum 18377.11 has been reached at energy level 43.65, shows that many levels will have to be added with zero number of appearance words for the Bose–Einstein function to approximate zero

We have tested the Bose–Einstein model on a large number of stories, short stories and long stories of the size of novels, and when we allow the energy spacings between different energy levels to vary according to a power law, we have been able to construct a perfectly matching Bose–Einstein model for the data for all of the considered stories. The power that was each time needed was situated between 0.75 and 1.25.

We want to emphasize that it is remarkable how the application of the power 1.08 to the linear version of the text of the novel of Gulliver’s Travels makes the Bose–Einstein model fit so well the data, and we observed the same effect of the introduction of a power on an original linear version of the model for many of the other example texts that we investigated. We mentioned already how those who studied Zipf’s law came to add a power to take into account that the gradient of the best fitting straight line in the \(\log /\log\) version of the graphs was not equal to \(-1\). However, also the concave slightly curbed nature of the lowest energy level ranks was noticed and tried to be remedied by making the law more general still, however in purely ad hoc ways with the only aim to fit the data (Mandelbroth 1953; Mandelbrot 1954; Edmundson 1972). That this slight concave curb appears in the Bose–Einstein distribution as a consequence of adding a power to the spacings between energy levels in exactly a way to make it fit with the data is in this sense remarkable, and since we saw it happening in many of the other examples for different values of the power, it is a strong indication of the Bose–Einstein model touching onto a fundamental property of human language.

In Fig. 8, we have represented the low energy part of the ‘energy distribution’ of the story of Gulliver’s Travels (Swift 1726). The blue graph represents the energy radiated by the story per energy level, the red graph represents the energy radiated by the Bose–Einstein model of the story per energy level, and the green graph represents the energy radiated by the Maxwell–Boltzmann model of the story per energy level. We have not added the highest energy levels radiation because we wanted to show the detail of the low energy distribution, the one where the Bose–Einstein condensate dynamics of the text plays out. The maximum with a value of 18377.11 is reached at energy level 43.65 at quantum number 33, hence very close to the low level energies. The parameters A, B, C and D of the Bose–Einstein and Maxwell–Boltzmann models are

$$\begin{aligned} A \approx 1.00019 \quad B \approx 19356.22 \quad f \approx 0.9998 \quad \upmu \approx -3.648 \quad C \approx 0.0075 \quad D \approx 1355.31 \end{aligned}$$
(48)

Comparing with the Winnie the Pooh story and with the H. G. Wells story we have a higher temperature, kT equals 19356 instead of 722 or 593, a higher fugacity, f equals 0.9998 instead of 0.9951 or 0.9923, and a higher chemical potential, \(\upmu\) equals \(-3.648\) instead of \(-3.576\) or \(-4.581\). As we remarked already, when we compared the parameters for the Winnie the Pooh story and the H. G. Wells story, this is generally what we expect to happen for longer texts.

5 Identity and Indistinguishability

We want to reflect now on what can the obtained results teach us about the notions of ‘identity and indistinguishability’ with respect to how they are used in human language and in quantum theory. We also want to reflect on the way in which these results support the ‘conceptuality interpretation of quantum theory’ (Aerts 2009a, 2010a, 2013, 2014; Aerts et al. 2018d, 2019c). Before we start our analysis, we repeat that all the words appearing in the stories that we considered are ‘states’ of the ‘cogniton’, which is the entity that for human language is what a ‘photon’ is for light, or what a ‘rubidium 87 atom’ is for the rubidium gas used to fabricate the Bose–Einstein condensate (Anderson et al. 1995).

Let us first analyze how the issue of ‘identity and indistinguishability’ appears in quantum theory. It is structurally speaking a consequence of the generally adopted mathematical rule that wave functions should be symmetrized or anti-symmetrized, depending of whether the quantum particles in question are bosons or fermions. This entails that a multi-particle wave functions is always a superposition of products of the single particle building blocks of the multi-particle wave function, such that the different product pieces are chosen in a way that the total wave function is symmetric or anti-symmetric, depending on whether the composed quantum entity is a boson or a fermion. Let us make concrete what this means when we apply a quantum model to the text of the Winnie the Pooh story. The set of energy levels \(\{E_0, \ldots , E_{542}\}\) shown in Table 1 are in principle the energy levels for a one particle situation in quantum theory, and the many particle situation of a text is then described in a Hilbert space which is the tensor product of, in the case of the Winnie the Pooh story, 2655 Hilbert spaces of which each one describes a one particle situation. The symmetrization is obtained by a superposition of all possible permutations of the original products and a renormalization to make the wave function a unit vector.

Let us consider the very simple version of this symmetrization procedure for two boson quantum particles which we call A and B, to see how challenging it is to try to understand its meaning. Both particles, when not part of a composite system, are described by their wave functions \(\psi _A(x_A)\) and \(\psi _B(x_B)\), where \(x_A\) and \(x_B\) are variables we considered for respectively particle A and particle B. When the two particles are joined in a single composite system, the latter is described by the symmetrized wave function

$$\begin{aligned} \psi (x_A,x_B)=c(\psi _A(x_A)\psi _B(x_B)+\psi _B(x_B)\psi _A(x_A)) \end{aligned}$$
(49)

where c is the renormalization constant. To see to what type of problems this symmetrization procedure leads, suppose for a moment that \(x_A\) and \(x_B\) are position variables pertaining to separated regions of space \(R_A\) and \(R_B\), such that for both particles A and B we can understand \(\psi _A(x_A)\) and \(\psi _B(x_B)\) as being the wave function representing one particle A mainly present in this region of space \(R_A\), and another particle B mainly present in this region of space \(R_B\)\(\psi _A(x_A)\) and \(\psi _B(x_B)\) are for example wave packets which have negligible values outside respectively regions \(R_A\) and \(R_B\) of space. The symmetrized wave function \(\psi (x_A,x_B)\) describes then a composite quantum entity which however does not consist of one particle pertaining to the region \(R_A\) and another particle pertaining to the region \(R_B\), because it also predicts the presence of entanglement correlations between measurements performed in both regions \(R_A\) and \(R_B\). This entanglement was put into evidence originally by Einstein and two of his students, Boris Podolsky and Nathan Rosen, and the correlations it produces are now called EPR correlations (Einstein et al. 1935). The theoretical and experimental study of the EPR type of correlations has been one of the major subjects of quantum theory investigation for the last decades and resulted in showing that these correlations are non-local, so there is no longer any doubt in the physics community that the EPR type of correlations predicted by the entanglement carried in symmetrized states such as (49) constitute an intrinsic reality in the quantum world even if there is still an ongoing debate about how to understand them (Bohm 1951; Bell 1964, 1987; Aerts et al. 2019a).

Such a symmetrization for bosons and anti-symmetrization for fermions, following quantum theory, exists for all bosons and all fermions, which literally means that all identical quantum particles are entangled in this strong way, giving rise to non-local correlations of the EPR type. This state of affairs is still nowadays a serious unsolved and not understood conundrum for theoretical physics and philosophy of physics (Black 1952; Van Fraassen 1984; French and Redhead 1988; Saunders 2003, 2006; Muller and Seevinck 2009; Krause 2010; Dieks and Lubberdink 2011, 2019), and this stands in great contrast with how experimentalists go along with it, for example, photons pertaining to different energy levels, hence carrying different frequencies, are treated by them as distinguishable (Hong et al. 1987; Knill et al. 2001; Zhao et al. 2014). The way in which experimentalists look at the ‘indistinguishability’ of photons was expressed clearly in more recent times, because of the actual importance of the creation of entangled photons for different reasons, e.g. for the fabrication of optically based quantum computers, and hence the focus in quantum optics on how to achieve this. Spontaneous parametric down conversion, which is a nonlinear optical process that converts one photon of higher energy into a pair of photons of lower energy has been historically the process for the generation of entangled photon pairs for the well-known Bell’s inequality tests (Aspect et al. 1982; Weihs et al. 1998). Parametric down conversion is however an inefficient process because it has a low probability and hence physicists looked for other ways to produce entangled photons. Hence, when a scheme for using linear optics in function of the needs of the production of qubits was presented (Knill et al. 2001), this made arise an abundance of new research. Most of the applications of this new research rely on the two-photon interference effect with two ‘indistinguishable photons’ entering from different sides of a beam splitter and leaving in the same direction after undergoing the so called Hong-Ou-Mandel interference effect (Hong et al. 1987). The crucial aspect of Hong-Ou-Mandel interference is the ‘indistinguishability of the two photons in the spectral, temporal and polarization degrees of freedom’.

This stimulated the direct study of the ‘indistinguishability of photons from different sources’, with the finding that ‘for photons to behave as indistinguishable bosons neither their frequencies nor their arrival times at the beam splitter can be too different, otherwise they behave as distinguishable quantum particles’ (Lettow et al. 2010). What is however most significant for what concerns our take on this, and its value as support of our conceptuality interpretation of quantum theory (Aerts 2009a, 2010a, b, 2013, 2014; Aerts et al. 2018d, 2019c), is the result of an amazing experiment that was performed in the series of attempts of quantum opticians to create entanglement within linear optics by making use of the interference due to two photon indistinguishability. In this experiment, photons of different frequencies are used to enter the beam splitter, hence given earlier experiments (Lettow et al. 2010), these photons should not behave as indistinguishable bosons, but on the outgoing part of the beam splitter a setup is realized that ‘erases’ the information about the different frequencies of the incoming photons. The result of the experiment is that this erasing makes the photons of different frequencies behave as indistinguishable bosons (Zhao et al. 2014). This experiment shows that it is sufficient for the photons to be contextually indistinguishable when they are measured, for them to behave as indistinguishable bosons. We should actually not be amazed by this result, because this is what the so called ‘quantum eraser experiments’ are all about (Scully and Druhl 1982; Kim et al. 2000; Walborn et al. 2002), and if we carefully read the famous analysis of the double-slit experiment by Richard Feynman (Feynman et al. 1963; Feynman 1965), the dependence of interference on the possibility of the measurement apparatus to ‘know or not know about the available alternatives’, was already at the center of his analysis. Hence, given the above analysis and our conceptuality interpretation of quantum theory, we can now put forward our view on the issue of ‘identity and indistinguishability’ as follows.

The way in which we understand in a straightforward way ‘what identity and indistinguishability are with respect to human language and human mind’ teaches us ‘what identity and indistinguishability are in quantum theory’.

Let us formulate the reason why it makes sense to state our view as just expressed above given the conceptuality interpretation of quantum theory. The main hypothesis of the latter is that ‘the role played by the human mind in relation with language is the same as the role played by a measuring apparatus (but also a heat bath and also a context that is perhaps not willingly used by a human being to make a measurement) in relation with a collection of quantum entities’. The statement above in italics follows directly from this hypothesis.

Let us become more concrete and consider the text of the Winnie the Pooh story of which the words can be found in Table 1. We see that—and the reasoning we develop now can be made for any other of the considered words—the word Piglet corresponds to the cogniton being with energy \(E_8\), and it appears 47 times in the text of the story. In the quantum wave function that represents the story, which is a multipartite wave function formed by 2655 parts (the total number of words), Piglet is the state associated with 47 of its parts, or components. It is straightforward that each of the Piglet in each of the components can be interchanged with each other of the Piglet in each other of the components without the story being changed even in the slightest way. This means, in physics jargon, that the wave function is symmetric (or anti-symmetric) with respect to the interchange of all these Piglet components. And, the symmetry (or anti-symmetry) is a consequence of their ‘absolute indistinguishability’. It is also easy to understand that this ‘absolute indistinguishability’ is due to Piglet being a concept, and not an object. Indeed, let us imagine for a moment, just to make the above more clear still, that the scenery of the story would be pictured in some physical theatrical form with real piglets on the places where now the concept Piglet appears in the text. If we interchanged these real piglets, of course this would influence the physical scenery of the story. It is indeed not possible to ‘interchange a real physical piglet with another real physical piglet without changing the whole of the physical scenery’. That is why real piglets when put in baskets will follow a Maxwell–Boltzmann statistics and not a Bose–Einstein statistics as conceptual piglets do. The ‘interchanging of concepts in a piece of text’, hence in the components of the wave function representing this piece of text, is an intrinsically different operation than the ‘interchange of objects in space’, and the basic hypothesis of the conceptuality interpretation of quantum theory consists in believing that quantum particles are like concepts, and that the reason why we find their behavior not understandable is because we think of them as objects. One of the crucial difficulties when thinking of quantum particles as objects comes to the surface exactly in their behavior as indistinguishable entities, as for objects this is something impossible to understand, while for concepts it is something straightforward and natural.

Let us show now how we can also easily understand the difference we indicated above between theoretical physicists who are struggling with the issue that, following quantum theory, all photons should be identical, in contrast with experimental physicists who pragmatically consider photons of different frequency as distinguishable and hence not identical. Consider again the Winnie the Pooh story, although we all understand right away that all concepts in the Piglet state are ‘absolutely indistinguishable’, we also are convinced that two different energy states of the cogniton are distinguishable. For example, energy state \(E_{43}\), which is the concept Robin, appearing 12 times in the text, is distinguishable from, Piglet. It is even very important for the meaning carried by the story that these two states are distinguishable. In a very similar way, for any measuring apparatus that is sensitive to the frequency of light, it is very important that a red photon is distinguishable from a blue photon, e.g. for our eyes, but also, we suppose, for plants practicing photosynthesis. It is even the ‘essence of the measuring apparatus’ to ‘distinguish these two states’. However, when a special purpose apparatus is fabricated that, when we would read the Winnie the Pooh story, the points where Piglet appears are made not distinguishable any longer with the points where Robin appears—and there is a multitude of ways we can imagine this to be done—the two cognitons that are still read by us, will be indistinguishable. Again, such an operation consisting of completely erasing the Piglet nature and Robin nature of both concepts, can only work ‘because both are concepts and not objects’. Underneath all of the words of the Winnie the Pooh text is indeed the more abstract notion of Concept, and hence we can bring all words into this abstract state of just being an unspecified concept in the text, which would make all of them indistinguishable. There are different ways of ‘erasing’, some ways more close to the ontology of the concepts, other more close to the measuring itself, and that is also why the quantum eraser effect can be understood very well within the conceptuality interpretation (see Aerts 2009a Section 4.4).

Does the above mean that ‘words in different states are distinguishable’ and ‘words in the same state are indistinguishable’ and this clarifies all of the issue? Not yet, let us proceed in refining our analysis. It certainly does not mean that ‘words in different states are objects’, they are concepts, and hence behave like concepts, and not like objects. And since they are concepts, when being in different states, their ‘distinguishability’ is not what ‘distinguishability’ means for objects. We have to return to the main subject of our investigation to find this more subtle form of behavior of words in different states distinguishable as concepts and being at the origin of the disagreements between theoreticians and experimentalists when it comes to consider photons of the same frequency and photons of different frequencies. To start with, let is not forget that the radiation law for photons, including photons of different frequencies, is derived in statistical mechanics by considering these photons to obey Bose–Einstein statistics, and since in the foregoing sections we showed that Bose–Einstein statistics is valid for pieces of texts of stories containing a mixture of distinguishable and indistinguishable words, it should be possible to identify what happens differently with distinguishable concepts as compared to distinguishable objects which can lead to distinguishable concepts obeying Bose–Einstein statistics while distinguishable objects obey Maxwell–Boltzmann statistics. Let us start our analysis considering a very typical and simple situation used commonly to illustrate the difference between Bose–Einstein statistics and Maxwell–Boltzmann statistics.

Fig. 9
figure 9

Three typical configurations of two particles in two states

In Fig. 9 we have represented two particles, the balls, in two states, the boxes, and three different configurations of this situation. The first configuration consists of the two particles in the first state, the second configuration of the two particles in the second state, and the third configuration consists of one particle in one state and the other particle in the other state. If the two particles are indistinguishable in the way that customarily is looked upon quantum indistinguishability, which is also the reason that this example is often displayed, the probabilities that are attached within a Bose–Einstein statistics model are 1/3, 1/3 and 1/3 for each of the configurations. However, if the the two particles are indistinguishable classically, the probabilities that are attached within a Maxwell–Boltzmann statistics are 1/4, 1/4 and 1/2. The reason is that the last configuration of one particle in one state and the other particle in the other state is realized in two ways classically, one way, and its permuted way are different realities. Within the ‘quantum indistinguishability’ these two are not different realities, and given our conceptuality interpretation this would be explained by them indeed not being different realities if they are concepts. What however in case we consider the three configurations of Fig. 9 for distinguishable states of the cogniton, hence for distinguishable concepts? To make things more concrete, suppose we consider the concepts Cat and Dog and the configurations Two Cats, Two Dogs and A Cat And A Dog. Let us remark that this is exactly the situation we have studied already in great detail showing Bose–Einstein statistics to be a better representation as compared to Maxwell–Boltzmann statistics (Aerts 2009a; Aerts et al. 2015b; Beltran 2019). How can we understand that even for distinguishable concepts Bose–Einstein is a better statistics than Maxwell–Boltzmann? The reason is the presence of ‘entanglement’ and ‘superposition’ also for distinguishable concepts like Cat and Dog. Indeed, the probabilities 1/3, 1/3, 1/3 with Bose–Einstein, versus 1/4, 1/4, 1/2 with Maxwell–Boltzmann, actually mean that for Maxwell–Boltzmann there are much more microstates in the third configuration than there are in the first two configurations, actually the double amount. When there is no entanglement and no superposition, and hence Cat and Dog are ‘separated’, we can understand this. This ‘is’ what happens when Cat and Dog are objects, hence a real cat and a real dog. Let us make this concrete, suppose we visit a farm with a lot of cats and dogs living at the farm, equal in number, and we receive as a present two of them randomly chosen for us by the farmer, then we will have the double chance that the gift will be a cat and a dog as compared to the gift being two cats or two dogs. What however if we ask a child to which it is promised that he or she can have two pets and he or she can choose for each pet whether it is a cat or a dog. The microstates that come into play in this case exist in the conceptual realm of the child’s conceptual world, and there is no reason that within this conceptual world there will be a double amount of microstates for the choice of a cat and a dog as compared to the choices for two cats or two dogs. If there are two children that each apart choose one pet and do this independently of each other Maxwell–Boltzmann statistics will be the better one again, because the amount of microstates of the combination of the two choices will be the double of the amount of microstates playing a role for each child apart. This situation was investigated by us in many different and more complex configurations of this type with the result of Bose–Einstein being a better statistics than Maxwell–Boltzmann to model the situation (Aerts 2009a; Aerts et al. 2015b; Beltran 2019). Actually, we noticed already in our study of quantum entanglement with concept combinations that the violation of Bell’s inequalities comes about due to the combined exemplars (microstates) being exemplars of the combined concept directly (giving rise to the Bose–Einstein situation) and not being exemplars of the concepts apart that then afterwards are combined (giving rise to the Maxwell–Boltzmann situation) (Aerts and Sozzo 2011, 2014; Aerts et al. 2019b, a). In our investigation of the quantum superposition with concept combinations the situation is even more Bose–Einstein, because the exemplars of the combined concepts that play a role (microstates) are no longer combinations of exemplars of the single concepts, which means that their amount in average will be equal to the amount of exemplars of the single concepts, the situation hence fulfilling the basic requirement to be modeled by Bose–Einstein statistics (Aerts and Gabora 2005b; Aerts et al. 2010, 2012; Aerts 2011; Sozzo 2014; Aerts et al. 2015a; Sozzo 2015; Aerts et al. 2017). The insight that also combined distinguishable concepts tend to give rise to Bose–Einstein rather than Maxwell–Boltzmann statistics explains why it is so important for the thermal de Broglie wave-lengths to be large with respect to the distance between the quantum particles, the equivalent for human language always being fulfilled, for the Bose–Einstein statistics to be applicable and why the original Rayleigh Jeans radiation law for light, which is the Maxwell–Boltzmann version of the Planck radiation law, is satisfied for low frequencies.

We have not yet reflected about ‘identity’ in itself. With respect to ‘the identity’ of a quantum particle, it can be proven that when the wave function of two identical quantum particles is considered, there does not exist a self-adjoint operator in the Hilbert space of their states that can represent a measurement that would identify one of the quantum particles (French and Redhead 1988; Butterfield 1993). Can a concept be said to have an identity? Not in the way we understand identity for an object. What can be attributed to a concept is a ‘number’ indicating ‘the number of times it is’, and that, one could say, is what can be seen as substituting what identity is for an object. The fact that also a ‘number of times it is’ can be attributed to a quantum particle is again a support for the hypothesis of our conceptuality interpretation.

Taking into account our above analysis, what we can understand about the nature of reality goes further than what we have formulated till now, in case we interpret quantum theory following the conceptuality interpretation. Like we mentioned already, we showed in earlier work that ‘combinations of concepts’ give rise to quantum superposition (Aerts et al. 2015a). Every sentence in a text is a combination of concepts. Also every paragraph in a text is a combination of concepts, since sentences, as combinations of concepts, combine amongst each others to form paragraphs. Depending on the nature of the text, this process, of increasingly larger pieces of the text being essentially ‘combinations of concepts’, keeps going on, certainly up to the level of stories, where the overall meaning content of a story glues all its concepts together in specific combinations. This implies that superpositions will also form for large subsets of combined concepts, and we believe that this is exactly the mechanism which we call ‘understanding’ when the human mind is engaging in these pieces of text. More concretely, suppose the human mind reads a piece of text. When reading, there is no direct focus on single words as a collection, on the contrary, when the words are read, a ‘new state is being formed’, which integrates ‘the meaning carried by the combination of all the concerned concepts’. This new state carrying the meaning of the piece of text formed by the combination of these words is exactly the superposition state which we identified already in earlier work (Aerts et al. 2015a), and it are these superposition states that form again and again by combining concepts of sentences or paragraphs that again superpose in the course of the reading of the whole text, and lead to the understanding of the whole piece of text. A similar process takes place when talking, thinking or writing, albeit in general in a more discontinuous and complex way than when reading. We believe that what happens with a physical Bose gas close to its Bose–Einstein condensate state can be understood similarly. The role played by the human mind with respect to the text is now played by the heat bath and the measuring apparatuses applied with respect to the Bose gas. When the temperature is low enough and the diluteness of the gas is such that the phase space density (25) satisfies (26), hence the thermal de Broglie wave length (17) is larger than the distance between the atoms, this process of superposition formation starts to happen. Indeed, the de Broglie waves of the different atoms will overlap heavily and give rise to these superpositions, which means that the process which we call ‘understanding’ when the human mind and text are involved takes place in the Bose gas with the heat bath. These superpositions are new emergent states that do not pertain to one of the atoms any longer, but represent several atoms joining in a new entity, just like the several combined concepts represent an emergent meaning. The more the temperature is lowered and the density of the gas is kept such that the de Broglie waves overlap on larger and larger regions of the gas, the more new states are formed containing a synthetic material reality different from single atoms. The Bose–Einstein condensate is an ultimate state where all the atoms have been gathered in the lowest energy state so that for the whole gas a single new state has emerged. The stories that we have studied are in states close to this Bose–Einstein condensate state, where synthetic parts of combined concepts emerge in superposition states and the sizes of these parts are determined by the state of understanding of the human mind of the stories.