Introduction

Claude E. Shannon, the pioneer of information theory, presented in 1952 a “maze-solving machine” as one of the first proper technical cognitive systems [1].Footnote 1 It comprises a maze in form of a rectangular board partitioned into discrete cells that are partially separated by removable walls, and a magnetized “mouse” (nicknamed “Theseus”, after the ancient Greek hero) as a cognitive agent. The mouse possesses as an actuator a motorized electromagnet beneath the maze board. The magnet pulls the mouse through the maze. Sensation and memory are implemented by a circuit of relays, switching their states after encounters with a wall. In this way, Shannon technically realized a simple, non-hierarchic perception-action cycle (PAC) [2, 3], quite similar to the more sophisticated version depicted in Fig. 1 as a viable generalization of a cybernetic feedback loop.

Perception Action Cycle

In general, PAC forms the core of cognitive dynamic systems [4]. They describe the interaction of a cognitive agent with a dynamically changing world as shown in Fig. 1. The agent is equipped with sensors for the perception of its current state in the environment and with actuators allowing for active state changes. A central control prescribes goals and strategies for problem solving that could be trained by either trial-and-error learning as in Shannon’s construction, or, more generally, by reinforcement learning [4, 5].

Fig. 1
figure 1

Hierarchical bidirectional perception-action cycle (PAC) for a cognitive dynamic system after [3]. The scope of the present paper is indicated by the dashed boundary

In Shannon’s mouse-maze system, the motor (the actuator) pulls the mouse along a path until it bumps into a wall which is registered by a sensor. This perception is stored by switching a relay, subsequently avoiding the corresponding action. The behavior control prescribes a certain maze cell where the agent may find a “piece of cheese” as a goal. When the goal is eventually reached, no further action is necessary. In a first run, the mouse follows an irregular path according to a trial-and-error strategy, while building up a memory trace in the relay array. In every further run, the successfully learned path is pursued at once. However, when the operator modifies the arrangement of walls, the previously learned path becomes useless and the agent has to learn from the very beginning. Therefore, [1, p. 1238] concludes:

The maze-solver may be said to exhibit at a very primitive level the abilities to (1) solve problems by trial and error, (2) repeat the solutions without the errors, (3) add and correlate new information to a partial solution, (4) forget a solution when it is no longer applicable.

In Shannon’s seminal approach, the mouse learns by trial-and-error whenever it bumps into a wall. Its PAC comprises a simple stimulus-response loop as originally outlined in [2]. More sophisticated cognitive dynamic systems require hierarchically organized PAC such as that depicted in Fig. 1 where only two levels of analysis/synthesis and interpretation/articulation are indicated. [3] presented a multilayer PAC inspired by cortical connectivity. Additionally, both perception and actuation are characterized by bottom-up and top-down pathways, establishing a recurrent network architecture. Figure 1 reflects this by means of bidirectional connections for the prediction of subsequent states along the sensory pathway and the expectation of performed actuator movements along the motoric pathway. Finally, the PAC in Fig. 1 could be augmented by an “internal stage” as a mental model of the world [6]. Such PAC architectures are much more powerful than behavioristic stimulus-response systems and are also able to employ predictive coding strategies [7, 8], either by considering action as coding [5], or by anticipating events in the environment [9, 10].

Moreover, elaborated cognitive dynamic systems should be able to draw logical inferences and to communicate either with each other or with an external operator, respectively [11]. This requires higher levels of mental representations such as formal logics and grammars. Below we consider the symbolic representation system of language as a means of communication. In order to fulfill this function, word symbols are used as semantic representatives for the description of facts. The simplest case in which communication between two communication partners can be designed successfully is natural communication about facts in a shared direct environment. In this scenario, linguistic utterances can be placed in an environmental context so that their meaning can be determined in a truth-functional sense. In order to meet this requirement using a hierarchically organized PAC, a first information transformation of signal time series into ordered symbolic representations (analysis) and vice versa (synthesis) is required, which can be implemented using subsymbol-symbol-transducers (SST) [12]. This enables the access to the theoretical foundation of “Physical Symbol Systems” (PSS), with which—according to the “Physical Symbol Systems Hypothesis” (PSSH)—cognitive processes can henceforth be designed as information processing [13]. This includes a second information transformation from ordered symbol sequences into semantic representations (interpretation) and vice versa (articulation) as well as logical processing, with which the higher cognitive functions such as problem solving, logical reasoning or behavior control can be implemented. It should be noted that with each transformation, the information is converted into a more useful form. For example, the behavior of technical cognitive systems can be explained on the level at which logical information is available. In this study we focus on the representation of syntactic structures using vector symbolic architectures and therefore refer to the lower part of the hierarchically organized PAC in Fig. 1.

Consider, e.g., the operator’s utterance:

$$\begin{aligned} \mathtt{the} \ \mathtt{mouse} \ \mathtt{ate} \ \mathtt{cheese} \end{aligned}$$
(1)

(note that symbols will be set in typewriter font in order to abstract from their conventional meaning in the first place). In a rather traditional PAC framework as shown in Fig. 1, the acoustic signal has firstly to be analyzed in order to obtain a symbol-like phonetic representation. For understanding its meaning, the agent has secondly to process the utterance grammatically through syntactic parsing. Finally, the syntactic representation, e.g. in form of a phrase structure tree, must be interpreted as a semantic representation which the agent can ultimately understand [14]. Depending upon such understanding, the agent can draw logical inferences and derive the appropriate behavior for controlling the actuators. In case of verbal behavior [16], the agent therefore computes an appropriate response, first as a semantic representation, that is articulated into a symbolic syntactic and phonetic form and finally synthesized as an acoustic signal. In any case, high-level representations are symbol-like and their processing is approximately rule-driven, in contrast to low-level sensation and actuation where physical signals are essentially continuous.

Cognitive dynamical systems can be coupled together through communication channels in such a way that the action of one PAC (the sender) is registered as perception by the other PAC (the receiver) [6]. Principally, there are two possibilities: The action of the sender may either perturb or modify the behavior of the receiver. Both cases can be described in the framework of interactive computation [15] where the computation of finite input strings is replaced by the processing of (principally) unbounded data streams. In the latter case of modification, the perception changes the intrinsic autonomous dynamics of the receiver, whereas in the former case of perturbation, only the actual state is transformed by means of a state space operator assigned to the perception. If perception is given as verbal behavior (B), the meaning of a perceived utterance becomes a mapping from antecedent states A to consequent states C. This so-called antecedent-behavior-consequent (ABC) schema [16] provides the fundament of dynamic semantics [17,18,19].

[17] discussed several kinds of propositional dynamic semantics. In its most simple form it codifies the ABC schema of [16] in such a way that a cognitive agent is in an epistemic state A that comprises the set of all logical propositions the agent believes to be true. Then, another agent utters a proposition B that acts on the belief state A by means of set algebraic unification, such that the consequent state C becomes the union \(C = A \cup \{B\}\). [17] further elaborated this simple scheme of dynamic semantics into different directions, including belief revision and Bayesian update semantics. Another extension, called dynamic predicate logic by [18], made dynamic semantics more compositional in the sense of traditional AI (cf. [19]). The latest developments consider internal algebraic structures of the perturbing perceptional stream [20]. Then a treatment in terms of algebraic representation theory analogous to that in quantum mechanics becomes necessary [21, 22].

Neural Networks

Perception-action cycles such as that depicted in Fig. 1 have been implemented through neural networks by several researchers [9, 10, 23]. [10] presented a neural network PAC for a robot’s visually guided reaching and grasping behavior that is inspired by the cortical organization of the human visual-motor system. The study [23] solves the reinforcement learning problem of Shannon’s mouse-maze system through an architecture of coupled feed-forward and long-short-term memory networks (LSTM, [24, 25]). In [9], a theory of event coding for PAC is neurally implemented by a recurrent network architecture that is able to anticipate effects in the environment, thereby ultimately allowing for predictive coding as well [7, 8]. Predictive coding, by contrast, has been neurally modeled by [26] through multiple timescales recurrent neural networks (MTRNN) of leaky-integrator units [27].

Also deploying neural networks in language technology became increasingly important in recent time. Beginning with hard-wired recurrent neural architectures [22, 28, 2930], the advent of deep learning algorithms leads to state-of-the-art language processing through recursive neural networks (RNN, [31]), through LSTM networks and through convolutional neural networks (CNN, [32, 33]), with their most recent improvements, capsule networks [34, 35]; for a survey consult [36,37,3839]. Particularly interesting are latest attempts of Smolensky and collaborators to train tensor product recurrent networks (TPRN, [40, 4142]) which are able to directly learn symbol-like representations by end-to-end training under a special quantization regularization constraint. Moreover, such networks provide an instantaneous association of linguistic input with semantic representations, thereby circumventing syntactic analysis and parsing.

Despite these impressive achievements, DNN are intrinsic black-box models, propagating input patterns through their hidden layers toward the associated output patterns. The hidden layers may have several hundred-thousands up to some billions synaptic weight parameters that are trained by regularized gradient climbing algorithms. After training, the network develops a hidden representation of the input features and the computational rules to transform them into output. Yet these representations are completely opaque and nobody can really explain how input is mapped onto output by a DNN [32].

Explainability

Therefore, following [43], the next-generation AI must be explainable, robust and trustworthy. Creating explainable AI (XAI) is an important challenge for current research [44,45,46]. For this aim, it is mandatory not only to develop new algorithms and network architectures, such as, e.g., TPRN [40,41,42], but also conceptual understanding of their formal structures. According to [47]: “A theory of explainable AI, with a formal and universally agreed definition of what explanations are, is lacking.” A first step towards this direction has been achieved by [48] who distinguish between opaque, interpretable, and comprehensive systems as follows:

Opaque systems. A system where the mechanisms mapping inputs to outputs are invisible to the user. It can be seen as an oracle that makes predictions over an input, without indicating how and why predictions are made. [...] Similarly, systems relying on genuine “black box” approaches, for which inspection of the algorithm or implementation does not give insight into the system’s actual reasoning from inputs to corresponding outputs, are classified as opaque. [48]

That is, computations carried out by an opaque system are neither understandable nor explainable. On the one hand, DNN that are “black boxes” are opaque in this sense [45]. On the other hand, linear systems such as support vector machines (SVM) appear to be interpretable:

Interpretable systems. A system where a user cannot only see, but also study and understand how inputs are mathematically mapped to outputs. This implies model transparency, and requires a level of understanding of the technical details of the mapping. [...] SVMs and other linear classifiers are interpretable insofar as data classes are defined by their location relative to decision boundaries. But the action of deep neural networks, where input features may be automatically learned and transformed through non-linearities, is unlikely to be interpretable by most users. [48]

Interestingly, [46, p. 84] pointed out that interpretability is a more “passive characteristic of a model referring to the level at which a given model makes sense for a human observer.” This contrasts with the model’s explainability, that can be regarded as an “active characteristic of a model, denoting any action or procedure taken by a model with the intent of clarifying or detailing its internal functions” to the human observer. The most important trait of an interpretable system is its transparency; and classical symbolic AI [49, 5051] is intrinsically transparent and hence interpretable [52]. However, interpretability usually requires expert knowledge for understanding and explanation, since their internal representations and computations are expressed in terms of formal grammars or languages [52].

Comprehensible systems. A comprehensible system emits symbols along with its output [...]. These symbols (most often words, but also visualizations, etc.) allow the user to relate properties of the inputs to their output. The user is responsible for compiling and comprehending the symbols, relying on her own implicit form of knowledge and reasoning about them. This makes comprehensibility a graded notion, with the degree of a system’s comprehensibility corresponding to the relative ease or difficulty of the compilation and comprehension. [...] Different users may have different tolerances in their comprehension: some may be willing to draw arbitrary relationships between objects while others would only be satisfied under a highly constrained set of assumptions. ([48])

Therefore, a comprehensible system could be a black box, yet producing symbolic output, a user is able to reason over. However, as has been emphasized by [45, 46], different users may belong to different audiences with different levels of individual skills or interests. Thus, the notion of comprehensibility is highly contextual.

Finally, [48, p. 7] define an explainable system as a comprehensible one that is augmented by a “reasoning engine” that is able to justify the emitted symbolic output. Explainable systems call for “confidence, safety, security, privacy, ethics, fairness and trust” [52, p. 3]. The meaning of “trust” has been circumscribed by [53, p. 211] “as a psychological state in which an agent willingly and securely becomes vulnerable, or depends on, a trustee, having taken into consideration the characteristics of the trustee.”

Vector Symbolic Architectures

Originally, in his maze solving machine, Shannon used an array of relays as the agent’s memory. This has later been termed the “learning matrix” by [54]. Learning matrices and vector symbolic architectures (VSA) provide viable interfaces between hierarchically organized symbolic data structures such as phrase structure trees or semantic representations in traditional AI technology on the one hand [49,50,51] and continuous state space approaches as required for deep neural networks on the other hand [32, 55]. Beginning with seminal studies by [56] and [57], and later pursued by [58, 59], and [60] among many others, those architectures have been dubbed VSA by [61] (cf. also [62]).

In a VSA, symbols and variables are represented as filler and role vectors of some underlying linear spaces [36, 63], respectively. When a symbol is assigned to a variable, the corresponding filler vector is bound to the corresponding role vector. Different filler-role bindings can be bundled together to form a data structure [62], such as a list, a frame, or a table of a relational data base [64]. Those structures can be recursively bound to other fillers and further bundled together to yield arbitrarily complex data structures [59]. In order to avoid the “curse of dimensionality” [49] induced by the binding process, vector symbolic architectures usually employ some data compression and subsequent clean-up algorithms restricting their memory capacity by the signal-to-noise ratio [58]. However, also loss-less VSA can be devised making use of infinite-dimensional functional representations as in quantum automata [65] or neural field architectures [66].

In recent applications, VSA have been employed for semantic spaces [63, 67], logical inferences [68,69,70], data base queries [64, 71], non-projective formal grammars [72, 73], and autoassociative memories [74, 75]. [6] developed a VSA model for cognitive representations and their induction in Shannon’s mouse-maze system.

In this study, we focus on transparent and interpretable cognitive dynamical systems that are originally described as traditional symbolic AI processors. For the sake of simplicity, we discuss context-free grammars (CFG) and push-down automata for formal languages [51]. Focussing on the dashed region in Fig. 1 , we elaborate earlier approaches for VSA language processors [22, 59, 73]. We rigorously prove a representation theorem for vector space representations of uncompressed filler-role bindings. To this end, we propose a novel normal form for CFG, allowing to express CFG parse trees as terms over a symbolic term algebra. Rule-based derivations over that algebra are then faithfully represented as transformation matrices in Fock space [76, 77]. In contrast to blackbox DNN, uncompressed VSA are therefore transparent and hence interpretable. Thus, our approach would be a first step towards the abovementioned “theory of explainable AI” [47] that could lead to future developments of new machine learning algorithms and the construction of reasoning engines [48] that are appropriate for desired audiences [45, 46].

Methods

We start from a symbolic, rule-based system that can be described in terms of formal grammar and automata theory. Specifically, we chose context-free grammars (CFG) and push-down automata as their processors here for the ease of exposition [51]. Note that an even more involved, non-projective grammar formalism [72], namely minimalist grammar [78] has been treated in [73] previously.

In the second step, we reformulate these languages through term algebras and their processing through partial functions over term algebras. We introduce a novel normal form for CFG, called term normal form, and prove that any CFG can be transformed into term normal form. Finally, we introduce a vector symbolic architecture by assigning basis vectors of a high-dimensional linear space to the respective symbols and their roles in a phrase structure tree. We propose a recursive function for mapping CFG phrase structure trees onto representation vectors in Fock space and prove a representation theorem for the partial rule-based processing functions. Finally, we present a software toolbox, FockBox, for handling Fock space VSA representations [79].

Context-Free Grammars

Consider again the simple sentence (1) as a motivating example. According to linguistic theory, sentences such as (1) exhibit a hierarchical structure, indicating a logical subject-predicate relationship, that can be illustrated by different means, e.g. through dependency graphs [72] or phrase structure trees [51]. For the sake of simplicity, we use the latter representation here, because phrase structure trees are directly related to context-free grammars and their accepting push-down automata.Footnote 2 In (1) “the mouse” appears as subject and the phrase “ate cheese” as the predicate, which is further organized into a transitive verb “ate” and its direct object “cheese”. The hierarchical structure of sentence (1) can therefore be either expressed through regular brackets, as in (2)

$$\begin{aligned} \mathtt{[[[the] [mouse]] [ate [cheese]]]} \,, \end{aligned}$$
(2)

or, likewise as a phrase structure tree as in Fig. 2 

Fig. 2
figure 2

Phrase structure tree of example sentence (1)

In Fig. 2 every internal node of the tree denotes a syntactic category: S stands for “sentence”, NP for “noun phrase”, the sentence’s subject, VP for “verbal phrase”, the predicate, D for “determiner”, N for “noun”, and V for “verb”.

The phrase structure tree Fig. 2 immediately gives rise to a context-free grammar (CFG) by interpreting every branch as a rewriting rule in Chomsky normal form [51, 81]

$$\begin{aligned} \mathtt{S}&\rightarrow \mathtt{NP} \quad \mathtt{VP} \end{aligned}$$
(3)
$$\begin{aligned} \mathtt{NP}&\rightarrow \mathtt{D} \quad \mathtt{N} \end{aligned}$$
(4)
$$\begin{aligned} \mathtt{VP}&\rightarrow \mathtt{V} \quad \mathtt{N} \end{aligned}$$
(5)
$$\begin{aligned} \mathtt{D}&\rightarrow \mathtt{the} \end{aligned}$$
(6)
$$\begin{aligned} \mathtt{N}&\rightarrow \mathtt{mouse} \end{aligned}$$
(7)
$$\begin{aligned} \mathtt{V}&\rightarrow \mathtt{ate} \end{aligned}$$
(8)
$$\begin{aligned} \mathtt{N}&\rightarrow \mathtt{cheese} \end{aligned}$$
(9)

where one distinguishes between syntactical rules (35) and lexical rules (69), respectively. More abstractly, a CFG is given as a quadruple \(G = (T, N, \mathtt {S}, R)\), such that in our example \(T = \{ \mathtt{the}, \mathtt{mouse}, \mathtt{ate}, \mathtt{cheese} \}\) is the set of words or terminal symbols, \(N = \{ \mathtt{S}, \mathtt{NP}, \mathtt{VP}, \mathtt{D}, \mathtt{N}, \mathtt{V} \}\) is the set of categories or nonterminal symbols, \(\mathtt{S} \in N\) is the distinguished start symbol, and \(R \subset N \times (N \cup T)^*\) is a set of rules. A rule \(r = (A, \gamma ) \in R\) is usually written as a production \(r : A \rightarrow \gamma\) where \(A \in N\) denotes a category and \(\gamma \in (N \cup T)^*\) a finite string of terminals or categories of length \(n = |\gamma |\).

Context-free grammars can be processed by push-down automata [51]. Regarding psycholinguistic plausibility, the left-corner (LC) parser is particularly relevant because input-driven bottom-up and expectation-driven top-down processes are tightly intermingled with each other [82], resembling a predictive coding algorithm [8]. An LC parser possesses, such as any other push-down automaton, two memory tapes: firstly a working memory, called stack, operating in a last-in-first-out (LIFO) fashion, and secondly an input tape storing the sentence to be processed.Footnote 3 In the most simple cases, when a given CFG does not contain local ambiguities (as in (39) for our example (1)), an LC parser can work deterministically. The LC parsing algorithm operates in four different modes: i) if nothing else is possible and if the input tape is not empty, the first word of the input is shifted into the stack; ii) if the first symbol in the stack is the left corner of a syntactic rule, the first stack symbol is rewritten by a predicted category (indicated by square brackets in Table 1) followed by the left-hand side of the rule (project); iii) if a category in the stack was correctly predicted, the matching symbols are removed from the stack (complete); iv) if the input tape is empty and the stack only contains the start symbol of the grammar, the automaton moves into the accepting state; otherwise, syntactic language processing had failed. Applying the LC algorithm to our example CFG leads to the symbolic process shown in Table 1.

Table 1 Left-corner parser processing the example sentence (1). The stack expands to the left

The left-corner parser shown in Table 1 essentially operates autonomously in modes project, complete and accept, but interactively in shift mode. Thus, we can significantly simplify the parsing process through a mapping from one intermediary automaton configuration to another one that is mediated by the interactively shifted input word perturbing the intrinsic structure building dynamics [15]. Expressing the configurations as temporary phrase structure trees yields then the symbolic computation in Fig. 3.

Fig. 3
figure 3

Interactive LC parse of the example sentence (1)

The first tree in Fig. 3, denoted \(\emptyset\), simply corresponds to the empty stack in Table 1, step 0. The second tree is generated by applying the CFG LC rules project (6) and (4) in steps 2 and 3. Accordingly, the third tree is obtained from the second one after processing steps 4 to 6, and so on.

According to our previous definitions, the states of the processor are the automaton configurations in Table 1 or the temporary phrase structure trees in Fig. 3, that are both interpretable in terms of LC parsing and language processing for an informed expert observer. Moreover, the processing steps in the last column of Table 1 and also the interactive mappings Fig. 3 are interpretable for the observer. In principle, one could augment the left-corner parser with a “reasoning engine” [48] that translates the formal language used in those symbolic representations into everyday language for any desired audience [45, 46]. The result would be something like the (syntactic) “meaning” \([\![ w ]\!]\) of a word w that can be regarded as the operator mapping a tree in Fig. 3 to its successor. This interactive interpretation of meaning is well-known in dynamic semantics [17,18,19].

Algebraic Description

In order to prepare the construction of a vector symbolic architecture (VSA) [56,57,58,59,60,61] and its representation theory in the next steps, we need an algebraically more sophisticated description. This is provided by the concept of a term algebra [81]. A term algebra is defined over a signature \(\Sigma = (F, \mathrm {rank})\) where F is a finite set of function symbols and \(\mathrm {rank}: F \rightarrow \mathbb {N}_0\) is an arity function, assigning to each symbol \(f \in F\) an integer indicating the number of arguments that f has to take.

To apply this idea to a CFG, we introduce a new kind of grammar normal form that we call term normal form in the following. A CFG \(G = (T, N, \mathtt {S}, R)\) is said to be in term normal form when for every category \(A \in N\) holds: if A is expanded through \(n\in \mathbb {N}\) rules, \(r_1 : A \rightarrow \gamma _1\) to \(r_n : A \rightarrow \gamma _n\), then \(|\gamma _1| = \ldots = |\gamma _n|\).

It can be easily demonstrated that every CFG can be transformed into a weakly equivalent CFG in term normal form, where weak equivalence means that two different grammars derive the same context-free language. A proof is presented in Appendix 6.1.

Obviously, the rules (3) – (5) of our example above are already in term normal form, simply because they are not locally ambiguous. Thus, we define a term algebra by regarding the set of variables \(V = N \cup T\) as signature with arity function \(\mathrm {rank}: V \rightarrow \mathbb {N}_0\) such that i) \(\mathrm {rank}(a) = 0\) for all \(a \in T\), i.e. terminals are nullary symbols and hence constants; ii) \(\mathrm {rank}(A) = |\gamma |\) for categories \(A \in N\), that are expanded through rules \(A \rightarrow \gamma\). Moreover, when G is given in Chomsky normal form, for all categories \(A \in N\) appearing exclusively in lexical rules \(\mathrm {rank}(A) = 1\), i.e. lexical categories (D, N, V) are unary functions, whereas \(\mathrm {rank}(A) = 2\) for all categories \(A \in N\) that appear exclusively in syntactic rules, which are hence binary functions.

For a general CFG G in term normal form, we define the term algebra \(\mathfrak {T}(G)\) inductively: i) every terminal symbol \(a \in T\) is a term, \(a \in \mathfrak {T}(G)\). ii) Let \(A \in N\) be a category with \(\mathrm {rank}(A) = k\) and let \(t_0, \dots , t_{k-1} \in \mathfrak {T}(G)\) be terms, then \(A(t_0, \dots , t_{k-1}) \in \mathfrak {T}(G)\) is a term. Additionally, we want to describe LC phrase structure trees as well. To this end, we extend the signature by the predicted categories \(P = \{ [\mathtt {N}], [\mathtt {VP}] \}\), that are interpreted as constants with \(\mathrm {rank}(C) = 0\) for \(C \in P\). The enlarged term algebra is denoted by \(\mathfrak {T}_\mathrm {LC}(G)\). We also allow for \(\emptyset \in \mathfrak {T}_\mathrm {LC}(G)\).

In the LC term algebra \(\mathfrak {T}_\mathrm {LC}(G)\), we encode the tree of step 1 in Fig. 3 (beginning with the empty tree \(t_0 = \emptyset\) in step 0) as term

$$\begin{aligned} t_1 = \mathtt{NP}(\mathtt{D}(\mathtt{the}), \mathtt{[N]}) \end{aligned}$$
(10)

because \(\mathrm {rank}(\mathtt{NP}) = 2\), \(\mathrm {rank}(\mathtt{D}) = 1\), and \(\mathrm {rank}(\mathtt{the}) = \mathrm {rank}(\mathtt{[N]}) = 0\). Likewise we obtain

$$\begin{aligned} t_2 = \mathtt{S}(\mathtt{NP}(\mathtt{D}(\mathtt{the}), \mathtt{N}(\mathtt{mouse})), \mathtt{[VP]}) \end{aligned}$$
(11)

as the term representation of the succeeding step 2 in Fig. 3.

Next, we define several partial functions over \(\mathfrak {T}_\mathrm {LC}(G)\) as follows [56, 83].

$$\begin{aligned} \mathrm {cat}(A(t_0, \dots , t_k))&= A \end{aligned}$$
(12)
$$\begin{aligned} \mathrm {ex}_i(A(t_0, \dots , t_k))&= t_i \end{aligned}$$
(13)
$$\begin{aligned} \mathrm {cons}_k(A, t_0, \dots , t_k)&= A(t_0, \dots , t_k) \,. \end{aligned}$$
(14)

Here, the function \(\mathrm {cat}: \mathfrak {T}_\mathrm {LC}(G) \rightarrow N\) yields the category, i.e. the function symbol A of the term \(A(t_0, \dots , t_k) \in \mathfrak {T}_\mathrm {LC}(G)\). The functions \(\mathrm {ex}_i: \mathfrak {T}_\mathrm {LC}(G) \rightarrow \mathfrak {T}_\mathrm {LC}(G)\) for term extraction and \(\mathrm {cons}_k: N \times \mathfrak {T}_\mathrm {LC}(G)^{k + 1} \rightarrow \mathfrak {T}_\mathrm {LC}(G)\) as term constructor are defined only partially, when \(A(t_0, \dots ,t_k) \in \mathrm {Dom}(\mathrm {ex}_i)\), if \(k = \mathrm {rank}(A) - 1\) and \(i < k\), as well as \((A, t_0, \dots , t_k) \in \mathrm {Dom}(\mathrm {cons}_k)\), if \(k = \mathrm {rank}(A) - 1\).

By means of the term transformations (1214) we can express the action of an incrementally and interactively shifted word \(a \in T\) that perturbs the intrinsic autonomous structure-building dynamics of a cognitive system [15] through a term operator \([\![ a ]\!] : \mathfrak {T}_\mathrm {LC}(G) \rightarrow \mathfrak {T}_\mathrm {LC}(G)\). For the transition from, e.g., LC tree 1 to LC tree 2 in Fig. 3 we obtain

$$\begin{aligned}{}[\![ \mathtt{mouse} ]\!] (t_1) = \mathrm {cons}_2(\mathtt{S}, \mathrm {cons}_2(\mathrm {cat}(t_1), \mathrm {ex}_0(t_1), \mathtt{N}(\mathtt{mouse})), \mathtt{[VP]}) = t_2 \,. \end{aligned}$$
(15)

The term operator \([\![ \mathtt{mouse} ]\!]\) corresponds to a computer program that sequentially extracts the category and the left subtree from the input data structure \(t_1\) and merges the results together into the output structure \(t_2\). Therefore, the (syntactic) meaning of the word “mouse” is its impact on the symbolic term algebra. Note that the function \([\![ \cdot ]\!]\) is known as the interpretation function in dynamic semantics [17,18,19]. Thus, term algebraic computation is transparent and straightforwardly interpretable [52].

Vector Symbolic Architectures

In vector symbolic architectures (VSA) [56,57,58,59,60,61] hierarchically organized complex data structures are represented as vectors in high dimensional linear spaces. The composition of these structures is achieved by two basic operations: binding and bundling. While bundling is commonly implemented as vector superposition, i.e. addition, different VSA realize binding in particular ways: originally through tensor products [56, 57], through circular convolution in reduced holographic representations (HRR) [58], through XOR spatter code [84] or through Hadamard products [62]. While HRR, spatter code, Hadamard products or a combination of tensor products with nonlinear compression [83] are lossy representations that require a clean-up module (usually an attractor neural network, cf. [60]), uncompressed tensor product representations of basis vectors are faithful, thereby allowing a transparent representation theory. It is the aim of the present study, to firstly elaborate the mathematical theory of uncompressed tensor product representations. We suggest to refer to such systems as to interpretable VSA [48] in this context.

Coming back to our linguistic example, we construct a homomorphism \(\psi : \mathfrak {T}_{\mathrm {LC}}(G) \cup N \rightarrow \mathcal {F}\) from the term algebra unified with its categories N to a vector space \(\mathcal {F}\) in such a way, that the structure of the transformations (1214) is preserved. The resulting images \(\psi (t)\) for terms \(t \in \mathfrak {T}_\mathrm {LC}(G)\) become vector space operators, i.e. essentially matrices acting on \(\mathcal {F}\).

Again, we proceed inductively. First we map the symbols in \(\mathfrak {T}_\mathrm {LC}(G) \cup N\) onto vectors. To each atomic symbol \(s \in T \cup N \cup P\) we assign a so-called filler basis vector \(\vert s \rangle = \psi (s) \in \mathcal {F}\), calling the subspace \(\mathcal {V}_F = \mathrm {span}(\psi (T \cup N \cup P))\) the filler space. Its dimension \(n = \dim \mathcal {V}_F\) corresponds to the number of atomic symbols in \(T \cup N \cup P\), which is \(n = 13\) in our example.

Let further \(m = \max (\{ |\gamma | \,|\, (A \rightarrow \gamma ) \in R \})\) be the length of the largest production of grammar G. Then, we define \(m + 1\) so-called role vectors \(\vert i \rangle\), spanning the role space \(\mathcal {V}_R = \mathrm {span}(\{ \vert i \rangle \,|\, 0 \le i \le m \})\). Note that we employ the so-called Dirac notation from quantum mechanics that allows a coordinate-free and hence representation-independent description here [79, 85]. Then, the role \(\vert 0 \rangle\) denotes the 1st daughter node, \(\vert 1 \rangle\) the 2nd daughter and so on, until the last daughter \(\vert m - 1 \rangle\). The remaining role \(\vert m \rangle\) bounds the mother node in the phrase structure trees of grammar G. In our example, because G has Chomsky normal form, we have \(m = 2 = \dim \mathcal {V}_R - 1\) such that there are three roles for positions in a binary branching tree: left daughter \(\vert 0 \rangle\), right daughter \(\vert 1 \rangle\), and mother \(\vert 2 \rangle\). For binary trees, we also use a more intuitive symbolic notation: left daughter \(\vert / \rangle\), right daughter \(\vert {\setminus } \rangle\), and mother \(\vert {\wedge } \rangle\).

Let \(A(t_0, \dots , t_k) \in \mathfrak {T}_\mathrm {LC}(G)\) be a term. Then, we define the tensor product representation of \(A(t_0, \dots , t_k) \in \mathfrak {T}_\mathrm {LC}(G)\) in vector space \(\mathcal {F}\) recursively as follows

$$\begin{aligned} \psi (A(t_0, \dots , t_k)) = \vert A \rangle \otimes \vert m \rangle \oplus \psi (t_0) \otimes \vert 0 \rangle \oplus \cdots \oplus \psi (t_k) \otimes \vert m - 1 \rangle \,. \end{aligned}$$
(16)

As a shorthand notation, we suggest the Dirac expression

$$\begin{aligned} \vert A(t_0, \dots , t_k) \rangle = \vert A \rangle \otimes \vert m \rangle \oplus \vert t_0 \rangle \otimes \vert 0 \rangle \oplus \cdots \oplus \vert t_k \rangle \otimes \vert m - 1 \rangle \,. \end{aligned}$$
(17)

Here the symbol “\(\otimes\)” refers to the (Kronecker) tensor product, mapping two vectors onto another vector, in contrast to the dyadic (outer) tensor product, which yields a matrix, hence being a vector space operator. In addition, “\(\oplus\)” denotes the (outer) direct sum that is mandatory for the superposition of vectors from spaces with different dimensionality in order to avoid interference effects.

Obviously, the (in principle) infinite recursion of the mapping \(\psi\) leads to an infinite-dimensional representation space

$$\begin{aligned} \mathcal {F} = \bigoplus _{p = 0}^\infty \left( \mathcal {V}_F \otimes \mathcal {V}_R^{\otimes ^p} \right) \oplus \mathcal {V}_R \,, \end{aligned}$$
(18)

that is known as Fock space from quantum field theory [59, 76, 77, 86].

In quantum field theory, there is a distinguished state \(\vert \mathbf{0} \rangle \ne 0\), the vacuum state, spanning a one-dimensional subspace, the vacuum sector that is isomorphic to the underlying number field. According to (18), this sector is contained in the subspace spanned by filler and role spaces, \(\mathcal {V}_F \oplus \mathcal {V}_R\). Therefore, we could represent the empty tree in Fig. 3 by an arbitrary role; a suitable choice is the mother role \(\psi (\emptyset ) = \vert m \rangle \cong \vert \mathbf{0} \rangle\), hence symbolizing the vacuum state. Using the tensor product representation (16), we can recursively compute the images of our example terms above. For (10) we obtain

$$\begin{aligned} \vert t_1 \rangle = \vert \mathtt{NP}(\mathtt{D}(\mathtt{the}), \mathtt{[N]}) \rangle = \vert \mathtt{NP} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{D}(\mathtt{the}) \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{[N]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{NP} \rangle \otimes \vert 2 \rangle \oplus ( \vert \mathtt{D} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{the} \rangle \otimes \vert 0 \rangle ) \otimes \vert 0 \rangle \oplus \vert \mathtt{[N]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{NP} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{D} \rangle \otimes \vert 2 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{the} \rangle \otimes \vert 0 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{[N]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{NP} 2 \rangle \oplus \vert \mathtt{D} 2 0 \rangle \oplus \vert \mathtt{the} 0 0 \rangle \oplus \vert \mathtt{[N]} 1 \rangle = \nonumber \\ \vert \mathtt{NP} {\wedge } \rangle \oplus \vert \mathtt{D} {\wedge } / \rangle \oplus \vert \mathtt{the} / / \rangle \oplus \vert \mathtt{[N]} {\setminus } \rangle \,, \end{aligned}$$
(19)

where we used the compressed Dirac notation \(\vert a \rangle \otimes \vert b \rangle = \vert a b \rangle\) in the last steps. The last line is easily interpretable in terms of phrase structure: It simply states that NP occupies the root of the tree, D appears as its immediate left daughter, the is the left daughter’s left daughter and a leave, and finally [N] is a leave bound to the right daughter of the root. Note that the Dirac kets have to be interpreted from the right to the left (reading the Arabic manner). The vector \(\vert t_1 \rangle\) belongs to a Fock subspace of dimension

$$\begin{aligned} q = n \frac{m^{p + 1} - 1}{m - 1} + m \end{aligned}$$
(20)

where \(n = \dim (\mathcal {V}_F)\), \(m = \dim (\mathcal {V}_R)\) and p the embedding depth in the phrase structure tree step 1 of Fig. 3. This leads to \(q_1 = 172\) for \(\vert t_1 \rangle\)

Similarly, we get for (11)

$$\begin{aligned} \vert t_2 \rangle = \vert \mathtt{S}(\mathtt{NP}(\mathtt{D}(\mathtt{the}), \mathtt{N}(\mathtt{mouse})), \mathtt{[VP]}) \rangle = \nonumber \\ \vert \mathtt{S} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{NP}( \mathtt{D}( \mathtt{the} ), \mathtt{N}(\mathtt{mouse})) \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{[VP]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{S} \rangle \otimes \vert 2 \rangle \oplus ( \vert \mathtt{NP} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{D}(\mathtt{the}) \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{N}(\mathtt{mouse}) \rangle \otimes \vert 1 \rangle ) \otimes \vert 0 \rangle \oplus \vert \mathtt{[VP]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{S} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{NP} \rangle \otimes \vert 2 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{D}(\mathtt{the}) \rangle \otimes \vert 0 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{N}(\mathtt{mouse}) \rangle \otimes \vert 1 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{[VP]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{S} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{NP} \rangle \otimes \vert 2 \rangle \otimes \vert 0 \rangle \oplus (\vert \mathtt{D} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{the} \rangle \otimes \vert 0 \rangle ) \otimes \vert 0 \rangle \otimes \vert 0 \rangle \oplus \nonumber \\ ( \vert \mathtt{N} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{mouse} \rangle \otimes \vert 0 \rangle ) \otimes \vert 1 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{[VP]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{S} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{NP} \rangle \otimes \vert 2 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{D} \rangle \otimes \vert 2 \rangle \otimes \vert 0 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{the} \rangle \otimes \vert 0 \rangle \otimes \vert 0 \rangle \otimes \vert 0 \rangle \oplus \nonumber \\ \vert \mathtt{N} \rangle \otimes \vert 2 \rangle \otimes \vert 1 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{mouse} \rangle \otimes \vert 0 \rangle \otimes \vert 1 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{[VP]} \rangle \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{S} 2 \rangle \oplus \vert \mathtt{NP} 2 0 \rangle \oplus \vert \mathtt{D} 2 0 0 \rangle \oplus \vert \mathtt{the} 0 0 0 \rangle \oplus \vert \mathtt{N} 2 1 0 \rangle \oplus \vert \mathtt{mouse} 0 1 0 \rangle \oplus \vert \mathtt{[VP]} 1 \rangle = \nonumber \\ \vert \mathtt{S} {\wedge } \rangle \oplus \vert \mathtt{NP} {\wedge } / \rangle \oplus \vert \mathtt{D} {\wedge } / / \rangle \oplus \vert \mathtt{the} / / / \rangle \oplus \vert \mathtt{N} {\wedge } {\setminus } / \rangle \oplus \vert \mathtt{mouse} / {\setminus } / \rangle \oplus \vert \mathtt{[VP]} {\setminus } \rangle \,, \end{aligned}$$
(21)

where we have again utilized the more intuitive branching notation in the last line which can be straightforwardly interpreted in terms of tree addresses as depicted in Fig. 3 (step 2). Computing the dimension of the respective Fock subspace according to (20) yields \(q_2 = 523\) for \(\vert t_2 \rangle\).

In Fock space, the interactive and incremental action of a word \(a \in T\) is then represented as a matrix operator \([\![ a ]\!] _\psi : \mathcal {F} \rightarrow \mathcal {F}\). For the transition from (10) to (11) we obtain

$$\begin{aligned}{}[\![ \mathtt{mouse} ]\!] _\psi \vert t_1 \rangle = [\![ \mathtt{mouse} ]\!] _\psi ( \vert \mathtt{NP} {\wedge } \rangle \oplus \vert \mathtt{D} {\wedge } / \rangle \oplus \vert \mathtt{the} / / \rangle \oplus \vert \mathtt{[N]} {\setminus } \rangle ) = \nonumber \\ \vert \mathtt{S} {\wedge } \rangle \oplus \vert \mathtt{NP} {\wedge } / \rangle \oplus \vert \mathtt{D} {\wedge } / / \rangle \oplus \vert \mathtt{the} / / / \rangle \oplus \vert \mathtt{N} {\wedge } {\setminus } / \rangle \oplus \vert \mathtt{mouse} / {\setminus } / \rangle \oplus \vert \mathtt{[VP]} {\setminus } \rangle = \vert t_2 \rangle \,. \end{aligned}$$
(22)

In order to prove \(\psi\) a homomorphism, we define the following linear maps on \(\mathcal {F}\)

$$\begin{aligned} \mathbf{cat} (\vert u \rangle )&= (\,\mathbf{1} \otimes \langle m|) \vert u \rangle \end{aligned}$$
(23)
$$\begin{aligned} \mathbf{ex} _i(\vert u \rangle )&= (\,\mathbf{1} \otimes \langle i|) \vert u \rangle \end{aligned}$$
(24)
$$\begin{aligned} \mathbf{cons} _k(\vert a \rangle , \vert u_0 \rangle , \dots , \vert u_k \rangle )&= \vert a \rangle \otimes \vert m \rangle \oplus \vert u_0 \rangle \otimes \vert 0 \rangle \oplus \cdots \oplus \vert u_k \rangle \otimes \vert k \rangle \,, \end{aligned}$$
(25)

here, \(\,\mathbf{1}\) denotes the unit operator (i.e. the unit matrix) and the Dirac “bra” vectors \(\langle k|\) are linear forms from the dual role space \(\mathcal {V}_R^*\) that are adjoined to the role “ket” vectors \(\vert k \rangle\) such that \(\langle i|k\rangle = \delta _{ik}\) with Kronecker’s \(\delta _{ik} = 0(1)\) for \(i \ne k (i = k)\).

By means of these homomorphisms we compute the meaning of “mouse” as Fock space operator through

$$\begin{aligned}{}[\![ \mathtt{mouse} ]\!] _\psi \vert t_1 \rangle = \mathbf{cons} _2(\vert \mathtt{S} \rangle , \mathbf{cons} _2(\mathbf{cat} (\vert t_1 \rangle ), \mathbf{ex} _0(\vert t_1 \rangle ), \vert \mathtt{N}(\mathtt{mouse}) \rangle ), \vert \mathtt{[VP]} \rangle ) = \vert t_2 \rangle \,. \end{aligned}$$
(26)

Therefore, \([\![ \mathtt{mouse} ]\!] _\psi\) is a holistic matrix operator instantaneously mapping the representation vector \(\vert t_1 \rangle\) of the partial phrase structure tree \(t_1\) onto the representation vector \(\vert t_2 \rangle\) of tree \(t_2\) by inserting the relevant fragments that have been extracted from the tree data structure \(t_1\) in terms of the symbolic functions in Eq. (15).

Inserting (2325) yields

$$\begin{aligned}{}[\![ \mathtt{mouse} ]\!] _\psi \vert t_1 \rangle = \mathbf{cons} _2(\vert \mathtt{S} \rangle , \mathbf{cons} _2((\,\mathbf{1} \otimes \langle 2|) \vert t_1 \rangle , (\,\mathbf{1} \otimes \langle 0|) \vert t_1 \rangle , \vert \mathtt{N}(\mathtt{mouse}) \rangle ), \vert \mathtt{[VP]} \rangle ) = \nonumber \\ \mathbf{cons} _2(\vert \mathtt{S} \rangle , (\,\mathbf{1} \otimes \langle 2|) \vert t_1 \rangle \otimes \vert 2 \rangle \oplus (\,\mathbf{1} \otimes \langle 0|) \vert t_1 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{N}(\mathtt{mouse}) \rangle \otimes \vert 1 \rangle , \vert \mathtt{[VP]} \rangle ) = \nonumber \\ \vert \mathtt{S} \rangle \otimes \vert 2 \rangle \oplus ((\,\mathbf{1} \otimes \langle 2|) \vert t_1 \rangle \otimes \vert 2 \rangle \oplus (\,\mathbf{1} \otimes \langle 0|) \vert t_1 \rangle \otimes \vert 0 \rangle \oplus \vert \mathtt{N}(\mathtt{mouse}) \rangle \otimes \vert 1 \rangle ) \otimes \vert 0 \rangle \oplus \vert \mathtt{[VP]} \rangle ) \otimes \vert 1 \rangle = \nonumber \\ \vert \mathtt{S} \rangle \otimes \vert 2 \rangle \oplus ((\,\mathbf{1} \otimes \langle 2|) \vert t_1 \rangle \otimes \vert 2 \rangle \oplus (\,\mathbf{1} \otimes \langle 0|) \vert t_1 \rangle \otimes \vert 0 \rangle \oplus ( \vert \mathtt{N} \rangle \otimes \vert 2 \rangle \oplus \vert \mathtt{mouse} \rangle \otimes \vert 0 \rangle ) \otimes \vert 1 \rangle ) \otimes \vert 0 \rangle \oplus \nonumber \\ \vert \mathtt{[VP]} \rangle ) \otimes \vert 1 \rangle = \vert t_2 \rangle \,, \end{aligned}$$
(27)

where we have expanded \(\vert \mathtt{N}(\mathtt{mouse}) \rangle\) as in (21) above. Note that the meaning of “mouse” crucially depends on the given state \(\vert t_1 \rangle\) subjected to the operator \([\![ \mathtt{mouse} ]\!] _\psi\), making meaning highly contextual. This is an important feature of dynamic semantics as well [17,18,19]. Locally, we can build the transformation matrix in analogy to Hebbian learning [87] for neural network

$$\begin{aligned}{}[\![ \mathtt{mouse} ]\!] _\psi (\vert t_1 \rangle ) = \vert t_2 \rangle \langle t_1| \end{aligned}$$
(28)

such that

$$[\![ \mathtt{mouse} ]\!] _\psi (\vert t_1 \rangle )\vert t_1 \rangle = \vert t_2 \rangle \langle t_1|t_1\rangle = \vert t_2 \rangle$$

since \(\langle t_1|t_1\rangle = 1\) for linearly independent filler and role vectors in uncompressed tensor product representations. Here, \([\![ \mathtt{mouse} ]\!] _\psi (\vert t_1 \rangle )\) is the local value of the transformation matrix at Fock space vector \(\vert t_1 \rangle\). Evaluating \([\![ \mathtt{mouse} ]\!] _\psi\) globally for all admissible arguments, yields a piecewise affine linear but overall nonlinear transformation [56].

The problem of learning such operators in a general VSA framework by end-to-end training has not been convincingly solved yet. However, the latest attempts of Smolensky and collaborators to train tensor product recurrent networks (TPRN) are a promising approach for this endeavor [40,41,42]. By contrast, neural automata have not to be trained at all since their synaptic connectivity is explicitly construed from the machine table of the underlying automaton [22].

Results

The main result of this study is a Fock space representation theorem for vector symbolic architectures of context-free grammars that follows directly from the definitions (2325) and is proven in Appendix 6.2.

The tensor product representation \(\psi : \mathfrak {T}_\mathrm {LC}(G) \cup N \rightarrow \mathcal {F}\) is a homomorphism with respect to the term transformations (1214). It holds

$$\begin{aligned} \mathbf{cat} (\vert A(t_0, \dots , t_k) \rangle )&= \vert \mathrm {cat}(A(t_0, \dots , t_k)) \rangle \end{aligned}$$
(29)
$$\begin{aligned} \mathbf{ex} _i(\vert A(t_0, \dots , t_k) \rangle )&= \vert \mathrm {ex}_i(A(t_0, \dots , t_k)) \rangle \end{aligned}$$
(30)
$$\begin{aligned} \mathbf{cons} _k(\vert A \rangle , \vert t_0 \rangle , \dots , \vert t_k \rangle )&= \vert \mathrm {cons}_k(A, t_0, \dots , t_k) \rangle \,. \end{aligned}$$
(31)

Therefore, the representation (16) mediates as an intertwiner between the matrix operators \(\mathbf{cat}\), \(\mathbf{ex}\), and \(\mathbf{cons}\) on Fock space (2325) and their symbolic counterparts \(\mathrm {cat}\), \(\mathrm {ex}\), and \(\mathrm {cons}\) (1214), entailing a commutative diagram. Hence, the uncompressed tensor product representation is faithful and interpretable.

For the particular example discussed above, we obtain the Fock space trajectory in Table 2.

Table 2 Fock space representation of LC parser processing the example sentence (1). The computational steps correspond to the shift operations in Table 1 and to the iterations of the interactive LC parse in Fig. 3

Table 2 shows for each processing step in the first column the resulting Fock space vector in Dirac notation [79, 85] in the second column and the dimension of the hosting Fock subspace (third column). Column four relates these processing steps to the underpinning symbolic computation presented in Table 1 for the push-down recognizer and in Fig. 3 for the interactive LC tree generator.

Each Fock space vector is a unique point in a high-dimensional linear space that is instantaneously transformed by the Fock space representation operators associated to the shifted words. These matrices provide the interpretation of the meanings of those words.

In order to visualize the results of Fock space computation, we project the trajectory from Table 2 into a three-dimensional space using principal component analysis (PCA) [49]. PCA is a common method for data compression utilizing the directions of maximal variance in a cloud of data points. In our example, the data cloud is the trajectory of the VSA representations, computing the left-corner phrase structure trees of the sentence example (1). To this aim, we use FockBox,Footnote 4 a MATLAB toolbox provided by [79], for the efficient calculation of Fock space representations. The result is shown in Fig. 4 as illustration.

Fig. 4
figure 4

Principal component (PC) projection of the LC parser’s Fock space representation. Shown are the first three PCs

The trajectory is initialized by the vacuum state, translated into the origin by the PCA in Fig. 4. The point labeled “the” is the result of applying the meaning operator \([\![ \mathtt{the} ]\!] _\psi\) onto the vacuum state \(\vert {\wedge } \rangle\), which is the first Dirac vector \(\vert \mathtt{D} {\setminus } {\wedge } / \rangle \oplus \vert \mathtt{NP} {\setminus } {\wedge } \rangle \oplus \vert \mathtt{[N]} {\setminus } \rangle \oplus \vert \mathtt{the} / / \rangle\) in Table 2. From there, the processing trajectory moves into a different direction by exploiting the meaning operator \([\![ \mathtt{mouse} ]\!] _\psi\) upon the latter state, entailing \(\vert \mathtt{D} {\setminus } {\wedge } / / \rangle \oplus \vert \mathtt{NP} {\setminus } {\wedge } / \rangle \oplus \vert \mathtt{N} {\setminus } {\wedge } {\setminus } / \rangle \oplus \vert \mathtt{S} {\setminus } {\wedge } \rangle \oplus \vert \mathtt{[VP]} {\setminus } \rangle \oplus \vert \mathtt{mouse} / {\setminus } / \rangle \oplus \vert \mathtt{the} / / / \rangle\) afterwards. This illustrates the nonlinearity and contextuality of meaning operators which is further supported by the last two processing steps. In any case, even the three-dimensional compression by means of PCA is at least approximately interpretable as symbolic computation because each region in state space corresponds to one distinguished VSA representation. Similarly, the connecting transitions have an interpretation as symbol-like computations as well.

Discussion

In this article we developed a representation theory for context-free grammars and push-down automata in Fock space as an vector symbolic architecture (VSA). We presented rigorous proofs for the representations of suitable term algebras. To this end, we suggested a novel normal form for CFG allowing to express CFG parse trees as terms over a symbolic term algebra. Interactive computations such as the predictions of a left corner parser are transparently interpreted by partially recursive operators acting upon the linguistic term algebra. In order to construct a faithful representation of the term algebra in VSA Fock space, we encoded filler and role vectors by linearly independent basis vectors and employed uncompressed tensor product binding. As a result, rule-based derivations over the term algebra are then represented by transformation matrices in Fock space such that computations in VSA are transparent and hence interpretable as well.

For the implementation of rule-based symbolic computations in cognitive dynamic systems, such as neural networks, VSA provide a viable approach. We have proven that uncompressed tensor product representations are transparent and interpretable. Thus, our results contribute a formally sound basis for a future “theory of explainable AI” [47] and subsequent research and engineering. In contrast to current blackbox approaches, our method is essentially transparent and hence interpretable. Explainable AI applications could then be constructed by delivering suitable “reasoning engines” [48] and cognitive user interfaces (CUI) [88, 89] from such systems.

Conclusion

We reformulated context-free grammars (CFG) through term algebras and their processing through push-down automata by partial functions over term algebras. We introduced a novel normal form for CFG, called term normal form, and proved that any CFG in Chomsky normal form can be transformed into term normal form. Finally, we introduced a vector symbolic architecture (VSA) by assigning basis vectors of a high-dimensional linear space to the respective symbols and their roles in a phrase structure tree. We suggested a recursive function for mapping CFG phrase structure trees onto representation vectors in Fock space and proved a representation theorem for the partial rule-based processing functions. We illustrated our findings by an interactive left-corner parser and used FockBox, a freely accessible MATLAB toolbox, for the generation and visualization of Fock space VSA. Our approach directly encodes symbolic, rule-based knowledge into the hyperdimensional computing framework of VSA and can thereby supply substantial insights into the future development of explainable artificial intelligence (XAI).