Invariants for neural automata

Computational modeling of neurodynamical systems often deploys neural networks and symbolic dynamics. A particular way for combining these approaches within a framework called vector symbolic architectures leads to neural automata. An interesting research direction we have pursued under this framework has been to consider mapping symbolic dynamics onto neurodynamics, represented as neural automata. This representation theory, enables us to ask questions, such as, how does the brain implement Turing computations. Specifically, in this representation theory, neural automata result from the assignment of symbols and symbol strings to numbers, known as G\"odel encoding. Under this assignment symbolic computation becomes represented by trajectories of state vectors in a real phase space, that allows for statistical correlation analyses with real-world measurements and experimental data. However, these assignments are usually completely arbitrary. Hence, it makes sense to address the problem question of, which aspects of the dynamics observed under such a representation is intrinsic to the dynamics and which are not. In this study, we develop a formally rigorous mathematical framework for the investigation of symmetries and invariants of neural automata under different encodings. As a central concept we define patterns of equality for such systems. We consider different macroscopic observables, such as the mean activation level of the neural network, and ask for their invariance properties. Our main result shows that only step functions that are defined over those patterns of equality are invariant under recodings, while the mean activation is not. Our work could be of substantial importance for related regression studies of real-world measurements with neurosymbolic processors for avoiding confounding results that are dependant on a particular encoding and not intrinsic to the dynamics.


Introduction
Computational cognitive neurodynamics deals to a large extent with statistical modeling and regression analyses between behavioral and neurophysiological observables on the one hand and neurocomputational models of cognitive processes on the other hand (Gazzaniga et al 2002, Rabinovich et al 2012).Examples for experimentally measurable observables are response times (RT), eye-movements (EM), event-related brain potentials (ERP) in the domain of electroencephalography (EEG), event-related magnetic fields (ERF) in the domain of magnetoencephalography (MEG), or the blood-oxygen-level-dependent signal (BOLD) in functional magnetic resonance imaging (fMRI).
For carrying out statistical correlation analyses between experimental data and computational models one has to devise observation models, relating the microscopic states within a computer simulation (e.g. the spiking of a simulated neuron) with the above-mentioned macroscopically observable measurements.In decision making, e.g., a suitable observation model is first passage time in a drift-diffusion model (Ratcliff 1978, Ratcliff andMcKoon 2007).In the domain of neuroelectrophysiology, local field potentials (LFP) and EEG can be described through macroscopic mean-fields, based either on neural compartment models (beim Graben and Rodrigues 2013, Martínez-Cañada et al 2021, Mazzoni et al 2008), or neural field theory (beim Graben and Rodrigues 2014, Jirsa et al 2002).For MRI and BOLD signals, particular hemodynamic observation models have been proposed (Friston et al 2000, Stephan et al 2004).
In the fields of computational psycholinguistics and computational neurolinguistics (Arbib and Caplan 1979, Crocker 1996, beim Graben and Drenhaus 2012, Lewis 2003) a number of studies employed statistical regression analysis between measured and simulated data.To name only a few of them, Davidson and Martin (2013) modeled speed-accuracy data from a translation-recall experiment among Spanish and Basque subjects through a drift-diffusion approach (Ratcliff 1978, Ratcliff andMcKoon 2007).Lewis and Vasishth (2006) correlated self-paced reading times for English sentences of different linguistic complexity with the predictions of an ACT-R model (Anderson et al 2004).Huyck (2009) devised a Hebbian cell assembly network of spiking point neurons for a related task.Using an automaton model for formal language (Hopcroft and Ullman 1979), Stabler (2011) argued how reading times could be related to the automaton's working memory load.Similarly, Boston et al (2008) compared eye-movement data with the predictions of an automaton model for probabilistic dependency grammars (Nivre 2008).
Correlating human language processing with event-related brain dynamics became an important subject of computational neurolinguistics in recent years.Beginning with the seminal studies of beim Graben et al (2000;2004), similar work has been conducted by numerous research groups (for an overview cf.Hale et al (2022)).Also to name only a few of them, Hale et al (2015) correlated different formal language models with the BOLD response of participants listening to speech.Similarly, Frank et al (2015) used different ERP components in the EEG, such as the N400 (a deflection of negative polarity appearing about 400 ms after stimulus onset as a marker of lexical-semantic access) for such statistical modeling.Beim Graben and Drenhaus (2012) correlated the temporally integrated ERP during the understanding of negative polarity items (Krifka 1995) with the harmony observable of a recurrent neural network (Smolensky 2006), thereby implementing a formal language processor as a vector symbolic architecture (Gayler 2006, Schlegel et al 2021).Another neural network model of the N400 ERP-component is due to Rabovsky and McRae (2014), and to Rabovsky et al (2018) who related this marker with neural prediction error and semantic updating as observation models.Similar ideas have been suggested by Brouwer and Crocker (2017), Brouwer et al (2017), and Brouwer et al (2021) who considered a deep neural network of layered simple recurrent networks (Cleeremans et al 1989, Elman 1990), where the basal layer implements lexical retrieval, thus accounting for the N400 ERP-component, while the upper layer serves for contextual integration.Processing failures at this level are indicated by another ERP-component, the P600 (a positively charged deflection occurring around 600 ms after stimulus onset).Their neurocomputational model thereby implemented a previously suggested retrieval-integration account (Brouwer andHoeks 2013, Brouwer et al 2012).
In the studies of beim Graben et al (2000;2004;2008), a dynamical systems approach was deployed -later dubbed cognitive dynamical modeling by beim Graben and Potthast (2009).This denotes a three-tier approach starting firstly with symbolic data structures and algorithms as models for cognitive representations and processes.These symbolic descriptions are secondly mapped onto a vectorial representation within the framework of vector symbolic architectures (Gayler 2006, Schlegel et al 2021) through filler-role bindings and subsequent tensor product representations (Mizraji 1989;2020, Smolensky 1990;;2006).In a third step, these linear structures are used as training data for neural network learning.More specifically, symbol strings and formal language processors can be mapped through Gödel encodings to dynamical automata (beim Graben andPotthast 2014, beim Graben et al 2000;2004;2008).Quite recently, Carmantini et al (2017) have demonstrated how to realize those devices parsimoniously as modular recurrent neural networks, called neural automata (NA), henceforth. 1 Carmantini et al (2017) also showed how neural automata can be used for neurolinguistic correlation studies.They implemented a diagnosis-repair parser (Lewis 1998, Lewis andVasishth 2006) for the processing of initially ambiguous subject relative and object relative sentences (Frisch et al 2004, Lewis andVasishth 2006) through an interactive automata network.As an appropriate observation model they exploited the mean activation of the resulting neural network (Amari 1974) as synthetic ERP (Barrès et al 2013, beim Graben et al 2008) and obtained a model for the P600 component in their attempt.
For all these neurocomputational models symbolic content must be encoded as neural activation patterns.In vector symbolic architectures, this procedure involves a mapping of symbols onto filler vectors and of their possible binding sites in a data structure onto role vectors (beim Graben and Potthast 2009).Obviously, such an encoding is completely arbitrary and could be replaced at least by any possible permutation of a chosen code.Therefore, the question arises to what extent neural observation models remain invariant under permutations of an arbitrarily chosen code.Even more crucially, one has to face the problem whether a statistical correlation analysis depends on only one particularly chosen encoding, or not.Only if statistical models are also invariant under recoding, they could be regarded as reliable methods of scientific investigation.
It is the aim of the present study to provide a rigorous mathematical treatment of invariant observation models for the particular case of dynamical and neural automata and their underlying shift spaces.The article is structured as follows.In Sect. 2 we introduce the general concepts and basic definitions about invariants in dynamical systems, focusing later in Sect.2.1 on the special case of neurodynamical ones.In Sect.2.2 we focus our attention on symbolic dynamics.After 1 Note that neural automata are parsimonious implementations of universal computers, especially of Turing machines.These are not to be confused with neural Turing machines appearing in the framework of deep learning approaches (Graves et al 2014).
introducing the basic notation we introduce the tools and facts that are needed in Sect.2.2.1 about rooted trees and about Gödel encodings in Sect.2.2.2.In Sect.2.2.3 we relate these concepts to cylinder sets in order to finally describe the invariant partitions for different Gödelizations of strings in Sect.2.2.4.Then, in Sect.2.3 we describe the architecture for neural automata and how to pass from single strings to dotted sequences.Finally, in Sect.2.3.1 we describe a symmetry group defined by Gödel recoding of alphabets for neural automata, and we define a macroscopic observable that is invariant under this symmetry, based on the invariants described in Sect.2.2.4 before.In the end, in Sect.3, we apply our results to a concrete example with a neural automaton constructed to emulate parser for a context-free grammar.We demonstrate that the given macroscopic observable is invariant under Gödel recodings, whereas Amari's mean network activity is not.Section 4 provides a concluding discussion.All the mathematical proofs about the facts claimed throughout the paper are collected in an appendix.

Invariants in dynamical systems
We consider a classical time-discrete and deterministic dynamical system in its most generic form as an ordered pair Σ = (X, Φ) where X ⊂ R n is a compact Hausdorff space as its phase space of dimension n ∈ N and Φ : X → X is an invertible (generally nonlinear) map (Atmanspacher and beim Graben 2007).The flow of the system is generated by the time iterates Φ t , t ∈ Z, i.e., t → Φ t is a one-parameter group for the dynamics with time t ∈ Z, obeying Φ t • Φ s = Φ t+s for t, s ∈ Z.The elements of the phase space x ∈ X refer to the microscopic description of the system Σ and are therefore called microstates.After preparation of an initial condition x 0 ∈ X the system evolves deterministically along a trajectory T = {x(t) = Φ t (x 0 )| t ∈ Z}.
A bounded function f : X → R is called an observable with f (x) ∈ R as measurement result in microstate x.The function space B(X) = {f : X → R| f < ∞}, endowed with point wise function addition (f + g)(x) = f (x) + g(x), function multiplication (f g)(x) = f (x)g(x), and scalar multiplication (λf )(x) = λf (x) (for all f, g ∈ B(X), λ ∈ R) is called the observable algebra of the system Σ with norm • : B(X) → R + 0 .Restricting the function space B(X) to the bounded continuous functions C 0 (X), yields the algebra of microscopic observables which describe ideal measurements for uniquely distinguishing among different microstates within certain regions of phase space.
By contrast, complex real-world dynamical systems only allow the measurement of macroscopic properties.The corresponding macroscopic observables belong to the larger algebra of bounded functions2 B(X) and are usually defined as large-scale limits of so-called mean-fields (Hepp 1972, Sewell 2002).Examples for macroscopic mean-field observables in computational neuroscience are discussed below.
The algebra of macroscopic observables B(X) contains step functions and particularly the indicator functions χ A for proper subsets A ⊂ X which are not continuous over whole X.Because χ A (x) = χ A (y) for all x, y ∈ A, the microstates x and y are not distinguishable by means of the macroscopic measurement of χ A .Thus, Jauch (1964) and Emch (1964) called them macroscopically equivalent. 3The class of macroscopically equivalent microstates forms a macrostate in the given mathematical framework (Emch 1964, Jauch 1964, Sewell 2002).Hence, a macroscopic observable induces a partition of the phase space of a dynamical system Σ into macrostates.
The algebras of microscopic observables, C 0 (X), and of macroscopic observables, B(X), respectively, are linear spaces with their additional algebraic products.As vector spaces, they allow the construction of linear homomorphism ϕ : B(X) → B(X) which are vector spaces either.An important subspace of the space of linear homomorphism is provided by the space of linear automorphisms, Aut(B(X)), which contains the invertible linear homomorphisms.The space Aut(B(X)) is additionally a group with respect to function composition, (ϕ • η)(f ), called the automorphism group of the algebra B(X).
Next, let G be a group possessing a faithful representation α in the automorphism group Aut(B(X)) of the dynamical system Σ.Then, for a ∈ G, α a ∈ Aut(B(X)) maps an observable f ∈ B(X) onto its transformed α a (f ) ∈ B(X), such that for two a, b ∈ G it holds α a * b (f ) = (α a • α b )(f ) where ' * ' denotes the group product in G.The group G is called a symmetry of the dynamical system Σ (Sewell 2002).Moreover, if the representation of G commutes with the dynamics of Σ, )) for all x ∈ X, the group G is called dynamical symmetry (Sewell 2002).In Eq. ( 1), the map α * a : X → X is the lifting result from the observables to phase space through As an example consider the macroscopic observable χ A , i.e. the indicator function for a proper subset A ⊂ X again.Choosing α * a in such as way that α * a (x) ∈ A for all x ∈ A, leaves χ A invariant: More generally, we say that an observable It is the aim of the present study to investigate such invariants for particular neurodynamical systems, namely dynamical and neural automata (Carmantini et al 2017, beim Graben et al 2000;2004;2008).
2.1.Neurodynamics.Neurodynamical systems are essentially recurrent neural networks consisting of a large number, n ∈ N, of model neurons (or units) that are connected in a complex graph (Arbib 1995, Hertz et al 1991, LeCun et al 2015, Schmidhuber 2015).Under a suitable normalization, the activity of a unit, e.g. its spike rate can be represented by a real number in the unit interval [0, 1] ⊂ R.Then, the microstate of the entire network becomes a vector in the n-dimensional hypercube, x ∈ X = [0, 1] n ⊂ R n .The microscopic observables are projectors on the individual coordinate axes, f i (x) = x i 3 Cf.the related concept of epistemic equivalence used by beim Graben and Atmanspacher (2006;2009).
for 1 ≤ i ≤ n.For discrete time, the network dynamics is generally given as a nonlinear difference equation ( 4) Here x(t) ∈ X is the activation vector (the microstate) of the network at time t and Φ W is a nonlinear map, parameterized by the synaptic weight matrix W ∈ R n 2 .Often, the map Φ W is assumed to be of the form with a nonlinear squashing function F = (F i ) 1≤i≤n : X → X as the activation function of the network.For F i = Θ (where Θ denotes the Heaviside jump function), equations (4, 5) describe a network of McCulloch-Pitts neurons (McCulloch and Pitts 1943).Another popular choice for the activation function is the logistic function describing firing rate models (cf., e.g., beim Graben (2008)).Replacing Eq. ( 5) by the map yields a time-discrete leaky integrator network (beim Graben and Rodrigues 2013, beim Graben et al 2009, Wilson and Cowan 1972).For numerical simulations using the Euler method, ∆t < 1 is chosen for the time step.
For correlation analyses of neural network simulations with experimental data from neurophysiological experiments one needs a mapping from the high-dimensional neural activation space X ⊂ R n into a much lower-dimensional observation space that is spanned by p ∈ N macroscopic observables f k : X → R (1 ≤ k ≤ p).A standard method for such a projection is principal component analysis (PCA) (Elman 1991).If PCA is restricted to the first principal axis, the resulting scalar variable could be conceived as a measure of the overall activity in the neural network.In the realm of computational neurolinguistics PCA projections were exploited by beim Graben et al (2008).
Another important scalar observable, e.g.used by beim Graben and Drenhaus (2012) as a neuronal observation model, is Smolensky's harmony (Smolensky 1986 with x + as transposed activation state vector, and the synaptic weight matrix W, above. Brouwer et al (2017) suggested the "dissimilarity" between the actual microstate and its dynamical precursor, i.e. ( 8) as a suitable neuronal observation model.

Symbolic dynamics.
A symbolic dynamics arises from a time-discrete but space continuous dynamical system Σ through a partition of its phase space X into a finite family of m disjoint subsets totally covering the space X (Lind and Marcus 1995).Hence Such a partition could be induced by a macroscopic observable with finite range.By assigning the index k of a partition set A k as a distinguished symbol s t to a state x(t) when x(t) ∈ A k , a trajectory of the system is mapped onto a two-sided infinite symbolic sequence.Correspondingly, the flow map of the dynamics Φ becomes represented by the left shift σ through σ(s t ) = s t+1 .
Following beim Graben et al (2004;2008), and Carmantini et al ( 2017), a symbol is meant to be a distinguished element from a finite set A, which we call an alphabet.A sequence of symbols w ∈ A l is called a word of length l, denoted l = |w|.The set of words of all possible lengths w of finite length |w| ≥ 0, also called the vocabulary over A, is denoted A * (for |w| = 0, w = ǫ denotes the "empty word").
2.2.1.Rooted trees.One can visualize the set of all words over the alphabet A as a regular rooted tree, T , where each vertex is labeled by and corresponds to each word formed by using this alphabet.Let us assume that A has m letters for some m ∈ N.That is A = {a 1 , . . ., a m }.Then, the tree T is inductively constructed as follows: (i) The root of the tree is a vertex labeled by the empty word ǫ.
(ii) Assume we have constructed the vertices of step n, then we construct the vertices of step n + 1 as follows.Suppose that we have k vertices at step n that are labeled by the words w 1 , . . ., w k .Then • For each i = 1, . . ., k and each a j ∈ A we add a new vertex decorated by w i a j .• For each i = 1, . . ., k and j = 1, . . ., m we add and edge from w i to w i a j .
This construction generates a regular rooted tree.Following the aforementioned construction, typically in the first step the root is placed at the top vertex.Subsequently the root is joined by edges, where each edge is associated to every word of length 1, that is, to every symbol of A. Then iteratively, each of these edges labeled by a letter of A is joined to any word of length two starting by that letter, and so on.Assuming that A = {a 1 , . . ., a m }, this construction yields an infinite tree as in Fig. 1.Each vertex of the tree corresponds to a word over the alphabet A. That is, the set of vertices of the tree is A * .On the other hand, each infinite ray starting from the root, corresponds to an infinite sequence of symbols over A, and it corresponds to the boundary of the tree.We denote this boundary by ∂T and as mentioned, viewed as a set is equal to A N .
The construction of the tree is unique up to the particular ordering of the symbols in A we chose.Thus, in principle, if γ : A → {0, . . ., m − 1} is a particular ordering (i.e. a bijection) of the alphabet where an element a is denoted as a i if γ(a) = i − 1, then the tree should be denoted by T γ as it depends on that particular ordering of the alphabet.
Let us denote by T the regular rooted tree over the alphabet {0, 1, . . .m − 1} with the natural order induced by N (see Fig. 2  Henceforth will denote by M the alphabet {0, . . ., m − 1} and as before, by T the tree corresponding to the alphabet M under the usual ordering on N.
When we say that the construction is unique up to reordering of symbols, we mean that both trees are isomorphic as graphs, where an isomorphism of graphs is a bijection between vertices preserving incidence.Indeed, for any bijection γ : A → M, the tree T γ is ismorphic to T as a graph.
Lemma 2.1.Let γ : A → M be an ordering of the alphabet A. Then T γ and T are isomorphic.
Since being isomorphic is transtive, this lemma shows that for any two alphabets A 1 and A 2 of the same cardinality and any two orderings of those alphabets γ 1 and γ 2 , their corresponding trees T γ1 and T γ2 will be isomorphic as graphs.
2.2.2.Gödel encodings.Having A N , the space of one-sided infinite sequences over an alphabet A containing |A| = m symbols and s = a 1 a 2 . . .a sequence in this space, with a k being the k-th symbol in s and an ordering γ : A → {0, . . ., m − 1}, then a Gödelization is a mapping from A N to [0, 1] ⊂ R defined as follows: (10) By the Lemma 2.1 we know that for each Gödelization of A induced by γ, there is an isomorphism of graphs between T γ and T .Since the choice for the ordering of the alphabet (in other words, the choice of γ) is arbitrary and leads to different Gödel encodings, we are interested in finding invariants for different such encodings.
One can define a metric on the boundary of the tree in the following way: given any two infinite rays of the tree p = a 1 a 2 a 3 . . .and q = b 1 b 2 b 3 . . .we define This defines an ultrametric on the boundary, that is, a metric that satisfies a stronger version of the triangular inequality, namely: When we encode the infinite strings under the Gödel encoding, we are sending rays that are close to each other under this ultrametric to points that are close in the [0, 1] interval under the usual metric.
Lemma 2.2.Let p = a 1 a 2 a 3 . . .and q = b 1 b 2 b 3 . . .be two infinite strings over A. Then for any Gödel encoding ψ we have that Recall that the lemma does not mean that points that are close (with respect to the usual metric) on the [0, 1] interval come from rays that were close on the tree.For example, if the alphabet has 3 letters, the points 1/3 − ǫ and 1/3 are as close as we want for any ǫ > 0 but are at distance 0 from each other on the tree.In fact, it gives a partition of the interval for each n ∈ N in a way that, if two points representing an infinite string are in the same interval according to the partition of the corresponding n, then they come from two rays that share at least a common prefix of length n.

Cylinder sets.
In symbolic dynamics, a cylinder set (McMillan 1953) is a subset of the space A N of infinite sequences from an alphabet A that agree in a particular building block of length l ∈ N. Thus, let w = A * be a finite word a 1 a 2 . . .a l of length l, we define the cylinder set We can also see the cylinder sets on the tree depicted in Fig. 3.In fact, for each level on the tree (where level refers to vertices corresponding to words of certain fixed length) we get a partition of the interval [0, 1].The vertices hanging from each vertex on that level land on their corresponding interval of the partition.Thus, from a rooted tree view point, a cylinder set corresponds to a whole tree hanging from that vertex.Concretely, the cylinder set [w] for the word w ∈ A * is the subtree hanging from the vertex decorated by w.Two different Gödel codes ψ, ϕ can only differ with respect to their assignments γ 1 , γ 2 : A → {0, . . ., m − 1}.Thus, we call a permutation π ∈ S m (with S m as the symmetric group) a Gödel recoding, if 2.2.4.Invariants.The ultimate goal of our study is to find invariants under Gödel recodings.Observe that under the notation of Lemma 2.1, g γ1 : T γ1 → T and g γ2 : T γ2 → T are two graph isomorphisms.In fact, they induce a graph automorphism of T , g π = g γ2 • g −1 γ1 : T → T .And this automorphism sends the vertices encoded by γ 1 to the ones encoded by γ 2 .
As Lemma 2.2 shows, a Gödel recoding preserves the size of cylinder sets after permuting vertices.However, the way of ordering the alphabet and how this permutes the rays of the tree is even more restrictive than just preserving the size of the cylinder sets.In fact, under the action of a reordering each vertex can only be mapped to certain vertices and it is forbidden to be sent to others.This is captured by the following most central definition.
Definition 2.3.Let w = a i1 a i2 . . .a i l ∈ M l be a string of length l after an ordering γ.We define a partition of the set of integers {1, 2, . . ., l}, (12) For any word w ∈ M * we call P w the pattern of equality of w.
Equipped with aforementioned formalisms we are now in a position to formulate the first main finding of our study as follows.
Theorem 2.4.For any other vertex u ∈ T there exists a Gödel recoding π such that g π (w) = u if and only if u ∈ M l (13) Theorem 2.4 states that each vertex can be mapped to any vertex having the same pattern of equality and nowhere else.
Example 2.5.If A = {a, b, c} and we consider w = aaabcabc ∈ A 8 .Then we have P w = {{1, 2, 3, 6}, {4, 7}, {5, 8}}, which gives us all the possible words where w can be mapped to.That would be the list of all the possibilities: bbbacbac cccbacba aaacbacb bbbcabca cccabcab aaabcabc So we have only 6 possible vertices out of 3 8 = 6561.And of course, this proportion reduces as we go deeper on the tree.
In terms of Gödelization into the [0, 1] interval, we ilustrate the implications by an example.Let us assume that m = 3 and l = 3, for example.Then, in Fig. 4 the cylinder sets of certain color can only be mapped through a recoding to a cylinder set of the same color and nowhere else. Figure 5 shows the corresponding partition of the interval [0, 1] where the intervals in each color may be mapped to another of the same color by a different assignment map and nowhere else.Here, the dot "." is simply used as a mnemonic sign, indicating that the index 0 is to its right.Carmantini et al (2017) interpreted the dot as a meta-symbol which can be concatenated with two words v 1 , v 2 ∈ A * through v = v 1 .v 2 .Let Â * denote the set of these dotted words.Moreover, let Z − = {i | i < 0, i ∈ Z} and Z + = {i | i ≥ 0, i ∈ Z} the sets of negative and non-negative indices.We can then reintroduce the notion of a dotted sequence as follows.Let s ∈ A Z be a bi-infinite sequence of symbols such that s = w α vw β with v ∈ Â * as a dotted word v = v 1 .v 2 and w α v 1 ∈ A Z − and v 2 w β ∈ A Z + .Through this definition, the indices of s are inherited from the dotted word v and are thus not explicitly prescribed.
A versatile shift (VS) was defined by Carmantini et al (2017) as a pair M V S = (A Z , Ω), with A Z being the space of dotted sequences, and Ω : A Z → A Z defined by ( 15) with ( 16) where the operator "⊕" substitutes the dotted word v 1 .v 2 ∈ Â * in s with a new dotted word v1 .v2 ∈ Â * specified by G, while F (s) = F | Â * (v 1 .v 2 ) determines the number of shift steps as for Moore's generalized shifts (Carmantini et al 2017).
A nonlinear dynamical automaton (NDA) is a triple M N DA = (Y, P, Φ), where P is a rectangular partition of the unit square so that each cell is defined as D (i,j) = I i × J j , with I i , J j ⊂ [0, 1] being real intervals for each bi-index (i, j), with The couple (Y, Φ) is a time-discrete dynamical system with phase space Y and the flow Φ : Y → Y is a piecewise affine-linear map such that Φ |D (i,j) := Φ (i,j) , with Φ (i,j) having the following form: with state vector y = (y 1 , y 2 ).Carmantini et al (2017) have shown that using Gödelization any versatile shift can be mapped to a nonlinear dynamical automaton.Therefore, one can reproduce the activity of a versatile shift on the unit square Y .In order to do so, the partition ( 17) is given by the so called Domain of Dependance (DoD).The Domain of Dependance is a pair (l, r) ∈ N × N which defines the length of the strings on the left and right hand side of the dot in a dotted sequence that is relevant for the versatile shift to act on the phase space.The dynamics of the versatile shift is completely determined by how the string looks like in each iteration on the Domain of Dependance.Then, if the domain is (l, r) and if the alphabet A has size m, the partition of the unit square is given by m r intervals on the y 1 axis and m l intervals on the y 2 axis, corresponding to cells where the NDA is defined according to the versatile shift.Finally, a neural automaton (NA) is an implementation of an NDA by means of a modular recurrent neural network (Carmantini et al 2017).
The neural automaton comprises a phase space X = [0, 1] n where the twodimensional subspace Y = [0, 1] 2 of the underlying NDA is spanned by only two neurons that belong to the machine configuration layer (MCL).The remainder X \ Y is spanned by the neurons of the branch selection layer (BSL) and the linear transformation layer (LTL), both mediating the piecewise affine mapping (18).Having an NDA defined from a versatile shift, each rectangle on the partition is given by the DoD, and the action of the NDA on each rectangle depends on the particular Gödel encoding of the alphabet A that has been chosen.We are interested in invariant macroscopic observables of such automata under different Gödel encodings of the alphabet.
Since we are now interested on dotted sequences over an alphabet A, instead of having an invariant partition of the interval [0, 1] as in Fig. 5, we will have an invariant partition of the unit square Y = [0, 1] 2 .That is, we will have a partition in rectangles where the machine might be at certain step of the dynamics or not.Each color in that partition gives all the possible places where a particular dotted sequence of certain right and left lengths could be under a different Gödel encoding.
For example, assuming that our alphabet has m = 3 letters in both sides of the dotted sequence and that we are looking at words of length l = 2 on the left hand side of the dot, and length r = 3 on the right hand side of the dot, the partition would be like in Figure 6. Figure 6.Each small square corresponds to a square on the partition given by the dotted sequences of length (2, 3).The squares colored by the same color are those having the same pattern of equality, and thus, are those which can be mapped to each other under different Gödel encodings of the alphabet.
Let us assume that we are considering the invariant partition for dotted sequences of length (l, r), meaning that the left hand side has length l and the right hand side r.Then we know that the partition of the square Y is given by Each left corner of the rectangle corresponds to the position of the Gödelization of a dotted sequence of size (l, r).Each point (y 1 , y 2 ) = ( i m l , j m l ) has a unique expansion on base m for its coordinates.That is, there are some c 1 , . . ., c l with 0 ≤ c i ≤ m 1 such that ( 19) These {c 1 , . . ., c l } also define a partition of {1, . . ., l} in the same way as given in definition 2.3.Therefore {d 1 , . . ., d k } ∈ P x ⇐⇒ c j1 = • • • = c j k .This procedure similarly applies to the y 2 coordinate.Hence, the corners defining an invariant piece of the partition will be those sharing the same partition of {1, . . ., l} × {1, . . ., r}.
In other words, we can obtain the corners related to a given y by expanding y 1 and y 2 on base m and permuting the appearance of 0, . . ., m − 1 on the expansion.
For example, if m = 3 and (l, r) = (2, 3), we have 3 2 • 3 3 = 3 5 rectangles.Now let us take, for instance the rectangle 6 3 2 , 7 3 2 × 10 3 3 , 11 3 3 and let us find its invariant partition.First we decompose Hence a rectangle in the same invariant partition must be of the form In this way we can construct the partition of the unit square given by the patterns of equality.
2.3.1.Invariant observables.Our aim now is to define an observable f ∈ B(X), in the sense of Section 2 for neural automata.That is f : X → R should obey Eq. ( 3) where the map α * π corresponds to a symmetry induced by a Gödel recoding of the alphabets.Here π denotes the permutation of the alphabet needed to pass from one Gödel encoding to the other, as explained later.
Notice that in the previous discussion we were assuming that we knew the length of the strings that were encoded.However, this is not the case in practice, and may cause problems, as the length of the strings vary at each iteration.For instance, if for the alphabet {a, b} the symbol a is mapped to 0 under certain Gödel enconding γ and the symbol b to 1, then the number x = 1/2 ∈ [0, 1] would correspond to the word w r = ba r−1 . . .a once we assume that the string is of length r for r ∈ N ∪ {0}.However, if we do not know the length of the encoded string, each w r will have a different Gödel number under the Gödel encoding that sends b to 0 and a to 1, namely r−1 k=2 1/2 k .Thus, encoding symbols with the number 0 makes some strings indistinguishible under Gödel recoding, because having no symbols is interpreted as having the symbol encoded by 0 as many times as we want.This issue can be easily avoided by adding one symbol ⊔ to the alphabet, which will be interpreted as a blank symbol, and will always be forced to be encoded as 0 by any Gödel encoding.
Suppose that we have an NDA defined from a versatile shift under the condition that the blank symbol ⊔ has been added to the alphabet A representing the blank symbol and that is mapped to 0 under any Gödel encoding 5 .We will assume that A has m symbols after adding the blank symbol (that is, we had m − 1 symbols before).Then for any pair (r, l) ∈ N × N, we can divide the unit square Y into the rectangle partition given by ( 20) Next, we extend this partition of the phase space of the NDA, that equals the subspace of the machine configuration layer of the larger NA, to the entire phase space of the neural automaton.This is straightforwardly achieved by defining another partition Now, for each left corner (x ) ∈ E (i,j) we find their pattern of equality P ij , assuming that the permutation is taking place just on the symbols {2, . . ., m} (as the first symbol has to be mapped to 0 under any encoding).
Let us suppose that {P ij1 , . . ., P ijs } are all the different appearing patterns of equality and we define the indicator functions χ k : X → {0, 1} as Then, we can choose c 1 , . . ., c s ∈ R to be s different real numbers and define a macroscopic observable f : X → R as a step function Clearly, we have f ∈ B(X).
Our aim is to show that this observable is invariant under the symmetry group S m−1 × S m−1 of the dynamical system (X, Φ) given by the neural automaton in 5 In order to make things simpler we will assume that we have the same alphabet on the stack and the input symbols.This can always be assumed considering the union of both alphabets if needed.
Eq. ( 18), where S m−1 denotes the symmetric group on m − 1 elements.First of all, we must show that S m−1 × S m−1 is a symmetry of the neural automaton.
Before doing this, we will define an auxiliary map.Let π = (π 1 , π 2 ) ∈ S m × S m be any element of the product that fixes 1 (on the set {1, 2, . . ., m} where S m acts).Notice that the elements of S m fixing the first element form a subgroup of S m that is isomorphic to S m−1 .Let now x be any point in X.Let us consider y = (y 1 , y 2 ) the first two coordinates of x given by the activations of the machine configuration layer of the NA.Then, we can check in which of the intervals of the partition R is, say (y We can therefore compute the expansion on base m of each corner and take the coefficients we get as words over the alphabet M = {0, 1, . . ., m − 1}, say c 1 . . .c l ∈ M l and d 1 , . . ., d r ∈ M r .Then, we compute g π1 (c 1 . . .c l ) and g π2 (d 1 . . .d r ) and we encode these words by the canonical Gödel encoding (that is, the one given by the identity map on M).Thus, we obtain a new corner of some rectangle in our partition of the phase space, say This map can obviously be extended to a map from X to X being the identity on the rest of the coordinates.Abusing notation we also refer to ρ π as to this map.Informally speaking, the map ρ π rigidly permutes the squares on the partition R according to the action of g π1 and g π2 on the words representing the corners.Now, we can define α π : B(X) → B(X) as follows.For any f ∈ B(X), we define Thus, we obtain finally our main result.
Theorem 2.6.Let f ∈ B(X) be a macroscopic observable on the space space of a neural automaton as defined in (23).Then f is invariant under the symmetric group S m−1 × S m−1 of Gödel recodings of the automaton's symbolic alphabet.
It is worth mentioning that this procedure gives infinitely many different invariant observables.In fact, any choice of (r, l) ∈ N × N gives a thinner invariant partition, and respectively, a sharper observable.

Neurolinguistic application
As an instructive example we consider a toy model of syntactic language processing as often employed in computational psycholinguistics and computational neurolinguistics (Arbib and Caplan 1979, Crocker 1996, beim Graben and Drenhaus 2012, Hale et al 2022, Lewis 2003).
In order to process the sentence given by beim Graben and Potthast (2014) in example 3.1, linguists often derive a context-free grammar (CFG) from a phrase structure tree (Hopcroft and Ullman 1979).
Example 3.1.the dog chased the cat In our case, the CFG consists of rewriting rules where the left-hand side always presents a nonterminal symbol to be expanded into a string of nonterminal and terminal symbols at the right-hand side.Omitting the lexical rules (27 -29), we regard the symbols NP, V, denoting 'noun phrase' and 'verb', respectively, as terminals and the symbols S ('sentence') and VP ('verbal phrase') as nonterminals.
Then, a versatile shift processing this grammar through a simple top down recognizer (Hopcroft and Ullman 1979)  where the left-hand side of the tape is now called 'stack' and the right-hand side 'input'.In (30) a stands for an arbitrary input symbol.Note the reversed order for the stack left of the dot.The first two operations in (30) are predictions according to a rule of the CFG while the last one is an attachment of subsequent input with already predicted material.This machine then parses the well formed sentence NP V NP as shown in Table 1 from beim Graben and Potthast (2014).We reproduce this table here as Tab. 1.
Table 1.Sequence of state transitions of the versatile shift processing the well-formed string from example 3.1, i.e.NP V NP.The operations are indicated as follows: "predict (X)" means prediction according to rule (X) of the context-free grammar; attach means cancelation of successfully predicted terminals both from stack and input; and "accept" means acceptance of the string as being wellformed.
Once we obtained the versatile shift, an NA simulating it can be generated.When we do so, we chose a particular Gödel encoding of the symbols.Suppose we chose the following two Gödelizations γ = (γ 1 , γ 2 ) and δ = (δ 1 , δ 2 ) that are given by on the one hand, and by on the other hand.Defining the step function f : X → R as in (23) after choosing (l, r) = (2, 3) and the c i -s randomly.The neural automaton consists of n = 72 neurons, i.e. the phase space is given by the hypercube X = [0, 1] 72 .Running the neural network with both encodings and computing the step function f on each iteration i = 1, . . ., 6, we see in Fig. 7 that f is indeed invariant under Gödel recoding.The step function clearly distinguishes among different states (where here by "different" we mean with different patterns of equality), but returns the same value for the states corresponding to the same pattern of equality, that is, states that differ on the Gödel encodings, as desired.
In contrast, if we use Amari's observable Eq. ( 9) for the same simulation, we obtain a very different picture, showing that this observable is not invariant under Gödel recoding, as shown in Fig. 8. Obviously, this observable strongly depends on the particular Gödel encoding we have chosen.

Discussion
In this study we have presented a way of finding particular macroscopic observables for nonlinear dynamical systems that are generated by Gödel encodings of symbolic dynamical systems, such as nonlinear dynamical automata (NDA: beim Graben and Potthast (2014), beim Graben et al (2000;2004;2008)) and their respective neural network implementation, namely, neural automata (NA: Carmantini et al ( 2017)).Specifically, we have investigated under which circumstances such observables could be invariant under any particular choice for the Gödel encoding.
When mapping symbolic dynamics to a real phase space, the numbering of the symbols is usually arbitrary.Therefore, it makes sense to ask which information of the dynamics is preserved or can be recovered from what we see in phase space under the different possible options.In this direction, we have provided a complete characterisation of the strings that are and are not distinguishable after certain Gödel encoding in terms of patterns of equality.We have proven a partition theorem for such invariants.
In the concrete case of NA constructed as in Carmantini et al (2017), which can emulate any Turing Machine, we have a dynamical system for a neural automaton.This system completely depends on the choice of the Gödel numbering for the symbols on the alphabet of the NA.Based on the invariant partition mentioned before, we were able to define a macroscopic observable that is invariant under any Gödel recoding.In fact, by the way we define this observable, the definition is based on an invariant partition according to the length of the strings on the left and right hand side of a dotted sequence compising the machine tape of the NA.This means that each choice of the length of those strings provides a sharper invariant, making strings with different patterns of equality completely distinguishable.It is also important to mention that macroscopic observables in general are not invariant under Gödel recoding.As a particular example, we computed the mean neural network activation originally suggested by Amari (1974) and later employed by Carmantini et al (2017) as a modeled "synthetic ERP" (Barrès et al 2013) in neurocomputing.
In fact, any observable that is invariant under Gödel recoding must be equally defined for points on the phase space corresponding to Gödelizations of strings sharing the same patterns of equality.This could probably provide an important constraint in the finding of other invariant macroscopic observables.
Theoretically, one could run neural automaton under all (or many) possible Gödel encodings and check which observables are preserved by the dynamics and which are not.This could provide important information about the performance of the neural network architecture that is intrinsic of the dynamical system, and not dependant on the choice of the numbering for the codification of the symbols.In practice, the computation of all the permutations of the alphabet grows with the factorial of the alphabet's cardinality, and the computation of invariant partitions even with powers of that number for longer strings.This, of course, would present some practical constraints for large alphabets and sharp invariant observables.
Our results could be of substantial importance for any kind of related approaches in the field of computational cognitive neurodynamics.All models that rely upon the representation of symbolic mental content by means of high-dimensional activation vectors as training patterns for (deep) neural networks (Arbib 1995, Hertz et al 1991, LeCun et al 2015, Schmidhuber 2015), such as vector symbolic architectures (Gayler 2006, Mizraji 1989;2020, Schlegel et al 2021, Smolensky 1990;2006) in particular, are facing the problems of arbitrary symbolic encodings.As long as one is only interested in building inference machines for artificial intelligence, this does not really matter.However, when activation states of neural network simulations have to be correlated with real-word data from experiments in the domains of human or animal cognitive neuroscience and psychology, the given encoding may play a role.Thus, the investigation of invariant observables in regression analyses and statistical modeling becomes mandatory for avoiding possible confounds that could result from a particularly chosen encoding.
These results also have implications in Mathematical and Computational Neuroscience, where the aim is to explain by means of mathematical theories and computational modelling neurophysiological processes as observed in in-vitro and in-vivo experiments via instrumentation devices.Our results forces us to consider the possibility as to what extent (if any) that observations, which motivate the development of models in the literature (e.g Spiking models), are epiphenomenon?To conclude, we express the hope that our study paves the way towards a more a comprehensive research in computational cognitive neurodynamics, mathematical and computational neuroscience where the study of macroscopic observations and its invariant formulation can lead to interesting new insights.4.1.Reproducibility.All numerical simulations that have been presented in Section 3 may be reproduced using the code available at the Github repository https://github.com/TuringMachinegun/Turing_Neural_Networks.The repository contains the code to build the architecture of a neural automaton as introduced in (Carmantini et al (2017)) together with particular examples.The code that computes the invariant partitions given by equality patterns can also be found in the repository.The code allows the user to implement various observables (e.g step function, Amari's observable) in order to test further cases, exploit and further develop our framework.
We must show that it defines a bijection between vertices and that preserves incidence.It is easy to prove that it is a bijection.Namely if w = a i1 a i2 . . .a in and u = b i1 . . .b i k are any two vertices of the tree T γ then g γ (w) = g γ (u) implies that both strings must have the same length, hence n = k.And since γ(a ij ) = γ(b ij ) and γ is a bijection, we must have a ij = b ij so that w = u.Moreover, for any v = l 1 . . .l n ∈ T there is z = a γ −1 (l1) . . .a γ −1 (ln) which is clearly mapped to v through g γ .
The only thing that is left to show is that g γ preserves incidence.That is, that given w ∈ A * and a ∈ A, then g γ (wa) = g γ (w)l for some l ∈ {0, . . ., m − 1}.But this is also clear from the definition of g γ .
of Lemma 2.2.Let us suppose that d(p, q) ≤ 1 m n .This means that at least the first n symbols in both strings are equal.Then, if ψ is a Gödel encoding defined by the asignment γ : A → M we have that Let us put r = γ(a i ) 1 m i < 1 m n+1 , we get that ψ(p) ∈ k m n , k+1 m n .Since q is equal to p on at least the first n strings we will also have ψ(q) = r + ∞ i=n+1 γ(b i ) 1 m i and by the same reason it will be on the same interval.
For the other implication, if we have two real numbers ψ(p) and ψ(q) after encoding some infinite strings p and q, we want to show that if they are on some interval of the type k m n , k+1 m n , then they have the same prefix of at least length n.We can always write those numbers as ψ(p) = k m n + r 1 and ψ(q) = k m n + r 2 with r 1 , r 2 < 1 m n+1 .If we write the number k in its m-adic expansion, it will be uniquely determined by l 1 , . . ., l n ∈ {0, . . ., m − 1}, and each number r i can be writen as a series by r i = j=n+1 l ji 1 m j , for i = 1, 2. Taking the inverse images of each l i , that is γ −1 (l i ) = a i we will obtain that p = a 1 . . .a n a n+1 . . .and q = a 1 . . .a n b n+1 . . . .That is, they are at least at distance 1 m n .
To show the other direction, it suffices to define π so that g π (w) = u.That is, if u = c 1 . . .c n , let us define π(a i ) = c i and let us send the a j ∈ M not appearing in w to the c j -s not appearing in u in a bijective way.This can be done, it is well defined by condition Eq. ( 14), and it defines a bijection on M by construction.
Note that since we have enlarged our alphabet with the ⊔ symbol, and since both our original encoding and the one permuted by π send this symbol to 0, if (y 1 , y 2 ) is decoded as (c 1 , . . ., c l ) and (d 1 , . . ., d r ), the possible 0s appearing at the end of each encoding (indicating that the string has smaller length than l and/or r) will remain being 0-s, and therefore the point will not be mapped to a point encoding longer strings.
After applying ρ π ′ , we will obtain ρ π ((y 1 , y 2 )) ∈ E (i ′ ,j ′ ) for some i ′ , j ′ .However, since ρ π ′ is defined through g π ′ 1 and g π ′ 2 both E (i,j) and E (i ′ ,j ′ ) belong to the same P ij .That is, they have the same pattern of equality.Hence, by the definition of f we obtain that f (x) = f (ρ π (x)), and we are done.

Figure 1 .
Figure 1.The vocabulary A * as a rooted tree.

wFigure 3 .
Figure 3. Cylinder set corresponding to w seen on the tree.

Figure 4 .
Figure 4. Invariant partition of the cylinder sets according to their patterns of equality.

Figure 7 .
Figure 7.The macroscopic observable f , given by the step function (23) is invariant under Gödel recoding.The figure shows the result of 'measuring' f to a neural automaton encoded by γ on top and to the same machine encoded by δ below.