Mathematical Derivation of Wave Propagation Properties in Hierarchical Neural Networks with Predictive Coding Feedback Dynamics

Sensory perception (e.g., vision) relies on a hierarchy of cortical areas, in which neural activity propagates in both directions, to convey information not only about sensory inputs but also about cognitive states, expectations and predictions. At the macroscopic scale, neurophysiological experiments have described the corresponding neural signals as both forward and backward-travelling waves, sometimes with characteristic oscillatory signatures. It remains unclear, however, how such activity patterns relate to specific functional properties of the perceptual apparatus. Here, we present a mathematical framework, inspired by neural network models of predictive coding, to systematically investigate neural dynamics in a hierarchical perceptual system. We show that stability of the system can be systematically derived from the values of hyper-parameters controlling the different signals (related to bottom-up inputs, top-down prediction and error correction). Similarly, it is possible to determine in which direction, and at what speed neural activity propagates in the system. Different neural assemblies (reflecting distinct eigenvectors of the connectivity matrices) can simultaneously and independently display different properties in terms of stability, propagation speed or direction. We also derive continuous-limit versions of the system, both in time and in neural space. Finally, we analyze the possible influence of transmission delays between layers, and reveal the emergence of oscillations.


Introduction
The brain's anatomy is characterized by a strongly hierarchical architecture, with a succession of brain regions that process increasingly complex information. This functional strategy is mirrored by the succession of processing layers found in modern deep neural networks (and for this reason, we use the term "layer" in this work to denote one particular brain region in this hierarchy, rather than the laminar organization of cortex that is well-known to neuroscientists). The hierarchical structure is especially obvious in the organization of the visual system (Felleman and Essen 1991), starting from the retina through primary visual cortex (V1) and various extra-striate regions, and culminating in temporal lobe regions for object recognition and in parietal regions for motion and location processing.
In this hierarchy of brain regions, the flow of information is clearly bidirectional: there are comparable number of fibers sending neural signals down (from higher to lower levels of the hierachy) as there are going up (Bullier 2001). While the bottom-up or "feed-forward" propagation of information is easily understood as integration of sensory input (and matches the functional structure found in artificial deep learning networks), the opposite feedback direction of propagation is more mysterious, and its functional role remains unknown.
Predictive coding is one dominant theory to explain the function of cortical feedback (Rao and Ballard 1999). Briefly, the theory states that each layer in the cortical hierarchy generates predictions about what caused their own activity; these predictions are sent to the immediately preceding layer, where a prediction error can be computed, and carried forward to the original layer, which can then iteratively update its prediction. Over time (and as long as the sensory input does not change), the system settles into a state where top-down predictions agree with bottom-up inputs, and no prediction error is transmitted. Like any large-scale theory of brain function, the predictive coding theory is heavily debated (Millidge et al. 2021). But macroscopic (EEG) experiments have revealed characteristic propagation signatures that could be hallmarks of predictive coding. For instance, Alamia and VanRullen (2019) showed evidence for alpha-band (7-15 Hz) oscillatory travelling waves propagating in both directions (feed-forward and feedback); the oscillation frequency and dynamics were compatible with a simplistic hierarchical model that included a biologically plausible time delay for transmitting signals between layers, and was also confirmed by a rudimentary mathematical model. In another study, Bastos et al. (2012Bastos et al. ( , 2015 found that beta (15-30 Hz) and gamma-frequency (30-100 Hz) oscillations could reflect, respectively, the predictions and prediction errors signals carried by backward and forward connections.
More recently, predictive coding has been explored in the context of deep neural networks (Wen et al. 2018;Choksi et al. 2021;Pang et al. 2021). For instance, Choksi et al. (2021) augmented existing deep convolutional networks with feedback connections and a mechanism for computing and minimizing prediction errors, and found that the augmented system displayed more robust perception, better aligned with human abilities. In another study, Pang et al. (2021) used a similar system and reported the emergence of illusory contour perception comparable to what humans (but not standard deep neural networks) would typically perceive.
While the concept of predictive coding is potentially fundamental for understanding brain function, and its large-scale implementation in deep artificial neural networks provides empirical support for its potential functional relevance, there is a gap of theoretical knowledge about the type of brain activity that predictive coding could engender, and the potential conditions for its stability. Here, we propose a mathematical framework where a potentially infinite number of neuronal layers exchange signals in both directions according to predictive coding principles. The stable propagation of information in such a system can be explored analytically as a function of its initial state, its internal parameters (controlling the strength of inputs, predictions, and error signals) and its connectivity (e.g., convolution kernels). Our approach considers both a discrete approximation of the system, as well as continuous abstractions. We demonstrate the practical relevance of our findings by applying them to a ring model of orientation processing. Finally, we extend our analytical framework to the case where communication delays between successive layers are included. This gives rise to oscillatory signals at frequencies consistent with those observed in the brain.

Model Description
Our initial model is inspired by the generic formulation of predictive coding proposed in the context of deep learning models by Choksi et al. (2021). This formulation considers different update terms at each time step: feed-forward inputs, memory term, feedback-and feed-forward prediction error corrections. By modulating the hyperparameters controlling each of these terms, the model can be reconciled with different formulations of predictive coding (for instance, the Rao and Ballard Rao and Ballard (1999) model by setting the feed-forward input term to zero) or other models of hierarchical brain function (e.g., similar to Heeger's model (Heeger 2017) by setting the feed-forward error correction to zero). Indeed, our objective is precisely to characterize the propagation dynamics inside the network as a function of the relative value of these hyper-parameters, which in turn alters the model's functionality.
We consider the following recurrence equation where E n j ∈ R d represents an encoder at step n and layer j is a d × d square matrix representing the weights of feedforward connections which we assume to be the same for each layer such that W f E n+1 j−1 models an instantaneous feedforward drive from layer j − 1 to layer j, controlled by hyperparameter β. The term F n j−1 encodes a feedforward error correction process, controlled by hyper-parameter α, where the reconstruction error R n j−1 at layer j − 1, defined as the square error between the representation E n j−1 and the predicted reconstruction W b E n j , that is R n j−1 := propagates to the layer j to update its representation. Here, W b ∈ M d (R) is a d × d square matrix representing the weights of feedback connections which we assume to be the same for each layer. Following (Rao and Ballard 1999;Choksi et al. 2021;Wen et al. 2018;Alamia and VanRullen 2019), the contribution F n j−1 is then taken to be the gradient of R n j−1 with respect to E n j , that is On the other hand, B n j incorporates a top-down prediction to update the representation at layer j. This term thus reflects a feedback error correction process, controlled by hyper-parameter λ. Similar to the feedforward process, B n j is defined as the the gradient of R n j with respect to E n j , that is As a consequence, our model reads for each j = 1, . . . , J − 1 and n ≥ 0 where we denoted I d the identity matrix of M d (R). We supplement the recurrence Eq. (1) with the following boundary conditions at layer j = 0 and layer j = J . First, at layer j = 0, we impose where S n 0 ∈ R d is a given source term, which can be understood as the network's constant visual input. At the final layer j = J , there is no possibility of incoming top-down signal, and thus one gets for some given initial sequence (H j ) 0,...,J . For instance, in Choksi et al. (2021), H j was initialized by a first feedforward pass through the system, i.e., β > 0 and α = λ = 0. Throughout we assume the natural following compatibility condition between the source terms and the initial condition, namely Regarding the hyper-parameters of the problem we assume that 0 ≤ β < 1, with 0 ≤ α + λ ≤ 1.
Our key objective is to characterize the behavior of the solutions of the above recurrence Eq. (1) as a function of the hyper-parameters and the feedforward and feedback connections matrices W f and W b . We would like to stay as general as possible to encompass as many situations as possible, keeping in mind that we already made strong assumptions by imposing that the weight matrices of feedforward and feedback connections are identical from one layer to another and that we only consider a linear model although deep neural networks are intrinsically nonlinear. Motivated by concrete applications, we will mainly consider matrices W f and W b which act as convolutions on R d .

The Identity Case
It turns out that we will gain much information by first treating the simplified case where W f and W b are both identity. That is, from now on, and throughout this section we assume that That is, each neuron in a layer is only connected to the corresponding neuron in the immediately preceding and following layer, with unit weight in each direction. Under such a setting, the recurrence Eq. (1) reduces to a scalar equation (Fig. 1), that is e n+1 j = βe n+1 j−1 + αe n j−1 + (1 − β − λ − α)e n j + λe n j+1 , j = 1, . . . , J − 1, (7) with this time the unknown e n j ∈ R, together with e n 0 = s n 0 , n ≥ 0, and Fig. 1 Schematic illustration of the network structure of model (7) where each point represents a given neuronal layer index j (x-axis) at a particular time step n (y-axis), and the red arrows indicate the contributions leading to the update of e n+1 j (Color figure online)

Wave Propagation on an Infinite Depth Network
It will be first useful to consider the above problem set on an infinite domain and look at given some initial sequence This situation has no direct equivalent in practical deep neural networks (nor in the brain), where the number of hierarchically connected layers is necessarily finite; but it is a useful mathematical construct. Indeed, such recurrence equations set on the integers Z are relatively well understood from the mathematical numerical analysis community. The behavior of the solution sequence (e n j ) j∈Z can be read out from the so-called amplification factor function defined as and which relates spatial and temporal modes. Indeed, formally, the sequence (ρ(θ ) n e i jθ ) j∈Z is an explicit solution to (11) for each θ ∈ [−π, π]. Actually one can be much more precise and almost explicit in the sense that one can relate the expression of the solutions to (11) starting from some initial sequence (h j ) j∈Z to the properties of ρ in a systematic way that we now briefly explain. Let us first denote by G n = (G n j ) j∈Z the sequence which is the fundamental solution of (11) in the special case where (H j ) j∈Z is the Dirac delta sequence δ. The Dirac delta sequence δ is defined as δ 0 = 1 and δ j = 0 for all j ∈ Z\{0}. As a consequence, we have G 0 = δ and for each n ≥ 0 The starting point of the analysis is the following representation formula, obtained via inverse Fourier transform, which reads Then, given any initial sequence (h j ) j∈Z , the solution (e n j ) j∈Z to (11) can be represented as the convolution product between the initial sequence and the fundamental solution, namely That is, having characterized the fundamental solution for a simple input pattern (δ), with a unitary impulse provided to a single layer, we can now easily generalize to any arbitrary input pattern, by applying the (translated) fundamental solution to each layer. Our aim is to understand under which conditions on the hyper-parameters we can ensure that the solutions of (11) given through (14) remain bounded for all n ≥ 1 independently of the choice of the initial sequence (h j ) j∈Z . More precisely, we introduce the following terminology. We say that the recurrence equation is stable if for each bounded initial sequence (h j ) j∈Z ∈ ∞ (Z), the corresponding solution (e n j ) j∈Z given by (14) satisfies On the other hand, we say that the recurrence equation is unstable if one can find a bounded initial sequence (h j ) j∈Z ∈ ∞ (Z) such that the corresponding solution (e n j ) j∈Z given by (14) satisfies Finally, we say that the recurrence equation is marginally stable if there exists a universal constant C > 0 such that for each bounded initial sequence (h j ) j∈Z ∈ ∞ (Z), the corresponding solution (e n j ) j∈Z given by (14) satisfies It turns out that one can determine the stability properties of the recurrence equation by solely looking at the amplification factor function. Indeed, from Riesz and Nagy (1955), we know that where we have set As a consequence, we directly deduce that the recurrence equation is stable when  −π, π] such that for all θ ∈ [−π, π]\ {θ 1 , . . . , θ k } one has |ρ(θ)| < 1 and |ρ(θ k )| = 1 for each k = 1, . . . , K . Furthermore, assume that there exist c k ∈ R, σ k ∈ C with Re(σ k ) > 0 and an integer μ k ≥ 1 such that Then the recurrence equation is marginally stable.
Based on the above notions of stability/instability, we see that the only interesting situation is when the recurrence equation is marginally stable, and thus when the amplification function is contained in the unit disk with finitely many tangent points to the unit circle with prescribed asymptotic expansions. This is also the only interesting situation from a biological standpoint, as it ensures that the network remains active, yet without runaway activations.

Study of the Amplification Factor Function
Since we assumed that 0 ≤ β < 1, the denominator in (12) never vanishes and is well-defined. Next, we crucially remark that we always have We will now check under which conditions |ρ(θ)| ≤ 1 for all θ ∈ [−π, π] to guarantee marginal stability of the recurrence equation.
To assess stability, we compute such that |ρ(θ)| 2 ≤ 1 is equivalent to and since 1 − cos(θ ) ≥ 0 we need to ensure and evaluating at ±π the above inequality we get But we remark that the above expression can be factored as As a consequence, |ρ(θ)| 2 ≤ 1 if and only if λ + α ≤ 1. This is precisely the condition that we made in (6). We can actually track cases of equality which are those values of θ ∈ [−π, π] for which we have We readily recover that at θ = 0 we have |ρ(0)| = 1. So, now assuming that θ = 0, we need to solve which we write as and using the previous factorization we get π] in the case where β = 0 and β = 0. a and b Amplification factor function ρ(θ) (blue curve) with a unique tangency point on the unit circle at z = 1 corresponding θ = 0. c and d When α + λ = 1 the function ρ(θ) (blue curve) has two tangency points on the unit circle at z = 1 corresponding θ = 0 and z = −1 corresponding to θ = ±π (Color figure online) and we necessarily get that both 1 + cos(θ ) = 0 and 1 − λ − α = 0 must be satisfied. As consequence, |ρ(±π)| = 1 if and only if 1 = λ + α.
As a summary we have obtained that: We present in Fig. 2 several representative illustrations of the spectral curves ρ(θ) for various values of the hyper-parameters recovering the results explained above. Furthermore, near θ ∼ 0, we get that ρ admits the following asymptotic expansion In fact, since −(λ − α) 2 ≥ −(λ + α) 2 as both α and λ are positive, we remark that Finally, we remark that when α + λ = 1 we have From now on, we denote and we always assume that σ 0 > 0, and σ π > 0, which is equivalent to assume that 0 < α < 1 and 0 < λ < 1.
Here, (c 0 , σ 0 ) and (c π , σ π ) are derived, respectively, from the asymptotic expansions of the amplification factor function ρ(θ) near θ = 0 and θ = π , as defined above. On the one hand c 0 reflects the propagation speed of the solution associated with ρ(0), while σ 0 can be understood as its spatio-temporal spread (and similarly for the solution potentially associated with ρ(π)). In the following, we explore the fundamental solutions of this system for various values of its hyper-parameters.

Turning Off the Instantaneous Feedforward Connections: Caseˇ= 0
We first investigate the case where there is no instantaneous feedforward connections in the network, that is we set β = 0. This case, although less generic, is compatible with the prominent Rao-Ballard formulation of predictive coding (Rao and Ballard 1999), in which feedforward connections-after contributing to setting the initial network activity-only convey prediction errors, as captured by the hyper-parameter α. In that case, the model is fully explicit: the update at time step n + 1 only depends on the internal states at the previous step n since we simply have As we assumed that α + λ ≤ 1, the right-hand side of the recurrence equation is a positive linear combination of elements of the sequence (e n j ) such that we have positivity principle of the solution, namely Furthermore, since the recurrence equation is explicit, we have finite speed propagation, in the following sense. Recall that when β = 0, the fundamental solution G n is solution to starting from G 0 = δ. Finite speed of propagation then refers to the property that G n j = 0, | j| > n. propagation along a Gaussian profile whose leading profile is given by When α < λ there is a leftward propagation along a Gaussian profile whose leading profile is given by This in turn implies that necessarily c 0 ∈ (−1, 1) which is readily seen from the explicit formula c 0 = α − λ in that case. Actually, it is possible to be more precise and to give a general expression for the fundamental solution. Roughly speaking, each G n j ressembles a discrete Gaussian distribution centered at j = c 0 n and we refer to the recent theoretical results of Diaconis and Saloff-Coste (2014), Randles and Saloff-Coste (2015), Coeuret (2022) and Coulombel and Faye (2022) for a rigorous justification.
Essentially, the results can be divided into two cases depending on whether or not α +λ = 1. As can be seen above, the special case α +λ = 1 results in a cancellation of the "memory" term, such that a neuronal layer j's activity does not depend on its own activity at the previous time step, but only on the activity of its immediate neighbors j − 1 and j + 1. More precisely, we have the following: • Case: 0 ≤ λ + α < 1. The fundamental solution can be decomposed as where the remainder term satisfies an estimate for some universal constants C, κ > 0 which only depend on the hyper-parameters and not n and j. In Fig. 3a, we represented the fundamental solution G n j at different time iterations (circles) in the case λ < α where there is rightward propagation with c 0 > 0 and compared it with the leading order fixed Gaussian profile centered at j = c 0 n (plain line). On the other hand, in Fig. 4, panels (a-c), we illustrate the above results by presenting a space-time color plot of the fundamental solution rescaled by a factor √ n. We observe rightward (respectively leftward) propagation with c 0 > 0 (respectively c 0 < 0) when λ < α (respectively α < λ), while when α = λ we have c 0 = 0 and no propagation occurs.
• Case: λ + α = 1. In this case, we first note that we have c 0 = c π together with σ 0 = σ π and where the remainder term satisfies an estimate for some universal constants C, κ > 0. In Fig. 3b, we represented the fundamental solution G n j at different time iterations (circles) in the case α < λ where there is leftward propagation with c 0 < 0 and compared it with the leading order fixed Gaussian profile centered at j = c 0 n (plain line). Similarly as in the previous case, in Fig. 4, panels (d-f), we illustrate the above results by presenting a space-time color plot of the fundamental solution rescaled by a factor √ n. The direction of propagation still depends on the sign of c 0 and whether or not λ ≶ α. Unlike the case α + λ < 1, we observe a tiled pattern where G n j = 0 for even or odd integers alternatively for each time step.
As a partial intermediate summary, we note that the sign of c 0 (directly related to the sign of α − λ) always indicates in which direction the associated Gaussian profile propagates. Namely if α > λ and c 0 > 0 (resp. α < λ and c 0 < 0) there is rightward (resp. leftward) propagation. Intuitively, this behavior reflects the functional role of each hyper-parameter, with α and λ controlling feed-forward and feed-back prediction error correction, respectively. When α = λ, the two terms are equally strong, and there is no dominant direction of propagation. In addition, when λ + α = 1, the Gaussian profile is oscillating because of the presence of (−1) n+ j . As will be seen later when considering continuous versions of our model, this oscillatory pattern arises here as a consequence of discrete updating.
Finally, we note that the fundamental solution sequence (G n j ) j∈Z is uniformly integrable for all values of the parameters, that is there exists some universal constant C > 0, depending only on the hyper-parameters such that As a consequence, since given any bounded initial sequence (h j ) j∈Z ∈ ∞ (Z), the solution (e n j ) j∈Z to (11) can be represented as the convolution product between the Fig. 4 Illustration of the evolution of the rescaled solution sequence ( √ n G n j ) j∈Z starting from the Dirac delta sequence at j = 0 in the case β = 0. First row: α + λ < 1 and second row: α + λ = 1. When λ ≶ α, we observe a rightward/leftward propagation while when α = λ no propagation occurs. In all panels, the pink curve is given by j = nc 0 , clearly illustrating the fact that c 0 measures the propagation speed of the solution. Note that in the case β = 0 and α + λ = 1, we have c 0 = c π which results in the tiled patterns observed in panels (d-f) (Color figure online) initial sequence and the fundamental solution, namely we readily deduce that the solution (e n j ) j∈Z is uniformly bounded with respect to n, that is there exists some universal constant denoted C > 0, such that This is exactly our definition of marginal stability.

Turning On the Instantaneous Feedforward Connections: Caseˇ> 0
We now turn to the general case where β > 0. That is, the feed-forward connections continue to convey sensory inputs at each time step following the network initializing, and β controls the strength of these signals. In that case, the recurrence equation is no longer explicit but implicit and the positivity property together with the finite speed propagation no longer hold true in general. Indeed, upon introducing the shift operator we remark that Eq. (11) can be written as with e n = (e n j ) j∈Z . Since 0 < β < 1 and |S −1 | q (Z)→ q (Z) = 1 for any q ∈ [1, +∞], the operator Id − βS −1 is invertible on q (Z) for any q ∈ [1, +∞] with inverse As a consequence, the recurrence equation can be recast as a convolution operator across the network layers with infinite support, namely From the above expression, we readily deduce that the positivity of the solution is preserved whenever 0 < β < 1 − λ − α. Furthermore, for the fundamental solution starting from the Dirac delta solution which solves we only have that which implies that −1 < c 0 , c π < +∞. Indeed, from the formula of c 0 we get that Once again, as in the case with β = 0, we can characterise the behavior of the fundamental solution by using the combined results of Coeuret (2022) and Coulombel and Faye (2022).
• Case: 0 ≤ λ + α < 1. There exist some universal constants C, κ > 0 and L > 0 such that where the remainder term satisfies a Gaussian estimate While for j > nL we simply get a pure exponential bound Inspecting the formula for c 0 , we notice that when α + β ≶ λ we have c 0 ≶ 0 and the wave speed vanishes precisely when α + β = λ. This is illustrated in Fig. 5, where we see that α and β, both propagating signals in the forward (rightward) direction, compete with λ carrying the feedback (leftward) prediction signals; this competition determines the main direction of propagation of neural activity in the system. • Case: λ + α = 1. What changes in that case is the existence of a secondary wave with associated wave speed c π whose sign depends on the competition between α and β + λ. When α < β + λ then we have c π < 0, and the competition between λ and β + α will determine the sign of c 0 , as illustrated in panels (a-c) of Fig. 6. On the other hand, when β + λ < α implying that c π > 0, we note that α + β > λ and thus c 0 > 0. In that case, the explicit formula for c π and c 0 shows that 0 < c π < c 0 and the secondary wave associated to c π is slower to propagate into the network, see Fig. 6d. Finally, when β + λ = α we have 0 = c π < c 0 and the secondary wave is blocked, see Fig. 6e.
We have summarized in the diagram of Fig. 6f all possible configurations for the sign of the wave speeds c 0 and c π when β ∈ (0, 1) as a function of α and β when α +λ ≤ 1.
As explained previously, c 0 changes sign precisely when λ = α + β (blue line), while the secondary wave speed c π only exists when α + λ = 1 and changes sign precisely when β + λ = α. We notably observe that when β is increased the region of parameter space where c 0 < 0 diminishes while the region of parameter space where c π < 0

Fig. 6
Effects of turning on β > 0 when α + λ = 1. We now observe a secondary wave with associated wave speed c π whose sign depends on the competition between α and β + λ. a-c When α < β + λ, the wave speed of the secondary wave always verifies c π < 0, and the competition between λ and β + α gives the direction of the primary wave as previously reported in Fig. 5. d When β + λ < α which always implies that α + β > λ, we have 0 < c π < c 0 traducing forward propagation for both waves. We remark that the secondary wave is slower. e When β + λ = α which also implies that α + β > λ, we get 0 = c π < c 0 such that the secondary wave is blocked. f Summary of the sign of the wave speeds c 0 and c π for fixed β > 0 as a function of α and β with α + λ ≤ 1 (Color figure online) increases, indicating that for high values of β the primary wave is most likely to be forward while the secondary wave is most likely to be backward.

Wave Propagation on a Semi-infinite Network with a Forcing Source Term
Now that we have understood the intrinsic underlying mechanisms of wave propagation for our model (7) set on an infinite domain, we turn to the case where the network is semi-infinite. That is, the network admits an input layer that is only connected to the layer above. The problem now reads We see that the system depends on the source term s n 0 applied to its input layer at each time step, also called a boundary value, and on the starting activation value (h j ) applied to each layer at the initial time point, also called the initial value. In fact, the linearity principle tells us that the solutions of the above problem can be obtained as the linear superposition of the solutions to the following two problems, the boundary value problem, where all layers except the input layer are initialized at zero: and the initial value problem, where the input layer source term is set to zero for all time steps: Subsequently, the generic solution sequence (e n j ) j≥1 can be obtained as e n j = f n j + g n j , j ≥ 1, n ≥ 1.

The Initial Value Problem (17)
It is first natural to investigate the initial value problem (17) since it is really close to the infinite network case of the previous section. Here, we consider the effect of the initial value assigned to each layer j > 0 at the first time step (n = 0), except the input layer ( j = 0) which is set to zero. The dynamics of (17) is still read out from the amplification factor function ρ defined in (12) and once again the solutions to (17) can be obtained as the convolution of the initial sequence with the fundamental solution associated to the problem. For j 0 ≥ 1, we denote by δ j 0 the Dirac delta sequence defined as δ j 0 j 0 = 1 and δ j 0 j = 0 for all j ≥ 1 and j = j 0 . Correspondingly, we denote by G n ivp (·, j 0 ) = (G n ivp ( j, j 0 )) j≥1 the solution to (17) starting from δ j 0 , and let us remark that the solutions to (17) starting from any initial condition (h j ) j≥1 can be represented as Combining the results of Coulombel and Faye (2022) and Coeuret (2022) together with those of Coulombel and Faye (2021), Coeuret (2023), Tadmor (1985, 1987) which precisely deal with recurrence equations with boundary conditions, one can obtain very similar results as in the previous case. The very first obvious remark that we can make is that for all j, j 0 ≥ 1 and 1 ≤ n < j 0 we have Space-time evolution of the rescaled solution sequence ( √ n G n ivp ( j, j 0 )) j≥1 to (17) starting with a Dirac delta sequence at j 0 = 25 in different cases with leftward propagation.
meaning that it takes n = j 0 iterations before the solution arrives at the boundary j = 0 and for 1 ≤ n < j 0 the problem is similar to the one set on the infinite network. This behavior is illustrated in Fig. 7 for several values of the hyper-parameters where we represent the spatio-temporal evolution of the rescaled solution sequence ( √ n G n ivp ( j, j 0 )) j≥1 . We clearly observe a Gaussian behavior before the solution reaches the boundary. And for all n ≥ j 0 , we can write where G n bl ( j, j 0 ) is a remainder term generated by the boundary condition at j = 0. It is actually possible to bound G n bl ( j, j 0 ) in each of the cases treated above. When β = 0 and α + λ < 1 with α < λ such that c 0 < 0, then G n bl ( j, j 0 ) is well approximated by this is illustrated in Fig. 8 in the case c 0 < 0. On the other hand for α + λ = 1 with α < λ such that c 0 < 0, then G n bl ( j, j 0 ) is well approximated by (17) in the case where β = 0 and α + λ < 1 with α < β. a Visualizations of the solution G n ivp ( j, j 0 ) (circles) at different time iterations. The plain lines correspond to the Gaussian approximation and remark the presence of a boundary layer (seen as a mismatch between the circles and the Gaussian lines approximation). b We represent the boundary layer by plotting (circles and we compare it to our boundary layer approximation − 1 When 0 < β < 1 and α + λ < 1 the approximations are similar as for the case with β = 0. We thus need to discuss three cases.
• Case −1 < c π < c 0 < 0. In that case, we have for 1 ≤ j ≤ j 0 that with an exponential bound for j > j 0 . This situation is presented in Fig. 7c • Case −1 < c π < 0 < c 0 . In this case we have

The Boundary Value Problem (16)
We now turn our attention to the boundary value problem (16) where the network is initialized with zero activity, for all layers except the input. Motivated by applications, we will only focus on the case where s n 0 = s 0 ∈ R for all n ≥ 0 (i.e., a constant sensory input) and thus study: Case β = 0. Here, the stimulus information s o does not directly propagate through the network via its feedforward connections (since β = 0), but may still propagate towards higher layers j > 0 via the feedforward prediction error correction mechanism, governed by parameter α. When α + λ ≤ 1, we distinguish between three cases.
Here and throughout, we denote by erf the error function defined by • Case α < λ. In this case we have It is interesting to note that the sequence s 0 is a stationary solution to (18) and we have uniform convergence at exponential rate toward this stationary solution, that is We illustrate this uniform convergence in Fig. 9a and d.
• Case α = λ. We have In this case, we observe a slow convergence to the steady state s 0 . Indeed, for each δ ∈ (0, 1/2) we have The propagation is thus diffusive along j ∼ √ n. This can be seen in Fig. 9b and e. • Case λ < α. In this case we have In this case, we deduce that we have local uniform convergence towards the steady state s 0 , actually we have spreading at speed c 0 . More precisely, for any c ∈ (0, c 0 ) we have while for any c > c 0 , we get lim n→+∞ sup j≥cn g n j = 0.
We refer to Fig. 9c and f for an illustration. The figure clearly shows the competition between hyperparameters α and λ, with forward propagation of the sensory input only when α ≥ λ.
Case 0 < β < 1. Here, the stimulus information s o propagates through the network not only via its feedforward connections (governed by β > 0) but also via the feedforward prediction error correction mechanism, governed by parameter α. In the case where α + λ ≤ 1, the results from the case β = 0 remain valid, the only differences coming from the fact that the above approximations in the case λ ≤ α are only valid for 1 ≤ j ≤ Ln for some large constant L > 0 with exponential localized bounds for j ≥ Ln and that the steady state is now s 0 α+β λ j j≥1 whenever α + β < λ. This confirms that the feedforward propagation of the input s 0 is now dependent on both terms α and β, jointly competing against the feedback term λ. Let us remark that when 0 < β < α − λ and in the special case α + λ = 1, where a second stable point exists for the amplification factor function at ρ(π), we can get a slightly more accurate description of the solution in the form where the remainder term satisfies an estimate of the form This is illustrated in Fig. 10. It should be noted here that, while the main wavefront reflecting solution c 0 is a generic property of our network in the entire range of validity of parameters 0 ≤ β < 1 and α + λ ≤ 1, the second oscillatory pattern reflecting c π only appears in the special case of β = 0 and α + λ = 1. This oscillation is, in fact, an artifact from the discrete formulation of our problem, as will become evident in the next section, where we investigate continuous formulations of the problem.

Towards Continuous Predictive Models
Starting from a discrete approximation of our system made sense, not only for mathematical convenience but also because artificial neural networks and deep learning systems implementing similar predictive coding principles are intrinsically discrete. Nonetheless, it can be useful to discard this discrete approximation and investigate our system in the continuous limit. Note that in the following, we will explore continuous extensions of our model in both time and space. Biological neural networks, like any physical system, operate in continuous time and thus it is more biologically accurate to relax the temporal discretization assumption. This is what we do in the first part of this section. In the spatial domain, however, the discretization of our system into successive processing layers was not just an approximation, but also a reflection of the hierarchical anatomy of the brain. Nonetheless, we can still represent neuronal network depth continuously, even if only as a mathematical abstraction. This is what we will do in the subsequent part of this section. Understanding such continuous limits can allow us to test the robustness of our framework, and to relate it to canonical models whose dynamics have been more exhaustively characterized.

Continuous in Time Interpretation
As a first step, we present a continuous in time interpretation of the model (15). We let t > 0 be some parameter which will represent a time step and reformulate the recurrence equation as We now interpret e n j as the approximation of some smooth function of time e j (t) evaluated at t n := n t, that is e n j ∼ e j (t n ). As a consequence, we get that we get at the limit t → 0 the following lattice ordinary differential equation When defined on the infinite lattice Z, one can represent the solutions as starting from the initial sequence e j (t = 0) = (h j ) j∈Z where (G j (t)) j∈Z is the fundamental solution to (20) starting from the Dirac delta sequence δ. Once again, each G j (t) can be represented by the inverse Fourier transform and reads where the function ν(θ) is defined as The function ν(θ) serves as an amplification factor function for the time continuous Eq. (20). To ensure stability, 1 one needs to impose that Re(ν(θ )) ≤ 0 for each θ ∈ [−π, π]. From its formula, we obtain that such that we deduce that Re(ν(0)) = 0 and Re(ν(θ )) < 0 for all θ ∈ [−π, π]\{0}. In particular, it is now evident that, contrary to the discrete case, ν(π) cannot be a stable solution for the continuous system (except in the trivial case where all hyperparameters α, β, λ are zero). This confirms that the previously observed oscillations associated with ρ(π) in specific cases were merely an artifact of the temporal discretization. We note that, near the tangency point at θ = 0, the function ν(θ) has the following asymptotic expansion It is also possible to prove a Gaussian approximation in that case, and following for example (Besse et al. 2022), we have for some universal constants C > 0 and κ > 0. Here, c 0 and σ 0 are given by We remark that both c 0 and σ 0 are linked to c 0 and σ 0 (the propagation speed and spread of the solution in the case of the discrete model) in the following sense We also note that the spatially homogeneous solutions of (20) are trivial in the sense that if we assume that e j (t) = e(t) for all j ∈ Z then the equation satisfied by e(t) is simply Finally, we conclude by noticing that in this continuous in time regime, there is no possible oscillations either in space or time, in the sense that the fundamental solution always resembles a fixed Gaussian profile advected at wave speed c 0 . The formula for c 0 highlights the intuitive functional relation between the propagation (or advection) direction and the "competition" between the feedforward influences α + β and the feedback influence λ.

Fully Continuous Interpretation: Both in Time and Depth
In this section, we give a possible physical interpretation of the discrete model (15) via continuous transport equations, in which both time and space (i.e., neuronal network depth) are made continuous. Let us introduce t > 0, x > 0 and set ν := x t . As before, we can view t as a time step for our system; additionally, x can be viewed as a spatial step in the (continuous) neuronal depth dimension, and thus ν becomes akin to a neural propagation speed or a conduction velocity. We then rewrite the recurrence equation as The key idea is to now assume that e n j represents an approximation of some smooth function e(t, x) evaluated at t n := n t and x j := j x, that is e n j ∼ e(t n , x j ). Then passing to the limit t → 0, x → 0 with x t = ν > 0 fixed and assuming that β + α = λ, one gets the partial differential equation with boundary condition e(t, satisfying the compatibility condition s 0 (0) = h(0) where s 0 (t) is a smooth function such that s 0 (t n ) = s n 0 and h(x j ) = h j . The above partial differential equation is a transport equation with associated speed ν(β+α−λ) 1−β = νc 0 . Depending on the sign of c 0 , we have a different representation for the solutions of (21).
• Case c 0 < 0. Solution is given by Let us remark that when c 0 < 0 the trace of the solution at x = 0 is entirely determined by the initial data h(x) since Intuitively, this reflects the dominance of backward (leftward) propagation in this network, with solutions determined entirely by the initial value h(x), even for x = 0 (the source term, s 0 (t), having no influence in this case).
• Case c 0 > 0. Solution is given by Intuitively, this reflects the dominance of forward (rightward) propagation in this network, with both the source term s 0 (t) and the initial values h(x) transported at constant velocity νc 0 .
Thanks to the explicit form of the solutions, we readily obtain many qualitative properties of the solution e(t, x). Boundedness and positivity of the solutions are inherited from the functions s 0 (t) and h(x). In the case where β+α = λ (i.e., with balanced feedforward and feedback influences), the limiting equation is slightly different. Indeed, in this case, introducing δ := x 2 t , which can be interpreted as an effective diffusivity, and letting t → 0, x → 0 with δ > 0 fixed, one gets the partial differential equation and we readily observe that when β + α = λ, we have that We obtain a heat equation with a boundary condition e(t, the solution of the equation is given by Let us remark that when s 0 (t) = s 0 ∈ R is constant for all t ≥ 0, the above expression simplifies to In conclusion, this section extended our discrete model towards a continuous limit in both space and time. In the temporal domain, it allowed us to understand our stable solution as an advection behavior, and alerted us that the other apparently oscillatory solutions previously observed in specific cases were mainly due to our discretization approximation. In the spatio-temporal domain, the continuous limit (21) allowed us to realize that our main Eq. (7) was merely a discrete version of a transport equation.
In the following sections, we will systematically return to discrete implementations (with gradually increasing functionality), before considering, again, their continuous formulations.

Beyond the Identity Case
In the previous section we have studied in depth the case where W f and W b are both the identity matrix: each neuron in any given layer directly conveys its activation value to a single corresponding neuron in the next layer, and to a single neuron in the previous layer. Motivated by concrete implementations of the model in deep neural networks (Wen et al. 2018;Choksi et al. 2021), we aim to investigate more realistic situations with more complex connectivity matrices. While the generic unconstrained case (i.e., two unrelated and dense connection matrices W f and W b ) does not easily lend itself to analytical study, we will consider here two situations of practical interest: in the first one, the forward and backward connection matrices are symmetric and identical; in the second case, each matrix is symmetric, but the two are not necessarily identical.

The Symmetric Rao and Ballard Case
Following the pioneering work of Rao and Ballard (1999), we will assume in this where we denoted S d (R) the set of symmetric matrices on R d . The underlying interpretation is that, if a strong synaptic connection exists from neuron a to neuron b, then there is also a strong connection from b to a. This assumption, which can find a possible justification from Hebbian plasticity rules ("neurons that fire together wire together"), does not capture all of the diversity of possible connectivity patterns, but it can be considered a good first approximation and has already been used in many computational studies, notably in the context of predictive coding models (Rao and Ballard 1999). Although this symmetry hypothesis is not biologically plausible, this setting is still very instructive from the mathematical point of view, since one can still manage to get a complete and exhaustive study as in the identity case of the previous section.

Neural Basis Change and Neural Assemblies
The spectral theorem ensures that W f (and thus W b ) is diagonalizable in an orthonormal basis. Namely, there exists an orthogonal invertible matrix P ∈ M d (R) such that P P t = P t P = I d and there exists a diagonal matrix denoted D ∈ M d (R) such that We denote by γ p ∈ R d , p = 1, . . . , d the diagonal elements of D and without loss of generality we may assume that Thanks to this diagonalization, we can now perform a change of basis for our neuronal space. We set U n j := P t E n j as the new basis, with PU n j := E n j . Each U n j can now be understood as a neural assembly, reflecting one of the principal components of the weight matrix W f = W b . Importantly, although assemblies may overlap, activity updates induced by feedforward or feedback connections to one given assembly do not affect the other assemblies, since the matrix P is orthogonal. Therefore, our problem is much simplified when considering activity update equations at the level of these neural assemblies U n j rather than across individual neurons E n j . Our model (1) becomes Note that, because all matrices in the above equation are diagonal, we have totally decoupled the d components of the vector U n j . More precisely, by denoting u n j, p the This indicates that one needs to study where γ ∈ R is a given parameter. Here, γ can be thought of as the connection strength across layers (both feedforward and feedback, since we assumed here symmetric connectivity) of the neural assembly under consideration. By construction, each assembly in a given layer is only connected to the corresponding assembly in the layer above, and similarly in the layer below. Note that when γ = 1, we encounter again the exact situation that we studied in the previous section (15), but now with neural assemblies in lieu of individual neurons.

Wave Speed Characterization
In the previous section (The Identity Case), we have proved that the direction of propagation was given by the sign of c 0 and c π whenever they exist which could be read off from the behavior of ρ γ (θ ) near θ = 0 or θ = ±π . We have reported the values of c γ 0 and c γ π for different values of γ in Table 1. For example, in Fig. 12 we illustrate the changes in propagation speed and direction for c γ 0 for the case (λ, β) in Region (III) (as defined in Fig. 11a), but the calculations remain valid for the other regions.
It is worth emphasizing that for fixed values of the hyper-parameters α, β and λ, we see here that varying γ can give rise to different propagation speeds or even different directions. As each neuronal assembly u j, p in a given layer j is associated with its own connection strength γ p , it follows that different speeds and even different directions of propagation can concurrently be obtained in a single network, one for each assembly. For instance, in a given network with hyperparameters α = 0.2, β = 0.2 and λ = 0.3 (region III), a neural assembly with a connection strength of γ = 1 would propagate forward at a relatively slow speed, while another with γ = 2.5 would propagate in the same direction at a much faster speed, and yet another assembly with γ = γ 0 − ≈ −2.09 would simultaneously propagate in the opposite backward direction.

Fig. 13
Stability/instability regions and their boundaries as a function of ( α, γ ) for (24) for any ( λ, β) fixed. The shaded orange region corresponds to an instability for (24) while the purple region corresponds to a stability for (24). The boundaries of the stability/instability regions are given by the intersections of the parametrized curves γ = ±1 (blue curves) and γ = ± λ+β α (gray curves) where Eq. (24) is marginally stable (color figure online) As a consequence, the stability analysis in this case is very simple and depends only on the relative position of γ with respect to ±1 and ± λ+ β α . It is summarized in Fig. 13. The simple behavior illustrated in Fig. 13 for our continuous system contrasts with the number and diversity of behaviors obtained for the discrete version of the same system (Fig. 11). A number of points are worth highlighting. For instance, although the values of β and λ were critical for the discrete system (to define the region (I)-(V)), they do not affect the qualitative behavior of the continuous system. Furthermore, some observations in the continuous system appear to contradict the conclusions made previously in the discrete case. We see that stability can still be obtained with high values of the connection weight γ >> 1, but this time the stable regions coincide with high α values, whereas it was the opposite in Fig. 11 panels (b, f). This qualitative difference in behavior can be taken as a point of caution, to remind us that a discrete approximation of the system can be associated with important errors in interpretation. Finally we note that, while stability regions are qualitatively different in the continuous case compared to the discrete approximation, the speed and direction of propagation of neural signals (reflected in the variables c 0 and c π when they exist) remains comparable.

A Class of Examples
In this section, we provide a class of examples of W f amenable to a complete analysis. Namely we consider W f as the following linear combination for some ζ, ξ ∈ R where A ∈ M d (R) is given by The matrix A is nothing but the discrete laplacian and W f acts as a convolution operator on R d . More precisely, W f combines a convolution term with a residual connection term, as in the well-known ResNet architecture (He et al. 2016). Let us also note that the spectrum of A is well known and given by Spec(A) = −4 sin 2 pπ 2(d + 1) , p = 1, . . . , d .
Next, for any p = 1, . . . , d the eigenvector corresponding to the eigenvalue −4 sin 2 pπ 2(d+1) is U p is the projection vector that corresponds to the p th neural assembly u j, p as defined above. Along U 1 , the recurrence equation reduces to (23) with γ = 1, while along U d , the recurrence equation reduces to (23) with γ = −1, and we can apply the results of the previous section (the Identity case). In between (for all 1 ≤ p ≤ d) we see that the eigenvalues of our connection matrix W f span the entire range between −1 and 1, that they can be explicitly computed, and thus that the stability, propagation speed and direction of activity in the corresponding neural assembly can be determined.

Fully Continuous Interpretation in Time, Depth and Width
For the same class of example (connection matrix composed of a convolution and residual terms), we now wish to provide a fully continuous interpretation for model (1) in the special case ζ = 1 and ξ adjusted as follows. By fully continuous, we mean that we explore the limit of our model when not only time t, but also network depth x and neuronal layer width y are considered as continuous variables. Although we already presented a model that was continuous in both time and depth in Sect. 3.3.2, the layers in that model only comprised a single neuron, and had no intrinsic spatial dimension. We now introduce this third continuous dimension. The starting point is to see E n j,k , the kth element of E n j , as an approximations of some continuous function E(t, x, y) evaluated at t n = n t, x j = j x and y k = k y for some t > 0, x > 0 and y > 0. Let us first remark that the action of A on E n j is given by which can be seen at a discrete approximation of ∂ 2 y E(t n , x j , y k ) up to a scaling factor of order y 2 . Once again, setting ν = x t and introducing κ = y 2 t , we may rewrite Now letting t → 0, x → 0 and y → 0 with ν and κ fixed, we obtain the following partial differential equation This is a diffusion equation along the y dimension while being a transport equation in the x direction. As such, it is only well defined (or stable) when the sign of the diffusion coefficient in front of ∂ 2 y E(t, x, y) is positive. This depends on the sign of ξ and β + λ − α, which need to verify ξ(β + λ − α) > 0. In that case, the system diffuses neural activity along the dimension y such that the entire neuronal layer converges to a single, uniform activation value when t → ∞.

The General Symmetric Case
Finally, we now wish to relax some of the assumptions made in the previous Rao-Ballard case. Thus, the last case that we present is one where we assume that But we do not necessarily impose that W f = (W b ) t as in the Rao and Ballard's previous case. Let us already note that examples of matrices verifying the above conditions are residual convolution matrices introduced in (25), that is Under assumptions (i) and (ii), W f and W b can be diagonalized in the same orthonormal basis, meaning that there exist an invertible orthogonal matrix P ∈ M d (R) such that P P t = P t P = I d , and two diagonal matrices For future reference, we denote by γ Once again, we can use the matrix P to apply an orthonormal basis change and create neural asssemblies U n j := P t E n j . With PU n j := E n j , the recurrence equation becomes Note that, because all matrices in the above equation are diagonal, we have also totally decoupled the d components of the vector U n j . More precisely, by denoting u n j, p the pth component of U n j , that is U n j = (u n j,1 , . . . , u n j,d ) t , we obtain This indicates that one needs to study where γ 1,2 ∈ R are now two given parameters. As before, γ 1,2 can be thought of as the connection strength across layers of the neural assembly under consideration. By construction, each assembly in a given layer is only connected to the corresponding assembly in the layer above, and similarly in the layer below, with γ 1 for the feedforward direction and γ 2 for the feedback direction. Note that γ 1 = γ 2 would then correspond to the Rao-Ballard situation studied previously.

Study of the Amplification Factor Function
Repeating the previous analysis, one needs to understand the amplification factor We already note a symmetry property of the amplification factor function which reads As a consequence, whenever ρ γ 1 ,γ 2 (0) = ±1 one has ρ −γ 1 ,−γ 2 (±π) = ±1 for the same values of the parameters. Then, we note that where the function χ(x), depending only on the hyper-parameters, is given by Thus, using the above symmetry, we readily deduce that ρ γ 1 ,γ 2 (±π) = 1 ⇐⇒ γ 1 = −χ(−γ 2 ).
More interestingly, we can investigate the dependence of the wave speed as a function of the parameters γ 1 and γ 2 . For example, when γ 1 = χ(γ 2 ), we have that such that the associated wave speed is given by whose sign may vary as γ 2 is varied. We refer to the forthcoming Sect. 4.2.4 below for a practical example (see Fig. 18).
Whereas, when βγ 1 + ( α + λ)γ 2 < 0, we observe that As a consequence, the stability regions are determined by the locations of the parabolas γ 2 → αγ 2 2 −( α+ λ)γ 2 + β+ λ β and γ 2 → − αγ 2 2 −( α+ λ)γ 2 − β− λ β in the plane (γ 1 , γ 2 ). We observe that they never intersect and are oriented in the opposite directions and refer  (29) for any ( α, λ, β) fixed with α = λ + β. The shaded orange region corresponds to an instability for (29) while the purple region corresponds to a stability for (29). The boundaries of the stability/instability regions are given by the intersections of the parabolas γ 1 = αγ 2 2 −( α+ λ)γ 2 + β+ λ β where Eq. (29) is marginally stable. We represented the line γ 1 = γ 2 (blue curve) which corresponds to the case studied in Fig. 13 (Color figure online) to Fig. 15 for a typical configuration. Here, we see that the system is stable for a very large range of values of both γ 1 and γ 2 . In particular, for large enough values of the feedback connection weight (e.g., |γ 2 | > 3), stability is guaranteed regardless of the value of the feedforward connection weight γ 1 (within a reasonable range, e.g., γ 1 ∈ (−10, 10)). This is the opposite behavior as that obtained for the discrete system in Fig. 14, where stability was impossible under the same conditions for γ 1,2 . This highlights again the errors of interpretation that can potentially be caused by discrete approximation of a continuous system.

Fully Continuous Interpretation When
one can once again identify E t j,k as the approximation of some smooth function E(t, x, y) at t n = n t, x j = j x and y k = k y, along the three dimensions of time, network depth and neuronal layer width. We may rewrite (1) in this case as such that in the limit t → 0, x → 0 and y → 0 with ν and κ fixed, we obtain the following partial differential equation As before, this is a diffusion equation along the y dimension, whose stability depends on the positivity of the diffusion coefficient, i.e., βξ f + (λ − α)ξ b ≥ 0.

Application to a Ring Model of Orientations
Going back to our discrete system, in this section we consider the case where neurons within each layer encode for a given orientation in [0, π]. Here, we have in mind visual stimuli which are made of a fixed elongated black bar on a white background with a prescribed orientation. We introduce the following matrix A per ∈ M d (R) given by which is nothing but the discretizing of the Laplacian with periodic boundary conditions. Indeed, for each E n j ∈ R d , we assume that neuron E n j,k encodes for orientation k d π for k = 1, . . . , d. We readily remark that 0 ∈ Spec(A per ) with corresponding eigenvector U 1 = (1, . . . , 1) t ∈ R d . Furthermore, we have: • if d = 2m + 1 is odd, then λ p = −4 sin pπ d 2 with p = 1, . . . , m is an eigenvalue of A per of multiplicity 2 with associated eigenvectors • if d = 2m is even, then λ p = −4 sin pπ d 2 with p = 1, . . . , m −1 is an eigenvalue of A per of multiplicity 2 with associated eigenvectors U 2 p and U 2 p+1 as above.
And λ = −4 is a simple eigenvalue of A per with associated eigenvector U d = (−1, 1, −1, 1, . . . , −1, 1) ∈ R d . Fig. 16 We plot the eigenvectors U 2 p and U 2 p+1 for p = 1 and p = 2 as a function of k d π for k = 1, . . . , d. We note that U 2 p and U 2 p+1 encode the first Fourier modes. Here we have set d = 2 5 (Color figure online) It may be interesting to note that any linear combinations of U 2 p and U 2 p+1 can always be written in the form where A = √ a 2 + b 2 > 0 and ϕ = −arctan b a ∈ (−π/2, π/2) whenever a = 0 and b = 0. This means that U 2 p and U 2 p+1 span all possible translations modulo [0, π] of a fixed profile. We refer to Fig. 16 for a visualization of the first eigenvectors. In short, these eigenvectors U i implement a Fourier transform of the matrix A per . We now set W b to be which means that W b acts as a convolution with local excitation and lateral inhibition. From now on, to fix ideas, we will assume that d = 2m is even. We define the following matrix As a consequence, we have the decomposition with (27), is applied to the diagonal elements of D b , that is

Now, for given values of the hyper-parameters
And then we set W f := PD f P t . We refer to Fig. 17 for an illustration of the structures of matrices W f and W b . For large set of values of the hyper-parameters, W f still present a band structure with positive elements on the diagonals indicating that W f can also be interpreted as a convolution with local excitation. For the values of the hyper-parameters fixed in Fig. 17, the feedforward matrix W f is purely excitatory.
Reproducing the analysis developed in the previous Sect. 4.2, we perform a change of orthonormal basis to express neural activities in terms of the relevant assemblies U n j := P t E n j . With PU n j := E n j , the recurrence equation becomes Then, if we denote by γ p the pth diagonal element of D b , then for each p = 1, . . . , d the above recurrence writes where u n j, p is the pth component (or neural assembly) of U n j . For each p = 1, . . . , d, the associated amplification factor function reads and with our specific choice of function χ , we have that ρ p (0) = 1 with such that the associated wave speed is given by and where we have set From now on, we assume that we have tuned the hyper-parameters such that |ρ p (θ )| < 1 for all θ ∈ [−π, π]\ {0} and each p = 1, . . . , d. This can in fact be systematically checked numerically for a given set of hyper-parameters. We report in Fig. 18 the shape of p → c p 0 for the same values of the hyper-parameters as the ones in Fig. 17 and d = 2 5 . We first remark that p → c p 0 is a monotone decreasing map, and in our specific case we have The fact that wave speeds come in pair for p = 2, . . . d − 1 is reminiscent of the spectral properties of A per which has m − 1 eigenvalues with multiplicity 2 when d = 2m is even. Given a fixed input entry E 0 ∈ R d presented at j = 0 to the network continually at each time step, we can deduce which components of E 0 ∈ R d will be able to propagate forward through the network. More precisely, we can decompose E 0 along the basis (U 1 , . . . U d ) of eigenvectors, that is for some real coefficients a p for p = 1, . . . , d. Assuming that the network was at rest initially, we get that the dynamics along each eigenvector (or neural assembly) is given by Thus, we readily obtain that where u n j, p is a solution to (30). As a consequence, the monotonicity property of the map p → c p 0 indicates that the homogeneous constant mode U 1 is the fastest to propagate forward into the network with associated spreading speed c 1 0 , it is then followed by the modes (U 2 , U 3 ) propagating at speed c 2 0 = c 3 0 . In our numerics, we have set the parameters such that c 1 0 ≈ c 2 0 = c 3 0 with a significant gap with the other wave speeds. Lets us remark, that all modes U p with p ≥ 8 are not able to propagate into the network (see Fig. 19). Thus our architecture acts as a mode filter. Even more precisely, let us remark that the sequence a p is a stationary solution of (30) which remains bounded whenever p is such that the associated wave speed is negative, that is c p 0 < 0, since in that case, one has αγ p + βχ(γ p ) < λγ p . The solution E n j can then be approximated as This is illustrated by a first example simulation in Figs. 20 and 21. We present at j = 0 a fixed input E 0 which is generated as the superposition of a tuned curve at ϑ = 0 (blue) with some fixed random noise: namely we select a 1 = 0, a 2 = 1, a 3 = 0 and all other coefficients a p for p = 4, . . . , d are drawn from a normal law with an amplitude pre-factor of magnitude ε set to ε = 0.1. The shape of the input E 0 is shown in Fig. 20a. The profile of E n j at time iteration n = 200 along the first layers of the network j ∈ {1, 2, 3, 4, 5} is given in Fig. 20b-f respectively. We first observe that the network indeed acts as a filter since across the layers of the network the solution profile E n j presents less noise and gets closer to the tuned curve at ϑ = 0. Let us also remark that the filtering is more efficient for layers away from the boundary and is less efficient for those layers near the boundary. This is rather natural since the impact of the input E 0 is stronger on the first layers. We see that already at layer j = 5, we have almost fully recovered the tuned curve at ϑ = 0 (see Fig. 20f). On the other hand, in Fig. 21, we show the time evolution of E n j at a fixed layer far away from the boundary, here j = 10. Initially, at n = 0, the layer is inactivated (see Fig. 21a), and we see that after several time iterations that the solution profile E n j start to be activated. It is first weakly tuned (see Fig. 21b-d) and then it becomes progressively fully tuned and converges to the tuned curve at ϑ = 0 (see Fig. 21e, f).
In a second example simulation (Fig. 22), we highlight the dynamics of the different modes in a situation where the input is a narrow Gaussian profile (close to a Dirac function), with a superposition of various Fourier modes. As expected from the different values of the propagation speed c 0 (Fig. 18), we see that the mode associated with the first Fourier component is the first to reach layer j = 10, later followed by successive modes associated with later Fourier components. In other words, this hierarchically higher layer j = 10 first receives information about the coarse spatial structure of the input signal, and then gradually about finer and finer spatial details.

Summary
In this section, we saw that the results obtained initially (The Identity Case) with the amplification function can be extended to more realistic situations with forward and backward connection matrices, for instance implementing (residual) convolutions or orientation processing. When we consider neural assemblies capturing the principal components of the connection matrices, we see that each assembly can be treated independently in terms of stability and signal propagation speed and direction. The exact behavior of the system will depend on the actual connection matrices (and thus on the function that they implement in the neural network), but the important point is that our generic framework can always be applied in practice. In some example cases (ring model of orientations), we saw that only a few assemblies support signal propagation (implying that the system acts as a filter on its inputs), and these assemblies propagate information at different speeds (implementing a coarse-to-fine analysis). In other cases (e.g., Fig. 12), we have even seen that distinct assemblies can simultaneously propagate information in opposite directions, with one assembly supporting feedforward propagation while another entails feedback propagation. We have extended our equations to the continuous limit in time, and found that the amplification factor function can give rise to qualitatively different stability regions compared to the discrete model. This served as a cautionary note for situations where the discrete implementation must be chosen; in that case, using smaller time steps will be preferable, because it makes such discrepancies less likely.
Finally, we also showed that it is possible to consider fully continuous versions of our dynamic system, where not only time but also network depth and neural layer width are treated as continuous variables. This gives rise to diffusion equations, whose stability can also be characterized as a function of hyperparameter values.
In the following, we address possible extensions of the model to more sophisticated and more biologically inspired neural architectures, taking into account the significant communication delays between layers.  , 20, 40, 60, 80, 100, 200, 300, 400} with fixed input E 0 at layer j = 0 (blue). The input E 0 is a Gaussian centered at ϑ = π/2. All profiles are plotted as a function of k d π for k = 1, . . . , d (Color figure online)

Extension of the Model: Taking into Account Transmission Delays
Deep feedforward neural networks typically implement instantaneous updates, as we did in Eq. (1) with our feedforward term E n+1 Similarly, artificial recurrent neural networks sequentially update their activity from one time step to the next, as we did with the other terms in our Eq. (1) (memory term, feedforward and feedback prediction error correction terms): E n+1 However, in the brain there are significant transmission delays whenever neural signals travel from one area to another. These delays could modify the system's dynamics and its stability properties. Therefore, in this section we modify model (1) by assuming that it takes k time steps to receive information from a neighboring site in the feedback/feedforward dynamics, namely we consider the following recurrence equation where k ≥ 1 is some given fixed integer (see Fig. 23 for an illustration with k = 1), and we refer to Pang (2022) for the justification of the derivation of the model. (Note in particular that we did not modify the instantaneous nature of our feedforward updating term E n+1 j = βW f E n+1 j−1 + · · · . This is because, as motivated in Choksi et al. (2021) and Pang (2022), we aim for the feedforward part of the system to be compatible with state-of-the-art deep convolutional neural networks, and merely wish to investigate how adding recurrent dynamics can modify its properties.) We may already notice that when k = 0, we recover our initial model (1). In what follows, for the mathematical analysis, we restrict ourselves to the identity case W f = W n = I d and when the model is set on Z. Indeed, our intention is to briefly explain what could be the main new propagation properties that would emerge by including transmission delays. Thus, we consider Let us also note that the system (32) depends on a "history" of 2k + 1 time steps; thus one needs to impose 2k + 1 initial conditions: for 2k + 1 given sequences (h m j ) j∈Z with m = 0, . . . , 2k. To proceed in the analysis, we first introduce a new vector unknown capturing each layer's recent history: such that the above recurrence (32) can then be rewritten as where the matrices Q 1 , Q 0 , Q −1 ∈ M 2k+1 (R) are defined as follows and Q ±1 have a single nonzero element on their last row:

Mathematical Study of the Recurrence Eq. (33)
We now postulate an ansatz of the form ρ n e iθ j E for some non zero vector E ∈ C 2k+1 , and obtain The above system has 2k + 1 roots in the complex plane that we denote ρ m (θ ) for m = 1, . . . 2k + 1. We remark at θ = 0, ρ = 1 is always a root of the equation since in this case (34) reduces to By convention, we assume that ρ 1 (0) = 1. We further note that E 1 = (1, . . . , 1) t is the associated eigenvector. As usual, we can perform a Taylor expansion of ρ 1 near θ = 0 and we obtain that so that the associated wave speed is this time given by and depends explicitly on the delay k. We readily conclude that: • When α < λ, then c k 0 is well defined for all values of k. Furthermore, the amplitude of the wave speed k → |c k 0 | decreases as k increases with |c k 0 | → 0 as k → +∞. That is, the activity waves may go forward or backward (depending on the hyperparameter values), but the transmission delay always slows down their propagation.
• When α = λ, then c k 0 = β 1−β > 0 is independent of the delay k. This is compatible with our implementation choice, where the initial feedforward propagation term (controlled by β) is not affected by transmission delays.
• When λ < α, then c k 0 is well defined whenever k = 1−β α−λ > 0. Furthermore, the wave speed c k 0 > 0 for 1 ≤ k < 1−β α−λ and increases with the delay k on that interval. That is, in this parameter range neural activity waves propagate forward and, perhaps counterintuively, accelerate when the transmission delay increases. On the other hand c k 0 < 0 for k > 1−β α−λ and k → |c k 0 | decreases as k increases on that domain with |c k 0 | → 0 as k → +∞. In this parameter range, waves propagate backward, and decelerate when the transmission delay increases.
Coming back to (35), we can look for other potential roots lying on the unit disk, i.e., marginally stable solutions. That is we look for ω ∈ (0, 2π) such that ρ = e iω . We Case k = 1. When k = 1, coming back to (35), we see that the two other roots are real and given by − λ 2(1−β) ± √ λ 2 +4α(1−β) 2(1−β) , such that when α + β + λ = 1 the negative root is precisely −1 such that ω = π is a solution which we assume, without loss of generality, to be the second root, that is ρ 2 (0) = −1 whenever α + β + λ = 1. In this specific case, the associated eigenvector is E −1 = (1, −1, 1) t . Recall that E reflects the history of activity across the 2k + 1 = 3 preceding time steps. In this case, the eigenvector E −1 is a rapid alternation of activity, i.e., an oscillation. We refer to Fig. 24a for an illustration of the spectral configuration in that case. We can perform a Taylor expansion of ρ 2 near θ = 0 and we obtain that which provides an associated wave speed c 0 given by As a consequence of the above analysis, if G n j denotes the fundamental solution of (33) starting from a Dirac delta mass centered at j = 0 along the direction E ∈ R 3 , then we have the following representation for G n j : • If α + β + λ = 1, then where π 1 is the spectral projection of R 3 along the direction E 1 and ·, · R 3 is the usual scalar product. Here σ k 0 is some positive constant that can be computed explicitly by getting the higher order expansion of ρ 1 (θ ).
where π −1 is the spectral projection of R 3 along the direction E −1 . Here σ 0 is some positive constant that can be computed explicitly by getting the higher order expansion of ρ 2 (θ ).
In Fig. 25, we illustrate the previous results in the case where α + β + λ = 1. In panel (a), we have set E = E 1 (a constant history of activity over the previous 3 time steps), such that π 1 (E 1 ) = E 1 and π −1 (E 1 ) = 0 R 3 so that we only observe a Gaussian profile propagating at speed c k 0 . On the other hand in panel (b), we have set E = E −1 (an oscillating history of activity over the previous 3 time steps), such that π 1 (E −1 ) = 0 R 3 and π −1 (E −1 ) = E −1 so that we only observe an oscillating (in time) Gaussian wave profile propagating at speed c 0 . Note that in this case, the period of the oscillation is necessarily equal to 2k, i.e., twice the transmission delay between layers. Finally in panel (c), we observe a super-position of the two Gaussian profiles propagating at speed c 1 0 and c 0 . Case k ≥ 2. Studying the above system (36) in full generality is a very difficult task. We refer to Fig. 24c and d for an illustration in the case k = 2 with three tangency points associated to θ = 0 lying on the unit circle. Increasing the delay k while keeping fixed the other hyper-parameters (α, β, λ) will generically tend to destabilize the spectrum (as shown in Fig. 26).

Continuous in Time Interpretation
As done before, we now re-examine our model (with transmission delays) in the time-continuous limit. First, we recall our notations for the scaled parameters where t > 0 is some time step. Next we introduce the following rescaled time delay (representing the transmission time for neural signals between adjacent areas) τ := k t.
Identifying e n j as the approximation of some continuous fonction e j (t n ) at t n = n t, we readily derive a delayed version of (20), namely In what follows, we first investigate the case of homogeneous oscillations, which are now possible because of the presence of time delays into the equation. Then, we turn our attention to oscillatory traveling waves.

Homogeneous Oscillations
One key difference of the above delayed equation compared to (20) is that spatially homogeneous solutions (i.e., solutions e j (t) that are independent of the layer j) may now have a non trivial dynamics, such as a broadly synchronized oscillation resembling brain rhythmic activity. Indeed, looking for solutions which are independent of j, we get the delayed ordinary differential equation Looking for pure oscillatory exponential solutions e(t) = e iωt for some ω ∈ R we obtain This leads to the system of equations Introducing = λ/ α > 0, we observe that the above system writes instead Using trigonometry identities, the first equation can be factorized as 0 = (1 − cos(τ ω))( − 1 − 2 cos(τ ω)).
We distinguish several cases. If > 3, then the above equation has solutions if and only if τ ω = 2kπ for k ∈ Z. Inspecting the second equation, we see that necessarily k = 0 and ω = 0 is the only possible solution. When = 3, we notice that the equation reduces to 0 = (1 − cos(τ ω)) 2 , and the solutions are again given by τ ω = 2kπ for k ∈ Z, which yields ω = 0 because of the second equation. Now, if ∈ (0, 3), we deduce that either τ ω = 2kπ for k ∈ Z or cos(τ ω) = − 1 2 .
Injecting the above relation into the right-hand side of the second equation yields that and thus necessarily We recover the fact that the system (37) is invariant by ω → −ω. Since arccos −1 2 ∈ [0, π], we deduce that the smallest positive τ is always achieved at k = 1. We computed for several values of α the corresponding values of τ and ω (for k = 1) as a function of , which are presented in Fig. 27a and b. We observe that for values of in the range (1/2, 1) the corresponding time delay τ takes values between 12 and 23 ms for values of 1/ α ranging from 5 to 10 ms. Correspondingly, in the same range of values for , the frequency ω/2π takes values between 30 and 60 Hz.
This tells us that, when the time delay τ is chosen to be around 10-20 ms, compatible with biological values for communication delays between adjacent cortical areas, and when hyperparameters α and λ are suitably chosen ( α in particular must be strong enough to allow rapid feed-forward error correction updates, i.e., around 1/ α < 8ms, while λ can be chosen more liberally, as long as it stays < 3 α), then the network produces globally synchronized oscillations, comparable to experimentally observed brain rhythms in the γ -band regime . In this context, it is interesting to note that theoretical and neuroscientific considerations have suggested that error correction in predictive coding systems is likely to be accompanied by oscillatory neural activity around this same γ -frequency regime (Bastos et al. 2012).
Of course, a globally-synchronized gamma oscillation across all layers is a mathematical abstraction that has no equivalent in the brain. In electrophysiological experiments, gamma-band activity is typically found to be locally generated; according to the Communication-Through-Coherence (CTC) theory (Fries 2015), gamma oscillations can sometimes synchronize their phase across two or more successive processing stages. Our observations provide theoretical insight about the parameter range that would enable the emergence of between-layer gamma synchronization in neural network models, compatible with CTC.

Oscillatory Traveling Waves
However, experimental and computational studies have also suggested that oscillatory signatures of predictive coding could be found at lower frequencies, in the so-called α-band regime, around 7-15 Hz. Furthermore, these oscillations are typically not homogeneous over space, as assumed in the previous section, but behave as forwardor backward-travelling waves with systematic phase shifts between layers (Alamia and VanRullen 2019). To explore this idea further, we now investigate the possibility for some ω ∈ R (representing the wave's temporal frequency) and θ ∈ [0, 2π) (representing the wave's spatial frequency, i.e., its phase shift across layers), and we are especially interested in deriving conditions under which one can ensure that θ = 0 (since otherwise, we would be again facing the homogeneous oscillation case). We only focus on the case β = 0 (as postulated, e.g., in Rao and Ballard's (1999) work) and leave the case β > 0 for future investigations. As a consequence, the equation reduces to Plugging in the ansatz e j (t) = e i(ωt+ jθ) , we obtain: Taking real and imaginary parts, we obtain the system Once again, we introduce := λ α ≥ 0 where we implicitly assumed that we always work in the regime α > 0. Then, we note that the right-hand side of the first equation As a consequence, either sin θ−ωτ 2 = 0, that is ωτ = θ + 2kπ for k ∈ Z, which then leads, from the second equation, to ω = 0 and θ = 0 since we restrict θ ∈ [0, 2π), or sin θ−ωτ 2 = 0. In the latter case, assuming that ωτ = θ + 2kπ for k ∈ Z, we get that 0 = sin θ − ωτ 2 + sin θ + 3ωτ 2 .
We will now study several cases. Case = 0. (In other words, this case implies λ = 0, that is, a system with no feedback error correction.) From sin θ+3ωτ 2 = 0, we deduce that θ = −3ωτ + 2kπ for some k ∈ Z, and reporting into the second equation of the system, we end up with ω = 2 α sin(2ωτ ).
We always have the trivial solution ω = 0 with θ = 0. In fact, when 4 ατ ≤ 1, ω = 0 is the only solution of the above equation. On the other hand, when 4 ατ > 1, there can be multiple non trivial solutions. At least, for each ( α, τ ) such that 4 ατ > 1 there always exist a unique ω c ( α, τ ) ∈ 0, π 2τ solution of the above equation. This gives a corresponding θ k c = −3ω c ( α, τ )τ + 2kπ with k ∈ Z, and retaining the corresponding value of θ in the interval [0, 2π), we have θ c = −3ω c ( α, τ )τ + 2π . We refer to Fig. 28 for an illustration of the solutions (ω, θ ) for several values of the parameters. Interestingly, we see that for values of the time delay τ between 10 and 20 ms, consistent with biological data, the observed oscillation frequency is lower than in the previous case, and now compatible with the α-frequency regime (between 10 and 20 Hz). Furthermore, the phase shift between layers θ varies roughly between 2 and 4 radians. As phase shifts below π or above π radians indicate respectively backward-or forward-travelling waves, we see that the exact value of the parameters τ and α critically determines the propagation direction of the travelling waves: stronger feedforward error correction (lower values of 1/ α) and longer communication delays τ will tend to favor backward-travelling waves; and vice-versa, weaker feedforward error correction (higher values of 1/ α) and shorter communication delays τ will favor forward-travelling waves. Case = 1. Now we assume that λ = 0, that is, the system now includes feedback error correction. At first, we consider the simpler case when = 1, that is when α = λ, where the equation can also be solved easily. Indeed, we have either This equivalent to Let assume first that θ = −ωτ + 2kπ for some k ∈ Z, then the second equation of the system gives ω = 0 since α = λ when = 1, and thus we end up with θ = 0. Now, if ωτ = π 2 + kπ for some k ∈ Z, the second equation leads to ω = −2 α cos(θ + kπ), from which we deduce that necessarily we must have (2k + 1)π −4 ατ = cos(θ + kπ), k ∈ Z.
This time, one needs to look at the positive maxima of the map ω → sin(2ωτ ) 2 cos(2ωτ )+ 2 +1 which are given by ω 0 = π 2τ − 1 2τ arccos 2 1+ 2 + kπ τ for k ∈ Z, at such maxima, one gets that sin(2ω 0 τ ) 2 cos(2ω 0 τ ) As a consequence, if 4 ατ < π − arccos 2 1+ 2 , then there is no other solution than (ω, θ ) = (0, 0). To obtain at least one non trivial positive solution, one needs to impose that 4 ατ ≥ π − arccos 2 1+ 2 . Once again, this condition is consistent with the condition 4 ατ ≥ π derived in the case = 1. We can also derive a second simple condition which ensures the existence of a non trivial solution by looking at the behavior near ω ∼ 0 where we have Thus if then there exists at least one positive solution ω ∈ (0, π/2τ ) to the above equation (and also one negative solution in (−π/2τ, 0) by symmetry). Note that the condition 4 ατ > 1+ 1− is consistent with the condition 4 ατ > 1 derived in the case = 0. Examples To illustrate the different scenarios and their possible interpretations in terms of brain oscillations, we take here two distinct examples corresponding to the situations described above. We report in Fig. 29  Space-time plot of cos(ωt + θ j) for values of (ω, θ ) which correspond to the orange and dark red points of Fig. 29 lying respectively on the blue and light blue curves, time t is in ms. In (a), the temporal frequency is ω/2π ∼ 19 Hz while in (b) it is ω/2π ∼ 13.3 Hz. In (a), since θ ∈ (0, π), we observe a backward propagation of the wave while in (b) we have a forward propagation since θ ∈ (π, 2π) (Color figure online) and terminate at a value of = 0 ∼ 1.06 (see Fig. 29b). The branch of solutions which exists for all values of ∈ [0, 0 ] has an associated spatial frequency which is almost constant and whose value is around ∼ 2.82 ∈ (0, π). On the other hand, the branch of solutions which only exists for values of ∈ ( c , 0 ] has an associated spatial frequency which lies in (π, 2π). Let us remark that at = 1, the spatial frequencies of the two solutions are different and symmetric with respect to π . Furthermore, at = 0 ∼ 1.06 where the two branches collide the associated spatial frequency is θ ∼ π . Let us finally note that for ∈ [1, 0 ], the spatial frequencies of the two branches are almost identical, although the secondary branch is slightly above the primary one.
Correspondingly we illustrate in Fig. 30, the space-time plot for two points along the two different branches which correspond to the orange and dark red points in Fig. 29. The corresponding values are (ω, θ ) ∼ (0.12, 2.82) and (ω, θ ) ∼ (0.08, 4.60) and associated to the same value of ∼ 0.633. In the first panel of Fig. 30a, which corresponds to the point on the branch of solution defined for all ∈ [0, 0 ], since the corresponding value of the spatial frequency is θ ∈ (0, π), we observe an apparent backward propagation, while in the second panel of Fig. 30b, we observe a forward propagation. This corresponds to the point on the lower branch of the solutions defined for values of ∈ ( c , 0 ] with associated spatial frequency θ ∈ (π, 2π). From a biological point of view, this indicates that the more interesting range of the parameters is the one with ∈ ( c , 0 ] and the corresponding branch of solutions which emerges at = c from the trivial solution (ω, θ ) ∼ (0, 0) since in this case we obtain an oscillatory traveling wave with forward propagation into the network.
In Fig. 31, we show the global structure of the branches for a second example, with fixed values of the time delay τ = 12 ms and 1/ α = 12 ms, which are still biologically relevant values. We observe that the two branches terminate at a value of = 0 ∼ 3.03 with a crossing at = 1. For ∈ [1, 0 ], the primary branch (blue curve) has a temporal frequency below the secondary branch (light blue curve), the difference in frequencies is almost 5 Hz for values of ∼ 2. Even more interestingly, we see that the corresponding spatial frequencies along the secondary branch are decreasing from 2π to a final value below π at 0 indicating that by increasing the value of we can reverse the direction of propagation from forward to backward oscillatory traveling waves. The transition occurs for ∼ 1.65, that is for values of 1/ λ ∼7-8 ms. It is further noticed that the associated temporal frequencies in the backward regime are around 25 Hz (β-frequency regime) much higher than for forward traveling waves whose temporal frequencies range from 0 to 20 Hz (and include the α-frequency regime).

Summary
In this section, we saw that including temporal delays in the timecontinuous version of the system produces non-trivial dynamics that can be characterized analytically. Contrary to the discrete version of the system, which can only be analyzed for a few discrete time delays k = 1, 2, ..., the continuous version is informative for a wide range of delays, compatible with biological data, and the resulting frequencies are very diverse. In particular, we observed homogenous synchronized oscillations in the gamma band (30-60 Hz) that emerged when the feed-forward error correction term α was strong enough (roughly, with 1/ α < 8ms). But we also found situations in which the oscillatory activity was not homogenous, but propagated as a travelling wave through the network. With biologically plausible values for the various parameters, the waves could propagate forward in the alpha-band (7-15 Hz) frequency range, and when the feedback error correction term λ was strong enough (e.g., 1/ λ < 8 ms while 1/ α = 12 ms), they started moving backward at a faster frequency in the beta-band (15-30 Hz). Altogether, this pattern of results is compatible with various (sometimes conflicting) observations from the Neuroscience literature (Alamia and VanRullen 2019;Bastos et al. 2012), and informs us about the conditions in which the corresponding dynamic behaviors might emerge in such bio-inspired neural networks with predictive coding dynamics incorporating communication delays.

Contributions
We proposed a mathematical framework to explore the properties and stability of neural network models of the visual system comprising a hierarchy of visual processing areas (or "layers"), mutually connected according to the principles of predictive coding. Using a discrete model, as is typically done in the recent deep learning literature, we introduced the amplification factor function, which serves to characterize the interesting (i.e., "marginally stable") regions as a function of the model hyperparameters. When considered on an infinite domain, we showed that the response of our linear neural network to a Dirac delta initialization presents a universal behavior given by a Gaussian profile with fixed variance and which spreads at a given speed. Both speed and variance could be explicitly characterized in terms of the model hyperparameters. This universal Gaussian profile was then the key to understand the long-time dynamics of the linear neural network set on a semi-infinite domain with a fixed constant source term at the left boundary of the network.
At first, we ignored the influence of neuronal selectivity and used feed-forward and feedback connection matrices set to the identity matrix. When β = 0 (no feedforward update after the network initialization), we observed that hyperparameters α and λ compete for forward and backward propagation, respectively. When β > 0, the constant feedforward input makes things more complex, with λ (feedback error correction) now competing with β + α (feedforward drive and feedforward error correction). In the special case when α + λ = 1, a second (but spurious) mode of propagation with rapidly alternating activity can emerge, whose direction is determined by the competition between α and β + λ.
Next, to evaluate the influence of a more complex and functionally relevant connectivity matrix, we defined neural assemblies reflecting the eigenvectors of the matrix. Each of these neural assemblies can be analyzed separately, and its behavior depends on the corresponding eigenvalue (in addition to the hyperparameters α, β and λ, as explained above). Different assemblies can simultaneously support different dynamics, so that some may propagate information forward, others may not propagate at all (acting as a filter on the inputs), while yet others might propagate backward (e.g., carrying "priors" set by preceding activations). We again saw a number of cases where "fringe" or spurious behavior arose, e.g., rapid alternations in activity, and understood that this could be caused by the discrete nature of our model, when the time steps defining the model's temporal resolution are too coarse.
The time-continuous version of the model helped us overcome this issue, and characterize dynamics in the limit of infinitely small time steps. The amplification factor function is still crucial in this situation, but it produces more robust results, without fringe behavior or spurious oscillations. In particular, the analysis of stability and propagation direction/speed was greatly simplified in this continuous case.
The same time-continuous model also allowed us to investigate the inclusion of communication delays between layers. In this case, we demonstrated the emergence of genuine oscillatory dynamics and travelling waves in various frequency bands compatible with neuroscientific observations (alpha-band from 7 to 15 Hz, beta-band from 15 to 30 Hz and gamma-band from 30 to 60 Hz).
Finally, we considered fully continuous versions of the model, not only in time but also in space, both across network depth (across neuronal layers) and width (across neurons in the same layer). This mathematical abstraction revealed that our model could be understood as a transport equation, and that it produced diffusion dynamics.

Biological Interpretations
The mathematical framework that we proposed naturally lends itself to interpretation in biological terms. The model's hyperparameters reflect the strength of feedforward and feedback signalling in the brain. These are determined not only by axonal density and synaptic strength (that vary slowly throughout development and learning), but can also be gated by other brain regions and control systems, e.g., through the influence of neurotransmitters, and thus vary much more dynamically. For instance, the feedforward drive β could be more active to capture sensory information immediately after each eye movement, and decrease over time until the next eye movement (Knoell et al. 2011); similarly, feedback error correction λ could dominate over the feedforward error correction α for one given second (e.g., because top-down attention drives expectation signals) and decrease in the next second (e.g., because unexpected sensory inputs have been detected) (Tschantz et al. 2022). In this dynamic context, it is fundamental to be able to characterize the dependence of the system's behavior on the exact hyperparameter values. Fortunately, our framework reveals that when the hyperparameters vary, the stability of the system, and its ability to propagate signals and maintain activity, change in predictable ways. Some hyperparameter combinations would not support signal propagation at all; others would render the system unstable, e.g., because of runaway excitation. Under the assumption that the brain behaves as a predictive coding system, our equations inform us about the parameter regimes compatible with this paradigm.
Using our time-continuous model, we found that predictive coding dynamics associated with inter-areal communication delays result in oscillatory activity. This finding resonates with both experimental observations and neuroscientific theories (Bastos et al. 2012;Alamia and VanRullen 2019). Bastos et al. (2012Bastos et al. ( , 2015 suggested that feedforward error correction could be accompanied by gamma-band oscillations; this suggestion was verified in our model, with synchronized gamma rhythms appearing when the corresponding hyperparameter α was strong enough (and with a frequency that monotonically increased from 30 to 60 Hz when the value of 1/ α decreased from 10 to 5 ms). However, considering that the communication delay τ between two adjacent brain regions is a fixed property of the system (a reasonable first approximation), our analysis shows that this oscillatory mode will only happen for a narrow range and a very precise combination of hyperparameter values α and λ (see Fig. 27). Similarly in the brain, synchronized gamma-band oscillations between areas are sometimes observed during electrophysiological recordings (Bastos et al. 2015), and sometimes not (Ray and Maunsell 2015).
By relaxing the phase delay between layers, our equations also revealed the potential emergence of oscillatory travelling waves across the network, similar to those observed in human EEG experiments (Alamia and VanRullen 2019;Pang et al. 2020;Alamia et al. 2020Alamia et al. , 2023. Again, for a fixed communication delay τ , these waves may only happen for specific values and combinations of the hyperparameters α and λ. In certain regimes (see e.g., Fig. 31 with 1/ α = 1/ λ = 12 ms), two waves might coexist at the same frequency, but going in opposite directions. This matches experimental reports of co-occurring feedforward and feedback waves in the brain (Alamia et al. , 2023. Upon increasing the feedback strength λ, we saw that an initial alpha-band (7-15 Hz) feed-forward wave could accelerate (towards the beta-band, 15-30 Hz) and eventually reverse its direction, producing a feedback wave. Similar reversal phenomena have also been reported for oscillatory waves in the human brain (Pang et al. 2020;Alamia et al. 2020Alamia et al. , 2023.

Limitations and Future Extensions
"All models are wrong, but some are useful" (Box et al. 1979). Our model, like all mathematical models, is based on simplifications, approximations and assumptions, and can only be valid under those assumptions. Some (if not all) of these assumptions are questionable, and future work will need to determine the robustness of the model, or its potential modifications, when relaxing these assumptions.
Even though we assumed that our bio-inspired neural network follows the general principles of predictive coding (Rao and Ballard 1999), our system's hyperparameters can in fact be modulated to accommodate many variants of this framework (Wen et al. 2018;Choksi et al. 2021;Heeger 2017;Tschantz et al. 2022). One other important assumption that we made was to simplify the connectivity matrices between neuronal layers-which determines the selectivity of each neuron, and thus the functionality of the entire system. Even when we moved past the "identity" assumption, the connection matrices that we adopted were constrained to be symmetric, and most importantly, were assumed to be similar from one layer to the next. This made our equations tractable, but it constitutes a clear restriction, and a departure from practical applications of such bio-inspired deep neural networks that will need to be addressed in future extensions. As already emphasized in the introduction, another important limitation that we wish to relax in future works is the fact that we have considered a linear model although real biological networks or deep neural networks are intrinsically nonlinear. Going beyond the linear analysis that we have presented here would need the development of new theoretical techniques which constitutes a major open problem to be addressed in forthcoming works.
This, as well as higher-order interaction models, possibly including "hub" regions like the thalamus that would be mutually interconnected with all layers in the hierarchy (Hwang et al. 2017), are promising directions for follow-up studies.

Conclusion
The mathematical framework proposed here, guided by both computational considerations and neuroscientific inspiration, can be of use to both fields. In machine learning, the framework may serve to provide guarantees about the stability of a predictive coding system given its chosen hyperparameters, or to choose a valid range for these hyperparameters. For neuroscientists, our equations can be used directly to understand biological vision and to make predictions about biological behavior in various situations compatible with predictive coding. But this general mathematical framework (a number of hierarchically connected layers with source terms, boundary conditions, feedforward and feedback connectivity matrices, analyzed via its amplification factor function) may also be adapted to fit other models of biological perception and cognition beyond predictive coding. We hope that the various derivations made in the present work can serve as a template for future applications in this direction. And more generally, that this study may be helpful to the larger computational neuroscience community.