Emulating complex networks with a single delay differential equation

A single dynamical system with time-delayed feedback can emulate networks. This property of delay systems made them extremely useful tools for Machine Learning applications. Here we describe several possible setups, which allow emulating multilayer (deep) feed-forward networks as well as recurrent networks of coupled discrete maps with arbitrary adjacency matrix by a single system with delayed feedback. While the network's size can be arbitrary, the generating delay system can have a low number of variables, including a scalar case.

Machine Learning is another rapidly developing application area of delay systems [29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47]. It is shown recently that DDEs can successfully realize a reservoir computing setup, theoretically [41-44, 46, 48-50], and implemented in optoelectronic hardware [30,32,39]. In time-delay reservoir computing, a single DDE with either one or a few variables is used for building a ring network of coupled maps with fixed internal weights and fixed input weights. In a certain sense, the network structure emerges by properly unfolding the temporal behavior of the DDE. In this paper, we explain how such an unfolding appears, not only for the ring network as in reservoir computing but also for arbitrary networks of coupled maps. In [51], a training method is proposed to modify the input weights while the internal weights are still fixed.
Among the most related previous publications, Hart and collaborators unfold networks with arbitrary topology from delay systems [42,48]. Our work 2 From delay systems to multilayer feed-forward networks

Delay systems with modulated feedback terms
Multiple delays are required for the construction of a network with arbitrary topology by a delay system [42,48,50]. In such a network, the connection weights are emulated by a modulation of the delayed feedback signals [50]. Therefore, we consider a DDE of the following forṁ with D delays τ 1 , . . . , τ D , a nonlinear function f , a time-dependent driving signal z(t), and modulation functions M 1 (t), . . . , M D (t). System (1) is a non-autonomous DDE, and the properties of the functions M d (t) and z(t) play an important role for unfolding a network from (1). To define these properties, a time quantity T > 0 is introduced, called the clock cycle. Further, we choose a number N of grid points per T -interval and define θ := T /N . We define the clock cycle intervals 0 < n 1 < · · · < n D < 2N . Consequently, it holds 0 < τ 1 < · · · < τ D < 2T . Property (II): The functions M d (t) are step-functions, which are constant on the intervals I ,n . We denote these constants as v d,n , i.e.
In the following sections, we show that one can consider the intervals I as layers with N nodes of a network arising from the delay system (1) if the modulation functions M d (t) fulfill certain additional requirements. The n-th node of the -th layer is defined as which corresponds to the solution of the DDE (1) at time point ( −1)T +nθ. The solution at later time points x n with either > or n > n for = depends, in general, on x n , thus, providing the interdependence between the nodes. Such dependence can be found explicitly in some situations. The simplest way is to use a discretization for small θ, and we consider such a case in the following Sec. 2.2. Another case, when θ is large, can be found in [50]. Let us remark about the initial state for DDE (1). According to the general theory [2], in order to solve an initial value problem, an initial history function x 0 (s) must be provided on the interval s ∈ [−τ D , 0], where τ D is the maximal delay. In terms of the nodes, one needs to specify x n for n D "history" nodes. However, the modulation functions M d (t) can weaken this requirement. For example, if M d (t) = 0 for t ≤ τ d , then it is sufficient to know the initial state x(0) = x 1 0 = x 0 at a single point, and we do not require a history function at all. In fact, the latter special case has been employed in [50] for various machine learning tasks.

Disclosing network connections via discretization of the DDE
Here we consider how a network of coupled maps can be derived from DDE (1).
Since the network nodes are already introduced in Sec. 2.1 as x n by Eq. (2), it remains to describe the connections between the nodes. Such links are functional connections between the nodes x n . Hence, our task is to find functional relations (maps) between the nodes. For simplicity, we restrict ourselves to the Euler discretization scheme since the obtained network topology is independent of the chosen discretization. Similar network constructions by discretization from ordinary differential equations have been employed in [52][53][54].
We apply a combination of the forward and backward Euler method: the instantaneous system states of (1) are approximated by the left endpoints of the small-step intervals of length θ (forward scheme). The driving signal z(t) and the delayed system states are approximated by the right endpoints of the step intervals (backward scheme). Such an approach leads to simpler expressions. We obtain for n = 2, . . . , N , where t n := ( − 1)T + nθ, and for the first node in the I -interval. According to Property (I), the delays satisfy 0 < τ d < 2T . Therefore, the delay-induced feedback connections with target in the interval I can originate from one of the following intervals: I , I −1 , or I −2 . In other words: the time points t n − τ d can belong to one of these intervals I , I −1 , I −2 . Formally, it can be written as We limit the class of networks to multilayer systems with connections between the neighboring layers. Such networks, see Fig. 4b, are frequently employed in machine learning tasks, e.g. as deep neural networks [50,[55][56][57][58]. Using (5), we can formulate a condition for the modulation functions M d (t) to ensure that the delay terms x(t − τ d ) induce only connections between subsequent layers. For this, we set the modulation functions' values to zero if the originating time point t n − τ d of the corresponding delay connection does not belong to the interval I −1 . This leads to the following assumption on the modulation functions: Property (III): The modulation functions M d (t) vanish at the following intervals: In the following, we assume that condition (III) is satisfied. Expressions (3)-(4) contain the interdependencies between x n , i.e., the connections between the nodes of the network. We explain these dependencies and present them in a more explicit form in the following. Our goal is to obtain the multilayer network shown in Fig. 4b.  In all cases, the connections induces by one delay τ d are parallel. Since the delay system possesses multiple delays 0 < τ 1 < . . . < τ D < 2T , the parallel connection patterns overlap, as illustrated in Fig. 4b, leading to a more complex topology. In particular, a fully connected pattern appears for D = 2N − 1 and

Modulation of connection weights
With the modulation functions satisfying property (III), the Euler scheme (3)-(4) simplifies to the following map where Eq. (6) implies v d,n = 0 if n − n d < 1 or n − n d > N . In other words, the dependencies at the right-hand side of (7)-(8) contain only the nodes from the − 1-th layer. Moreover, the numbers v d,n determine the strengths of the connections from x −1 n−n d to x n and can be considered as network weights. By reindexing, we can define weights w nj connecting node j of layer − 1 to node n of layer . These weights are given by the equation and define the entries of the weight matrix W = (w nj ) ∈ R N ×(N +1) , except for the last column, which is defined below and contains bias weights. The symbol δ nj is the Kronecker delta, i.e. δ nj = 1 if n = j, and δ nj = 0 if n = j.
The time-dependent driving function z(t) can be utilized to realize a bias weight b n for each node x n . For details, we refer to Sec. 2.5. We define the last The weight matrix is illustrated in Fig. 3. This matrix W is in general sparse, where the degree of sparsity depends on the number D of delays. If D = 2N − 1 and τ d = dθ, d = 1, . . . , D, we obtain a dense connection matrix. Moreover, the positions of the nonzero entries and zero entries are the same for all matrices W 2 , . . . , W L , but the values of the nonzero entries are in general different.

Interpretation as multilayer neural network
The map (7)- (8) can be interpreted as the hidden layer part of a multilayer neural network provided we define suitable input and output layers. The input layer determines how a given input vector u ∈ R M +1 is transformed to the state of the first hidden layer x(t), t ∈ I 1 . The input u ∈ R M +1 contains M input values u 1 , . . . , u M and an additional entry u M +1 = 1. In order to ensure that x(t), t ∈ I 1 depends on u and the initial state x(0) = x 0 exclusively, and does not depend on a history function x(s), s < 0, we set all modulation functions to zero on the first hidden layer interval. This leads to the following Property (IV): The modulation functions satisfy M d (t) = 0, t ∈ I 1 , d = 1, . . . , D.
The dependence on the input vector u ∈ R M +1 can be realized by the driving signal z(t). Property (V): The driving signal z(t) on the interval I 1 is the step function given by where f in (W in u) ∈ R N is the preprocessed input, W in ∈ R N ×(M +1) is an input weight matrix, and f in is an element-wise input preprocessing function. For example, f in (a) = tanh(a) was used in [50]. As a result, the following holds for the first hidden layeṙ which is just a system of ordinary differential equations, which requires an initial condition at a single point x(0) = x 0 for solving it in positive time. This yields the coupled map representation For the hidden layers I 2 , I 3 , . . . , the driving function z(t) can be used to introduce a bias as follows. Property (VI): The driving signal z(t) on the intervals I , ≥ 2, is the step function given by b(t) = b n for t ∈ I ,n , ≥ 2.
Assuming the properties (I)-(VI), Eqs. (7)-(8) imply Let us finally define the output layer, which transforms the node states x L 1 , . . . , x L n of the last hidden layer to an output vectorŷ ∈ R P . For this, we define a vector x L := (x L 1 , . . . , x L N , 1) T ∈ R N +1 , an output weight matrix W out ∈ R P ×(N +1) , and an output activation function f out : R P → R P . The output vector is then defined aŝ Figure 4 illustrates the whole construction process of the coupled maps network; it is given by the equations (15)- (21).
We summarize the main result of section 2.

Constructing a recurrent neural network from a delay system
System (1) can also be considered as recurrent neural network. To show this, we consider the system on the time interval [0, KT ], for some K ∈ N, which is divided into intervals I k := ((k − 1)T, kT ], k = 1, . . . , K. We use k instead of as index for the intervals to make clear that the intervals do not represent layers. The state x(t) on an interval I k is interpreted as the state of the recurrent network at time k. More specifically, is the state of node n at the discrete time k. The driving function z(t) can be utilized as an input signal for each k-time-step. Property (VII): z(t) is the θ-step function with where is an input weight matrix, f in is an element-wise input preprocessing function. Each input vector u(k) contains M input values u 1 (k), . . . , u M (k) and a fixed entry u M +1 (k) := 1 which is needed to include bias weights in the last column of W in . The main difference of the Property (VII) from (VI) is that it allows for the information input through z(t) in all intervals I k . Another important difference is related to the modulation functions, which must be T -periodic in order to implement a recurrent network. This leads to the following assumption. Property (VIII): The modulation functions M d (t) are T -periodic θ-step functions with Note that the value v d,n is independent on k due to periodicity of M d (t).
When assuming the Properties (I), (III), (IV), (VII), and (VIII), the map equations (7)-(8) become and can be interpreted as a recurrent neural network with the input matrix W in and the internal weight matrix W = (w nj ) ∈ R N ×N defined by When we choose the number of delays to be D = 2N − 1, we can realize any given connection matrix W ∈ R N ×N . For that we need to choose the delays  (1) with two delays. The delays τ 1 < T and τ 2 > T induce connections with opposite direction (color arrows). Moreover, the nodes of the recurrent layer a linearly locally coupled (black arrows). All nodes of the recurrent layers are connected to the input and output layer.
provides for all entries w nj of W exactly one corresponding v d,n . Therefore, the arbitrary matrix W can be realized by choosing appropriate step heights for the modulation functions. In the setting of Sec. 3 the resulting network is an arbitrary recurrent network.
Summarizing, the main message of Sec. 3 is as follows.

Networks from delay systems with linear instantaneous part and nonlinear delayed feedback
Particularly suitable for the construction of neural networks are delay systems with a stable linear instantaneous part and a feedback given by a nonlinear function of an affine combination of the delay terms and a driving signal. Such DDEs are described by the equatioṅ where α > 0 is a constant time scale, f is a nonlinear function, and Ref. [8] studied this type of equation for the case D = 1, i.e. for one delay.
An example of (29) is the Ikeda system [59] where D = 1, i.e. a(t) consists of only one scaled feedback term x(t − τ ), signal z(t), and the nonlinear function f (a) = sin(a). This type of dynamics can be applied to reservoir computing using optoelectronic hardware [33]. Another delay dynamical system of type (29), which can be used for reservoir computing, is the Mackey-Glass system [30], where D = 1 and the nonlinearity is given by f (a) = ηa/(1 + |a| p ) with constants η, p > 0. In the work [50], system (29) is used to implement a deep neural network.
Even though the results of the previous sections are applicable to (29)-(30), the special form of these equations allows for an alternative, more precise approximation of the network dynamics.

Interpretation as multilayer neural network
It is shown in [50] that one can derive a particularly simple map representation for system (29) with activation signal (30). We do not repeat here the derivation, and only present the resulting expressions. By applying a semi-analytic Euler discretization and the variation of constants formula, the following equations connecting the nodes in the network are obtained: for the first hidden layer. The hidden layers = 2, . . . , L are given by x n = e −αθ x n−1 + α −1 (1 − e −αθ )f (a n ), n = 2, . . . , N.
One can also formulate the relation between the hidden layers in a matrix form. For this, we define Then, for = 2, . . . , L, the equations (33)-(34) become where f is applied component-wise. By subtracting Ax from both sides of Eq. (41) and multiplication by the matrix we obtain a matrix equation describing the -th hidden layer The neural network (31)-(39) obtained from delay system (29)-(30) can be trained by gradient descent [50]. The training parameters are the entries of the matrices W in and W out , the step heights of the modulation functions M d (t), and the bias signal b(t).

Network for large node distance θ
In contrast to the general system (1), the semilinear system (29) with activation signal (30) does not only emulate a network of nodes for small distance θ. It is also possible to choose large θ. In this case, we can approximate the nodes given by Eq. (2) by the map limit where a = W x −1 for > 1 and a 1 = g(W in u), up to exponentially small terms.
The reason for this limit behavior lies in the nature of the local couplings. Considering Eq. (29), one can interpret the parameter α as a time scale of the system, which determines how fast information about the system state at a certain time point decays while the system is evolving. This phenomenon is related to the so-called instantaneous Lyapunov exponent [60][61][62], which equals −α in this case. As a result, the local coupling between neighboring nodes emerges when only a small amount of time θ passes between the nodes. Hence, increasing θ one can reduce the local coupling strength until it vanishes up to a negligibly small value. For a rigorous derivation of Eq. (44), we refer to [50].
The apparent advantage of the map limit case is that the obtained network matches a classical multilayer perceptron. Hence, known methods such as gradient descent training via the classical back-propagation algorithm [63] can be applied to the delay-induced network [50].
The downside of choosing large values for the node separation θ is that the overall processing time of the system scales linearly with θ. We need a period of time T = N θ to process one hidden layer. Hence, processing a whole network with L hidden layers requires the time period LT = LN θ. For this reason, the work [50] provides a modified back-propagation algorithm for small node separations to enable gradient descent training of networks with significant local coupling.

Conclusions
We have shown how networks of coupled maps with arbitrary topology and arbitrary size can be emulated by a single (possibly even scalar) DDE with multiple delays. Importantly, the coupling weights can be adjusted by changing the modulations of the feedback signals. The network topology is determined by the choice of time-delays. As shown previously [30,33,34,39,50], special cases of such networks are successfully applied for reservoir computing or deep learning.
As an interesting conclusion, it follows that the temporal dynamics of DDEs can unfold arbitrary spatial complexity, which, in our case, is reflected by the topology of the unfolded network. In this respect, we shall mention previously reported spatio-temporal properties of DDEs [7,[64][65][66][67][68][69][70][71][72]. These results show how in some limits, mainly for large delays, the DDEs can be approximated by partial differential equations.
Further, we remark that similar procedures have been used for deriving networks from systems of ordinary differential equations [52][53][54]. However, in their approach, one should use an N -dimensional system of equations for implementing layers with N nodes. This is in contrast to the DDE case, where the construction is possible with just a single-variable equation.
As a possible extension, a realization of adaptive networks using a single node with delayed feedback would be an interesting open problem. In fact, the application to deep neural networks in [50] realizes an adaptive mechanism for the adjustment of the coupling weights. However, this adaptive mechanism is specially tailored for DNN problems. Another possibility would be to emulate networks with dynamical adaptivity of connections [73]. The presented scheme can also be extended by employing delay differential-algebraic equations [74,75].