As previous methods either relied on hand-tuning, neglected advanced CPG features, or couldn’t scale to larger CPG networks, we present the following approach, which is the first to successfully combine all of these characteristics using a biologically plausible network architecture. From a general perspective, the architecture consists of three main components. The first one is the neural phase generator (NPG), which is adapted from [15] with minor modifications. In analogy to the biological model in [21], this part plays the role of the rhythm generator. The second part is the pattern forming network (PFN), which consists of networks with random parameters. At this level spiking patterns appear in a rhythmic fashion with respect to the NPG’s output. Finally, the last part is the pool of motor neurons, which activates the actuated muscles through spike activity. Each of these components will be discussed next in more details.
Figure 1 shows the proposed architecture in the case of a CPG which encodes only two phases. For this reason, the architecture is almost symmetric, when omitting the randomness in the PFNs. In the case when more phases are required, the architecture is extended in all three parts. For instance, at the NPG level, more modules should be added corresponding to the new requirement (Fig. 2). Similarly, at the second level, random networks are also created with respect to the number of desired phases. Finally, in the pool of motor neurons, the number of neurons can also be different but not necessarily. In the following, each of these components will be described separately, in terms of their inner connections, functionality, and role.
Neural Phase Generator
As already mentioned, this part is adapted from [15]. Briefly, each NPG module consists of three neurons H, Q and T (as seen in Fig. 2). The H neuron is the representative of the module’s activity. In other words, whenever a module’s H neuron is spiking, the phase this module represents is active. As for the Q neuron, its role is to guarantee that other NPG modules are inactive when required. It does so by inhibiting the H neurons of the corresponding modules. As for T neurons, they are responsible for the transition from module to module. In contrast to the NPG in [15], our modified version receives an external tonic input, which is responsible for the start and end of activity by exciting and inhibiting the neurons of the NPG. This input represents the modulating inputs issued from the brain to control the CPG’s activity. Once the NPG is active, cutting the external input won’t stop its activity. Instead, the network will keep oscillating because of its inner mutual connections. This behavior is compatible with the nature of CPGs being able to oscillate even in the absence of input from the brain. To make sure this property is available, a constraint is required: the auto-synapse of the H neurons should have weights that are high enough to ensure a continuous spiking behavior until the respective T neurons fire, which then results in a phase transition. Moreover, when the tonic input has a higher frequency, the H neuron will fire more frequently resulting in a faster transition to the next phase. This modification gives control to the NPG over the speed and phase properties of the produced gaits. This behavior is plausible with the proposed biological model where this control is available at the level of the rhythm generator. In [15], this ability is given to the second layer of the architecture which in their case is the motor output shaping stage. Compared to our architecture this would correspond to giving control to the PFNs.
Another modification, to the NPG of [15], is in the number of T neurons in each module. In the previous work, two T neurons were present in every module, as for ours, this number is variable. It is dependent on the duration of the phase each module is responsible for, the longer the duration the bigger the number. All T neurons except the last one, receive the same external tonic input (as the H neuron), which is important since they each excite a part of the PFN. However, these intermediate T neurons, in addition to their tonic inputs are not visible in Fig. 2 for the sake of simplicity.
Finally, the number of modules in our version of the NPG is not fixed to two, but instead is flexible depending on the number of desired phases, as illustrated in Fig. 2. This flexibility allows the architecture to be general and adaptable for different types of locomotion.
Pattern Forming Networks (PFN)
This part of the architecture consists of several separate networks. The number is equivalent to the number of phases of the desired CPG. Each of these networks corresponds to a module of the NPG and is excited by its H and intermediate T neurons. The neurons within each of these networks have randomly initialized properties, leading to richer dynamics and higher learning abilities.
In order to simplify the learning procedure, a minor restriction is enforced: neurons of the PFNs are required to spike only once during each CPG cycle. This is achieved by creating another network with a similar number of neurons for every PFN. This second network is named inhibiting network (IN). Both networks are then connected in a one-to-one fashion with excitation issued from the PFN to the IN and with inhibition in the opposite direction. INs have no synapses connecting their neurons, but instead, exciting auto-synapses for all of their neurons to ensure that each one of them keeps spiking once it’s activated. To illustrate how this works: each time a neuron in a PFN spikes, it’ll excite a corresponding neuron in an IN leading it to spike in a continuous fashion and inhibit reciprocally the same PFN neuron that made it fire. This concept is illustrated in Fig. 3. Now to make sure that the IN is reset in the next cycle, it’ll be inhibited by a Q neuron of an NPG module from a different phase.
Additionally, neurons within a certain PFN are connected with random mutual inhibition at a \(10\%\) rate of connectivity (i.e. the number of synapses is \(10\%\) the one of an all-to-all connectivity scheme), which ensures that spikes produced by different neurons within the network are well distributed over the whole duration of the corresponding phase, and not concentrated in a small duration right after the NPG spikes. Distribution of the PFN spikes in time is very important for the learning, in a way that the more spikes are distributed, the more the number of desired behaviors that can be learned increases. The intuition behind that is simple. To illustrate it, consider the case where no spike occurs in the time interval \([t,t+\epsilon ]\) at the PFN. This will make it less probable that any spike occurs in this same interval in the pool of motor neurons, which are only excited by PFN neurons. In conclusion, PFN spikes should be the most possibly scattered over the phase duration. To enforce that, the size of the network can be increased in order to obtain a higher spiking rate. Conceptually, the role of the PFN can be seen as mapping the NPG spikes into a higher dimensional space in which spikes are well distributed in time. Finally, the external control inputs of the NPG are also propagated to this layer. When the frequency of these signals is increased, the frequency of spiking in the H and intermediate T neurons in the NPG will also increase. Thus, the neurons of the PFN will fire sooner, which means that also the speed of gait will increase. The relation between the frequency of spiking at the NPG level and the time of spikes in the PFN is not a perfect mapping, but that is left for the learning to figure out. Concerning the learning, it is important to note that it’s dependent on the PFN’s size. For instance, the larger the PFN is, the more probable it is that the learning method will converge. However, the convergence will then require longer training time. Throughout our experiments, we used PFNs with sizes ranging from 150 to 500 neurons. The exact selection of the number is also dependent on each phase’s duration.
Pool of Motor Neurons
This last part of the architecture is where rhythmic patterns really matter, since the spikes at this level are responsible for muscle activation and producing gaits for legged locomotion. In biology, the precise way these spikes are decoded into muscle activation is not fully known, but research has indicated that population rate coding is used [9, 16]. In other terms, the activation of a muscle is encoded by a population of neurons. Their spike rate is related to the level of activation of the muscle. Therefore, the number of neurons in the pool is dependent on the desired locomotion and the number of actuated muscles. A similar scheme was adopted in the robotic experiment performed within this work. Moreover, neurons within the pool are not inter-connected, but receive as input the spikes produced by the PFNs, through synapses of both excitatory and inhibitory types. The weights of those synapses are the only ones that are learned (Fig. 4). They are adapted to obtain the desired locomotion behavior. The learning procedure will be discussed later. Finally, the control signals (external tonic input) applied at the NPG are also propagated to the neurons in this pool. For instance, when the tonic input has an increased frequency, the speed of the spike pattern produced by the PFN will increase. Ideally, this will lead to the spiking behavior of the motor neurons to be accelerated.
Learning
When designed faithfully to the constraints already mentioned, the previously described architecture is capable by itself of producing rhythmic spiking patterns in the pool of motor neurons. However, this rhythmic spiking behavior is random and does not correspond to any locomotive behavior. Therefore, the network should be trained somehow to adapt to the desired spiking behaviorFootnote 2 in order for it to be used in real robot locomotion tasks. In order to do so, the Remote Supervision Method (ReSuMe) [18], was employed. However, the only synapses of the network which learning is applied on, are the ones connecting the PFNs to the pool of motor neurons.
The way ReSuMe works is as follows [18]: For each of the learning neurons (in our case the motor neurons), a teacher signal is associated with predetermined timing, representing the desired spiking behavior. This signal is not delivered to the learning neuron, but still plays an important role in the learning updates of synapses terminating at it. Moreover, the modification is based on two rules, the first one depends on the correlation between the presynaptic and the desired spike times, as for the second, it depends on the correlation between the presynaptic and the postsynaptic spike times [18]. The following is the corresponding modification function for the synaptic weight between a presynaptic neuron k (from the PFN) and a postsynaptic neuron i (from the motor neurons):
$$\begin{aligned} \begin{aligned} \frac{d}{dt}w_{ki}(t)&=S^d(t)\Big [a^d+\int _{0}^{\infty }W^d(s^d)S^{in}(t-s^d)ds^d \Big ]\\&\quad +S^l(t)\Big [a^l+\int _{0}^{\infty }W^l(s^l)S^{in}(t-s^l)ds^l \Big ] \end{aligned} \end{aligned}$$
(1)
where \(S^d\), \(S^l\) and \(S^{in}\) are respectively the target, the post and the presynaptic spike trains, \(a^d\) and \(a^l\) determine the so-called non-Hebbian processes of weight modifications, and \(s^l\) and \(s^d\) represent respectively the difference between the time of spike of the postsynaptic neuron and the presynaptic one and the difference between the time of the teacher signal and the time of spike of the presynaptic neuron. In the case of excitatory synapses, the terms \(a^d\) and \(a^l\) are positive, and negative otherwise. As for \(W^d\) and \(W^l\), they represent the learning windows, and are formulated as follows:
$$\begin{aligned} W^d(s^d)= & {} {\left\{ \begin{array}{ll} \ +A^d\cdot exp\left( \frac{-s^d}{\tau {}^d}\right) ,&{} \text {if } s^d>\, 0 \\ 0, &{} \text {if } s^d\le 0 \end{array}\right. } \end{aligned}$$
(2)
$$\begin{aligned} W^l(s^l)= & {} {\left\{ \begin{array}{ll} \ -A^l\cdot exp\left( \frac{-s^l}{\tau {}^l}\right) ,&{} \text {if } s^l>\, 0 \\ 0, &{} \text {if } s^l\le 0 \end{array}\right. } \end{aligned}$$
(3)
where \(W^d\) and \(W^l\) correspond respectively to the learning window of the target and postsynaptic neurons. As for \(A^d\), \(A^l\), \(\tau ^d\) and \(\tau ^l\), they are all constants, such that \(A^d\), \(A^l\) are positive for excitatory synapses and negative otherwise and \(\tau ^d\), \(\tau ^l\) are always positive. When setting \(a^d=-a^l=a\), \(\tau ^d=\tau ^l\) and \(A^d=A^l\), then equation (1) takes the following form:
$$\begin{aligned} \frac{d}{dt}w_{ki}(t)=[S^d(t)-S^l(t)]\Big [a^d+\int _{0}^{\infty }W^d(s)S^{in}(t-s^d)ds^d \Big ] \end{aligned}$$
(4)
Intuitively, what the method does to the network is to strengthen (weaken) synapses which are excitatory (inhibitory) and are incoming to a certain motor neuron, when a spike is transmitted through these synapses within the learning window as defined by \(\tau \), whenever the subject motor neuron is desired to spike. This rule pushes the motor neurons to spike at the desired times. Additionally, it weakens (strengthens) synapses which are excitatory (inhibitory) and are incoming to a certain motor neuron when a spike is transmitted through these synapses within the learning window defined by \(\tau \), whenever the motor neuron actually spikes. As for this rule, it forbids motor neurons from spiking at undesired times.