A neuroplasticity-inspired neural circuit for acoustic navigation with obstacle avoidance that learns smooth motion paths

Abstract

Acoustic spatial navigation for mobile robots is relevant in the absence of reliable visual information about the target that must be localised. Reactive robot navigation in such goal-directed phonotaxis tasks requires generating smooth motion paths towards the acoustic target while simultaneously avoiding obstacles. We have reported earlier on a neural circuit for acoustic navigation which learned stable robot motion paths for a simulated mobile robot. However, in complex environments, the learned motion paths were not smooth. Here, we extend our earlier architecture, by adding a path-smoothing behaviour, to generate smooth motion paths for a simulated mobile robot. This allows the robot to learn to smoothly navigate towards a virtual sound source while avoiding randomly placed obstacles in the environment. We demonstrate through five independent learning trials in simulation that the proposed extension learns motion paths that are not only smooth but also relatively shorter as compared to those generated without learning as well as by our earlier architecture.

Introduction

There are several application scenarios where navigating towards a sound source is relevant. Survivors of natural disasters such as earthquakes are often buried among rubble and large debris, rendering them undetectable to vision-based sensors. Search-and-rescue robots attempting to locate such invisible targets must therefore rely upon other sensory modalities than vision. Audition can be a highly relevant modality for localisation in such cases because auditory cues, unlike visual ones, can be perceived outside the field of view. This can allow mobile robots to navigate in such cluttered environments. In the human-robot interaction context, the ability of a mobile robot in a home setting to autonomously navigate towards a human speaker in response to speech commands is highly desirable. A common problem among the elderly population, which is increasingly living alone in modern times, is injuries sustained at home by accidentally falling down. The extent of such injuries may be great enough to immobilise the person. Robotic companions equipped with sound localisation capabilities could provide assistance in such scenarios by navigating towards an immobilised person assuming that they are able to vocalise.

Mobile robot navigation

Mobile robot navigation is one of the oldest investigated problems within the field of mobile robotics. Robot navigation entails the mobile robot’s ability to determine its own location within a chosen frame of reference, to plan a collision-free path towards a goal location and, if required, to construct a map of the environment in which it is navigating. In this article we focus on the issue of path-planning. A variety of control architectures for mobile robot navigation have been developed over last few decades. A comprehensive review of the different architectures can be found in [27]. These control architectures can be broadly classified into three categories—deliberative, reactive and hybrid (where elements from deliberative and reactive architectures are combined at varying degrees). The primary differences between these architectures lie in the levels of sensing, planning and motor actions performed [1].

Deliberative control architectures depend on precise a priori knowledge about the working environment at the global level. This information can be either provided by the user prior to navigation, extracted using sensor information during navigation or any combination of the two. A path-planner then generates motion paths based on the available a priori knowledge regarding the location of the goal and any obstacles placed in the environment. These paths satisfy one or many pre-defined constraints such as planning time, execution time, maximum distance traversed, maximum autonomy achieved. Thanks to the global nature of available world information, deliberative control architectures can generate optimal robot motion paths. This optimality is, however, achieved at the expense of significant computational power, since the search for optimal paths must cover the entire world model. Such architectures are also not robust to dynamically changing environments where new obstacles and/or goals may appear during navigation. Reactive control architectures, on the other hand, rely solely on local sensor information obtained during navigation to perform path-planning and obstacle avoidance online. Due to the tight coupling between the robot’s own movements and the local nature of incoming sensor data that is inherently uncertain due to sensor noise, motion paths are rarely optimal. This lack of optimality is compensated by the relatively low computational resources required for successful navigation. Such architectures are furthermore robust to dynamically changing environments. Such controllers also tend to employ behaviour-based architectures [3] such as the widely known subsumption architecture [6]. The subsumption architecture is a layered architecture where each layer is responsible for a single behaviour and higher layers employ the behaviours of the layers below them. However, fine-tuning the behaviours of all but the lowest layers requires expert domain knowledge. This implies that extensive hand-tuning by a human designer is critical for success. Hybrid control architectures attempt to generate the “best possible” motion paths that are optimal to a certain extent as well as robust to dynamic changes in the environment. However, this compromise is also achieved at the expense of significant computational resources. Neurobiological findings on specialised neurons called grid cells in the hippocampus of rodents have sparked significant research in neurobiologically inspired map-based navigation for mobile robots in the last decade. A comprehensive review of these approaches can be found in [38]. A more general review of various biomimetic approaches to mobile robot navigation can be found in [14].

Smooth path-planning

Since path-planning and obstacle avoidance are fundamental components in any mobile robot navigation architecture, they have been thoroughly investigated by the mobile robotics community. Path-planning can be categorised into two types depending on the extent of knowledge available about the working environment—global or local. In global path-planning, the environment is fully known a priori in terms of its dimensions and complexity, i.e. the relative location and sizes of obstacles, walls and so on. In local path-planning, the available information about the complexity of the robot’s surroundings is restricted to that perceived by the available sensors, within their respective ranges, on the robot. Naturally, global path-planning is more suitable for offline processing due to significant computational requirements while local path-planning can be executed online due to relatively lower computational requirements. Conventional smooth path-planners determine a path in two steps. First, a general path from start to the goal position is determined by a global planner. Various global path-planning algorithms have been summarised in [23]. Second, any sharp, angular turns are rendered smooth by a local planner. A number of approaches have been developed to achieve smoothness. These approaches use various geometric objects such as Bézier curves [7], canonical curves [22], line arcs and cubic spirals [18], clothoids [13] as well as algebraic techniques such as splines [20, 24] and quintic polynomials [36]. A thorough summary of various path-smoothing approaches can be found in [11]. Most of these approaches typically ignore the mobile robot’s kinematics, thus reducing it to a point object and modify the overall path into a curvilinear path. This has the drawback of transforming straight sections of the path into curved sections, which may bring these sections undesirably close to obstacles such as walls. Some recent approaches address this drawback by taking into account the mobile robot’s kinematic constraints [28] or selectively transforming only sharp angular sections of the path into smooth ones [32]. However, all these approaches are performed offline on the motion paths generated by the global path-planner.

Acoustic navigation

Reactive acoustic navigation or phonotaxis for mobile robots with obstacle avoidance has been relatively well studied. A number of approaches, employing a varying number of microphones for phonotaxis as well as number and type of distance sensor(s) for obstacle avoidance, have been reported in the literature [2, 4, 16, 17, 40, 41]. All these approaches use multi-microphone arrays with more than two microphones to extract sound direction information from the perceived time-of-arrival-difference via cross-correlation. Mobile robots with reactive behaviour-based controllers that must perform phonotaxis in the presence of obstacles must deal with two challenges. First, obstacles that are smaller in size than the sound wavelengths do not block the sound due to sound diffraction. Without any obstacle detection, a higher-level goal-directed steering behaviour will simply steer the mobile robot towards the acoustic target and may cause collision with such obstacles. Adding a low-level obstacle avoidance behaviour will force the robot to alter its heading temporarily while negotiating the obstacle. This maneuvre may, however, temporarily direct the robot away from the acoustic target, thereby creating conflict between the two behaviours, which is the second challenge. We have previously developed a neural circuit that minimises this conflict and learns stable motion paths via unsupervised learning [35].

Contribution of the present work

The adaptive neural circuit for acoustic navigation presented here has the advantage of being a computationally inexpensive reactive controller that uses only two sound sensors and learns smooth motion paths online without supervision.

Our approach builds upon our previously reported [35] purely reactive control architecture implementing path-planning with obstacle avoidance using two sound sensors and a distance sensor. Our original architecture is in the form of an adaptive neural circuit that implements two behaviours—high-level phonotaxis and low-level obstacle avoidance. There are two main differences between our original architecture and the subsumption architecture. First, the high-level behaviour does not subsume the low-level behaviour. Instead, the low-level behaviour inhibits the motor commands of the high-level behaviour when necessary. Second, the obstacle avoidance behaviour modulates the parameters of the phonotaxis behaviour via unsupervised learning. Explicit hand-tuning of the phonotaxis behaviour is therefore not necessary. This also minimises the triggering of the obstacle avoidance behaviour and consequently sharp turns during navigation. However, the main drawback of our earlier approach is that in complex cluttered environments, the obstacle avoidance behaviour cannot be completely avoided. This implies that sharp turns cannot be completely avoided, even though the overall motion paths are relatively smoother as compared to those when the learning is disabled and parameters of the acoustic steering behaviour are fine-tuned by hand.

The neural circuit presented here extends our earlier architecture by allowing a simulated mobile robot with non-holonomic kinematic constraints to learn smooth motion paths online towards an acoustic target. The target is located 3 m away and emits a continuous tone with a sound frequency of 2.2 kHz. We extract sound direction information by a previously reported model of the lizard peripheral auditory system [37]. This model has been extensively investigated through various robotic implementations [34]. The extracted sound direction is mapped to the robot’s wheel velocities as Braitenberg sensorimotor mappings [5]. These sensorimotor mappings realise the phonotaxis behaviour. The parameters of these mappings are modulated via input correlation (ICO) learning [29] during the obstacle avoidance behaviour. ICO learning is a closed-loop, unsupervised, correlation-based learning algorithm adapted from differential Hebbian learning [19, 21]. In this manner, we explicitly exploit interaction with obstacles to fine-tune the parameters of the phonotaxis behaviour.

The main difference between our previously reported architecture and the architecture presented here is the inclusion of an online path-smoothing mechanism. This mechanism independently adapts the motor commands at the output of the phonotaxis behaviour, via another instance of the ICO learning algorithm, to ensure smooth obstacle avoidance. This adaptation is achieved in terms of learning to appropriately enhance the motor commands of the phonotaxis behaviour such that the resultant changes in the movement of the robot eventually completely prevent the obstacle avoidance behaviour from being triggered. In this manner, the conflict between the high-level behaviour and the low-level behaviour is completely eliminated. This has the advantage of ensuring that after learning, the wheel velocities of the mobile robot are modulated appropriately when approaching obstacles so as to generate motion paths that smoothly bend around the obstacles.

We validate the neural circuit in simulation via independent trials in multiple environments, each with randomly placed obstacles of random dimensions. For each environment, we compare the generated motion paths with those obtained in two control conditions—(a) when all learning is disabled and parameters for the behaviours are fine-tuned by hand and (b) when only the path-smoothing behaviour is disabled but parameters of the Braitenberg sensorimotor cross-couplings are learned.

This article is organised in the following manner. We briefly describe the lizard peripheral auditory system model and its response characteristics in Sect. 2. We also describe heterosynaptic and non-synaptic plasticity in this section. We describe the neural circuit and the experimental set-up in Sect. 3. We present and discuss simulation results in Sect. 4 and Sect. 5. We summarise the research in Sect. 6 and outline further work.

Background

Lizard peripheral auditory system

The lizard peripheral auditory system is depicted on the left in Fig. 1a. It comprises two eardrums (TM) that are internally joined to each other via air-filled Eustachian tubes (ET) that open into a central cavity. Externally impending sound causes the eardrums to vibrate, creating sound waves inside the ETs that interfere with each other. This implies that the two eardrums are acoustically linked. This acoustical link maps miniscule phase differences between the sound waves arriving externally at either eardrum into relatively larger differences of up to 20 dB between vibration amplitudes of the two eardrums. These miniscule phase differences correspond to interaural time differences (ITDs) in the \(\upmu \)s scale that encode sound direction information. Since the relative direction from which sound is approaching the system corresponds to the magnitude of the phase difference, the magnitude of vibration amplitude difference also encodes sound direction information. In simpler terms, for any given and relevant sound signal, the eardrum that is closer to it vibrates more strongly as compared to the eardrum farther from it.

Fig. 1
figure1

Sound direction information driving phonotaxis behaviour (taken from [35]). a Cross section [8] (left) of a lizard (genus Sceloporus) peripheral auditory system and its electrical equivalent [12] (right). b The outputs \(\left| i_{\mathrm {I}}\right| \) (left) and \(\left| i_{\mathrm {C}}\right| \) (right) as given by (1)

The internal acoustic filtering effects of the lizard peripheral auditory system are mimicked by an equivalent electrical model [12] as depicted on the right in Fig. 1a. \(P_{\mathrm {I}}\) and \(P_{\mathrm {C}}\) represent sources exerting sound pressures \(V_{\mathrm {I}}\) and \(V_{\mathrm {C}},\) respectively, experienced by the ipsilateral (facing towards the acoustic target) and contralateral (facing away from the acoustic target) eardrum. This generates current flow through impedances \(Z_{\mathrm {r}}\) and \(Z_{\mathrm {v}}\) that, respectively, represent the net acoustic filtering due to the mass of the eardrum as well as stiffness of the ET and the central cavity. Currents \(i_{\mathrm {I}}\) and \(i_{\mathrm {C}},\) respectively, represent the overall vibrations of the ipsilateral and contralateral eardrums in response to \(V_{\mathrm {I}}\) and \(V_{\mathrm {C}}\). The current flow through \(Z_{\mathrm {v}}\) represents the propagation of sound waves inside the central cavity as sound pressure \(V_{\mathrm {cc}}\) inside it changes. \(V_{\mathrm {cc}}\) is a result of the superposition of internal sound pressures experienced from either end. Sound direction information extracted by either ear is encoded in the outputs \(\left| i_{\mathrm {I}}\right| \) and \(\left| i_{\mathrm {C}}\right| \) (see Fig. 1b) of the peripheral auditory model. Relative sound direction can then be determined by comparing \(\left| i_{\mathrm {I}}\right| \) and \(\left| i_{\mathrm {C}}\right| \). Mathematically, this can be formulated as

$$\begin{aligned} \left| i_{\mathrm {I}}\right|= & {} \left| G_{\mathrm {I}}\cdot V_{\mathrm {I}} + G_{\mathrm {C}}\cdot V_{\mathrm {C}}\right| \equiv 20\log \left| i_{\mathrm {I}}\right| \text{ dB } \text{ and } \nonumber \\ \left| i_{\mathrm {C}}\right|= & {} \left| {G_{\mathrm {C}}\cdot V_{\mathrm {I}} + G_{\mathrm {I}}\cdot V_{\mathrm {C}}}\right| \equiv 20\log \left| i_{\mathrm {C}}\right| \text{ dB. } \end{aligned}$$
(1)

\(G_{\mathrm {I}}\) and \(G_{\mathrm {C}},\) respectively, are ipsilateral and contralateral gain terms experimentally derived via laser vibrometry [8] measurements of eardrum vibrations in vivo. \(G_{\mathrm {I}}\) and \(G_{\mathrm {C}}\) vary with sound frequency (1–2.2 kHz) and are implemented as fourth-order digital infinite impulse response bandpass filters. If we define the subscripts I and C to, respectively, signify left and right sides of the median, then the condition \(\left| i_{\mathrm {I}}\right| > \left| i_{\mathrm {C}}\right| \) holds true for a sound signal arriving from the left, while \(\left| i_{\mathrm {C}}\right| > \left| i_{\mathrm {I}}\right| \) holds true for a sound signal arriving from the right. Since the model is symmetric, \(\left| i_{\mathrm {I}}\right| \) and \(\left| i_{\mathrm {C}}\right| \) vary symmetrically with sound direction. This variation is nonlinear with respect to the sound direction within the relevant range of \([-\,90^\circ ,+\,90^\circ ]\). The lizard peripheral auditory system, including its equivalent electrical model and response characteristics of the same, has been reported earlier in detail [34] and is summarised here for the sake of clarity.

Neuroplasticity and learning in the biological brain

In biological nervous systems, neurons communicate with each other to process sensory information and generate appropriate motor commands. This communication occurs via electrical and chemical signalling mechanisms. The structure through which this communication occurs is referred to as a synapse, which acts as a bridge between two neurons (see Fig. 2a).Footnote 1 These two neurons are referred to as pre-synaptic and post-synaptic neurons. The pre-synaptic neuron sends a signal as output, and the post-synaptic neuron receives that signal as input. The likelihood of a post-synaptic neuron firing in response to synaptic input from a pre-synaptic neuron also depends on changes in the transfer properties of these synapses. Synapses tend to strengthen or weaken over time, in response to increase or decrease in their activity. The phenomenon of activity-dependent changes in a synapse is referred to as synaptic plasticity [31].

Fig. 2
figure2

Neuroplasticity and learning. a Illustration showing heterosynaptic and non-synaptic plasticity sites in the brain. b Hypothetical neural circuit before and after ICO learning (taken from [35]). During learning, temporally correlated activities of sensory neurons A and B lead to gradual increase in strength of synapse (depicted as a thick line) between neuron A and a motor neuron C. This leads to stronger activation of neuron C, causing correspondingly stronger behavioural response. After learning, the behavioural response is strong enough to minimise activation of neuron B

There are many forms of synaptic plasticity. Changes in the synaptic strength between two neurons often depend (but not exclusively) on the activity of the pre-synaptic neuron. This phenomenon is referred to as homosynaptic plasticity. However, changes in synaptic strength may also be brought about by the release of chemical neuromodulators from a nearby third neuron due to its electrical activity. This phenomenon of synaptic changes between two neurons brought about by the firing of a third neuron is referred to as heterosynaptic plasticity. Neuroplasticity can also be observed within the neuron itself. A neuron essentially integrates all incoming synaptic inputs, of which some may be excitatory and others inhibitory. This synaptic integration can be modulated by biochemical changes inside the neuron, resulting in changes in the neuron’s intrinsic excitability (i.e. sensitivity to synaptic input). These changes can alter the firing characteristics of the neuron. This phenomenon of changes in the intrinsic excitability of neurons themselves is referred to as non-synaptic plasticity. Non-synaptic plasticity has been observed across many species and brain areas [39]. These various synaptic plasticity mechanisms are considered to be regulated globally by homeostatic plasticity. Homeostatic plasticity is referred to as ability of neurons to regulate their own excitability relative to network activity over a relatively longer timescale.

An organism’s interaction with its environment generates multimodal sensory stimuli that induce the aforementioned plastic changes. These changes allow the organism to learn new sensorimotor associations in an unsupervised manner. Hebbian theory [15] proposes a mechanism for synaptic changes underlying such unsupervised associative learning, and its corresponding algorithmic implementation is called Hebbian learning. Hebbian theory postulates that highly correlated firing of pre-synaptic and post-synaptic neurons strengthens the synaptic connection between them. One form of Hebbian-based learning is ICO learning [29], an online unsupervised learning algorithm that implements heterosynaptic plasticity. Here, temporally correlated activities of all pre-synaptic neurons projecting on to a post-synaptic neuron drive changes in synaptic strength between them (e.g. the three-neuron sub-circuit shown in Fig. 2b). ICO learning is computationally cheap, inherently stable by design and has been utilised to successfully generate adaptive, timing-dependent behaviour in real robots [26, 30].

Materials and methods

The task of acoustic navigation is defined as follows. A simulated mobile robot must learn a smooth motion path to navigate from a pre-defined start location and orientation towards an acoustic target placed at an unknown location, while avoiding obstacles in the environment. Here, we consider smoothness to be purely kinematic, i.e. described only in terms of the lack of sharp angular turns. Dynamic smoothness where the linear and angular velocities of the mobile robot must also vary smoothly are not considered here.

We use the kinematics of a differential-drive mobile robot with two wheels, which imposes non-holonomic constraints on the movement, to model the simulated mobile robot (see Fig. 3). We arbitrarily choose the distance l between the centres of the two wheels to be 16 cm. The mobile robot receives auditory signals from the acoustic target towards which it must navigate via two virtual acoustic sensors that mimic microphones only in function. We set the physical separation d between the acoustic sensors to 13 mm because the animal from which the parameters of the lizard peripheral auditory model have been derived has a 13 mm physical separation between its ears. It is critical that the acoustic sensor separation matches this value; otherwise, there would be a mismatch between the ITD cues to which the peripheral auditory model is tuned and the actual ITD cues. This mismatch can result in the lizard peripheral auditory model generating inaccurate responses [33]. The parameters of the lizard auditory model can, however, be tuned to match a given physical separation between the acoustic sensors. This tuning shifts the frequency response of the model, and this shift is inversely proportional to the physical separation. Greater physical separation implies a shift towards lower sound frequencies, while smaller physical separation implies a shift towards higher sound frequencies.

Fig. 3
figure3

Mobile robot kinematics (modified from [35])

The lizard peripheral auditory model processes the incoming auditory signals, and its outputs (\(\left| i_{\mathrm {I}}\right| \) and \(\left| i_{\mathrm {C}}\right| \)) are fed as auditory inputs to the neural circuit described in the next paragraph. The auditory signals alone drive phonotaxis behaviour. A distance measurement sensor located at the centre of the mobile robot measures the distances as well as relative locations (i.e. whether a sensed obstacle is to the left or the right) of obstacles within a frontal \(180^\circ \) field of view. This sensor mimics a laser range finder from a purely functional perspective. A laser range finder is a common distance measurement sensor used in mobile robots for navigation purposes. We added white Gaussian noise at, respectively, 20 dB and 3 dB to the auditory signals emanating from the acoustic target and to the distance measurement sensor output to simulate noisy sensors. We used the standard forward kinematic model for differential-drive mobile robots [10] as given by (2) to determine the pose \([x,y,\theta ]\) of the robot given the left and right wheel velocities \(v_{\mathrm {left}}\) and \(v_{\mathrm {right}},\) respectively, where (xy) are the two-dimensional position coordinates and \(\theta \) is the orientation.

$$\begin{aligned} \begin{aligned}&\begin{bmatrix} x \\ y \\ \theta \end{bmatrix} = \begin{bmatrix} \cos (\omega \delta t)&\quad -\,\sin (\omega \delta t)&\quad 0 \\ \sin (\omega \delta t)&\quad \cos (\omega \delta t)&\quad 0 \\ 0&\quad 0&\quad 1 \end{bmatrix} \begin{bmatrix} D\sin (\theta ) \\ -D\cos (\theta ) \\ \theta \end{bmatrix} + \begin{bmatrix} x - D\sin (\theta ) \\ y + D\cos (\theta ) \\ \omega \delta \end{bmatrix} \end{aligned} \end{aligned}$$
(2)

where angular velocity \(\omega = \frac{(v_{\mathrm {right}} - v_{\mathrm {left}})}{l}\) and distance D from instantaneous centre of curvature = \(\frac{l}{2} \frac{(v_{\mathrm {right}} + v_{\mathrm {left}})}{(v_{\mathrm {right}} - v_{\mathrm {left}})}\).

The mobile robot is placed in a simulated arena to evaluate the neural circuit for adaptive acoustic navigation. Wheel slip is not modelled to reduce complexity. The initial pose of the robot is set to \([0,0,0^\circ ]\). Obstacles are modelled as solid, circular shapes placed at random locations inside the arena around the target as well as between the mobile robot and the target. The obstacles are of randomly chosen dimensions but all dimensions are set to a value below a threshold that matches the wavelength of a sinusoidal acoustic signal of frequency 2.2 kHz. This is the acoustic signal emitted continuously by the target, which is located 3 m away from the start location. The threshold is computed as \(\dfrac{\text{ Speed } \text{ of } \text{ sound } \text{ in } \text{ air } \text{ in } \text{ cm/s }}{\text{ Sound } \text{ frequency } \text{ in } \text{ Hz }}=\dfrac{34{,}000}{2200} = ~15.45\) cm.

Adaptive neural circuit

The adaptive neural circuit for reactive acoustic navigation with obstacle avoidance is illustrated in Fig. 4. Sound direction information encoded within the auditory inputs \(\left| i_{\mathrm {I}}\right| \) and \(\left| i_{\mathrm {C}}\right| \) in decibels is extracted as described earlier from the lizard peripheral auditory model. This information is mapped onto the motor outputs (i.e. linear wheel velocities in cm/s) via Braitenberg sensorimotor cross-couplings. The cross-couplings imply that the ipsilateral auditory input proportionally drives the contralateral motor output. The robot consequently turns towards the side on which the auditory input is greater in magnitude, i.e. left when \(\left| i_{\mathrm {I}}\right| > \left| i_{\mathrm {C}}\right| \) and right when \(\left| i_{\mathrm {C}}\right| > \left| i_{\mathrm {I}}\right| \). The instantaneous radius of the turn is proportional to the relative instantaneous magnitudes of the auditory signals. When the robot is oriented directly towards the target, the two acoustic sensors are equidistant from the target. There is consequently zero phase difference between the auditory signals arriving at the two acoustic sensors. Thus, the two outputs of the peripheral auditory model are identical. This renders the wheel velocities identical, resulting in the robot moving in a straight line. The sensorimotor cross-couplings generate phonotaxis behaviour and are realised as two sub-circuits each with a local interneuron with a nonlinear sigmoid transfer function. The transfer functions are, respectively, defined as

$$\begin{aligned} v_{\mathrm {l}} = \dfrac{4}{1+\beta _{\mathrm {r}}e^{-\left| i_{\mathrm {C}}\right| }} \text{ and } v_{\mathrm {r}} = \dfrac{4}{1+\beta _{\mathrm {l}}e^{-\left| i_{\mathrm {I}}\right| }}. \end{aligned}$$
(3)

The transfer functions are defined to impose finite limits between 0 and 4 cm/s, arbitrarily chosen as the effective range for the linear wheel velocities \(v_\mathrm{l}\) and \(v_\mathrm{r}\). The parameters \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) determine the amount by which the respective transfer functions can be horizontally shifted to the left or right. This horizontal shift in the transfer functions models the intrinsic excitability of the individual sensorimotor interneurons. Shifting the transfer function, respectively, to the left or right causes an increase or decrease in the corresponding linear wheel velocity. The strength of a sensorimotor cross-coupling can be viewed as being proportional to the amount of horizontal shift in its transfer function. Rightward shifts imply a relatively stronger cross-coupling while leftward shifts imply a relatively weaker cross-coupling. During learning, the two identical sub-circuits that implement ICO learning, described in the next section, modify \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) and shift these transfer functions to generate appropriate linear wheel velocities.

Fig. 4
figure4

Adaptive neural circuit for reactive navigation with obstacle avoidance embodied in the environment. The sensorimotor mappings between the auditory inputs \(\left| i_{\mathrm {I}}\right| \) and \(\left| i_{\mathrm {C}}\right| \) and the motor outputs \(v_{\mathrm {l}}\) and \(v_{\mathrm {r}}\) (dotted blue lines) generate phonotaxis behaviour. Two sub-circuits implementing ICO learning (solid lines inside the shaded areas) update the mapping strengths (\(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\)), respectively, by \(\delta \beta _{\mathrm {l}}\) and \(\delta \beta _{\mathrm {r}}\). Path-smoothing is realised as a third sub-circuit implementing ICO learning that modifies wheel velocities by computing a secondary velocity component v. The obstacle avoidance behaviour is implemented by the sensorimotor mappings (dashed red lines), with manually fixed strengths, between the distance sensor inputs and motor outputs (colour figure online)

The obstacle avoidance behaviour is realised as two identical sub-circuits, one forcing the mobile robot to turn left and the other forcing it to turn right. These sub-circuits, respectively, project excitatory and inhibitory synaptic connections on to the outputs of the contralateral and ipsilateral sensorimotor interneurons. The obstacle avoidance behaviour is triggered in the form of a reflex action when the mobile robot detects an approaching obstacle within a collision threshold. This reflex behaviour is initiated by overriding the motor velocities of the wheels. The motor velocity of the wheel on the same side as the detected obstacle is set to a relatively greater but constant value of 4 cm/s and the motor velocity of the opposite wheel to a relatively low but constant value of 0.1 cm/s. This forces the mobile robot to turn sharply away from the obstacle either to the left or to the right, depending, respectively, on whether the obstacle is on the right or the left. The collision threshold (see Fig. 3) is an imaginary circular boundary centred on the mobile robot, with a radius 20 cm.

To smoothen the motion paths, we include a third sub-circuit that also implements ICO learning to learn the appropriate value for an additional velocity component v. This velocity component is added to that calculated by the Braitenberg sensorimotor interneurons to modify wheel velocities. v is, respectively, added to \(v_\mathrm{l}\) and \(v_\mathrm{r}\) when the obstacle is to the left and to the right. Path-smoothing is activated when the obstacle avoidance behaviour is triggered and remains active only for the duration of the obstacle avoidance maneuvre. This sub-circuit also models heterosynaptic plasticity.

Circuit operation and learning

At each simulation time step, sound direction information is extracted by the lizard peripheral auditory model and the individual linear wheel velocities are computed via the sigmoid transfer functions. Based on the current wheel velocities, the new pose of the robot is computed via (2). If at any time step the mobile robot encounters an obstacle within its collision threshold, the obstacle avoidance reflex behaviour overrides the current linear wheel velocities as described earlier. The obstacle avoidance behaviour endures as long as the obstacle is sensed to lie within the mobile robot’s collision threshold. At each time step during the obstacle avoidance maneuvre, the sub-circuits implementing ICO learning update the parameters \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) of the sensorimotor interneurons. The updates for \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) are mathematically defined as

$$\begin{aligned} \begin{aligned} \dfrac{\delta \beta _{\mathrm {l}}}{\delta t}&= w_{\mathrm {l}} \left| i_{\mathrm {I}}\right| + {\mathrm {dL}},\quad \text{ where } \dfrac{\delta w_{\mathrm {l}}}{\delta t} = \eta _{\mathrm {1}} \left| i_{\mathrm {I}}\right| \dfrac{\delta {\mathrm {dL}}}{\delta t},\quad \text{ and } \\ \dfrac{\delta \beta _{\mathrm {r}}}{\delta t}&= w_{\mathrm {r}} \left| i_{\mathrm {C}}\right| + {\mathrm {dR}},\quad \text{ where } \dfrac{\delta w_{\mathrm {r}}}{\delta t} = \eta _{\mathrm {1}} \left| i_{\mathrm {C}}\right| \dfrac{\delta {\mathrm {dR}}}{\delta t}. \end{aligned} \end{aligned}$$
(4)

The updates for \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) in (4) are computed as follows. First, the respective temporal cross-correlations between the ipsilateral and contralateral output signals of the lizard peripheral model and the first-order derivative of the distance sensor signal are computed \(\left( \left| i_{\mathrm {I}}\right| \dfrac{\delta {\mathrm {dL}}}{\delta t}\,{\hbox {and}}\,\left| i_{\mathrm {C}}\right| \dfrac{\delta {\mathrm {dR}}}{\delta t}\right) \). Next, these cross-correlation terms are used to update respective synaptic weights \(w_{\mathrm {l}}\) and \(w_{\mathrm {r}}\). Finally, \(w_{\mathrm {l}}\) and \(w_{\mathrm {r}}\) are used to compute the updates for \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) as summations of distance information and sound direction information weighted by the synaptic weights (\(w_{\mathrm {l}} \left| i_{\mathrm {I}}\right| + {\mathrm {dL}}\) and \(w_{\mathrm {r}} \left| i_{\mathrm {C}}\right| + {\mathrm {dR}}\)).

Initially, \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) are set to random values between 0 and 1, while the synaptic weights \(w_{\mathrm {l}}\) and \(w_{\mathrm {r}}\) are set to random values between 0 and 0.1. These sub-circuits themselves model heterosynaptic plasticity in that the activity of the neuron providing distance information drives the synaptic weight change between the neuron providing sound direction information and the neuron integrating these two pieces of information together. The overall mechanism of updating the intrinsic excitability of the sensorimotor interneurons via these sub-circuits models non-synaptic plasticity.

Equation (4) enforces positive weight updates only. This implies that \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) are always increasing for the duration of the obstacle avoidance behaviour. Allowing the ICO learning algorithm to execute without restraint may therefore result in an uncontrollable increase in \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\). This will continuously shift the transfer function curves to the right, and as a consequence the linear wheel velocities \(v_{\mathrm {l}}\) and \(v_{\mathrm {r}}\) will eventually become infinitesimally small. This will have the undesired effect of continuously reducing the linear velocity of the mobile robot. To avoid this, \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) are exponentially decreased as function of time at each time step during learning. This is achieved by scaling \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) with a time-varying scaling factor (i.e. synaptic scaling, a form of homeostatic plasticity) defined as \(e^{(-t/{\mathrm {k}})}\), where t is the current time step. This continuously shifts the transfer function curves slightly to the left. The parameter k is determined via trial-and-error to be 60,000 to ensure that \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) do not decrease faster than they increase. This is necessary because any decrease in \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) implies that the neural circuit partially “forgets” what it has learned so far. This compensatory mechanism prevents the uncontrollable rightward shift in the transfer functions. This maintains the linear wheel velocities, and thus the mobile robot’s linear velocity, within a reasonable effective range. This mechanism models homeostatic plasticity and stabilises the overall effects of the various concurrent ICO learning mechanisms, thereby maintaining the neural circuit in homeostasis.

To smoothen the motion paths online, the mobile robot must initiate a relatively gentle turn away from an approaching obstacle when that obstacle is outside the collision threshold. This gentle turn is initiated by, respectively, adding a second velocity component v to the linear wheel velocities \(v_\mathrm{l}\) and \(v_\mathrm{r}\), when the obstacle is to the left and to the right side of the mobile robot. This causes the wheel closer to the obstacle to rotate faster than the one further away from the obstacle, resulting in the robot turning further away from the obstacle. We therefore define a second, outer circular boundary centred on the mobile robot with a radius 40 cm that is used as an “initiation” threshold to decide when to initiate this gentler turn. At each simulation time step during the obstacle avoidance maneuvre, the path-smoothing sub-circuit computes the temporal cross-correlation between the maximum distance signal \({d}_\mathrm{max}\) and the first-order derivative of the minimum difference signal \({d}_\mathrm{min}\). \({d}_\mathrm{max}\) is defined as the distance between the robot and the obstacle when the obstacle is within the initiation threshold. \({d}_\mathrm{min}\) is defined as the distance between the robot and the obstacle when the obstacle is within the collision threshold. The cross-correlation term is used to update the synaptic weight w, which is then used to compute the additional velocity component v as a weighted sum of \({d}_\mathrm{max}\) and \({d}_\mathrm{min}\). The update for v is mathematically defined as

$$\begin{aligned} v = w \, {d}_\mathrm{max} + {d}_\mathrm{min},\quad \text{ where } \dfrac{\delta w}{\delta t} = \eta _{\mathrm {2}} \, {d}_\mathrm{max}\,\, \dfrac{\delta {d}_\mathrm{min}}{\delta t}. \end{aligned}$$
(5)

As learning progresses, v will eventually be large enough to allow the robot to turn fast enough away from the approaching obstacle such that the obstacle never crosses the collision threshold. At this point, the obstacle avoidance behaviour will no longer be triggered. Since the additional velocity component v for path-smoothing is only learned during the obstacle avoidance behaviour, the updates to v will also cease to occur. Therefore, there is no possibility of an uncontrollable increase in v and thus no compensatory mechanism is required to stabilise v. The free parameters governing the circuit operation are summarised below.

  • \(\eta _{\mathrm {1}}\) The learning rate for the neural sub-circuit that modifies the Braitenberg sensorimotor couplings. This is initialised to 0.01.

  • \(\eta _{\mathrm {2}}\) The learning rate for the neural sub-circuit that realises path-smoothing. This is initialised to 0.2.

  • k A parameter that determines the speed of the exponential leftward shift of the activation functions in the Braitenberg sensorimotor couplings. This is initialised to 60,000.

To maintain the simulation time to a reasonable amount, the learning is limited to a maximum of 50 iterations. The maximum number of time steps per learning iteration is set to 1000, again to maintain the simulation time to a reasonable amount. A single simulation time step is defined as being equivalent to an elapsed time of 1 s. The learning is terminated when either one of two conditions is met—(a) when all 50 learning iterations are complete, or (b) when the robot reaches within a radius of 10 cm around the acoustic target without triggering the obstacle avoidance behaviour.

Results

We evaluate the performance of the adaptive neural circuit in simulation in terms of its ability to learn to suppress the obstacle avoidance behaviour completely to generate smooth motion paths as well as in terms of the overall reduction in path length. We conduct the navigation trials in five different environments, each with randomly placed obstacles of random dimensions. For each environment, we compare the performance of the adaptive neural circuit to that obtained in two control conditions—(a) when all learning is disabled and (b) when only the path-smoothing behaviour is disabled but learning of parameters of the Braitenberg sensorimotor cross-couplings is allowed. Under the first condition, the transfer functions of the sensorimotor interneurons, given by (4), are not used to compute wheel velocities. Instead, the auditory inputs \(\left| i_{\mathrm {I}}\right| \) and \(\left| i_{\mathrm {C}}\right| \) are, respectively, mapped 1:1 to the wheel velocities \(v_\mathrm{r}\) and \(v_\mathrm{l}\). Therefore, these are simply static Braitenberg sensorimotor cross-couplings. Furthermore, the secondary velocity component v is set to zero. Under the second condition, the transfer functions of the sensorimotor interneurons are used to compute wheel velocities. The parameters of these transfer functions are learned via ICO learning. The secondary velocity component v is set to zero under this condition as well. This renders the adaptive neural circuit identical to our previous architecture [35].

Fig. 5
figure5

Acoustic navigation with the proposed neural circuit. ac Robot motion paths without learning (dashed blue line in a), learned without path-smoothing (solid green line in b) and learned with path-smoothing (solid red line in c). The numbered circles along the motion paths indicate the approximate instance during which the obstacle avoidance behaviour is activated. df Wheel velocities (black and magenta lines) computed during navigation. The shaded areas indicate the approximate instance during which the obstacle avoidance behaviour is activated when an obstacle is detected to lie within the collision threshold (as indicated by a burst in the distance from the obstacle, shown here for aesthetic purposes as a normalised signal in green). This overrides the wheel velocities with predetermined values (colour figure online)

Figure 5 illustrates the learned motion paths of the neural circuit in three independent trials in a common environment. Motion paths learned in the other four environments are illustrated in Online Resources 14–17 (respectively, titled “ESM_14.eps” , “ESM_15.eps” , “ESM_16.eps” and “ESM_17.eps” ), which are included in the supplementary material. Figure 5a, b highlights the potential limitation of our previously proposed neural circuit in generating motion paths. It is evident that there is no significant difference between the motion paths generated when all learning is disabled (see Fig. 5a) and when the path-smoothing behaviour is disabled (i.e. v is set to zero) but learning of parameters \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) of the Braitenberg sensorimotor cross-couplings is allowed (see Fig. 5b). In both cases, the robot is steered in a relatively straight line towards the target due to the strong and symmetrical Braitenberg sensorimotor cross-couplings, hand-tuned in the former case and learned through interaction with the obstacles in the latter. Strong and symmetrical Braitenberg cross-couplings tend to enforce straight-line motion paths that are sub-optimal and difficult to maintain in complex environments. This implies that the robot can encounter obstacles in its path during navigation, since the placement of the obstacles in the environment is unknown to the robot. These relatively straight motion paths are heavily modified by the obstacle avoidance behaviour (see Fig. 5d, e) that is triggered when obstacles cross the robot’s collision threshold. The obstacle avoidance behaviour overrides the wheel velocities and forces the robot to execute sharp turns while avoiding obstacles and reorienting itself towards the target. The robot is therefore unable to learn to completely suppress its obstacle avoidance behaviour after learning.

When the path-smoothing behaviour is enabled, the robot is able to learn to completely suppress its obstacle avoidance behaviour (see Fig. 5f) during interaction with obstacles. This is due to the modification of the wheel velocities with the second velocity component v via the ICO learning algorithm during interaction with the obstacles. The neural circuit learns an appropriate value for v such that the robot initiates relatively gentle turns before the obstacle crosses its collision threshold. This minimises sharp turns during navigation and generates smooth motion paths. Early in the progression of the learning, the robot tends to generate very winding motion paths (grey motion paths in Fig. 5c). This is because the strengths of the Braitenberg sensorimotor cross-couplings are initialised to random values that are not necessarily identical and the turning radius of the robot depends on the parameters \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) of the transfer functions of the Braitenberg sensorimotor cross-couplings. As the learning progresses, the sensorimotor cross-couplings as well as the second velocity component gradually adapt to the environment. Once these are properly adapted, the robot generates smooth motion paths (red motion paths in Fig. 5c). The motion paths generated by the simulated mobile robot during navigation as shown in Fig. 5a–c can be observed in the video provided as Online Resource 18 (titled “ESM_18.wmv” ) in the supplementary material. Videos showing the motion paths for the other four environments are provided as Online Resources 19–22 (respectively, titled “ESM_19.wmv” , “ESM_20.wmv” , “ESM_21.wmv” and “ESM_22.wmv” ) in the supplementary material.

We also compared the overall lengths of the motion paths generated during navigation for the three cases in the five environments. For each environment, we conducted ten trials for each case. For each case, we computed the average path length as well as the standard deviation in path length over the ten trials. The path lengths were computed as the sum of the linear distance travelled per time step. Figure 6 summarises this comparison. It is evident that learning to smoothen the motion paths ultimately decreases path lengths by varying amounts. The extent with which the paths are shortened depends on the placement and dimensions of the obstacles in the environment, as these two factors affect the interaction the mobile robot experiences with the obstacles during navigation. Furthermore, it is also evident that the learned smooth paths are relatively more stable in terms of path length, given the relatively lower standard deviation in path length.

Fig. 6
figure6

Comparison of total motion path length without learning, learning without path-smoothing and learning with path-smoothing. Bars for trial 1 correspond to the motion paths depicted in Fig. 5a–c

Acoustic navigation in complex environments

The proposed architecture is able to generate smooth motion paths in relatively simple environments. Here, we present the motion paths learned for relatively complex environments, where the target is located within a confined area. An example of such an environment can be an indoor location. The acoustic target such as a human speaker may be located in a room, and the robot, starting from a location outside the room, must navigate to the target. The complexity of this task lies in the fact the robot must navigate along walls and corridors. This forces the robot to navigate in a direction away from the target for extended periods of time. For such environments, a deliberative navigation control architecture is preferred, since it incorporates a global planner. The global planner can compute optimal motion paths as discussed in Sect. 1.1.

Figure 7 shows motion paths learned in two relatively complex environments. The first environment (Fig. 7a) simulates a situation where the target is located within a confined area such as a room. The robot is initially located in another similarly confined area. In this case, the robot is able to successfully navigate towards the target. However, the motion path generated is not very smooth. This is because our purely reactive Braitenberg behaviour constantly forces the robot to steer towards the target, while the path-smoothing behaviour forces the robot to avoid the wall by turning away from it. This antagonistic control results in the slightly zig-zag motion of the robot. The second environment (Fig. 7b) simulates a situation where the neural circuit must resolve the problem of choosing one of two possible directions at a bifurcation point in the motion path. This bifurcation point occurs at the intersection of two walls in a corridor-like environment. In this case, the simulation was manually terminated after 25 iterations because the neural circuit was unable to learn a suitable motion path. The robot spent a significant amount of time steps at the bifurcation point, where turning left to avoid the wall positions the target to its right and turning right to avoid the wall positions the target to its left. The Braitenberg controller steers the robot, respectively, to the right and to the left. This results in numerous overlapping figure-of-eight motion path loops at the bifurcation point. The robot eventually exits in these loops due to sensor noise. Since the obstacle avoidance behaviour is triggered constantly at the bifurcation point, the parameters \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) are also constantly updated. The constant updates shift the activation functions significantly to the right, decreasing the wheel velocities to an extent that forces extremely wide turns. Such wide turns eventually generate motion paths that do not approach any obstacle, and thus no further learning takes place. This implies that the robot may never reach the target.

Fig. 7
figure7

Motion paths learned in complex environments. a The robot is able to successfully navigate from one room to an acoustic target located in another room. b The robot fails to resolve the direction in which to turn at a bifurcation point at the intersection of two walls in a corridor-like environment

In some cases, sound emitted by the acoustic target may be completely blocked by sufficiently large obstacles in the environment where the robot is operating. This scenario is simulated in Fig. 8 in which there are random and unknown locations in the environment where there is no detectable auditory signal available. Inside the dead zones, sound information is unavailable for a randomly selected number of simulation time steps within the range [1, 10]. When the robot enters such dead zones, it uses the last detected auditory signal, i.e. detected just before entering the dead zone, as input to the lizard auditory model for as long as sound information is unavailable. As evident, the proposed architecture is able to learn a motion path when encountering dead zones where no sound information is available.

Fig. 8
figure8

Motion paths learned when the acoustic signal is completely occluded for a randomly chosen number of time steps at randomly chosen locations in the environment. The robot learns to successfully navigate from one room to an acoustic target located in another room

Analysis

In this section, we empirically analyse the navigation performance of the neural circuit from various perspectives.

Navigation with static sensorimotor couplings and learned path-smoothing

The parameters \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) directly influence the motor velocities \(v_\mathrm{l}\) and \(v_\mathrm{r}\). Therefore, \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) influence the motion path of the robot towards the target. For example, when \(\beta _{\mathrm {l}}>\beta _{\mathrm {r}}\), the left sensorimotor coupling is weaker than the right. This implies that the right auditory input \(i_{\mathrm {C}}\) has lesser influence on the motor velocity of the left wheel as compared to the influence of the left auditory input \(i_{\mathrm {I}}\) on the motor velocity of the right wheel. For a target to the left of the robot, this forces the robot to generate curvilinear motion paths that approach the target from its right. In our approach, we regard the sensorimotor couplings as interneurons exhibiting intrinsic plasticity. We then learn the sensorimotor coupling parameters \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) independently. It is not strictly necessary to learn these two parameters to achieve phonotaxis behaviour since the Braitenberg cross-coupling architecture enforces such behaviour. However, the sensorimotor couplings have to be properly tuned to achieve successful phonotaxis behaviour where straight motion paths are generated. As an alternative to tuning the parameters via online learning, one could also hand-tune the parameters. However, without prior knowledge of the placement of the obstacles in the environment, hand-tuning the parameters is a non-trivial task. There exist multiple combinations of \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) that can generate smooth motion paths while avoiding obstacles. Determining \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) beforehand would require searching through a large parameter space. One could reduce the parameter space by assuming \(\beta _{\mathrm {l}} = \beta _{\mathrm {r}} = \beta \) and choosing a suitable value for \(\beta \). However, this does not necessarily generate optimal motion paths as exemplified in Fig. 9. The smoothness of the motion paths is not dependent on \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\). Therefore, smoothness is not influenced by whether these parameters are hand-tuned or learned online. Motion paths learned in the other four environments are illustrated in Online Resource 23, titled “ESM_23.eps”.

Fig. 9
figure9

Acoustic navigation for hand-tuned sensorimotor couplings but learning smooth obstacle avoidance. The motion paths generated during learning are coloured as a gradient starting from grey for the first iteration to blue for the last iteration (colour figure online)

Learning a common sensorimotor coupling

In the previous section, we asserted that manually choosing parameters \(\beta _{\mathrm {l}} = \beta _{\mathrm {r}} = \beta \) does not necessarily generate successful phonotaxis behaviour or optimal motion paths. Here, we investigate the impact of learning \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) jointly as the common parameter \(\beta \) on the navigation performance. This essentially renders the sensorimotor couplings identical to one another. While this steering strategy will still generate successful phonotaxis behaviour, it favours straight trajectories. Initially, when the robot is pointing away from the target, the left auditory input is greater than the right. This steers the robot towards the target. Once the left and right auditory inputs are similar, the motor velocities are similar as well, steering the robot in a straight line towards the target. If the distribution of obstacles between the robot and target allows the robot to turn both left and right, the generated motion paths do not vary much from those generated when learning \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) independently. This is evident in Fig. 10a as well as in a and b in Online Resource 24, titled “ESM_24.eps” . However, when the distribution of obstacles between the robot and target allows the robot to only turn either left or only right, the generated motion paths vary significantly from those generated when learning \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) independently. In fact, in this case the motion paths learned can be either relatively sub-optimal or cannot smoothly avoid obstacles. This is evident in Fig. 10b and in c in Online Resource 24.

Fig. 10
figure10

Motion paths generated when learning a common sensorimotor coupling (red) versus learning the sensorimotor couplings separately (green). a There is no significant change in motion paths. b The learned motion path is sub-optimal as compared to that learned when \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) are learned separately. The motion paths generated during learning are coloured as a gradient starting from grey for the first iteration to blue for the last iteration (colour figure online)

As a general example, we compare the motion paths generated when using the common parameter \(\beta _{\mathrm {l}} = \beta _{\mathrm {r}} = \beta \) to those generated when using different values for \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\). As evident from Fig. 11, using a common parameter \(\beta \) for the sensorimotor couplings does not necessarily generate optimal motion paths and successful phonotaxis for all values of \(\beta \). This limitation is overcome when using different values for \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\). Therefore, it makes sense to learn \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) separately.

Fig. 11
figure11

Effect of using joint versus separate sensorimotor couplings on motion paths. The values chosen for \(\beta \), \(\beta _{\mathrm {l}}\) and \(\beta _{\mathrm {r}}\) are manually chosen and are assumed to be learned

Influence of homeostatic plasticity parameter k

We investigate the impact of the free parameter k on the performance of the neural circuit. This parameter determines the how quickly the transfer function curves of the interneurons shift leftwards, strengthening the sensorimotor couplings, which increases the wheel velocities. Smaller values of k imply stronger couplings and greater wheel velocities, while greater values of k imply weaker couplings and smaller wheel velocities. Since k is common to both the interneurons, the wheel velocities are increased by equal amounts. This equal increase in wheel velocities implies that the learning should be independent of k as evident in Fig. 12b. The speed of the robot increases with decreasing k, thus reducing the time taken to reach the target as evident in Fig. 12a.

Fig. 12
figure12

Effect of varying the free parameter k on learning and navigation. a The time taken to reach the target increases with k. b There is no significant effect of varying k on the number of iterations needed to learn a smooth motion path to the target

The learned motion path depends on the ratio of the velocities of the left and the right wheels. Since the free parameter k does not influence this velocity ratio, the motion paths should be independent of k as evident in the example shown in Fig. 13. This is also observed for the other four environments.

Fig. 13
figure13

Effect of varying the free parameter k on motion paths. k is varied from 10,000 (a) to 60,000 (f) in steps of 10,000

Conclusion

We have presented an adaptive neural circuit for reactive acoustic navigation with obstacle avoidance in complex environments. A simulated mobile robot agent embodying the neural circuit successfully navigated towards an acoustic target located 3 m away, emitting a continuous 2.2 kHz tone while avoiding randomly placed obstacles of random dimensions in the environment. The circuit utilises simple Braitenberg acoustomotor cross-couplings that map sound direction information, extracted via a model of the lizard peripheral auditory system, to wheel velocities for phonotaxis behaviour. The parameters of these cross-couplings were learned through interaction with obstacles during the obstacle avoidance behaviour.

The proposed approach extends our previously reported approach. Although our previous neural circuit was able to learn stable and consistent motion paths in simple environments, they were not smooth when the robotic agent was placed in complex cluttered environments. This is because the conflicting motion paths enforced by the phonotaxis behaviour and the obstacle avoidance behaviour could not be resolved. We have extended our previous neural circuit to include a path-smoothing mechanism using heterosynaptic plasticity that resolves this conflict by learning to pre-emptively modulate the wheel velocities when approaching an obstacle by learning a secondary velocity component. We have recreated the acoustic navigation task in five different randomly generated environments.

We have demonstrated that the proposed extension allowed the robotic agent to circumvent the obstacle avoidance behaviour completely by progressively learning to turn away earlier from obstacles. This simple extension allows the mobile robot to learn smooth motion paths as compared to two control conditions—(a) when all learning is disabled and parameters for the behaviours are fine-tuned manually and (b) when only the path-smoothing behaviour is disabled but parameters of the Braitenberg sensorimotor cross-couplings are learned during navigation. We have also compared the motion paths in all cases in terms of the total path length. The path lengths are relatively shorter in all five environments as compared to the lengths of motion paths generated in the two control conditions. We also evaluated the performance of the proposed approach in more complex indoors-like environments. Finally, we analysed the performance of the neural circuit from various perspectives. We aim to validate the adaptive neural circuit on a mobile robot in real-world conditions as the next immediate step. In the next few paragraphs, we highlight the advantages of our approach as opposed to other online approaches and discuss improvements to be considered in future work.

Advantages over other online approaches

Purely reactive approaches such as ours are inherently inferior to deliberative, online path-planning approaches. The latter typically try to maximise information metrics such as entropy, Fisher’s information or minimise metrics such as distance covered to find an optimal trajectory. These metrics are often task specific and are chosen based on detailed knowledge of the task as well as the environment. Deliberative approaches typically use some combination of detailed maps of the environment, mathematical models of physical phenomena such as wheel slip and models of sensor noise to achieve the maximisation or minimisation of the chosen metric. These approaches deal with dynamic changes in the environment such as sudden appearance of obstacles via simple reactive strategies at lower levels of motor control. Our approach has the advantage of not requiring any metric that needs to be determined. Furthermore, it also does not require any a priori knowledge such as detailed environment maps and mathematical models of wheel slip and sensor noise. Due to these relaxed requirements, the computational burden of our approach is relatively minimal when executed online.

Future work

The adaptive neural circuit is able to learn a smooth motion path, in a given environment with a given placement of obstacles, and later recreate that path in the same environment. However, it would be interesting to see how well the neural circuit is able to generalise its learned performance in an environment in which it has not been trained. The parameters of the sensorimotor couplings as well as the secondary velocity component are learned only during interaction with obstacles. This implies that the placement and dimensions of the obstacles play a significant role in determining the final values of the learned parameters. This suggests that the neural circuit may not generalise well to environments other than those in which it has been trained. This hypothesis will be tested, and any necessary improvements will be proposed in future work. One possibility is to apply actor-critic reinforcement learning for generalisation [9, 25].

Furthermore, as mentioned earlier, smoothness is considered here to be purely kinematic, i.e. described only in terms of the lack of sharp angular turns. The wheel velocities computed by the phonotaxis behaviour are directly and often heavily modified by the learned secondary velocity component. These sudden and rapid changes in wheel velocities result in jerky movements. One way to minimise jerky movements could be to learn an appropriate radius of the “initiation” threshold as well via ICO learning, as opposed to keeping it constant. This could allow the robot to initiate a turn at an appropriate distance from an approaching obstacle such that the need to significantly modify wheel velocities is minimised. This could potentially allow for dynamic smoothness, where the wheel velocities of the mobile robot remain constant or vary smoothly while generating smooth bends around obstacles—a significant and desirable improvement.

Notes

  1. 1.

    Modified from https://en.wikipedia.org/wiki/Nonsynaptic_plasticity#/media/File:Neurons_big1.jpg.

References

  1. 1.

    Alves S, Rosario J, Ferasoli Filho H, Rincon L, Yamasaki R (2011) Conceptual bases of robot navigation modeling, control and applications. In: Barrera A (ed) Advances in robot navigation. InTech, London. https://doi.org/10.5772/20955

    Google Scholar 

  2. 2.

    Andersson S, Shah V, Handzel A, Krishnaprasad P (2004) Robot phonotaxis with dynamic sound source localization. In: Proceedings of IEEE international conference on robotics and automation, 2004. ICRA ’04, vol 5, pp 4833–4838. https://doi.org/10.1109/ROBOT.2004.1302483

  3. 3.

    Arkin R (1998) Behavior-based robotics. MIT Press, Cambridge

    Google Scholar 

  4. 4.

    Bicho E, Mallet P, Schöner G (2000) Target representation on an autonomous vehicle with low-level sensors. Int J Robot Res 19(5):424–447. https://doi.org/10.1177/02783640022066950

    Article  Google Scholar 

  5. 5.

    Braitenberg V (1984) Vehicles: experiments in synthetic psychology. MIT Press, Cambridge

    Google Scholar 

  6. 6.

    Brooks R (1986) A robust layered control system for a mobile robot. IEEE J Robot Autom 2(1):14–23. https://doi.org/10.1109/JRA.1986.1087032

    Article  Google Scholar 

  7. 7.

    Choi JW, Curry R, Elkaim G (2008) Path planning based on Bézier curve for autonomous ground vehicles. In: Advances in electrical and electronics engineering—IAENG special edition of the world congress on engineering and computer science 2008, pp 158–166 . https://doi.org/10.1109/WCECS.2008.27

  8. 8.

    Christensen-Dalsgaard J, Manley G (2005) Directionality of the lizard ear. J Exp Biol 208(6):1209–1217

    Article  Google Scholar 

  9. 9.

    Dasgupta S, Wörgötter F, Manoonpong P (2014) Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control. Front Neural Circuits 8:126. https://doi.org/10.3389/fncir.2014.00126

    Article  Google Scholar 

  10. 10.

    Dudek G, Jenkin M (2010) Computational principles of mobile robotics, 2nd edn. Cambridge University Press, New York

    Google Scholar 

  11. 11.

    Farin G (2001) Curves and surfaces for CAGD: a practical guide. The Morgan Kaufmann series in computer graphics. Elsevier, Amsterdam

    Google Scholar 

  12. 12.

    Fletcher N, Thwaites S (1979) Physical models for the analysis of acoustical systems in biology. Q Rev Biophys 12(1):25–65

    Article  Google Scholar 

  13. 13.

    Fraichard T, Scheuer A (2004) From Reeds and Shepp’s to continuous-curvature paths. IEEE Trans Robot 20(6):1025–1035. https://doi.org/10.1109/TRO.2004.833789

    Article  Google Scholar 

  14. 14.

    Franz M, Mallot H (2000) Biomimetic robot navigation. Robot Auton Syst 30(1):133–153. https://doi.org/10.1016/S0921-8890(99)00069-X

    Article  Google Scholar 

  15. 15.

    Hebb D (2005) The organization of behavior: a neuropsychological theory. Psychology Press, London

    Google Scholar 

  16. 16.

    Huang J, Supaongprapa T, Terakura I, Wang F, Ohnishi N, Sugie N (1999) A model-based sound localization system and its application to robot navigation. Robot Auton Syst 27(4):199–209. https://doi.org/10.1016/S0921-8890(99)00002-0

    Article  Google Scholar 

  17. 17.

    Hwang BY, Park SH, Han JH, Kim MG, Lee JM (2014) Sound-source tracking and obstacle avoidance system for the mobile robot. Springer, Cham, pp 181–192. https://doi.org/10.1007/978-3-319-05711-8_19

    Google Scholar 

  18. 18.

    Kanayama Y, Hartman B (1997) Smooth local-path planning for autonomous vehicles. Int J Robot Res 16(3):263–284. https://doi.org/10.1177/027836499701600301

    Article  Google Scholar 

  19. 19.

    Klopf A (1988) A neuronal model of classical conditioning. Psychobiology 16(2):85–125

    Google Scholar 

  20. 20.

    Komoriya K, Tanie K (1989) Trajectory design and control of a wheel-type mobile robot using B-spline curve. In: Proceedings of IEEE/RSJ international workshop on intelligent robots and systems ’89. The autonomous mobile robots and its applications. IROS ’89, pp 398–405. https://doi.org/10.1109/IROS.1989.637937

  21. 21.

    Kosko B (1986) Differential Hebbian learning. AIP Conf Proc 151(1):277–282

    Article  Google Scholar 

  22. 22.

    Lamiraux F, Lammond JP (2001) Smooth motion planning for car-like vehicles. IEEE Trans Robot Autom 17(4):498–501. https://doi.org/10.1109/70.954762

    Article  Google Scholar 

  23. 23.

    LaValle S (2006) Planning algorithms. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511546877

    Google Scholar 

  24. 24.

    Magid E, Keren D, Rivlin E, Yavneh I (2006) Spline-based robot navigation. In: 2006 IEEE/RSJ international conference on intelligent robots and systems, pp 2296–2301. https://doi.org/10.1109/IROS.2006.282635

  25. 25.

    Manoonpong P, Kolodziejski C, Wörgötter F, Morimoto J (2013) Combining correlation-based and reward-based learning in neural control for policy improvement. Adv Complex Syst 16(02n03):1350,015. https://doi.org/10.1142/S021952591350015X

    MathSciNet  Article  Google Scholar 

  26. 26.

    Manoonpong P, Wörgötter F (2009) Neural information processing. In: Proceedings of the 16th international conference, ICONIP 2009, part II, chap. adaptive sensor-driven neural control for learning in walking machines, Bangkok, Thailand, 1–5 December 2009. Springer, Berlin, pp 47–55

  27. 27.

    Nakhaeinia D, Tang S, Noor S, Motlagh O (2011) A review of control architectures for autonomous navigation of mobile robots. Int J Phys Sci 6(2):169–174

    Google Scholar 

  28. 28.

    Ön S, Yazici A (2011) A comparative study of smooth path planning for a mobile robot considering kinematic constraints. In: 2011 international symposium on innovations in intelligent systems and applications, pp 565–569. https://doi.org/10.1109/INISTA.2011.5946138

  29. 29.

    Porr B, Wörgötter F (2006) Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only. Neural Comput 18(6):1380–1412

    Article  MATH  Google Scholar 

  30. 30.

    Porr B, Wörgötter F (2007) Fast heterosynaptic learning in a robot food retrieval task inspired by the limbic system. Biosystems 89(1–3):294–299 (Selected Papers presented at the 6th International Workshop on Neural Coding)

    Article  Google Scholar 

  31. 31.

    Purves D, Augustine G, Fitzpatrick D, Hall W, LaMantia A, White L (2012) Synaptic plasticity. Neuroscience, 5th edn. Sinauer Associates, Sunderland, pp 163–182

    Google Scholar 

  32. 32.

    Ravankar A, Ravankar A, Kobayashi Y, Emaru T (2016) Path smoothing extension for various robot path planners. In: 2016 16th international conference on control, automation and systems (ICCAS), pp 263–268. https://doi.org/10.1109/ICCAS.2016.7832330

  33. 33.

    Shaikh D, Hallam J, Christensen-Dalsgaard J (2010) Modifying directionality through auditory system scaling in a robotic lizard. Springer, Berlin, pp 82–92. https://doi.org/10.1007/978-3-642-15193-4_8

  34. 34.

    Shaikh D, Hallam J, Christensen-Dalsgaard J (2016) From ear to there: a review of biorobotic models of auditory processing in lizards. Biol Cybern 110(4):303–317. https://doi.org/10.1007/s00422-016-0701-y

    Article  MATH  Google Scholar 

  35. 35.

    Shaikh D, Manoonpong P (2017) A neural circuit for acoustic navigation combining heterosynaptic and non-synaptic plasticity that learns stable trajectories. Springer, Cham, pp 544–555. https://doi.org/10.1007/978-3-319-65172-9_46

    Google Scholar 

  36. 36.

    Takahashi A, Hongo T, Ninomiya Y, Sugimoto G (1989) Local path planning and motion control for AGV in positioning. In: Proceedings of IEEE/RSJ international workshop on intelligent robots and systems ’89. IROS ’89. The autonomous mobile robots and its applications, pp 392–397. https://doi.org/10.1109/IROS.1989.637936

  37. 37.

    Wever E (1978) The reptile ear: its structure and function. Princeton University Press, Princeton

    Google Scholar 

  38. 38.

    Zeno P, Patel S, Sobh T (2016) Review of neurobiologically based mobile robot navigation system research performed since 2000. J Robot. https://doi.org/10.1155/2016/8637251

  39. 39.

    Zhang W, Linden D (2003) The other side of the engram: experience-driven changes in neuronal intrinsic excitability. Nat Rev Neurosci 4(11):885–900. https://doi.org/10.1038/nrn1248

    Article  Google Scholar 

  40. 40.

    Zu L, Yang P, Zhang Y, Chen L, Sun H (2009) Study on navigation system of mobile robot based on auditory localization. In: 2009 IEEE international conference on robotics and biomimetics (ROBIO), pp 321–326. https://doi.org/10.1109/ROBIO.2009.5420665

  41. 41.

    Zuojun L, Guangyao L, Peng Y, Feng L, Chu C (2012) Behavior based rescue robot audio navigation and obstacle avoidance. In: Proceedings of the 31st Chinese control conference, pp 4847–4851

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Danish Shaikh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This research was supported with a Grant for the SMOOTH Project (Project Number 6158-00009B) by Innovation Fund Denmark.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shaikh, D., Manoonpong, P. A neuroplasticity-inspired neural circuit for acoustic navigation with obstacle avoidance that learns smooth motion paths. Neural Comput & Applic 31, 1765–1781 (2019). https://doi.org/10.1007/s00521-018-3845-y

Download citation

Keywords

  • Smooth path-planning
  • Reactive navigation
  • Behaviour-based robotics
  • Phonotaxis
  • Lizard peripheral auditory system
  • Neuroplasticity
  • Correlation-based learning