Sailboat navigation control system based on spiking neural networks

Giraldo, Nelson Santiago; Isaza, Sebastián; Velásquez, Ricardo Andrés

doi:10.1007/s11768-023-00150-1

Sailboat navigation control system based on spiking neural networks

Research Article
Open access
Published: 29 August 2023

Volume 21, pages 489–504, (2023)
Cite this article

Download PDF

You have full access to this open access article

Control Theory and Technology Aims and scope Submit manuscript

Sailboat navigation control system based on spiking neural networks

Download PDF

Nelson Santiago Giraldo¹,
Sebastián Isaza¹ &
Ricardo Andrés Velásquez¹

1176 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we presented the development of a navigation control system for a sailboat based on spiking neural networks (SNN). Our inspiration for this choice of network lies in their potential to achieve fast and low-energy computing on specialized hardware. To train our system, we use the modulated spike time-dependent plasticity reinforcement learning rule and a simulation environment based on the BindsNET library and USVSim simulator. Our objective was to develop a spiking neural network-based control systems that can learn policies allowing sailboats to navigate between two points by following a straight line or performing tacking and gybing strategies, depending on the sailing scenario conditions. We presented the mathematical definition of the problem, the operation scheme of the simulation environment, the spiking neural network controllers, and the control strategy used. As a result, we obtained 425 SNN-based controllers that completed the proposed navigation task, indicating that the simulation environment and the implemented control strategy work effectively. Finally, we compare the behavior of our best controller with other algorithms and present some possible strategies to improve its performance.

Autonomous Learning Paradigm for Spiking Neural Networks

Task-Independent Spiking Central Pattern Generator: A Learning-Based Approach

Article Open access 13 March 2020

A Spiking Neural Network Based Autonomous Reinforcement Learning Model and Its Application in Decision Making

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Research on autonomous navigation systems (ANS) for unmanned vehicles has become a popular topic, particularly in relation to ANS for sailboats due to their primary source of propulsion being wind—a free, abundant, and eco-friendly resource. Sailboats have shown great potential for long-term navigation and marine monitoring applications where they cannot touch land for extended periods, making the energy efficiency of their different systems essential. However, designing an ANS for sailboats is challenging due to complex sailboat dynamics and the variability of wind and waves [1, 2]. Several authors have suggested controllers for ANS that require deep knowledge of sailboat dynamics. Abrougui et al. [1] designed an automatic control system to control heading and sail opening based on sliding mode control. Melin et al. [3] designed a sailing control system for small-scale sailboats, using the field potential control strategy as inspiration. However, acquiring a comprehensive knowledge of dynamic sailboat parameters is complex [2]. Therefore, some works proposed control strategies from perspectives that do not require dynamic models. Viel et al. [4] proposed a position-keeping controller using geometric laws. Junior et al. [5] used the Q-Learning reinforcement learning algorithm to solve the path planning problem. Cheng et al. [6] combined a coarse-to-fine strategy and a Q-Learning algorithm for an obstacle avoidance controller. Our work belongs to this category of controllers.

Spiking neural networks (SNNs) have been widely used in neuroscience and more recently, in robotics. Unlike artificial neural networks, SNNs communicate using short electrical pulses distributed over time, known as action potentials or spikes, making their behavior similar to that of biological neurons [7, 8]. SNNs are considered a promising solution for various control challenges in robotics since they realistically mimic the underlying mechanisms of the brain, while saving energy and sometimes allowing for simple hardware implementation [7, 9]. Recently, research groups and semiconductor sellers have developed specialized neuromorphic hardware, such as Loihi, SpiNNaker, and TrueNorth, to efficiently run SNNs [10]. These platforms allow for large SNNs to run with minimal response latency and power consumption, making SNNs an AI technique with a potential in applications where energy and latency are limiting, such as sailboat control tasks [11]. Furthermore, the use of SNNs presents an excellent opportunity to move towards a greener artificial intelligence paradigm [10].

Several works in robotics have applied SNN-based controllers to various control tasks. In mobile robotics, Chao et al. [12] used a biological-based recurrent SNN with a leaky integrate and fire (LIF) neuron model [8], spike-time-dependent plasticity (STDP) learning rule [13], and rate coding to solve the path planning problem for a drone. Bing et al. [14] used a 32x2 feed-forward SNN with LIF neuron model, Reinforcement STDP (RSTDP) learning rule [15, 16], and rate coding to control a two-wheeled vehicle in a lane-keeping application. Feng et al. [17] used a feed-forward SNN with LIF neuron model, STDP learning rule, and population coding [18] to implement a pain mechanism for the humanoid robot Nao, to solve two tasks: the alerting actual injury task and the preventing potential injury task. In these works, the authors demonstrated that SNNs offer a promising solution for controlling robots with high biological plausibility and good performance. However, due to their complex construction and optimization, SNNs can be challenging to use in a given robotic application. Therefore, SNNs have not yet been extended to many potential applications. It is essential to highlight that there is still no unified framework for the design of SNNs [19]. For each application, it is possible to choose different topologies, neural models, learning rules, and coding methods. To the best of our knowledge, no work has addressed the topic of navigation control systems for sailboats using SNNs. In this context, our work is novel in that we applied SNNs to a task in which they had not been previously used, using a reinforcement learning rule. This approach allowed us to train SNNs without knowing the dynamic sailboat parameters and without the need for a sailing database.

In this study, our objective was to devise a control system for sailboats using SNNs and conduct simulations to evaluate its effectiveness. To achieve this, we introduced a design methodology and utilized it to construct various SNN-based ANS. After training and testing these systems, we compared the most effective one with the Viel [2] and USVSim [20] algorithms. We discovered that our control system is operational and improves the deviation error of the USVSim algorithm, but further refinement is necessary to match more advanced algorithms like Viel. The primary contribution of this study is our design methodology, the application of SNNs in sailboat control, and the obtained results, which provide a foundation for future research in this area.

The paper is structured as follows: Sect. 2 details the methodology utilized to implement the system. In Sect. 3, we provide a description of the sailing problem. Section 4 discusses the simulation environment. In Sects. 5 and 6, we present the architecture of the SNN and the SNN-based control strategy, respectively. Sections 7 and 8 showcase the experimental setup and simulation results. Finally, in Sect. 9, we discuss our conclusions and future research.

2 Methodology

In this paper, we introduce an SNN-based ANS for sailboats, along with the simulation environment used for training and testing. Our work comprises the following steps:

1.
We developed a simulation environment by integrating the USVSim simulator [20] with our proposed control environment.
2.
We defined the SNN architecture, control strategy, and training methodology.
3.
We established the training and testing scenarios and explored the design space of various hyper-parameters related to the SNN architecture and control strategy.
4.
We trained multiple SNN-based ANS controllers and evaluated their performance in terms of deviation error, total sailing time and total input neurons.

The initial stage of this project involved creating a simulation environment. To achieve this, we made some modifications to certain files in the USVSim simulator [20] and integrated it with a control environment that we developed using the BindsNET library [21]. A more comprehensive explanation of the simulation environment is presented in Sect. 4.

Table 1 External variables to the control system

Full size table

After ensuring the simulation environment was operational, we proceeded to define the SNN-based controllers required to implement the ANS using the available actuators in the sailboat: the sails and rudder. This involved defining the SNN’s architecture, learning method, and designing the control strategy. We specified various SNN characteristics, including the neuron model, topology, input encoding, and output decoding. In addition, we employed the MSTDP learning rule [16] to train the SNN controllers. Finally, we established reward functions for each SNN, based on the desired maneuvers for the sailboat. A detailed explanation of the SNN’s architecture is provided in Sect. 5.

While designing the SNN-based controllers, we discovered several hyperparameters that influenced the behavior of the ANS controller. Therefore, we explored the design space of these parameters to identify a set of controllers that minimized both the deviation error and the total sailing time. A more detailed explanation of the control strategy is provided in Sect. 6.

As a last step, we created training and testing scenarios for the SNN-based ANS and used them to carry out the design space exploration. For each design point, we trained and tested each pair of controllers, varying the hyper-parameters to obtain different performances. We eliminated design points where the controllers did not complete the training or testing sequence within a specific time frame. Next, we evaluated the performance of the remaining controllers by identifying the set of Pareto optimal controllers. Finally, we chose our best controller and compared them with other sailboat control algorithms. We conducted these experiments on a workstation using Docker v4.3.2 [22], with multiple containers running instances of the simulation environment. A more detailed explanation of the experiments is provided in Sect. 7.

3 Problem description

An autonomous navigation system (ANS) presents a control challenge where a vehicle must perform tasks like following a route, detecting or avoiding obstacles. For the purpose of this work, we limit our focus to the first task: following a route. A route in our study comprises a set of coordinates that the sailboat must reach sequentially. To solve the proposed ANS problem, two critical elements of sailing must be controlled: the rudder, which alters the sailboat’s heading, and the sails, which harness energy from the wind to propel the sailboat. To achieve this, we implemented two SNN-based controllers - one to control the rudder and the other to control the sails. Table 1 shows the input variables (setpoint), sensed variables (feedback), and position orders (control actions) used in our control system.

Besides the variables listed in Table 1, it is crucial to establish the values of $\theta $ and $ \vert \Delta {\varvec{r}} \vert $. These quantities represent the desired heading and the distance between the sailboat and the target point, respectively. We can express these values in terms of the variables given in Table 1, as shown in Eqs. (1) and (2).

$$\begin{aligned}&\vert \Delta {\varvec{r}} \vert = \sqrt{(x-x_{1})^{2}+(y-y_{1})^{2}}, \end{aligned}$$

(1)

$$\begin{aligned}&\theta = {{\,\textrm{atan2}\,}}(y_-y_{1},x-x_{1}),\ -\pi \le \theta < \pi . \end{aligned}$$

(2)

With these variables, we can describe the problem of autonomous navigation mathematically. The aim is to move a sailboat located at $(x_{1},y_{1})$ to a position (x, y) using a global true wind $\varvec{\tau }$ and a specific control simulation time t. To accomplish this, the sailboat’s heading must approach the desired heading (ideally $\phi = \theta $) or perform the tacking or gybing maneuvers by executing actions $\alpha _{1}$ and $\alpha _{2}$ on the rudder and sails, respectively. We assume that the sailboat has reached the target if $ \vert \Delta {\varvec{r}} \vert \le k_{r}$, where $k_{r}$ is a constant parameter. Figure 1 depicts a sailboat with all the aforementioned variables.

3.1 Sailing maneuvers

Depending on the true wind direction and the target point’s position, the sailboat may face six primary sailing scenarios, as depicted in Fig. 2. Our aim was to train the SNN-based controllers to enable the sailboat to move in any direction, and we used these scenarios to define the training and testing scenarios.

To train the SNN-based ANS, we relied on conventional sailing strategies rather than proposing novel strategies. As shown in Fig. 2, these sailing strategies can be categorized into two groups: if the sailboat’s heading towards the target point is in the upwind or downwind zones, the sailboat will pursue a straight trajectory to the target. If the sailboat’s heading towards the target point is in the no-go zones, it will perform tacking and gybing maneuvers to reach the target, because a straight trajectory is unfeasible [23].

3.2 True and apparent wind

Understanding the concepts of true wind and apparent wind is fundamental in sailing. The relationship between true wind $\varvec{\tau }$, which is the wind perceived by a stationary observer, the apparent wind ${\varvec{a}}$, which is the wind perceived by an observer inside the sailboat [24], and the sailboat speed ${\varvec{v}}$ is presented in Eq. (3).

$$\begin{aligned} {\varvec{a}} = \varvec{\tau } - {\varvec{v}}. \end{aligned}$$

(3)

Using Eq. (3) and applying trigonometric and vector laws, we can derive Eqs. (4) and (5) to calculate the apparent wind speed a and direction $\gamma _{a}$ over the sailboat.

$$\begin{aligned}&a = \sqrt{\tau ^{2}+v^{2}-2\tau v \cos (\phi - \gamma _{\tau })}, \end{aligned}$$

(4)

$$\begin{aligned}&\gamma _{a} = \arccos \Big (\dfrac{\tau \cos (\gamma _{\tau })-v\cos (\phi )}{a}\Big ). \end{aligned}$$

(5)

3.3 Reinforcement learning

Reinforcement learning is an artificial intelligence technique that differs from supervised and unsupervised learning as it aims to learn what actions to take based on a numerical reward signal. To develop and understand our control strategy, we defined some reinforcement learning concepts, which are drawn from [25]:

Agent The agent represents the actuator controller in terms of control theory. It is the learner and decision-maker. We define two different agents in this paper: the rudder controller and the sails controller.
Environment Everything external to the agent can interact with it.
Action The action represents the control signal in terms of control theory. It is the chosen decision by the agent for a given environment state. In this paper, $\alpha _{1}$ represents the rudder control action, and $\alpha _{2}$ represents the main and jib sails control action (with both sails use the same control action).
Environment state The environment state represents an environment feedback signal in terms of control theory. It is an indicator that provides information about the environment at a given time. In this paper, $\varTheta _{1}$ represents the rudder environment state, and $\varTheta _{2}$ represents the sails environment state.
Policies A policy generates actions based on the perceived environment states. It defines the way the agent behaves at a given time. In this paper, the policies are the set of all synaptic weights of the SNN-based controllers.
Reward The reward is a numeric value that aims to rate how good or bad the agent’s actions are within the context of the problem to be solved. We have denoted $R_{1}$ and $R_{2}$ as rewards for the rudder and sail controllers, respectively.

4 Simulation environment

The simulation environment serves as the software infrastructure for training and testing the SNN controllers within the context of an ANS for a sailboat, enabling us to train and run SNNs while also modeling the sailboat and environmental forces acting on it.

For this purpose, we opted for USVSim, an open-source simulator for unmanned surface vehicles (USVs) developed by Paravisi et al. [20]. USVSim employs Python 2.7, ROS Kinetic, and Gazebo 7.0. Among the sailboat simulators available, USVSim was selected for its highly detailed physical simulation, including the modeling of environmental disturbances such as winds, water currents, and waves. We customized the default sailboat model provided by USVSim to resemble the physical sailboat we have for future real-world implementation. A list of the modifications is presented below.

We added a second sail for the sailboat.
We changed the sailboat’s hull dimensions and mass.
We changed the sailboat’s rudder dimensions and mass.
We changed the sailboat’s sails dimensions and mass.
We changed the sailboat’s environment.
We changed the USVSim launch characteristics.

On the other hand, we used Python 3 and the BindsNET library [21] to implement our controller environment. BindsNET is a Python 3 library used to simulate SNNs on CPUs or GPUs using PyTorch Tensor functionality. We chose BindsNET for its high-level abstraction, which enables us to describe the behavior of SNNs directly. Below is a list of the tasks performed within our controller environment.

Make SNN-based controllers with BindsNET library.
Execute the control system presented in Sects. 5 and 6.
Generate the target points of the training and testing scenarios.
Execute and save relevant information from the different experiments.

We had to isolate USVSim and our controller environment due to the incompatibility between the Python versions they use. To establish communication between them, we developed a communication link via Socat [26]. Finally, we loaded the input data through a configuration file, which contains necessary information to configure our SNN-based ANS, such as control hyper-parameters and SNN topology.

The simulation environment operates as follows: Input data is loaded, the controller environment is configured, and Socat communication is established. At each simulation time step, data arrives from USVSim, and a controller environment step is executed, which can be a training or inference step. This step involves encoding the sensed variables (Sects. 5.2 and 6), calculating training rewards (Sects. 6.1.3 and 6.2.3), training (or inferring) the SNNs with the encoded variables, decoding the control actions at the SNNs output neuron (Sects. 5.4 and 6), and sending them back to USVSim. Figure 3 presents the block diagram of our simulation environment.

The developed simulation environment, and the modified USVSim files are available in the following repository https://github.com/nsantiagogiraldo/Sailboat_simulator.

5 SNN-based controllers

We developed two SNN-based controllers, one for the rudder and another for the sails, as described in Sect. 3. Both SNNs were built using the same approach, which is detailed in this section.

5.1 Neuron model

The neuroscience community has proposed various neuron models for SNNs with different trade-offs between biological plausibility and computational complexity. We chose the leaky integrate and fire (LIF) model [8] due to its simplicity and previous use in other robotics applications [14, 27, 28]. Both SNNs in our study used the LIF model with the default parameters set by BindsNET.

In the LIF neuron model, the axon membrane is represented by an electrical circuit comprising a capacitor C in parallel with a resistor R, which models the cell membrane’s capacitance and leakage resistance. An input current $I_{\textrm{ext}}$, which is the sum of $I_{C}$ (current through the cell membrane) and $I_{R}$ (ion diffusion leakage current) components, is applied to the circuit [8]. This behavior is described by Equation (6).

$$\begin{aligned} I_{\textrm{ext}}=C\dfrac{\textrm{d}V(t)}{\textrm{d}t}+\dfrac{V(t)}{R}. \end{aligned}$$

(6)

In this model, the action potential form is not explicitly described. Instead, spikes are formal events characterized by a “firing time” $t^{f}$. The firing time $t^{f}$ is determined by a threshold criterion as shown in Eq. (7), and immediately after $t^{f}$, the potential resets to a value $V_\textrm{rest}$ less than the threshold potential $\vartheta $ [8], as shown in Eq. (8).

$$\begin{aligned}&t^{f}: V(t^{f})=\vartheta , \end{aligned}$$

(7)

$$\begin{aligned}&\lim _{t\rightarrow t^{f}; t\ge t^{f}} V(t)=V_{\textrm{rest}}: V_{\textrm{rest}} < \vartheta . \end{aligned}$$

(8)

5.2 Encoding technique

We used an encoding technique to transform the input data into spike trains that can be processed by the SNN. Specifically, we transformed the values of the environment variables $\varTheta _{1}$ and $\varTheta _{2}$ into spike trains using the state encoding approach proposed by Fremaux et al. [29] and Mahadevuni et al. [27]. This coding scheme is a form of one-hot coding [30], where only one “hot” set of spiking neurons is excited at any given time. We describe the encoding scheme mathematically in general terms, considering that variables with subscript $i=1$ belong to the rudder, and with $i=2$ belong to the sails.

Let us assume that our state variable $\varTheta _{i}$ (Sect. 3.3) has a finite number of possible values and can only be in one value at a given time. We define the ascending ordered set $S_{i}$ and its index $n_{i} \in {\mathbb {Z}}^{+}$ (starting from zero), which contain all the possible values of the variable $\varTheta _{i}$. To each state value, we associated a set of two input spiking neurons and use the $n_{i}$ value to decide which pair of neurons are excited with a spike train. For instance, if the rudder SNN has four input neurons, $n_{1}$ can take the values 0 and 1. If $n_{1}=0$, neurons 0 and 1 are excited, and if $n_{1}=1$, neurons 2 and 3 are excited. Thus, at any time, only two input neurons are activated. To excite a neuron, we generated a train of Poisson spikes at a rate of 240 Hz in a time window of 500 ms. A Poisson spike train is a set of spikes distributed in time, whose firing time is calculated by the Poisson probability distribution [10]. In this paper, $\varTheta _{i}$ provides information about the sailboat’s current state and depends on the sensed variables. We explained how to use these concepts in our study problem in Sects. 6.1.1 and 6.2.1.

5.3 SNN topology

Figure 4 depicts the architecture of the SNNs, which consist of two fully connected feed-forward layers. The input layer of each SNN is composed of $2 \vert S_{1} \vert $ and $2 \vert S_{2} \vert $ neurons, corresponding to the rudder and sails, respectively. The output layer comprises a single neuron that generates the control action to be executed by the agent.

5.4 Decoding technique

To use the SNN’s output as a control action, we need to decode the spike train into a scalar. We adopted a rate-coding approach [9] for this purpose. Kaiser et al. [28] proposed a decoding method based on the output spike rate O of a neuron and the maximum spike rate of the same neuron $O_{M}$. They used the ratio of O to $O_{M}$ to obtain a number between 0 and 1, as shown in Eq. (9).

$$\begin{aligned} c = \dfrac{O}{O_{M}}, ~~0 \le O \le O_{M}. \end{aligned}$$

(9)

as explained in Sect. 5.2, for any given environment state, only one set of two neurons is fired at a time for each SNN. With this in mind, the value of $O_{M}$ is calculated as follows:

Create an SNN with the topology described in Sect. 5.3 and the maximum default weights defined by BindsNET.
Feed a set of two input neurons with spikes.
Count the number of output spikes, which is $O_{M}$.
Randomize the SNN’s weights and start training.

The $O_{M}$ calculation was performed only once before training since it is a constant value in both training and inference stages. We explained how to convert the number c into the control actions $\alpha _{1}$ and $\alpha _{2}$ in Sects. 6.1.2 and 6.2.2.

5.5 SNN learning

The selected learning rule for training SNN-based controllers was Dopamine modulated spike time-dependent plasticity (MSTDP), as presented by Florian [16] and Izhikevich [15]. This reinforcement learning rule has been used in various robot control applications, such as those developed by Evans [31] and Clawson et al. [32].

MSTDP enables the learning of SNNs by modifying the synaptic weight $W_{ab}$ between a presynaptic neuron (source) a and a postsynaptic neuron b (target). Mathematically, the change in the synaptic weight $W_{ab}$ is the result of modulating the STDP learning rule [13] by a constant R, known as reward [16]. The behavior of this learning rule can be observed in Eq. (10), where the variation of the synaptic weight $W_{ab}$ is presented in terms of the change of the synaptic weights $P_{ab}$ calculated by STDP. Our work used the MSTDP learning rule provided by the BindsNET library without any modifications to the default values assigned by the library for the STDP hyperparameters.

$$\begin{aligned} \Delta W_{ab}(t) = R \cdot \Delta P_{ab}(t). \end{aligned}$$

(10)

6 Control strategy

To develop the rudder and sails controllers, we defined various sailing scenarios that the sailboat must navigate, as well as designs for the rudder and sails controllers, along with training and testing scenarios for the experiments.

6.1 Rudder controller

In this paper, the rudder controller is based on an SNN with the architecture explained in Sect. 5. In this section, we defined $\varTheta _{1}$, $\alpha _{1}$ and the reward mechanism used.

6.1.1 Input state

We defined the state variable $\varTheta _{1}$ based on the input variable of the low-level controller proposed by Viel et al. [2]. Their controller positions the rudder to compensate for heading disturbances caused by waves and wind, using the difference between the current heading $\phi $ and the desired heading $\theta $ as an input variable. Therefore, we set $\varTheta _{1} = \theta - \phi $, where $\theta $ is calculated as shown in Eq. (2).

As explained in Sect. 5.2, the neurons to be fired depend on the value of $n_{1}$. Thus, we derived an equation to calculate it. Assuming that $-\varTheta _{1M}$ and $\varTheta _{1M}$ represent the minimum and maximum possible values of $\varTheta _{1}$, respectively. We set $n_{1}=0$ when $\varTheta _{1}=-\varTheta _{1M}$ and $n_{1}=\vert S_{1}\vert -1$ when $\varTheta _{1}=\varTheta _{1\,M}$, where $\vert S_{1}\vert $ is the cardinality of the set $S_{1}$ (Sect. 5.2). In Equation (11), we present a rounded linear model that satisfies these conditions. We rounded the equation to ensure that $n_{1} \in {\mathbb {Z}}^{+}$.

$$\begin{aligned} n_{1} = \left\lfloor \dfrac{(\varTheta _{1}+\varTheta _{1M})(\vert S_{1} \vert -1)}{2 \cdot \varTheta _{1M}} \right\rfloor , ~~0 \le n_{1} < \vert S_{1}\vert . \end{aligned}$$

(11)

In this paper, $\vert S_{1}\vert $ represents the number of possible values of $\varTheta _{1}$. For instance, if $\vert S_{1}\vert $ = 3 and $\varTheta _{1M}$ = 90, then $\varTheta _{1}$ can take on the values $\{-90, 0, 90\}$, and $n_{1}$ can take on the values $\{0, 1, 2\}$, respectively. It is important to note that the value of $\vert S_{1}\vert $ can impact the controller’s performance, and we, therefore, considered it a controller hyper-parameter.

6.1.2 Output

In Sect. 5.4, we explained that the output variable c represents the normalized control action calculated by the SNN. To convert c to the rudder control action $\alpha _{1}$, we use the following method.

Let $-\alpha _{1M}$ and $\alpha _{1M}$ denote the minimum and maximum possible values of $\alpha _{1}$, respectively. If we divide the interval $[-\alpha _{1M},\alpha _{1M}]$ into $J_{1}$ sub-intervals, the size of each sub-interval $\beta $ is given by Eq. (12).

$$\begin{aligned} \beta = \dfrac{2 \cdot \alpha _{1M}}{J_{1}}. \end{aligned}$$

(12)

To ensure that the possible values of $\alpha _{1}$ correspond to the mean value of each sub-interval, it was necessary to restrict c to only take $J_{1}$ possible values. To achieve this, a new variable $c_{1}$ was introduced, which is defined in Eq. (13).

$$\begin{aligned} c_{1} = \left\{ \begin{array}{ll} \lfloor c\cdot J_{1}\rfloor , &{}~\text {if}\ 0 \le c < 1, \\ J_{1}-1, &{}~\text {if}\ c = 1, \end{array} \right. \end{aligned}$$

(13)

To determine the value of $\alpha _{1}$ for a given interval $c_{1}$, we can use the following expressions: $N_{u} = -\alpha _{1\,M}+\beta \cdot c_{1}$ and $N_{u+1} = -\alpha _{1\,M}+\beta \cdot (c_{1}+1)$, which correspond to the maximum and minimum points of the interval $c_{1}$, respectively. Then, the expression for $\alpha _{1}$ is given by Eq. (14).

$$\begin{aligned} \begin{aligned} \alpha _{1}&= \dfrac{N_{c+1}+N_{c}}{2} \\&= \dfrac{-\alpha _{1M}+\beta \cdot (c_{1}+1)-\alpha _{1M}+\beta \cdot c_{1}}{2} \\&= \dfrac{(2c_{1}+1)\beta -2\alpha _{1M}}{2}. \end{aligned} \end{aligned}$$

(14)

By substituting Eqs. (12) into (14), we obtained a simplified expression for computing $\alpha _{1}$, as presented in Eq. (15). We specify that $J_{1}$ should be an odd number, as it allows for $\alpha _{1}=0$ to be a possible value.

$$\begin{aligned} \alpha _{1} = \dfrac{(2c_{1}+1-J_{1})\alpha _{1M}}{J_{1}}. \end{aligned}$$

(15)

In this paper, $J_{1}$ represents the number of possible rudder control actions and $c_{1}$ represents the index predicted by the SNN. For instance, if $J_{1}=3$ and $\alpha _{1\,M}=90$, then $\alpha _{1}$ can take on the values {−60, 0, 60}. If the SNN predicts $c_{1}=2$, then $\alpha _{1} = 60$. It is important to note that the value of $J_{1}$ can impact the controller’s performance. Therefore, we considered it as a controller hyper-parameter.

6.1.3 Reward strategy

As explained in Sect. 5.5, our SNN-based controllers were trained using the MSTDP algorithm, which required us to derive an equation for the reward value $R_{1}$. To do so, we referred to the results obtained by Florian [16]. In their study, an SNN with a rate-decoded output neuron was trained to solve the XOR problem, and they defined the reward as $R=\{-1,0,1\}$, where $R=1$ indicated an increase in the firing rate of the output neuron, $R=-1$ indicated a decrease, and $R=0$ indicated no change in the firing rate was desired. Based on this, we defined $R_{1}\in [-1,1]$.

To derive an equation for $R_{1}$, we first defined the ascending ordered set $E_{1}$ (named error set) and its index $e_{1} \in {\mathbb {Z}}^{+}$ (starting from zero), which contained the results of subtracting all possible values of $\alpha _{1}$. For instance, if $J_{1}=3$ and $\alpha _{1\,M}=90$, then $\alpha _{1}$ can take on the values $\{-60, 0, 60\}$, resulting in $E_{1} = \{-120,-60,0,60,120\}$. Note that $\vert E_{1} \vert =2J_{1}-1$ since the possible values of $\alpha _{1}$ are separated by a fixed distance (Sect. 6.1.1). If the elements in $E_{1}$ represent the possible errors between the current heading and its desired value, then $R_{1}$ must try to make the error zero. If $e_{z} = J_{1}-1$ represents the value of $e_{1}$ corresponding to the error zero, we expect that $R_{1}=1$ if $e_{1}-e_{z} = J_{1}-1$ and $R_{1}=-1$ if $e_{1}-e_{z} = -(J_{1}-1)$ due to symmetry with respect to zero. We presented a linear model satisfying these conditions in Eq. (16).

$$\begin{aligned} R_{1} = \dfrac{e_{1}-J_{1}+1}{J_{1}-1}. \end{aligned}$$

(16)

To derive an equation for $e_{1}$, we introduced the variable $\Delta G_{1}$, which represents the difference between the actual heading and the desired heading, and a constant $I_{1}$, which denotes the maximum allowable error for $\Delta G_{1}$. Therefore, if $\Delta G_{1} \ge I_{1}$, then $e_{1}$ must be at its maximum ($2J_{1}-2$). Similarly, if $\Delta G_{1} \le -I_{1}$, then $e_{1}$ must be at its minimum (0). For all other cases, we used a rounded linear model (to ensure $e_{1} \in {\mathbb {Z}}^{+}$). With the above considerations, we presented an equation to compute $e_{1}$ that fulfills the aforementioned conditions, as displayed in Eq. (17).

$$\begin{aligned} e_{1} = \left\{ \begin{array}{ll} \left\lfloor \dfrac{J_{1}-1}{I_{1}}(\Delta G_{1}+I_{1}) \right\rfloor ,&{}~\text {if}\ \Delta G_{1} \in (-I_{1},I_{1}),\\ 2J_{1}-2,&{}~\text {if}\ \Delta G_{1} \ge I_{1},\\ 0,&{}~\text {if}\ \Delta G_{1} \le -I_{1}. \end{array} \right. \end{aligned}$$

(17)

In this paper, we calculated $\Delta G_{1} = \phi - \theta $, allowing the controller to learn a policy to follow the desired heading. For instance, if we set $J_{1}=3$, $I_{1}=60$ and $\Delta G_{1}$ takes values of $\{-50, 0, 60\}$, then $e_{1}$ and $R_{1}$ can take on the values $\{0, 2, 4\}$ and $\{-1, 0, 1\}$, respectively. It is important to note that the value of $I_{1}$ can impact the controller’s performance, and we therefore considered it as a controller hyper-parameter.

6.2 Sails controller

In this paper, the sails controller is based on an SNN with the architecture explained in Sect. 5. In this section, we defined $\varTheta _{2}$, $\alpha _{2}$ and the reward mechanism used.

In contrast to the rudder controller, we derived an approximate model of the behavior of a sail to define $\varTheta _{2}$ and to reward the SNN. This model determines the angle $\bar{\alpha }_2$ that maximizes the sailboat’s acceleration in the heading direction $\phi $. We assumed that the sailboat depicted in Fig. 1 has a rigid sail^{Footnote 1} and moves at a fixed heading $\phi $ and speed ${\varvec{v}}$.

The first step in deriving the model was to find an equation for the magnitude of the apparent wind force $F_{\phi }$ in the heading direction. We based our approach on the work of Melin et al. [3]. Equation (18) shows the force ${\varvec{F}}_{s}$ acting on the sail, where $\rho $ is the sail lift coefficient, $\sigma $ is the sail opening angle with respect to the x-axis, ${\hat{\varPhi }}$ is a unit normal vector to the sail, $\gamma {a}$ is the apparent wind direction, and a is the apparent wind speed (see Sect. 3.2).

$$\begin{aligned} {\varvec{F}}_{s} = \rho \cdot a \sin (\gamma _{a}-\sigma ) {\hat{\varPhi }} = F \cdot {\hat{\varPhi }}. \end{aligned}$$

(18)

Note that ${\hat{\varPhi }}$ is always normal to the sail for any angle $\sigma $. For this to hold true, ${\hat{\varPhi }}$ must have cylindrical (azimuthal) symmetry. By using the transformation equations from cylindrical to Cartesian vectors [33], we derived Eq. (19). This represents the force of the apparent wind on the sail in the global coordinate system of Fig. 1.

$$\begin{aligned} {\varvec{F}}_{s} = F \cdot (-\sin (\sigma ) {\hat{x}} + \cos (\sigma ) {\hat{y}}). \end{aligned}$$

(19)

By applying the transformation equations from Cartesian to cylindrical vectors [33] to Eq. (19) and considering the heading $\phi $ as the opening angle of the coordinate system, we obtain Eq. (20). In this equation, ${\hat{\rho }}$ and ${\hat{\psi }}$ are unit vectors parallel and perpendicular, respectively, to $\phi $. Therefore, Eq. (21) shows the force magnitude in the heading direction.

$$\begin{aligned}&{\varvec{F}}_{s} = F \cdot (\sin (\phi -\sigma ) \ {\hat{\rho }} + \cos (\phi -\sigma ) \ {\hat{\psi }}), \end{aligned}$$

(20)

$$\begin{aligned}&F_{\phi } = F \cdot \sin (\phi -\sigma ) = \rho \cdot a \sin (\gamma _{a}-\sigma ) \sin (\phi -\sigma ). \end{aligned}$$

(21)

The second step in deriving the model was to calculate the derivative of Eq. (21) with respect to $\sigma $ and set it equal to zero. By applying the laws of trigonometry and solving for $\sigma $, we obtain Eq. (22). This model maximizes the sailboat’s acceleration in the heading direction, meaning that Eq. (22) can be used to advance the heading direction.

$$\begin{aligned} \sigma = \left\{ \begin{array}{ll} \arctan \left( \dfrac{\sin (\gamma _{a}+\phi )}{\cos (\gamma _{a}+\phi ) - 1} \right) ,\qquad ~~\text {if}\ \gamma _{a}+\phi \ne k\pi , \\ \pm \dfrac{\pi }{2},~~\text {if}\ \gamma _{a}+\phi = k\pi ,~k\in {\mathbb {Z}}. \end{array} \right. \end{aligned}$$

(22)

Finally, to calculate the angle $\bar{\alpha }_2$, we used the operation shown in Eq. (23). In this equation, $\alpha _{2M}$ represents the maximum possible value of $\alpha _{2}$. It is important to ensure that both $\sigma - \phi $ and $\sigma - \phi + \pi $ are within the interval $[-\pi ,\pi )$.

$$\begin{aligned} \bar{\alpha }_2 = \left\{ \begin{array}{ll} \sigma - \phi ,&{}~\text {if}\ -\alpha _{2M} \le \sigma - \phi \le \alpha _{2M},\\ \sigma - \phi + \pi ,&{}~\text {if}\ -\alpha _{2M} \le \sigma - \phi + \pi \le \alpha _{2M}. \end{array} \right. \end{aligned}$$

(23)

6.2.1 Input state

We based the definition of the state variable $\varTheta _{2}$ for the sails controller on Eq. (22). As $\gamma _{a}+\phi $ is the input variable in this equation, we set $\varTheta _{2} = \gamma _{a}+\phi $.

To derive an expression for $n_{2}$, we followed the same procedure described in Sect. 6.1.1 and obtained Eq. (24). In this equation, $\varTheta _{2M}$ represents the maximum possible value of $\varTheta _{2}$ and $\vert S_{2}\vert $ represents the cardinality of the set $S_{2}$ (Sect. 5.2). Similar to the rudder controller, $\vert S_{2}\vert $ denotes the number of possible values of $\varTheta _{2}$, and was considered as a controller hyper-parameter.

$$\begin{aligned} n_{2} = \left\lfloor \dfrac{(\varTheta _{2}+\varTheta _{2M})(\vert S_{2}\vert -1)}{2\cdot \varTheta _{2M}} \right\rfloor , ~0 \le n_{2} < \vert S_{2}\vert . \end{aligned}$$

(24)

6.2.2 Output

Using the same procedure as in Sect. 6.1.2, we derived Eqs. (25) and (26). In these equations, $\alpha _{2M}$ represents the maximum possible value of $\alpha _{2}$, and c represents the normalized control action calculated by the sails output neuron. Similarly to the rudder controller, $J_{2}$ represents the number of possible control actions, and was considered as a controller hyper-parameter.

$$\begin{aligned}&c_{2} = \left\{ \begin{array}{ll} \lfloor c\cdot J_{2}\rfloor , &{}~\text {if}\ 0 \le c < 1, \\ J_{2}-1, &{}~\text {if}\ c = 1, \end{array} \right. \end{aligned}$$

(25)

$$\begin{aligned}&\alpha _{2} = \dfrac{(2c_{2}+1-J_{2})\alpha _{2M}}{J_{2}}. \end{aligned}$$

(26)

6.2.3 Reward strategy

Using the same procedure as in Sect. 6.1.3, we derived Eqs. (27) and (28). In these equations, $\Delta G_{2}$ represents the error between the sails control action and the ideal sails control action, and $I_{2}$ represents the maximum allowable error for $\Delta G_{2}$. Similar to the rudder controller, we considered $I_{2}$ as a controller hyper-parameter.

$$\begin{aligned}&e_{2} = \left\{ \begin{array}{ll} \left\lfloor \dfrac{J_{2}-1}{I_{2}}(\Delta G_{2}+I_{2}) \right\rfloor ,&{}~\text {if}\ \Delta G_{2} \in (-I_{2},I_{2}),\\ 2J_{2}-2,&{}~\text {if}\ \Delta G_{2} \ge I_{2},\\ 0,&{}~\text {if}\ \Delta G_{2} \le -I_{2}, \end{array} \right. \end{aligned}$$

(27)

$$\begin{aligned}&R_{2} = \dfrac{e_{2}-J_{2}+1}{J_{2}-1}. \end{aligned}$$

(28)

In this paper, we calculated $\Delta G_{2}$ as $(\alpha _{2}-\bar{\alpha }_2)_{t-1}$. The subscript $t-1$ indicates that the value of $\alpha _{2}-\bar{\alpha }_2$ is calculated in the previous simulation instant. Thus, the controller learns a policy by approximating the model presented in Eq. (22).

6.3 Tacking and gybing

Tacking and gybing maneuvers are performed when the sailboat is sailing upwind (tacking) or downwind (gybing) and its intended heading falls within the corresponding no-go zone. If the tacking and gybing no-go zones are defined by angles $\sigma _{1}$ and $\sigma _{2}$, respectively, then the sailboat has its intended heading in the no-go zones if conditions (29) and (30) are met, for tacking and gybing, respectively. In these equations, $\Delta w_{1} = \theta -\gamma _{\tau }$, where $\theta $ is the desired heading and $\gamma _{\tau }$ is the true wind angle (see Sect. 3).

$$\begin{aligned}&\vert \Delta w_{1}(t)\vert >\pi -\dfrac{\sigma _{1}}{2}, \end{aligned}$$

(29)

$$\begin{aligned}&\vert \Delta w_{1}(t)\vert < \dfrac{\sigma _{2}}{2}. \end{aligned}$$

(30)

To determine the sailboat’s scenario, we use Eqs. (29) and (30). If we substitute $\Delta w_{1}$ for $\Delta w_{2}$, where $ \Delta w_{2} =\phi -\gamma _{\tau }$, and note that the full angular size of the upwind and downwind zones is $\pi $ radians (see Fig. 2), then Eqs. (31) and (32) provide a way to identify the sailboat’s sailing scenario.

$$\begin{aligned}&\text {upwind}, ~~\text {if}\ \vert \Delta w_{2}\vert \ge \dfrac{\pi }{2}, \end{aligned}$$

(31)

$$\begin{aligned}&\text {downwind}, ~~\text {if}\ \vert \Delta w_{2}\vert < \dfrac{\pi }{2}. \end{aligned}$$

(32)

Based on the previous equations, we have established the activation conditions for tacking and gybing maneuvers. To activate tacking, Eqs. (29) and (31) must be satisfied. To activate gybing, Eqs. (30) and (32) must be satisfied. To perform these maneuvers, it is necessary to calculate the desired heading $\theta $ in a different way than the approach described in Sect. 3. We calculated $\theta $ using the methods presented in [1] and [2], where $\delta $ represents the desired sailboat heading relative to the true wind. Equations (33) and (34) allow us to calculate $\theta $, where $\delta _{1}$ and $\delta _{2}$ represent the variable $\delta $ for tacking and gybing, respectively.

$$\begin{aligned}&\theta = \pi + \gamma _{\tau } \pm \delta _{1}, \end{aligned}$$

(33)

$$\begin{aligned}&\theta = \gamma _{\tau } \pm \delta _{2}. \end{aligned}$$

(34)

To execute the maneuvers, we employed the following strategy: upon detecting the need to tack or gybe, the controller assigns a value of $\theta $ that is closest to the sailboat’s heading $\phi $, and switches to the next $\theta $ when the speed limit ($v_{t}$ for tacking or $v_{g}$ for gybing) is reached. For the remainder of the trajectory, heading adjustments are generated whenever the velocity limit is surpassed and $\Delta w_{1}$ changes sign.

6.4 Controller training

Figure 5 illustrates the target points for the sailboat controller in the training scenario. The sailboat training problem involves reaching all the points indicated in Fig. 5 from the origin point $(x_{0},y_{0})$. We divided the training into two stages: downwind and upwind. In both cases, we define the target point as reached when $\Delta {\varvec{r}} \le 2$. This parameter value is reasonable considering the positioning error in some GPS devices.

Downwind In this stage, the SNN-based ANS is trained to learn a suitable policy for moving in the downwind sailing scenario. Points 1–10 in Fig. 5 correspond to this stage.
Upwind In this stage, the SNN-based ANS is trained to learn a suitable policy for moving in the upwind sailing scenario. Points 11–13 in Fig. 5 correspond to this stage.

To better understand the following explanation, please refer to Fig. 6. To avoid large deviations from the sailboat’s ideal trajectory during the training scenario, we have defined a reset action. This action returns the sailboat to the origin point. When the sailboat deviates from the desired trajectory by a distance of $0.5\omega $, this action is triggered, and a learning episode ends. In Eq. (35), we presented the logical activation condition for the reset action, where $l = \vert 0.5\omega \sec (\theta )\vert $, $\theta = \arctan (m)$, and $m = (y-y_{0})(x-x_{0})^{-1}$. If the controller detects a tack or gybe, the point (x, y) is changed to a point in the $\theta $ direction (Eqs. (33) and (34)).

$$\begin{aligned} y_{1} > y-m(x_{1}-x)-l \vee y_{1} < y-m(x_{1}-x)+l. \end{aligned}$$

(35)

To begin the training process, we randomly initialize all weights $W_{ab}$ for both SNNs. We start on the downwind stage, where the sailboat is positioned at the origin $(x_{0},y_{0})$ and $\phi = 0$. If the sailboat deviates a distance of $0.5\omega $ away from the ideal heading, we trigger the reset action. Similarly, if the sailboat reaches the target point, we trigger the reset action and assign the controller another point (x, y) until the downwind stage is completed. Once the downwind stage is finished, we start the upwind stage, where the sailboat is at the origin $(x_{0},y_{0})$ and $\phi = \dfrac{3\pi }{4}$. Again, if the sailboat reaches the target point, we trigger the reset action and assign the controller another point (x, y) until the upwind stage is completed. In both scenarios, we randomly select the sailboat’s next target point.

In the downwind stage, we set $\sigma _{2} = 0$ to ensure that the sail controller responds appropriately when $\theta -\phi =0$. For the upwind stage, we chose a small value for $v_{t}$ and a large value for $\delta _{1}$ to make the tacking turn slow, enabling the sail controller to learn how to respond over a wide range of angles with few points. Specifically, we set $v_{t} = 0.2$, $\sigma _{1} = \pi $, $\delta _{1} = \dfrac{2\pi }{45}$, $\tau =1$, and $\gamma _{\tau } = 0$.

6.5 Controller testing

In Fig. 7, we presented the target points used to test the sailboat controllers. The sailboat testing problem involves reaching all the points shown in Fig. 7, following the direction of the arrows. We proposed twelve segments, two for each region of Fig. 2.

The testing process is as follows: the sailboat is initially positioned at point 1 with a heading of $\phi = \dfrac{3\pi }{4}$, and the controller is assigned point 2 as the first target. Once the sailboat reaches a target, the next point in the trajectory is assigned until the sailboat has traveled through all twelve defined trajectories. Similar to the training environment, we consider a target point reached if $\vert \Delta {\varvec{r}}\vert \le 2$. For this scenario, we selected the following values: $\sigma _{1} = 0.5\pi $, $\sigma _{2} = \dfrac{\pi }{6}$, $\delta _{1}=\delta _{2}=\frac{\pi }{4}$, as these values are commonly used for tacking and gybing maneuvers [23, 34]. Additionally, we selected $v_{t} = 0.47$, $v_{g} = 0.8$, $\tau =1$, and $\gamma _{\tau } = 0$.

7 Experiments

As a first step for our simulation experiments, we needed to determine the values for the control hyper-parameters. Initially, we were uncertain about what values to assign to them. Therefore, we performed a manual calibration until we obtained a functional SNN-based ANS. The SNN-based ANS we found has the following parameters: $J_{1} = 11, J_{2}=15, I_{1} = I_{2} = 40^{\circ }, \vert S_{1}\vert =5,\vert S_{2}\vert =18$.

For the hyper-parameters $J_{1}$ and $J_{2}$, which can only be odd (Sect. 6.1.2), we chose four values: the calibration value, one value above it, and two values below it. We selected four values for $I_{1}$ and $I_{2}$: the calibration value and three higher values, each separated by $10^{\circ }$. Finally, we decided that the variables $\vert S_{1}\vert $ and $\vert S_{2}\vert $ should take two values: the calibration value and its double, in order to double the number of neurons in the input layer and explore more complex SNNs. Next, we present the specific values for each hyper-parameter.

$J_{1} = \{5,9,11,13\}$.
$J_{2} = \{11,13,15,17\}$.
$I_{1} = \{70^{\circ },60^{\circ },50^{\circ },40^{\circ }\}$.
$I_{2} = \{70^{\circ },60^{\circ },50^{\circ },40^{\circ }\}$.
$\vert S_{1}\vert = \{5,10\}$.
$\vert S_{2}\vert = \{36,18\}$.

To find out how the behavior of the SNN-based ANS is influenced by different combinations of hyper-parameters, we opted to explore the design space of the SNN-based ANS using the previously selected hyper-parameters. Our aim was to examine all 1024 possible combinations of hyper-parameters to identify the SNN-based ANS that executes the testing scenario in the shortest possible time, the smallest deviation error, and the fewest number of neurons.

We assigned an integer value between 1 and 1024 to each possible hyper-parameters combination. These were ordered according to the sequence $(J_{1},J_{2},I_{1},I_{2},\vert S_{1}\vert ,\vert S_{2}\vert )$. To generate the combinations, we systematically varied all possible values of the hyper-parameters, starting with $\vert S_{2}\vert $ and moving towards $J_{1}$. Combination $l=1$ corresponds to $(5,11,70^{\circ },70^{\circ },5,36)$, and combination $l=1024$ corresponds to $(13,17,40^{\circ },40^{\circ },10,18)$.

To evaluate the behavior of various SNN-based ANS in a testing scenario, it is necessary to first train them. Consequently, each experiment entails the training and testing of a single SNN-based ANS. Finally, a Docker image was created to contain the simulation environment for conducting the design space exploration. The exploration was executed on a workstation capable of running up to five experiments simultaneously. Figure 8 illustrates the execution scheme for the design space exploration.

8 Results and discussion

Our design space exploration took approximately 13 days to perform the 1024 experiments required to explore the different SNN-based ANS. Out of the 1024 experiments conducted, 88 experiments failed the testing scenario, 511 experiments failed the training scenario, and 425 experiments completed both scenarios correctly. An experiment fails to complete a scenario when it does not reach all target points within 105 min for training and 45 min for testing. It should be noted that controllers that failed to complete a scenario do not necessarily fail to work; they simply fail to complete the proposed task within the defined time interval and thus will not be considered among the best.

To process the data generated by the design space exploration, we defined three optimization goals:

Sailing time ($t_{s}$): total time to reach the target in the testing scenario.
Deviation error ($D_{e}$): mean absolute error between the path traveled by the sailboat and the ideal path in all trajectories except no-go zones.
SNN size (S): total number of input neurons $S = 2(\vert S_{1}\vert +\vert S_{2}\vert )$ (as discussed in Sect. 5).

The results of the $t_{s}$ metric are depicted in Fig. 9 as a histogram. Each bar in the histogram represents a specific time range. The numbers on the time axis indicate the starting point of the range, and the numbers above the bars represent the total number of experiments. The figure reveals that most test scenarios were completed in under 600 s. Moreover, there were 14 experiments that finished in less than 400 s, making them potential candidates for the SNN-based ANS with the best time.

Figure 10 displays the mean absolute errors (MAE) for the trajectories depicted in Fig. 7 (excluding the no-go zones), aiming to observe the behavior of the SNN-based ANS in different trajectories. Most of the trajectories exhibit MAE between 0.3 m and 2.1 m, while the downwind 1 trajectory has the highest errors, with a considerable number of results positioned to the right of the value 2.1. This indicates the need for further training for downwind 1 trajectories. Notably, some SNN-based ANS exhibit errors per trajectory below 0.4, indicating minimal deviation from the ideal path.

To identify the best controllers of the design space exploration, we calculated the Pareto points [35] by minimizing the metrics $t_{s}$, S, and $D_{e}$ as explained earlier. Figure 11 presents the Pareto frontier points, where N_time represents the normalized $t_{s}$ variable, N_error denotes the normalized $D_{e}$ variable, and N_states reflects the normalized S variable. Table 2 presents the values of the three target metrics for each Pareto frontier point.

After analyzing the results in Table 2, we have determined that experiment $l=923$ is the best performing SNN-based ANS. This is because it belongs to the set of experiments with $t_{s}<400$, has the lowest $D_{e}$ among this set, and also has one of the lowest S values.

Table 2 Pareto frontier results

Full size table

8.1 Comparison with other control algorithms

In this section, we presented comparisons between our SNN-based ANS and other control algorithms found in the state of the art, to solve the same sailing task.

In Fig. 12, we present the path followed by our $l=923$ SNN-based ANS in the testing scenario (blue line). The different maneuvers performed can be seen in trajectories $2\rightarrow 3$, $11\rightarrow 12$, $5\rightarrow 6$, and $8\rightarrow 9$, where the sailboat tacked and gybed properly as it had to sail in the no-go zones. For the rest of the trajectories, the sailboat reached the target point following the heading $\theta $ with small deviations from the green line (low $D_{e}$). Based on these observations, we can conclude that our $l=923$ SNN-based ANS learned a suitable sailboat control policy, and the developed simulation environment is useful for training SNNs.

For comparison, we selected Viel’s low-level control algorithm [2] and the default sailing algorithm of the USVSim [20]. Viel’s algorithm operates based on a geometric approximation of the sailboat’s behavior, and performs corrections to perturbations in the sailboat’s heading. The USVSim control algorithm is a proportional integral controller (PI) calibrated for the original USVSim sailboat. We implemented both algorithms in our simulation environment and ran the testing scenario for each one.

Table 3 Algorithm comparison metrics

Full size table

In Fig. 12 and Table 3, we present the results obtained by each control system in the testing scenario. All algorithms successfully completed the scenario. Viel’s controller outperformed the other algorithms as it had the smallest travel time and deviation error with respect to the ideal path. While the USVSim algorithm had a better travel time than the SNN-based ANS, the SNN-based ANS had a lower deviation error. These results suggest that although the SNN-based ANS does not perform better than a robust controller like Viel’s, it may be useful as a viable alternative to a PI controller in tasks where low deviation error is important.

It is important to note that this is our first attempt at developing SNN-based ANS. We employed a simple architecture, a specific training, a learning approach, and a particular testing technique. While our results do not exhibit significant improvements over state-of-the-art controllers, there may be other SNN architectures and training methods that can enhance performance in sailing tasks. Thus, these findings can provide a foundation for further exploration and development of SNN-based ANS designs.

9 Conclusion

In this work, we developed an SNN-based ANS for sailboat control. We formulated the sailing problem, identified the SNNs features, developed a control strategy, and established training and testing scenarios. We conducted a design space exploration in a simulated experiments to minimize testing time, deviation error, and total input neurons. Our experiments generated 425 controllers that successfully navigated the testing scenario. Our best controller achieved a testing time of 396 s and a deviation error of 0.55 m, outperforming the USVSim controller in deviation error. However, it performed worse than the Viel’s controller, which completed the testing scenario in 309 s with an error of 0.51 m, indicating a need to reevaluate aspects of our methodology. One potential change is to use a reinforcement learning algorithm with an eligibility trace instead of the MSTDP algorithm, as it would enable more advanced reward strategies. Other possibilities include exploring recurrent SNNs to incorporate information about past events, as well as conducting a more comprehensive hyper-parameter search to find optimal values for our sailing task. As future work, we will implement the $l=923$ SNN-based ANS on a real small-scale sailboat to validate its performance under real conditions.

Data availability

A repository with the results obtained from the simulations is available at https://github.com/nsantiagogiraldo/Sailboat_simulator.

Notes

Rigid sails maintain their shape regardless of the wind.

References

Abrougui, H., & Nejim, S. (2018). Sliding mode control of an autonomous sailboat. In 5th International Conference on Green Energy and Environmental Engineering GEEE (pp. 19–24).
Viel, C., Vautier, U., Wan, J., & Jaulin, L. (2020). Platooning control for heterogeneous sailboats based on constant time headway. IEEE Transactions on Intelligent Transportation Systems, 21(5), 2078–2089. https://doi.org/10.1109/TITS.2019.2912389
Article Google Scholar
Melin, J., Dahl, K., & Waller, M. (2015). Modeling and control for an autonomous sailboat: A case study. In World Robotic Sailing Championship and International Robotic Sailing Conference (pp. 137–149). Springer.
Viel, C., Vautier, U., Wan, J., & Jaulin, L. (2018). Position keeping control of an autonomous sailboat. IFAC-PapersOnLine, 51(29), 14–19 (2018). https://doi.org/10.1016/j.ifacol.2018.09.462. 11th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles CAMS.
Junior, A., Santos, D., Negreiros, A., Vilas Boas, J., & Gonçalves, L. (2020). High-level path planning for an autonomous sailboat robot using q-learning. Sensors. https://doi.org/10.3390/s20061550
Article Google Scholar
Cheng, Z., Qi, W., Sun, Q., Liu, H., Ding, N., Sun, Z., Lam, T. L., & Qian, H. (2019). Obstacle avoidance for autonomous sailboats via reinforcement learning with coarse-to-fine strategy. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO) (pp. 59–64). https://doi.org/10.1109/ROBIO49542.2019.8961749
He, J., Li, Y., Liu, Y., Chen, J., Wang, C., Song, R., & Li, Y. (2022). The development of spiking neural network: A review. In 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO) (pp. 385–390). https://doi.org/10.1109/ROBIO55434.2022.10012028
Gerstner, W., & Kistler, W. M. (2002). Spiking neuron models: Single neurons, populations, plasticity. Cambridge University Press. https://doi.org/10.1017/CBO9780511815706
Book MATH Google Scholar
Bing, Z., Meschede, C., Röhrbein, F., Huang, K., & Knoll, A. C. (2018). A survey of robotics control based on learning-inspired spiking neural networks. Frontiers in Neurorobotics. https://doi.org/10.3389/fnbot.2018.00035
Article Google Scholar
Young, A. R., Dean, M. E., Plank, J. S., Rose, S., & G. (2019). A review of spiking neuromorphic hardware communication systems. IEEE Access, 7, 135606–135620. https://doi.org/10.1109/ACCESS.2019.2941772
Nunes, J. D., Carvalho, M., Carneiro, D., & Cardoso, J. S. (2022). Spiking neural networks: A survey. IEEE Access, 10, 60738–60764. https://doi.org/10.1109/ACCESS.2022.3179968
Chao, Y., Augenstein, P., Roennau, A., Dillmann, R., & Xiong, Z. (2023). Brain inspired path planning algorithms for drones. Frontiers in Neurorobotics. https://doi.org/10.3389/fnbot.2023.1111861
Article Google Scholar
Bi, G.-Q., & Poo, M.-m. (2001). Synaptic modification by correlated activity: Hebb’s postulate revisited. Annual Review of Neuroscience, 24(1), 139–166. https://doi.org/10.1146/annurev.neuro.24.1.139. PMID: 11283308.
Article Google Scholar
Bing, Z., Meschede, C., Chen, G., Knoll, A., & Huang, K. (2020). Indirect and direct training of spiking neural networks for end-to-end control of a lane-keeping vehicle. Neural Networks, 121, 21–36. https://doi.org/10.1016/j.neunet.2019.05.019
Article Google Scholar
Izhikevich, E. M. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex, 17(10), 2443–2452.
Article Google Scholar
Florian, R. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19, 1468–502. https://doi.org/10.1162/neco.2007.19.6.1468
Article MathSciNet MATH Google Scholar
Feng, H., & Zeng, Y. (2022). A brain-inspired robot pain model based on a spiking neural network. Frontiers in Neurorobotics. https://doi.org/10.3389/fnbot.2022.1025338
Article Google Scholar
Bohte, S. M., Kok, J. N., & La Poutré, H. (2002). Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing, 48(1), 17–37. https://doi.org/10.1016/S0925-2312(01)00658-0
Article MATH Google Scholar
Lobo, J. L., Del Ser, J., Bifet, A., & Kasabov, N. (2020). Spiking neural networks and online learning: An overview and perspectives. Neural Networks, 121, 88–100.
Article Google Scholar
Paravisi, M., Santos, H., & D., Jorge, V., Heck, G., Gonçalves, L.M., & Amory, A. (2019). Unmanned surface vehicle simulator with realistic environmental disturbances. Sensors. https://doi.org/10.3390/s19051068
Hazan, H., Saunders, D. J., Khan, H., Patel, D., Sanghavi, D. T., Siegelmann, H. T., & Kozma, R. (2018). Bindsnet: A machine learning-oriented spiking neural networks library in python. Frontiers in Neuroinformatics, 12, 89. https://doi.org/10.3389/fninf.2018.00089
Article Google Scholar
Merkel, D. (2014). Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014(239), 2.
Google Scholar
Jing, W., Liu, C., Li, T., Rahman, A., Xian, L., Wang, X., Wang, Y., Guo, Z., Brenda, G., & Tendai, K. W. (2020). Path planning and navigation of oceanic autonomous sailboats and vessels: A survey. Journal of Ocean University of China, 19(3), 609–621.
Article Google Scholar
Rousmaniere, J., & Smith, M. (1999). The Annapolis book of seamanship: Third edition: Completely revised, expanded and updated. Simon & Schuster. https://books.google.com.co/books?id=xRqzoX04v5AC
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
MATH Google Scholar
Rieger, G. (2018). Socat—Multipurpose relay. http://www.dest-unreach.org/socat/. Accessed 10-04-2021.
Mahadevuni, A., & Li, P. (2017). Navigating mobile robots to target in near shortest time using reinforcement learning with spiking neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 2243–2250). https://doi.org/10.1109/IJCNN.2017.7966127
Kaiser, J., Vasquez Tieck, J. C., Hubschneider, C., Wolf, P., Weber, M., Hoff, M., Friedrich, A., Wojtasik, K., Roennau, A., Kohlhaas, R., Dillmann, R., & Zöllner, J. M. (2016). Towards a framework for end-to-end control of a simulated vehicle with spiking neural networks. In 2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR) (pp. 127–134). https://doi.org/10.1109/SIMPAR.2016.7862386
Frémaux, N., Sprekeler, H., & Gerstner, W. (2013). Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLoS Computational Biology, 9.
Harris, S. L., & Harris, D. M. (2016). 3–Sequential logic design. In S. L. Harris & D. M. Harris (Eds.), Digital design and computer architecture (pp. 108–171). Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-800056-4.00003-0
Chapter Google Scholar
Evans, R. (2015). Reinforcement learning in a neurally controlled robot using dopamine modulated STDP. https://doi.org/10.48550/ARXIV.1502.06096
Clawson, T. S., Ferrari, S., Fuller, S. B., & Wood, R. J. (2016). Spiking neural network (SNN) control of a flapping insect-scale robot. In 2016 IEEE 55th Conference on Decision and Control (CDC) (pp. 3381–3388). https://doi.org/10.1109/CDC.2016.7798778
Avila, M. R. (2008). Apuntes de mecanica clasica capitulo 1. http://www.fiumsa.edu.bo/docentes/mramirez/capitulo_I.pdf. Accessed 13-07-2022.
Silva, M. F., Friebe, A., Malheiro, B., Guedes, P., Ferreira, P., & Waller, M. (2019). Rigid wing sailboats: A state of the art survey. Ocean Engineering, 187, 106150. https://doi.org/10.1016/j.oceaneng.2019.106150
Article Google Scholar
Ng, Y.-K. (2004). Pareto optimality (pp. 26–46). Palgrave Macmillan UK. https://doi.org/10.1057/9781403944061_2
Book Google Scholar

Download references

Funding

Open Access funding provided by Colombia Consortium. The Authors declare that this work was supported by the University of Antioquia with project PRG2017-16182 and by the Colombia Scientific Program within the framework of the call Ecosistema Científico (Contract No. FP44842-218-2018).

Author information

Authors and Affiliations

Department of Electronics and Telecommunications Engineering, University of Antioquia, 67 st, Medellin, 050010, Antioquia, Colombia
Nelson Santiago Giraldo, Sebastián Isaza & Ricardo Andrés Velásquez

Authors

Nelson Santiago Giraldo
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián Isaza
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Andrés Velásquez
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ricardo Velasquez conceived the idea of this project and co-supervised its development. Sebastian Isaza co-supervised the project development and helped write and review the paper. Nelson Giraldo proposed some of the ideas, developed the codes, run the experiments and wrote the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Nelson Santiago Giraldo.

Ethics declarations

Conflict of interest

The Authors declare that they have no conflicts of interest.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Giraldo, N.S., Isaza, S. & Velásquez, R.A. Sailboat navigation control system based on spiking neural networks. Control Theory Technol. 21, 489–504 (2023). https://doi.org/10.1007/s11768-023-00150-1

Download citation

Received: 26 November 2022
Revised: 28 March 2023
Accepted: 14 April 2023
Published: 29 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11768-023-00150-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sailboat navigation control system based on spiking neural networks

Abstract

Similar content being viewed by others

Autonomous Learning Paradigm for Spiking Neural Networks

Task-Independent Spiking Central Pattern Generator: A Learning-Based Approach

A Spiking Neural Network Based Autonomous Reinforcement Learning Model and Its Application in Decision Making

1 Introduction

2 Methodology

3 Problem description

3.1 Sailing maneuvers

3.2 True and apparent wind

3.3 Reinforcement learning

4 Simulation environment

5 SNN-based controllers

5.1 Neuron model

5.2 Encoding technique

5.3 SNN topology

5.4 Decoding technique

5.5 SNN learning

6 Control strategy

6.1 Rudder controller

6.1.1 Input state

6.1.2 Output

6.1.3 Reward strategy

6.2 Sails controller

6.2.1 Input state

6.2.2 Output

6.2.3 Reward strategy

6.3 Tacking and gybing

6.4 Controller training

6.5 Controller testing

7 Experiments

8 Results and discussion

8.1 Comparison with other control algorithms

9 Conclusion

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation