Leveraging Online Racing and Population Cloning in Evolutionary Multirobot Systems

  • Fernando Silva
  • Luís Correia
  • Anders Lyhne Christensen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9598)

Abstract

Online evolution of controllers on real robots typically requires a prohibitively long time to synthesise effective solutions. In this paper, we introduce two novel approaches to accelerate online evolution in multirobot systems. We introduce a racing technique to cut short the evaluation of poor controllers based on the task performance of past controllers, and a population cloning technique that enables individual robots to transmit an internal set of high-performing controllers to robots nearby. We implement our approaches over odNEAT, which evolves artificial neural network controllers. We assess the performance of our approaches in three tasks involving groups of e-puck-like robots, and we show that they facilitate: (i) controllers with higher performance, (ii) faster evolution in terms of wall-clock time, (iii) more consistent group-level performance, and (iv) more robust, well-adapted controllers.

Keywords

Online evolution Multirobot systems Racing Population cloning 

1 Introduction

Online evolution is one of the most open-ended approaches to adaptation and learning in robotic systems. In online evolution, an evolutionary algorithm is executed onboard robots during task execution to continuously optimise behavioural control. The main components of the evolutionary algorithm, namely evaluation, selection, and reproduction, are performed by the robots without any external supervision. In this way, robots may continuously self-adapt and modify their behaviour in response to changes in the task or in the environmental conditions.

In 1994, Floreano and Mondada [1] conducted the first study on online evolution using a single, mobile robot. In 2002, Watson et al. [2] developed embodied evolution, an approach in which the evolutionary algorithm is distributed across a collective of robots. The key objective behind the use of multiple robots was to enable a speed-up of evolution due to robots that evolve controllers in parallel and exchange partial solutions to the task. Such approach allows for a form of knowledge transfer that has been shown to speed up the evolutionary process and to facilitate more effective collective problem solving [3, 4].

Over the past years, research in online evolution lead to the development of a number of different approaches. Examples include the \((\mu + 1)\)-online evolutionary algorithm by Haasdijk et al. [5], r-ASiCo [6] by Prieto et al., mEDEA by Bredeche et al. [7], and odNEAT by Silva et al. [3], among others. However, there are currently a number of key issues that must be addressed before online evolution becomes a feasible approach to adaptation and learning in real robots. One major impediment to widespread adoption is the long time that online evolution requires to synthesise solutions to any but the simplest of tasks (several hours or days), which currently renders the approach infeasible on real robots [8].

In the vast majority of online evolution algorithms, see [5, 7] for examples, each controller is assessed for a fixed, predefined amount of time. Because the evaluation of inferior controllers amounts to poor task performance, previous studies have focused on how to optimise the evaluation period online via: (i) self-adaptation of the evaluation period [9], (ii) roulette wheel-based selection [10], (iii) a stochastic heuristic rule that monitors the performance of controllers to adjust the evaluation period [10], and (iv) a racing technique that runs independently on every robot in a group [11]. Despite their potential, such techniques have a number of inherent limitations, namely: (i) require the experimenter to decide, for instance, on the maximum evaluation period [9, 10, 11], which may be infeasible in practice, (ii) are significantly sensitive to the parameter settings [9, 10, 11, 12], and (iii) require a controller to be reevaluated multiple times [10].

In this paper, we propose two novel approaches to speed up online evolution of robotic controllers in multirobot systems. Firstly, we introduce a racing approach for multirobot systems that relies on the task performance of controllers alone, and therefore does not require the definition of a maximum evaluation period. Because the racing approach relies on a modified version of the non-parametric Hoeffding’s bounds [13], it can be applied to an unrestricted set of tasks and algorithms. We then extend racing with a population cloning approach, which enables each individual robot to clone and transmit a set of high-performing controllers stored in its internal population to other robots nearby. The underlying motivation is to effectively leverage the genetic information accumulated by multiple robots that evolve together.

We implement our approaches over odNEAT [3], which optimises artificial neural network (ANN) controllers online in a distributed and decentralised manner. One of the main advantages of odNEAT is that it evolves both the weights and the topology of ANNs, thereby bypassing the inherent limitations of fixed-topology algorithms [3]. odNEAT is used here as a representative efficient algorithm that has been successfully used in a number of simulation-based studies related to adaptation and learning in robot systems, see [3, 4, 14, 15] for examples. We assess the performance of our proposed approaches in three tasks involving groups of e-puck-like robots [16], namely in two foraging tasks with differing complexity and in a dynamic phototaxis task. Our results show that racing and population cloning facilitate: (i) synthesis of controllers with higher performance, (ii) faster evolution in terms of wall-clock time, (iii) more consistent group-level performance, and (iv) more robust, well-adapted controllers.

2 Related Work

In this section, we review the background on racing approaches in machine learning and in evolutionary computation, we discuss current approaches to the exchange of genetic information between robots and the principles behind population cloning, and we describe odNEAT.

2.1 Racing

The general racing framework was originally introduced by Maron and Moore [17] as a technique for model selection in machine learning. The key principle behind racing is to iteratively test multiple models in parallel, use the error values of each model to discard those that are statistically inferior as soon as there is enough evidence, and then concentrate the computational effort on the remaining models. In this way, models race against each other.

Given the similarity between model selection in machine learning and parameter tuning in meta-heuristics, such as evolutionary algorithms, previous contributions have assessed how evolutionary techniques could benefit from racing procedures [18, 19, 20]. In [11], Haasdijk et al. studied how racing could be used to cut short the evaluation of poor controllers in online evolution. The authors used \((\mu + 1)\)-online [5], an encapsulated algorithm in which there is no exchange of genetic information among robots. In the native \((\mu + 1)\)-online algorithm, a new controller is produced at regular time intervals, and operates for a fixed amount of time called the evaluation period. When the evaluation period elapses, a new controller is synthesised and its evaluation starts. In the racing version of \((\mu + 1)\)-online, the current controller is compared with those previously evaluated as it operates. If the fitness score of the controller is below a lower bound, the evaluation is aborted. The lower bound is computed based on a modified version of Hoeffding’s bounds [13] taking into account the fitness score of the worst controller in the population.

Haasdijk et al. [11]’s study was the first demonstration of how racing techniques could be used to speed up online evolution. There are, however, a number of disadvantages regarding the authors’ approach. Firstly, algorithms such as \((\mu + 1)\)-online can lead to incongruous group behaviour and poor performance in collective tasks due to the periodic substitution of controllers [3]. Secondly, \((\mu + 1)\)-online is an encapsulated algorithm, meaning that an isolated instance runs independently on each robot. In this way, \((\mu + 1)\)-online does not benefit from the parallelism in multirobot systems or from exchange of genetic information between robots, which can effectively speed up online evolution [3]. Thirdly, Haasdijk et al.’s approach was tailored to the elitist dynamics of \((\mu + 1)\)-online as the lower bound of performance considers that the worst fitness score in the population does not decrease. Thus, the approach may be subject to backtracking when applied to non-elitist algorithms. Finally, even with the racing technique, the combined approach still requires the experimenter to decide on the maximum evaluation period, and is significantly sensitive to such parameter settings [12].

2.2 Exchange of Genetic Information in Online Evolution

The exchange of genetic information between robots is a crucial feature in distributed, online evolutionary algorithms. This process can be viewed as a set of inter-robot reproduction events, in which reproduction is implemented using other robots in the same group [2]. In traditional evolutionary algorithms, selection precedes reproduction and is accomplished by having [2]: (i) more-fit individuals becoming parents and supplying genes, (ii) less-fit individuals being replaced by the offspring, or (iii) by a combination of the two. In online evolution, this amounts to individual robots transmitting to neighbouring robots either part of a genome [2] or a complete genome [3, 7]. That is, the genome is the unit in the selection process, and the population of robots is a distributed substrate which genetic information can spread across.

In our population cloning approach, see Sect. 3.2 for a description, we take on a novel approach to the exchange of genetic information between robots. We place the selection and reproductive processes at a higher level. That is, we consider the elements in the selection process to be the internal population of each robot. A robot can therefore transmit to neighbouring robots a copy of any part of its population (e.g. a single genome or a set of genomes representing high-performing controllers) or of the complete population. In this way, robots have the potential to leverage the genetic information they have accumulated, and to enable a more effective knowledge transfer to solve the current task.

2.3 Online Evolution with odNEAT

This section provides an overview of odNEAT; a comprehensive description of odNEAT can be found in [3]. odNEAT is distributed across multiple robots that exchange candidate solutions to the task. Specifically, the evolutionary process is implemented according to a physically distributed island model. Each robot optimises an internal population of genomes (directly encoded artificial neural networks) through intra-island variation, and genetic information between two or more robots is exchanged through inter-island migration. In this way, each robot is potentially self-sufficient and the evolutionary process opportunistically capitalises on the exchange of genetic information between multiple robots for collective problem solving [3, 4].

One of the key features of odNEAT it that it starts with minimal artificial neural networks (ANNs) with no hidden neurons, that is, with each input neuron connected to every output neuron. Throughout evolution, topologies are gradually complexified by adding new neurons and new connections through mutation. In addition, the internal population of each robot implements a niching scheme comprising speciation and fitness sharing, which allows each robot to maintain a healthy diversity of candidate solutions with differing topologies. In this way, odNEAT is able to evolve a suitable degree of complexity for the current task, and an appropriate ANN topology is the product of the evolutionary process [3].

During task execution, each robot is controlled by an ANN that represents a potential solution to the task. Each controller maintains a virtual energy level reflecting its individual task performance. The fitness score is defined as the mean energy level, sampled at regular time intervals. When the virtual energy level reaches a minimum threshold, the current controller is considered unfit for the task. A new controller is then synthesised via selection of a parent species and two genomes from that species (the parents), crossover of the parents’ genomes, and mutation of the offspring. Mutation is both structural and parametric, as it adds new neurons and new connections, and optimises parameters such as connection weights and neuron bias values. A new controller is guaranteed a maturation period during which the controller is not replaced.

odNEAT has been successfully used in a number of simulation-based studies related to long-term adaptation and learning in robot systems. Previous studies have shown key features of odNEAT, including: (i) adaptivity, as odNEAT effectively evolves controllers for robots that operate in dynamic environments [15], (ii) scalability, in the sense that odNEAT allows groups of different size to leverage their multiplicity [4], (iii) robustness, as the controllers evolved can often adapt to changes in environmental conditions without further evolution [3, 14], and (iii) fault tolerance: robots executing odNEAT are able to adapt and learn new behaviours in the presence of sensor faults [3], and (v) how to incorporate and optimise behavioural building blocks prespecified by the human experimenter [21]. Given previous results and the ability to efficiently optimise ANN weights and topology, odNEAT is used in our study as a representative distributed online evolutionary algorithm.

3 Racing and Cloning in Multirobot Systems

We propose a combined racing and cloning approach to speed up online evolution of robotic controllers in multirobot systems. The objective is to leverage racing to cut short the evolution of poor controllers, and the genetic information accumulated by individual robots evolving in parallel.

3.1 Racing

We extended odNEAT with a racing approach based on a modified version of the non-parametric Hoeffding’s bounds [13]. The major advantage of Hoeffding’s bounds is that they do not require any assumption regarding the underlying fitness distribution. In our racing approach, an evaluation is aborted if the controller’s performance \(F_{current}\) is below a lower bound \(L_b\) given by:
$$\begin{aligned} L_b = M_c(t) - 2\cdot \xi (t) \end{aligned}$$
(1)
$$\begin{aligned} \xi (t) = \sqrt{\frac{(F_{best} - F_{worst})^2 \cdot \log (2/\alpha )}{2 \cdot t}} \end{aligned}$$
(2)
where \(M_c(t)\) is a dynamic fitness threshold henceforth referred to as minimal criterion (see below), t is the current control cycle since the controller started executing, \(F_{worst}\) and \(F_{best}\) are respectively the fitness scores of the worst and best controllers in the internal population, and \(\alpha \) is the significance level of the comparison. The minimal criterion is computed based on the fitness of the internal population. Whenever there is a change to the fitness scores of a given controller in the population (e.g. a robot receives a new controller or the fitness score of a controller is updated), \(M_c(t)\) is computed based on the value \(v_n\), which corresponds to the P-th percentile of the fitness scores in the population:
$$\begin{aligned} M_c(t) = M_c(t-1) + max(0, (v_n - M_c(t-1)) \cdot W) \end{aligned}$$
(3)
where W is a weighting parameter that enables fine-grained control over the magnitude of the changes to the minimal criterion. Because racing approaches require a certain number of measurement points to produce reliable results [20], we take advantage of the maturation period of odNEAT to put a lower boundary on the sample size for the racing approach. That is, racing can only abort the evaluation of a controller after the maturation period has expired.

3.2 Population Cloning

To implement our population cloning technique according to the principles described in Sect. 2.2, we adopt an approach in which internal populations compete when robots meet. Specifically, when two robots are in communication range, a connection link between the robots is created if none of them has been involved in a population cloning process within a predefined period of time \(P_c\). Winner and loser are determined by comparing the \(M_c(t)\) value of each robot, as defined in Eq. 3, which is indicative of the performance of each population. The robot with the highest \(M_c(t)\) value is considered the winner. The genomes injected in the losing robot are those from the population of the winning robot that have a fitness score above \(M_c(t)\).

We consider two variants of the population cloning approach. In one variant, genomes are injected from one robot to another as described above. In a second variant, the internal population of the losing robot is subject to an extinction event. The genomes in the receiving population that yield a fitness score below the \(M_c(t)\) of the winner robot are removed before the injection of new genomes, thus potentially pushing evolution towards higher quality solutions.

4 Methods

In this section, we define our experimental methodology1, including the simulation platform and robot model, and we describe the three tasks used in the study: two foraging tasks with differing complexity and a dynamic phototaxis.
Table 1.

Controller details. Light sensors have a range of 50 cm (phototaxis task). Other sensors have a range of 25 cm.

Foraging tasks – controller details

Input neurons: 25

4 for IR robot detection

4 for IR wall detection

1 for energy level reading

8 for resource A detection

8 for resource B detection

Output neurons: 3

2 for left and right motor speeds

1 for controlling the gripper

Phototaxis task – controller details

Input neurons: 25

8 for IR robot detection

8 for IR wall detection

8 for light source detection

1 for energy level reading

Output neurons: 2

Left and right motor speeds

4.1 Experimental Setup

We use the JBotEvolver platform [22] to conduct our simulation-based experiments. JBotEvolver is an open-source, multirobot simulation platform andneuroevolution framework. The robots are modelled after the e-puck [16]. The e-puck is a small circular (7.5 cm in diameter) differential drive robot that can move at speeds of up to 13 cm/s. Similarly to the e-puck, each simulated robot is equipped with infrared sensors that multiplex obstacle sensing and communication between robots at a range of up to 25 cm. The controller details, namely input and output configurations for the tasks, are listed in Table 1. Each sensor and each actuator are subject to noise, which is simulated by adding a random Gaussian component within \(\pm \) 5 % of the sensor saturation value or of the current actuation value. The controllers are discrete-time ANNs with connection weights in the range [-10,10]. The inputs of the neural network are the readings from the sensors, normalised to the interval [0,1]. The output layer is composed of two neurons. The values of the output neurons are linearly scaled from [0,1] to [-1,1] to set the signed speed of each wheel. In the two foraging tasks, each robot is also equipped with a gripper that enables the robot to collect the closest resource within a range of 2 cm, if there is any. In these two tasks, a third output neuron is used to set the state of the gripper. The gripper is activated if the output value of the neuron is higher than 0.5, otherwise it is deactivated.

Foraging Tasks. In the foraging tasks, robots have to search for and collect objects spread across the environment. Foraging is a canonical task in cooperative robotics, and is evocative of tasks such as search and rescue, harvesting, and toxic waste clean-up [23].

Similarly to [4], we setup a foraging task with different types of resources that have to be collected. Robots spend virtual energy at a constant rate, and gain energy when they collect resources. When a resource is collected by a robot, a new resource of the same type is placed randomly in the environment in order to keep the number of resources constant. We conduct experiments with two variants of a foraging task: (i) in one variant there are only type A resources, henceforth called standard foraging task, and (ii) in the other variant there are two types of resources, namely type A and type B resources, henceforth called concurrent foraging task. In the concurrent foraging task, type A and type B have to be consumed alternately. In this way, besides having to learn the foraging aspects of the task, robots need to actively decide which type of resource to collect. The energy level of each controller is initially set to 100 units, and limited to the range [0,1000]. At each control cycle, E is updated as follows:
$$\begin{aligned} \frac{\varDelta E}{\varDelta t} = {\left\{ \begin{array}{ll} reward\_item &{} \text{ if } \text{ right } \text{ type } \text{ of } \text{ resource } \text{ is } \text{ collected } \\ penalty\_item &{} \text{ if } \text{ wrong } \text{ type } \text{ of } \text{ resource } \text{ is } \text{ collected } \\ \text{-0.02 } &{} \text{ if } \text{ no } \text{ resource } \text{ is } \text{ consumed } \end{array}\right. } \end{aligned}$$
(4)
where \(reward\_item\) = 10 and \(penalty\_item\) = -10. The constant decrement of 0.02 means that each controller will execute for a period of 500 s if no resource is collected since it started operating. Note that the \(penalty\_item\) component applies only to the concurrent foraging task. The number of resources of each type is set to the number of robots multiplied by 10.
Phototaxis Task. In the classic phototaxis task, a widely used benchmark in evolutionary robotics experiments, robots have to find and move towards a light source. Following previous studies [3, 4], we setup a dynamic phototaxis task. In this task, the light source is periodically moved to a new random location. The robots thus have to continuously search for and reach the light source, which eliminates controllers that find the light source by chance. The virtual energy level is limited to the range [0,1000] units. Each controller is assigned an initial value of 100 units. At each control cycle, E is updated as follows:
$$\begin{aligned} \frac{\varDelta E}{\varDelta t} = {\left\{ \begin{array}{ll} S_r &{} \text{ if } S_r > 0.5 \\ \text{0 } &{} \text{ if } 0 < S_r \le 0.5 \\ \textit{penalty} &{} \text{ if } S_r = 0 \end{array}\right. } \end{aligned}$$
(5)
where penalty = -0.01, \(S_r\) is the maximum value of the readings from light sensors, between 0 (no light) and 1 (brightest light). Light sensors have a range of 50 cm, meaning that robots are only rewarded if they are close to the light source. The remaining sensors have a range of 25 cm.

4.2 Experimental Parameters and Treatments

We assess the performance of four approaches: (i) standard odNEAT, henceforth called odNEAT, (ii) odNEAT with racing alone, which we simply refer to as racing, and (iii, iv) racing plus population cloning, with and without extinction events (racing-ppc-rem and racing-ppc-norem, respectively). For each task and each algorithm considered, we conduct 30 independent runs. Each run lasts 100 hours of simulated time. Robots operate in a square arena surrounded by walls. The size of the arena is chosen to be 3 \(\times \) 3 meters. odNEAT parameters are set as in previous studies [3], including a population size of 40 genomes per robot. Each robot executes a control cycle every 100 ms. Regarding the minimal criterion for racing, we set P to the 50th percentile of the fitness scores found in the population and \(W = 1\), meaning that \(M_c(t)\) amounts to the median fitness score, and \(\alpha ~=~0.95\). In the population cloning technique, we set \(P_c\) to 100 s of simulated time. These parameter settings are robust to moderate variation, and were found to perform effectively in preliminary experiments.

5 Experimental Results

In this section, we present and discuss the experimental results. We use the two-tailed Mann-Whitney test to compute statistical significance of differences between sets of results because it is a non-parametric test, and therefore no strong assumptions need to be made about the underlying distributions.

5.1 Comparison of Performance

Figure 1 shows the mean fitness score of controllers throughout the simulation trials. In the two foraging tasks, racing-ppc-rem and racing-ppc-norem typically produce high-performing solutions in the early stages of evolution, which contributes to their superior performance. These approaches consistently outperform racing and odNEAT. In addition, the solution-synthesis process of racing with and without population cloning contrasts with that of odNEAT, which synthesises increasingly higher-performing controllers in a more progressive manner. In the dynamic phototaxis task, differences in performance between the most effective approaches (racing-ppc-rem and racing-ppc-norem) and the less effective ones further accentuate, which provides additional evidence regarding the benefits of racing plus population cloning.

Regarding the fitness score of the final controllers, see Fig. 2, racing-ppc-rem and racing-ppc-norem lead to superior collective performance across the three tasks, which is validated by the distribution of the mean fitness of each group of robots (\(\rho < 0.001\) and \(\rho < 0.05\) in the standard foraging task, \(\rho < 0.01\) and \(\rho < 0.001\) in the concurrent foraging task, \(\rho < 0.0001\) and \(\rho < 0.001\) in the dynamic phototaxis task, respectively). Differences between racing-ppc-rem and racing-ppc-norem are not statistically significant across all comparisons.
Fig. 1.

Mean fitness score of controllers throughout the simulation trials. Top: standard foraging and concurrent foraging. Bottom: dynamic phototaxis.

Fig. 2.

Distribution of the mean group fitness of the final controllers. From left to right: standard foraging, concurrent foraging, and dynamic phototaxis.

Our results show distinct features of racing and population cloning techniques. Comparing with odNEAT alone, the main benefit of racing is a potential speed up of evolution. The benefits of combining racing with population cloning, on the other hand, are twofold: racing-ppc-rem and racing-ppc-norem significantly speed up the evolutionary process and lead to the synthesis of superior controllers. This result is particularly significant to online evolution because racing-ppc-rem and racing-ppc-norem effectively minimise the time spent assessing the quality of poor controllers, which is time spent not performing adequately at the task to which solutions are sought.

5.2 Analysis of the Evolutionary Dynamics

We first analysed how the fitness score of the final controllers vary within each group to better understand the evolutionary dynamics of the multiple approaches. To measure the intra-group fitness variation, we computed the relative standard deviation (RSD) of the fitness scores of each group of robots. Values close to zero indicate similar fitness values within the group. Higher values, on the other hand, indicate increasingly larger variation of fitness scores.
Fig. 3.

Distribution of the RSD of the final controllers. From left to right: standard foraging, concurrent foraging, and dynamic phototaxis.

The distribution of RSD values is shown in Fig. 3. Across the three tasks, racing-ppc-rem typically displays the smaller intra-group fitness variation. odNEAT displays significantly larger variation than racing (\(\rho ~<~0.001\)). In turn, the variation of racing is also larger than that displayed by both racing-ppc-rem and racing-ppc-norem (\(\rho < 0.0001\) in the standard foraging and in the dynamic phototaxis tasks, \(\rho < 0.01\) in the concurrent foraging task, respectively). Population cloning thus leads not only to more capable controllers, but also to more consistent groups fitness-wise. Overall, the results suggest a boosting effect on the evolutionary process and an interplay between racing and cloning towards stable collective performance (hypothesis 1). In addition, the high RSD values of odNEAT across the three tasks and the relatively higher RSD values of racing in the dynamic phototaxis task suggest that the performance of such approaches may be more sensitive to the task requirements and environmental pressure than the performance of racing-ppc-rem and racing-ppc-norem (hypothesis 2).

We conducted two sets of complementary experiments using the dynamic phototaxis task in order to verify the two hypotheses. In the first set of experiments, we removed the racing component of both racing-ppc-rem and racing-ppc-norem, and we assessed the performance of population cloning in isolation, henceforth ppc-rem and ppc-norem. In the second set of experiments, we studied the relation between evolutionary pressure and evolutionary dynamics by varying the value of the penalty component defined in Eq. 5 when the light source is not in a robot’s line of sight. The penalty was increased by a factor of 2, 5, 8, and 10, that is, to a value of -0.02, -0.05, -0.08, and -0.10 per control cycle. In this way, each controller unable to find the light source respectively executes for a period of 500, 200, 125, and 100 s since it started operating. These experimental setups are henceforth referred to as p2, p5, p8, and p10 setups, respectively. For each configuration in each set of complementary experiments, we conducted 30 independent runs.
Fig. 4.

Mean fitness score of controllers throughout the simulation trials for the first set of complementary experiments (see text for details). The ppc-rem and ppc-norem approaches refer to population cloning with and without extinction events, but without the racing component.

Isolating the Effects of Population Cloning. Figure 4 shows the mean fitness score throughout the simulation trials for the first set of complementary experiments. Overall, the results confirm that adding population cloning can effectively boost the evolutionary process and push towards higher-quality solutions. In addition, the median RSD of the final controllers is of 1.158 for odNEAT, 0.708 for racing, 0.375 for ppc-rem, 0.829 for ppc-norem, 4.17 \(\cdot 10^{-4}\) for racing-ppc-rem, and 1.89 \(\cdot 10^{-2}\) for racing-ppc-norem, which confirms the interplay between racing and cloning in the evolution of controllers with less disparate performance levels. The key reason for such interplay is that population cloning typically increases the fitness scores of the receiving robot’s internal population and therefore the \(M_c(t)\) value, which in turn contributes to further increasing the lower bound of performance of racing.

Modifying the Evolutionary Pressure. In terms of the second set of complementary experiments, as shown in Fig. 5, an analysis of the mean fitness score throughout evolution shows that both racing-ppc-rem and racing-ppc-norem achieve qualitatively similar performance levels across the setups with varying evolutionary pressures. The remaining two approaches, odNEAT and racing, yield different performance levels as the evolutionary pressure varies. From p5 onwards, racing is the approach that evolves, on average, the controllers with lowest performance. The mean fitness score of the final controllers is 410.08, 342.16, 371.75, and 265.98, respectively. odNEAT, on the other hand, synthesises controllers with superior performance levels as the evolutionary pressure is increased. The mean fitness score of the final controllers varies from 241.64 in the p2 setup to 924.24 in the p10 setup, outperforming both racing-ppc-rem and racing-ppc-norem in the two most demanding configurations of the dynamic phototaxis task.
Fig. 5.

Mean fitness score of controllers throughout the simulation trials for the second set of complementary experiments (see text for details).

One way to understand the responses of racing and of odNEAT to the evolutionary pressure is to study: (i) the operation time (age) of the controllers used by the robots during the experiments, and (ii) the number of controllers produced. The operation time of controllers relates to the robustness of solutions evolved and ability to adapt to changes in the position of the light source. Complementarily, the number of controllers produced is an indicator of the difficulty of the evolutionary process to adapt the behaviour of the robots.

Figure 6 shows the mean operation time of controllers during the experiments. For racing-ppc-rem and racing-ppc-norem, the operation time typically increases linearly, with a gentle slope, proportionally to the simulation time, which indicates that new controllers are increasingly rarely synthesised after the early stages of evolution. In effect, the final solutions synthesised by racing-ppc-rem and racing-ppc-norem operate, on average, up to 98 consecutive hours, which indicates that the controllers are robust and well-adapted to the task. This result, combined with the number of controllers produced, showcases the ability of the algorithms to quickly assess and abort the evaluation of inefficient controllers. Regarding racing, the algorithm displays a trend to that of racing-ppc-rem and racing-ppc-norem but yields a relatively lower mean operation time and higher number of controllers produced, which indicates that it typically requires more evaluations and therefore more time to evolve stable solutions to the task. Complementarily, odNEAT substitutes the controller of each robot more frequently as the evolutionary pressure increases. Specifically, each robot executing odNEAT produces on average 24 controllers in the p2 setup, 120 controllers in the p5 setup, 420 controllers in the p8 setup, and 920 controllers in the p10 setup. This result confirms that, in this particular case, odNEAT’s dynamics are more sensitive than the dynamics of the racing approaches to the magnitude of the evolutionary pressure.
Fig. 6.

Mean operation time of controllers produced throughout the simulation trials for the second set of complementary experiments (see text for details)

6 Concluding Discussion and Future Work

In this paper, we proposed two novel approaches to speed up online evolution of controllers in multirobot systems: (i) a racing technique, and (ii) a population cloning technique. To implement our approaches, we used odNEAT, a decentralised online evolution algorithm in which robots optimise controllers in parallel and exchange candidate solutions to the task. We conducted experiments with four approaches (odNEAT, racing, racing-ppc-rem, and racing-ppc-norem) in three tasks: (i, ii) two foraging tasks with differing complexity, and (iii) dynamic phototaxis.

We showed the benefits of our racing approach, and of a population cloning technique that allows evolution to effectively leverage the genetic information accumulated by each individual robot. The combined racing plus population cloning approaches typically yielded: (i) the highest task performance in terms of the fitness score, (ii) the fastest evolution of effective solutions to the task, (iii) the most consistent and stable group-level performance, and (iv) the highest degree of robustness as the evolutionary pressure to solve the task increases. However, if the evolutionary pressure is set above a certain limit, algorithms such as odNEAT can, in certain conditions, display superior performance in the long-term. One key research question is therefore how to enable robots to find the best evolutionary algorithm to solve a given task during the actual task execution.

The immediate follow-up work is to investigate the performance of our proposed approaches in real multirobot systems. In this respect, we also intend to investigate: (i) the effects of heterogeneous racing at the algorithm level, that is, of allowing multiple robots to race with different configurations of the online evolutionary algorithm in order to find the most effective one, and (ii) cloning techniques and their limits, including potential robustness vs. stagnation trade-offs, and how sensitive is the performance of population cloning to the frequency of interactions between robots.

Footnotes

  1. 1.

    The source code of the experiments can be found at: http://fgsilva.com/?page_id=302.

Notes

Acknowledgements

This work was partly supported by FCT under grants SFRH/BD/89573/2012, UID/EEA/50008/2013, and UID/Multi/04046/2013.

References

  1. 1.
    Floreano, D., Mondada, F.: Automatic creation of an autonomous agent: genetic evolution of a neural-network driven robot. In: 3rd International Conference on Simulation of Adaptive Behavior, pp. 421–430. MIT Press, Cambridge (1994)Google Scholar
  2. 2.
    Watson, R., Ficici, S., Pollack, J.: Embodied evolution: distributing an evolutionary algorithm in a population of robots. Rob. Auton. Syst. 39(1), 1–18 (2002)CrossRefGoogle Scholar
  3. 3.
    Silva, F., Urbano, P., Correia, L., Christensen, A.L.: odNEAT: an algorithm for decentralised online evolution of robotic controllers. Evol. Comput. 23(3), 421–449 (2015)CrossRefGoogle Scholar
  4. 4.
    Silva, F., Correia, L., Christensen, A.L.: A case study on the scalability of online evolution of robotic controllers. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS, vol. 9273, pp. 189–200. Springer, Heidelberg (2015)Google Scholar
  5. 5.
    Haasdijk, E., Eiben, A., Karafotias, G.: On-line evolution of robot controllers by an encapsulated evolution strategy. In: IEEE Congress on Evolutionary Computation, pp. 1–7. IEEE Press, Piscataway (2010)Google Scholar
  6. 6.
    Prieto, A., Becerra, J., Bellas, F., Duro, R.J.: Open-ended evolution as a means to self-organize heterogeneous multi-robot systems in real time. Rob. Auton. Syst. 58(12), 1282–1291 (2010)CrossRefGoogle Scholar
  7. 7.
    Bredeche, N., Montanier, J.M., Liu, W., Winfield, A.: Environment-driven distributed evolutionary adaptation in a population of autonomous robotic agents. Math. Comput. Model. Dyn. Syst. 18(1), 101–129 (2012)CrossRefMATHGoogle Scholar
  8. 8.
    Silva, F., Duarte, M., Correia, L., Oliveira, S.M., Christensen, A.L.: Open issues in evolutionary robotics. Evol. Comput. In press (2016). http://www.mitpressjournals.org/doi/pdf/10.1162/EVCO_a_00172
  9. 9.
    Dinu, C.M., Dimitrov, P., Weel, B., Eiben, A.: Self-adapting fitness evaluation times for on-line evolution of simulated robots. In: 15th Genetic and Evolutionary Computation Conference, pp. 191–198. ACM, New York (2013)Google Scholar
  10. 10.
    Arif, A., Nedev, D., Haasdijk, E.: Controlling maximum evaluation duration in on-line and on-board evolutionary robotics. Evolving Syst. 5(4), 275–286 (2014)CrossRefGoogle Scholar
  11. 11.
    Haasdijk, E., Atta-ul Qayyum, A., Eiben, A.: Racing to improve on-line, on-board evolutionary robotics. In: 13th Genetic and Evolutionary Computation Conference, pp. 187–194. ACM, New York (2011)Google Scholar
  12. 12.
    Haasdijk, E., Smit, S.K., Eiben, A.E.: Exploratory analysis of an on-line evolutionary algorithm in simulated robots. Evol. Intell. 5(4), 213–230 (2012)CrossRefGoogle Scholar
  13. 13.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Silva, F., Correia, L., Christensen, A.L.: Dynamics of neuronal models in online neuroevolution of robotic controllers. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 90–101. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Silva, F., Urbano, P., Christensen, A.L.: Online evolution of adaptive robot behaviour. Int. J. Nat. Comput. Res. 4(2), 59–77 (2014)CrossRefGoogle Scholar
  16. 16.
    Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat, S., Zufferey, J., Floreano, D., Martinoli, A.: The e-puck, a robot designed for education in engineering. In: 9th Conference on Autonomous Robot Systems and Competitions, IPCB, Castelo Branco, Portugal, pp. 59–65 (2009)Google Scholar
  17. 17.
    Maron, O., Moore, A.W.: The racing algorithm: model selection for lazy learners. Artif. Intell. Rev. 11(1), 193–225 (1997)CrossRefGoogle Scholar
  18. 18.
    Lobo, F.G.: The Parameter-Less Genetic Algorithm: Rational and Automated Parameter Selection for Simplified Genetic Algorithm Operation. Ph.D. thesis, Universidade Nova de Lisboa, Lisbon, Portugal (2000)Google Scholar
  19. 19.
    Birattari, M., Stützle, T., Paquete, L., Varrentrapp, K.: A racing algorithm for configuring metaheuristics. In: 4th Genetic and Evolutionary Computation Conference, pp. 11–18. Morgan Kauffmann, San Francisco (2002)Google Scholar
  20. 20.
    Yuan, B., Gallagher, M.: Combining meta-EAs and racing for difficult EA parameter tuning tasks. In: Lob, F.G., Lima, F.C., Michalewicz, Z. (eds.) Parameter Setting in Evolutionary Algorithms. Studies in Computational Intelligence, vol. 54, pp. 121–142. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Silva, F., Correia, L., Christensen, A.L.: Speeding up online evolution of roboticcontrollers with macro-neurons. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 765–776. Springer, Heidelberg (2014)Google Scholar
  22. 22.
    Duarte, M., Silva, F., Rodrigues, T., Oliveira, S.M., Christensen, A.L.: JBotEvolver: a versatile simulation platform for evolutionary robotics. In: 14th International Conference on the Synthesis and Simulation of Living Systems, pp. 210–211. MIT Press, Cambridge (2014)Google Scholar
  23. 23.
    Cao, Y., Fukunaga, A., Kahng, A.: Cooperative mobile robotics: antecedents and directions. Auton. Rob. 4(1), 1–23 (1997)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Fernando Silva
    • 1
    • 2
    • 4
  • Luís Correia
    • 4
  • Anders Lyhne Christensen
    • 1
    • 2
    • 3
  1. 1.BioMachines LabLisboaPortugal
  2. 2.Instituto de TelecomunicaçõesLisboaPortugal
  3. 3.Instituto Universitário de Lisboa (ISCTE-IUL)LisboaPortugal
  4. 4.BioISI, Faculdade de CiênciasUniversidade de LisboaLisboaPortugal

Personalised recommendations