Keywords

1 Introduction

Collective motion is widely documented in groups of animals in nature and has been shown to enhance the group with capabilities that are not apparent in an individual group member. Those emergent capabilities include increased environmental awareness [6, 16], protection against predators [21], and gradient sensing [23]. Swarm robotics aims at implementing such collective behaviors in robots to leverage those advantageous emergent properties for engineering purposes. Collective motion can enable robotic swarms to achieve sensing beyond the capabilities of individual agents [15], that is, emergent perception.

The main challenge in designing collective motion behaviors, and swarm robotics systems in general, comes from the fact that designers can only implement robot controllers on individual robots, but that the desired behavior is defined at the group level [11]. Therefore, successful group behaviors depend on emergent properties, which are difficult to predict. An approach to this problem is to define a success metric at the group level and to use it as a reward to automatically optimize robot controllers [9, 11, 27, 32]. However, the solutions that emerge through this process tend to overfit their training environment and, consequently, lack flexibility. Therefore, automated design requires a framework that maintains good performance under a variety of environmental conditions. For example, modularization of individual-level control has been proposed as a solution [9].

We believe that, by leveraging on heterogeneity, we can achieve a modular framework at the swarm level, as opposed to the prevailing ‘homogeneous designs’ typically found in the swarm robotics literature [9, 11, 27]. Distinct sub-group behaviors can emerge when task specialization is beneficial for the group as a whole [14]. For example, social insects divide their tasks into sub-tasks, assigned to specific group members, in order to improve their efficiency [24]. Maintaining a ‘group identity’ while splitting in sub-groups can be achieved through genetic homogeneity with phenotypic heterogeneity. For example, within an insect colony, members share similar-to-identical genotypes [25]. The various behaviors used by insects are encoded in the same genotype and are activated by external cues, e.g., queen pheromones [12]. This phenomenon, in which the same genotype expresses a different phenotype through gene regulatory mechanisms, is known as phenotypic plasticity. Although it is more commonly useful for task partitioning [8, 24], it has also been found in species exhibiting collective motion [1].

In this paper, we propose to improve the flexibility of automated designs by promoting modularity at the swarm level through heterogeneity in self-organized collective behaviors (as apposed to individual-level). Our framework is based on phenotypic plasticity, where we evolve separate controllers for sub-groups inside the swarm and, through a regulatory mechanism, adjust the phenotypic ratio of each group. More specifically, we consider polyphenism, a specific case of phenotypic plasticity, whereby phenotypic plasticity is expressed at birth and remains constant throughout life [1]. Our work is novel as it evolves a heterogeneous swarm on a swarm level without a priori knowledge of how heterogeneity should be leveraged. This makes our method task-agnostic and highly adaptable.

The paper is organized as follows. In Sect. 2, we present the state of the art on automated design for heterogeneous swarms with a unique approach, that we test in an emergent perception task. Our implementation is detailed in Sect. 3, considering robots that have limited sensing capabilities (which prevents them from achieving the group task individually) and lack awareness of the specific roles or specializations within the swarm. In Sect. 4, we present our optimization results and re-test our best controller with different sub-group ratios. This analysis enables us to design an online regulatory mechanism dependent on local conditions, where robots automatically switch between the controllers with a probabilistic finite-state machine. We discuss, in Sect. 5, specialization and cooperation of the different sub-groups and conclude our final remarks and future work in Sect. 6.

2 Related Work

Behavioral heterogeneity induces several challenges, the first of which is task allocation, i.e. dynamically adjusting the number of agents assigned to each available task [2]. The problem of achieving this goal in a decentralized manner has been widely addressed by swarm robotics. Approaches to this problem usually focus on the mechanism of task switching and, therefore, use relatively simplistic, manually implemented behaviors. With threshold-based responses, robots initially show some preference for a given task, while simultaneously recognizing deficiencies in the accomplishment of other tasks (e.g. objects accumulating). Beyond some threshold, the robot switches to the corresponding task [17]. In some cases, this switch is probabilistic in order to avoid large-scale population switches, which might leave another task unaddressed [4]. Mathematical modeling of task allocation has also been proposed [31], which allows custom task allocation parameters according to their needs.

In addition to task allocation, task specialization focuses on the emergence of several complementary functions within the swarm. It often refers to physically heterogeneous swarms, i.e. multi-robot systems composed of multiple robot platforms [26]. It should be noted that this impairs a vital advantage of swarms, namely robustness, as in such cases the individual members are not interchangeable with one another. A more robust design is when the agents are physically identical but differ in behavioral function [3]. Functional heterogeneity through behavioral specialization does not suffer from this problem [13]. Here, it is possible for robots with the same body to switch behavior while employed, also called task partition. In such a context, maintaining collective behavior at the group level can be tricky, as online regulatory mechanisms tend to switch individual behaviors only [7].

Tuci et al. investigated the evolution of task partition (i.e. both task allocation and specialization), with a clonal and aclonal evolutionary process, in a physically homogeneous swarm of five e-pucks. Here, clonal refers to a single genotype being shared by all the agent, whereas, with aclonal, their genotypes are all different from each other. Unsurprisingly, they found that robots performed better in their aclonal approach [28, 29], especially if their controller was optimized with a multi-objective fitness [30], as they could address the required sub-tasks in parallel. Notably, this approach results in an efficient optimization process, since each agent in the swarm samples a different genome. Moreover, plasticity was observed, in the sense that individual robots were able to switch tasks according to environmental requirements (including their peers’ behaviors).

Closely related, [8] also addressed task partition, with a task that evokes the environmental context of leafcutter ants who divide their foraging task into two sub-tasks: cutting and dropping leaf fragments into a storage area, on the one hand, and collecting and bringing the fragments back to the nest, on the other. Their experimental environment was composed of a slope separating a nest (below) and a source (above) area so that the robots could individually retrieve the food objects back to the nest, or deposit them on the slope and relying on other robots to fetch them. Task allocation was evolved on a probabilistic finite-state machine, composed of simplistic pre-programmed behaviors, without specifying a preference for collective, rather than individual behavior. Their success demonstrates that even homogeneous controllers (i.e. a single phenotype) can handle task partition through individual experience, stigmergy, and stochastic switching alone, given enough knowledge on the task to design viable behaviors.

The presented work on heterogeneous swarm optimization requires in-depth knowledge of the specific learning task. Whether it be the design of sub-tasks/goals, pre-defined (modular) behaviors, or finite states; a priori knowledge is required for the design of these controllers. If such insight is available, we could utilize more specialized optimization methods (e.g. [20, 22]). Unfortunately, in our case, such prior knowledge is undisclosed with emergent capabilities.

We aim to make collective specialized behavior emerge, without any explicit reward on the specific sub-tasks. Differently from the aforementioned works, our approach requires minimal insight about the solution as we do not pre-define various subtasks/behaviors, such as bucket brigading, finite-state machines, or task-allocations. Instead, we define a reward on the overall group-level performance of the whole task, and let specialization evolve as an optimal solution to simplify the design of a good performance metric. Our method is more flexible than a modular design with pre-defined specialized behaviors that presuppose sub-group interactions affecting the overall swarm performance. In addition, we automatically obtain task allocation through our online regulatory mechanism. This straightforward method minimizes the time and effort required, while demonstrably improving overall task performance. Altogether, this work shows that optimal task partition can be designed automatically using evolutionary computing, without any specific sub-task knowledge. To the best of our knowledge, no previous work exists that addresses task specialization and allocation in such a context.

3 Methodology

Optimising a heterogeneous swarm controller without specific knowledge on any sub-task requires a flexible approach that can be applied on any type of task. We tested our method in an emergent perception task for gradient sensing where robots have to find the brightest spot in the center (inspired by fish behavior described in [23]). For this, we utilize black box optimization in the form of an evolutionary algorithm on Reservoir Neural Networks (RNN), a method that has been used on homogeneous swarms in a similar capacity [32]. Full code base can be found at https://github.com/fudavd/EC_swarm/tree/PPSN_2024

Robot Design

Each robot in the swarm has an identical differential drive hardware design based on the Thymio II without any communication capabilities (Bluetooth, radio, WiFi). Our robot consists of a cart with two actuated wheels (max speed is \(\pm 14\mathrm{cm/s}\)) in the back and a passive omni-directional wheel in the front. We equip our robots with range and bearing sensing in 4 directions (specifics are detailed in Sect. 3) and a local value sensor to measure local light intensity. These sensors are sampled at 10 Hz to obtain control inputs only based on current information, i.e. no memory of previous state.

It is important to be explicit on the capabilities of our robot design: 1) Robots do not communicate information to each other (e.g. local values at their position, future motor inputs, or any form of message passing); 2) The controller is memoryless, only current local sensor readings are known; 3) there is no notion of specialization inside the controller, meaning robots are ‘unaware’ of specialization inside the swarm. All in all, this grounds the idea of ‘limited sensing’, as a single robot is incapable of estimating the gradient of the light.

Controller Design

For general applicability, we require our controller to be as flexible as possible while capable of learning quickly. For this we opt to use neural networks (which are expressive function approximators) with random functionalities in the form of a reservoir to speed up learning (i.e. RNN, [18]). This reservoir is created by freezing the network weights up to the last layer after random initialization, resulting in a fixed set of functions from which we learn an optimal combination in the final network layer.

To allow specialization, we divide our swarm into two sub-groups, with each sub-group containing a different RNN controller (all sub-group members have the same RNN). The two RNNs are randomly initialized with different reservoirs that we save at the start. We describe the swarm genotype as a single vector of weights from which the first half refers to the last layer of the first RNN, and the second half to the last layer of the second RNN. The phenotypic plasticity of our single genome is expressed through our sub-group division.

The phenotype of the controller is illustrated in Fig. 1. The RNN has an input layer of 9 neurons that are rescaled to [\(-1\), 1], namely 4 directional sensors (each providing two values: distance and heading) and 1 local value sensor. The 4 directional sensors cover a combined \(360^\circ \) view of the robot’s surroundings (front, back, left, and right quadrants of \(90^\circ \) each). Within each quadrant (i) the sensor obtains the distance (\(d_i\)) and relative heading (\(\theta _i\)) of the nearest neighbor up to a maximum range of 2 m (outside of this range the sensor defaults \(d_i\)=2.01 and \(\theta _i\)=0). The RNN outputs target speed (\(v\in [-1,1]\)) and angular velocity (\(w\in [-1,1]\)), which are transformed into direct velocity commands for the two wheels.

The RNN architecture is a fully connected neural network with an input layer with 9 neurons (\(\textbf{s}_{in} \in [-1,1]^9\)), 2 hidden ReLU layers of the same size (\({h}_{1}, {h}_{2} \in \mathbb {R}^9\)), and a final output layer with two \(\textrm{tanh}^{-1}\) neurons, \(RNN \in [-1,1]^2\). All hidden reservoir weights are initialized randomly with a uniform distribution (\(U[-1,1]\)). We set all biases to 0 and only optimize the weights of the output layer during evolution (18 weights per RNN). The final RNN controller can be formalized as follows:

$$\begin{aligned} &RNN = \textrm{tanh}^{-1}\left( \textbf{W}_{out}\textrm{ReLU}\left( \textbf{W}_{h2}\textrm{ReLU}\left( \textbf{W}_{h1}\textbf{s}_{in}\right) \right) \right) \\ &\qquad \text {with,} \quad \textbf{W}_{h1, h2} \in \mathbb {R}^{9\times 9} \quad \text {and} \quad \textbf{W}_{out} \in \mathbb {R}^{2\times 9} \end{aligned}$$
Fig. 1.
figure 1

Reservoir Neuron Network controller design.

Fig. 2.
figure 2

Experimental setup. (a) The scalar map indicating a random instance of our swarm task. (b) Our swarm with different sub-groups (colored red and green). (Color figure online)

Emergent Perception Task

We aim to enhance the sensing capability of our robots so that, when operating collectively as a swarm, they can perceive the gradient. Our set-up consist of 20 robots that are rewarded for navigating to the brightest spot in the center of a 30\(\,\times \,\)30m arena. Since an individual robot lacks the capacity to sense the direction of the gradient field, we anticipate that collective behavior will emerge. This ‘task’ is not too dissimilar to a behavior found in schools of fish that tend to aggregate in the shadow as it decreases their visibility from predators [23].

We use Isaac Gym from [19] to simulate our swarm(s) (\(\mathrm {dt=0.05s}\)). The task environment is a scalar field map with its maximum value (255) in the center (see Fig. 2a). We randomly placed the swarm in a circle at a fixed distance (r = 12 m) from the center. At this position, we randomly place each swarm member within a 3\(\,\times \,\)3m bounding box (shown in red). The swarm is divided into two sub-groups (red and green) of 10 members each, as shown in Fig. 2b.

Fig. 3.
figure 3

Experimental setup for optimizing swarm controllers, where an evolutionary algorithm evaluates different genotypes in our swarm simulator (big dashed box). Please note that our genotype encodes two different controllers colored green and white boxes. (Color figure online)

Evolving Swarm Experiment

We optimize our swarm using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES, [10]). We run 10 repetitions of our experiments to obtain the overall best controller. Baseline comparison is made with a homogeneous swarm ofthe same size that is optimized with only one RNN. The choice of CMA-ES is interchangeable with other derivative free methods, as shown in Fig. 3.

Let us draw a clear distinction between the components within CMA-ES, where individuals in a population are being optimized, and our swarm which refers to an instance of an individual, consisting of robots/members that are assigned to different a sub-group (see Fig. 3). Within our evolutionary algorithm, we thus have individuals that we want to evaluate. All individuals within this evolving population have two RNNs with the same two reservoirs. Differences between individuals are defined by their genotype (a vector of 36 weights, \({\textbf {x}} = \left[ \textbf{W}_{out_{1:}}, \textbf{W}_{out_{2:}}\right] \) with, \({\textbf {x}} \in \mathbb {R}^{36}\)) that encode the last layer weights of the two RNNs. We evaluate our individual by assigning the RNNs to two sub-groups in a single swarm of robots, where each sub-group member (i.e. a robot belonging to a specific sub-group) has the same RNN as the other constituents. After a trial, we calculate a fitness value based on task performance of the swarm and assign it to the corresponding individual.

CMA-ES is a sampling-based evolutionary strategy that aims to find a distribution in the search space to sample high-performing individuals with high probability. Here, CMA-ES samples new candidates \({\textbf {x}}\) according to a multivariate normal distribution. The covariance matrix of the sampling distribution is updated at each generation (\({\textbf {C}}_{gen}\)) to increase the probability of sampling an individual with higher fitness.

We set our population size to 30 individuals (i.e., 30 different swarms) and evolve for 100 generations. Every individual is tested three times, with the median running to represent the final fitness. This reduces the sensitivity to lucky runs that are nonrepeatable and therefore nonviable (Table 1).

Table 1. Evolving swarm experiment parameters

Fitness Function. We define the fitness of a swarm as generally as possible, solely based on the ability to follow the increasing gradient of the scalar field defined by the environment (shown in Fig. 2-a). This is done by aggregating the average light intensity values of all members over time (see Eq. 1).

$$\begin{aligned} {f} = \frac{\sum _{t=0}^{T}{l_t}}{G_{max}\cdot {T}}\quad \text {and}\quad {l}_t = \frac{\sum _{n=1}^{N}{G_n}}{N} \end{aligned}$$
(1)

where \(G_n\) is the scalar value of the grid cell in which agent n (of all agents, N) is located at a time t. Therefore, fitness at a specific time (\(l_t\)) is calculated as the mean scalar light value of all swarm members. The trial fitness (f) is calculated by averaging all \(l_t\) over total simulation time T. Finally, we normalize using the maximum scalar value \(G_{max}\), always equal to 255 for all experiments. A theoretical maximum fitness of 1 can only be achieved if all members of the swarm stack up in the center for the entire run. Fitness is only evaluated on swarm level with no distinction between sub-groups or any other task-specific principles that promote specialization. Additionally, we would like to emphasize that our fitness function does not distinguish between robots that are sensing and following the increasing gradient collectively or as solitaries, or sub-groups that cooperate or not.

Validation Experiments

After our evolutionary experiment, we obtain the overall best controller over 10 runs. We re-test this controller to test the emergent perception capability of the swarm to sense the gradient. Our validation is split in two: 1) we are particularly interested in the possible collective (sub-group) behavior(s) and the importance of sub-group interactions, which we elucidate by re-testing the best evolved controller and analysing the swarms behavior; 2) Using our two RNNs, we implement a straightforward online regulatory mechanism to mimic phenotypic plasticity induced by pheromones [12]. We investigate the viability of our (adaptive) heterogeneous swarm through scalability and robustness experiments.

Collective Behavior and Sub-group Interactions: First, we assess whether collective motion has emerged in a single re-test. We examine two different aspects of collective behavior over time: (1) performance as mean scalar light value of the swarm (Eq. 1, \(l_t\)); (2) alignment in terms of order (Eq. 2, \(\varPhi \)), which is defined as follows:

$$\begin{aligned} \begin{aligned} {\varPhi } = \frac{\sum _{n=1}^{N}{\varphi _n}}{N}\quad \text {and} \quad {\varphi _n}=\frac{\Big |\Big |\left( \sum _{p=1}^{P}{\angle {e^{j \theta _p}}}\right) +\angle {e^{j \theta _n}} \Big |\Big |}{P+1} \end{aligned} \end{aligned}$$
(2)

Here, \(\varphi _n\) defines the order value calculated for agent n. Which is the average current heading direction of agent n (noted \(\angle {e^{j \theta _n}}\)) and all its perceived neighbors P (noted as \(\angle {e^{j \theta _p}}\)). The total swarm order \(\varPhi \) is then defined as the average \(\varphi _n\) for all agents in the swarm. The order measure gives a powerful insight into the alignment of agent’s direction of motion. If all agents move towards the same direction, then the order measure approaches 1; and if they move in different directions, the order approaches 0.

Additionally, we investigate the benefits of sub-group interactions on two factors, namely performance and robustness. We retest our best controller in the same environment where we change the sub-group ratios (ratio \(\in \) {4:0, 3:1, 2:2, 1:3, 0:4}). Different sub-group ratios can tell us if one sub-group is mainly responsible for the swarm performance or if sub-group interactions are important. These different ratios are tested by initializing the swarm at different distances from the center (\(\textrm{r}_{ratio}\) \(\in \left\{ 0,0.25,0.5,0.75,1\right\} \) as a ratio of the original training distance, 12 m).

Online Regulatory Mechanism: Based on the results of the sub-group interactions experiment we can heuristically identify the best performing sub-group ratios at certain light intensities. Subsequently, we create a probabilistic finite state machine where the choice of sub-group controller within a robot is defined such that, on a holistic group level, the optimal sub-group ratios should emerge. This probability is only dependent on the local light value, and thus no communication is implemented to adapt to the best ratio. For example, if we heuristically find a 1:3 ratio to be the best sub-group division at the current local light intensity, the probability to sample an action from the second reservoir is 75%. We update the probabilistic reservoir state every 5 s for stable behavior (this update frequency is found to be optimal by parameter sweep \(\{1, 5, 10, 30, 60, 100\}\) seconds).

Fig. 4.
figure 4

Validation environments. The black striped line indicates the random initialization location of the swarm (similar to Fig. 2). The striped box in (c) indicates the area of random initialization

Table 2. Validation experiment parameters

Scalability and Robustness: The swarm should be able to operate with a wide range of group sizes (i.e. Scalability) and across different types of environment (i.e. Robustness), using the same best controller. In our Scalability experiment, we initialize the swarm in the same environment but with the following swarm sizes 10, 20, 50 to see the impact on performance. In our robustness experiments, we initiate the swarm (of size 20) in different gradient maps ’Bi-modal’, ’Linear’, ’Banana’ as shown below. Each new arena poses different challenges: Bi-modal, requires collective decision on where to go (Fig. 4a); Linear poses a less salient gradient stretched over the full arena (Fig. 4b); Banana, the banana function is a classic nonlinear minimization problem [5] with a curved shallow bottom. For our collective gradient ascent task, this function is interesting, as it has both shallow gradient and local maxima (Fig. 4c). The experimental parameters we used in all the validation experiments are described in Table 2.

4 Results

Evolutionary experiment, data published here: https://doi.org/10.34894/0VSN8Z.

We measure the efficacy by the mean and maximum fitness averaged over the 10 independent evolutionary runs for each generation. The results of our evolutionary experiment show similar max performance between baseline (homogeneous) and heterogeneous control Fig. 5a. We see a early plateau in max Baseline while heterogeneous control is still increasing which indicates possible improvement with more generations.

At the bottom of Fig. 5a we show the genotypical variation of our population in the best evolutionary run (calculated as mean Standard Deviation, STD). The rapid increase after generation 20 shows as an increase in variation, while the plateauing of fitness after generation 50 coincides with the steepest decline in genotype variation. It is interesting to see that both reservoirs seem to interchangeably adapt their genotype variation. Where the first reservoir (green) increases first but stabilizes relatively soon, while the second reservoir (red) remains relatively fixed at first and increases when the first reservoir starts to flatten out. This may indicate a concurrent adaptation of reservoirs due to a learned task distribution.

Fig. 5.
figure 5

a: top, mean\(\pm STD\) and max fitness over 100 generations (averaged over 10 runs); bottom, genotype spread in population (STD) of the best heterogeneous run. b: top, fitness of a single run with best best controller; bottom, the order (alignment) in the same run. Vertical lines correspond with interesting time frames. (Color figure online)

Fig. 6.
figure 6

Snapshots of the interesting timeframes during the re-test experiment. The numbers correspond to the order of the vertical bars in the figure, corresponding to each line plot in Fig. 5b. (1) At first, the swarm spreads out to search for the gradient. (2) The green sub-group senses a gradient and ’aligns’ the swarm to the left. (3) The gradient is lost, resulting in the swarm dispersing in different directions (i.e. decreasing alignment). (4) Red sub-group senses the gradient and directs the swarm towards the light source. (5) Red slowly pulls in more green swarm members. (6) the swarm starts to spread around the light source.

Validation experiments Collective behavior & sub-group interactions

Fig. 6 and 5b show the results of our best controller re-tested in the same environment. A video of this run is provided in the supplementary material. During this trial we can measure fitness over time, with its final value around 0.38. More interestingly, we measure the overall alignment of the swarm and the alignment for each sub-group. We provide snapshots of the swarm in the arena in Fig. 6, with the corresponding time frames represented as dotted vertical lines in Fig. 5b (the snapshot from left to right reflect the progression in time).

In Fig. 5b, at the start, we see a low initial swarm alignment (black line) that quickly increases (corresponding to Fig. 6: 1–2). This rapid increase is mainly caused by sub-group 2 (in red) who tends to align more amongst themselves during the whole run. Sub-group 1 (in green) does not align that much, but finds the gradient faster at the start (see snapshot: 3), indicated by a higher fitness. Red follows shortly as a group given their alignment increase. In a later stage, red dominates the swarm’s alignment and behavior, thereby concurrently pulling and pushing green towards the center (corresponding to Fig. 6: 4–5). This is also visible by the oscillating sub-group performance.

In Table 3 we evaluated the same best swarm in different setups; we repeat the evaluation at different ratios of sub-groups and at different starting distances from the gradient center. Every configuration was repeated 60 times. We observe how starting closer to the target results in higher fitness. We find that evenly mixed sub-groups at distances far away from the center (\(r_{dist}\ge 0.5\)) perform statistically significantly higher than any of the extremes (i.e. fully green/red, \(p\le 0.05, df=118\)).

When visually inspecting the behavior of the swarms at \(r_{dist}=0\), we observe how both extreme ratios of sub-groups are capable of staying at the center of the gradient, but with different strategies (video in supplementary material). While the swarm made of only red robots seem to circle around the center with some distance, the swarm made of only green robots seem to fully occupy the space around the center, coinciding with a higher fitness. When evaluating the sub-group ratios at their extremes, we see that both independent sub-groups consistently outperform the best controllers of the first generation in our evolutionary experiments (\(\sim 0.3\) at \(r_{dist}=1.0\)). However, a mixed ratio of subgroups performs better for all \(r_{dist}\ge 0\), indicating an advantage of sub-group interaction. Furthermore, we see a tendency of the second red sub-group to correlate with a higher performance at \(r_{dist}\ge 1.0\).

Table 3. Average fitness values (\(N=60\)) of retesting the best swarm with different sub-group ratios (green:red) from different starting distances (\(r_{dist}\) = distance to the optimum). Sub-group ratios vary from solely sub-group 1 (green) to solely sub-group 2 (red). Solid red boxes indicate best ratio at a given \(r_{dist}\), while the dashed boxed indicate no statistically significant differences with respect to the maximum.
Table 4. Validation experiments

Online Regulatory Mechanism. Based on the results of Table 3 we assign the light intensity threshold values to the best performing sup-group ratio heuristically. I.e., we design a probabilistic state machine such that members in the swarm adapt their behavior automatically to reflect the optimal ratio on a swarm level using only local information. Thresholds are based on the light intensities at \(r_{dist} = \{0.125, 0.375, 0.625, 0.875\}\). This results in the following function:

$$ P_{green}(light) = {\left\{ \begin{array}{ll} \text {1.0} & \quad \text {if}\,\, light\,\, >229\\ \text {0.75} & \quad \text {if}\,\, light\,\, \in (76,229] \\ \text {0.50} & \quad \text {if}\,\, light\,\, \le 76\\ \end{array}\right. } $$

Scalability & Robustness Retesting the best Baseline and best Heterogeneous swarm controller shows significant improvement in Scalability and Robustness (\(p\le 0.05\)), with and without our online regulatory mechanism in 6 different environments (3 scalability experiments and 3 robustness). The results of these experiments are presented below (see Table 4). For Scalability, the performance of the controllers seems to be positively correlated by the size of the swarm which indicates sensitivity to swarm size. This positive correlation is the least apparent in the adaptive controller whose performance is the highest for each swarm size (denoted with star; 10 \(p\le 0.05\), 20: \(p\le 0.01\), 50: \(p\le 0.001\)). Robustness results show the same tendency for the adaptive controller to outperform the others, although these differences were only statistically significant between Adaptive and Baseline. On an aggregate (\(N=360\)), Adaptive outperforms the other controllers with Bonferroni correction: Baseline \(0.37\pm 0.18\) vs. Best \(0.40\pm 0.18\) vs. Adaptive \(0.45\pm 0.22\) with \(p\le 0.01/\alpha \) for all comparisons (\(\alpha =6\), \(df=718\)).

5 Discussion

We successfully evolved self-organized sub-group specialization in a swarm of robots, a difficult task due to the complexity of swarm dynamics. This becomes even more evident when factoring in the added complexity of optimizing between-group interactions in conjunction with the specific specialization itself. The strength of our framework lies in its simple approach to learning these behaviors, which lends itself to broad applicability. We demonstrated our methods’ effectiveness in the context of an emergent perception task for gradient sensing, which shares similarities to other source localization tasks common in robotics literature [33]. Transferring our method to more complex tasks is also trivial, as 1) the use of task-agnostic nature of RNNs can express a wide array of behaviors, thus requiring no controller design adaptation; and 2) the simplicity of our group-level fitness function can be easily adapted, as its design requires minimal insight into the optimal solution (i.e. no presupposition on certain behaviors or specific task divisions are required).

The results of our validation experiments show the emergence of specialization in our sub-groups that is more robust and scalable than homogeneous control (Table 4). Closer inspection reveals that Sub-group 2 (red) seems to follow the gradient better at low intensities than sub-group 1 (see start of Fig. 5b) and tends to move more in coordination (i.e. higher alignment). In contrast, sub-group 1 performs better when initialized near the center, and shows consistently lower alignment. This indicates that sub-group 1 shows less sensitivity to the swarm’s overall behavior and tend to be more greedy as a sub-group. In contrast, sub-group 2 shows more exploratory behavior which provides more coordinated movement of the swarm when the gradient is found.

Exploration and exploitation are fundamental principles in optimization, in general. You could interpret our task as such, where our swarm learns to act as an optimizer that maximizes its local light value. The behavioral findings mentioned above show that the solution converged on a similar exploitation-exploration task division within their sub-group specializations. This task division is interesting as we did not encourage such collective behavior or specialization in our fitness. Arriving at these fundamental optimization principles in the context of sub-group swarm interactions without prior knowledge shows the power of automated design for finding collective behaviors suitable for any user-defined task.

Specialisation provides more robustness and scalability than baseline. Furthermore, leveraging our specialised controllers reveals that only employing one of the two sub-group at lower light intensities leads to lower performances than using a mixed ratio Table 3. This shows that the sub-group performance is enhanced by interactions with the other specialization. Collaboration becomes unnecessary when robots are placed near the center of the gradient (\(r_{dist}\le 0.25\)). The online regulatory mechanism furthermore shows the successful division of specialized tasks within our swarm, as it significantly improves performance of the best controller. The idea of biasing evolved specializations of the swarm adaptively toward a certain phenotype can be found in nature. Phenotypic plasticity emphasizes the expression of specific parts of the genome (i.e. our reservoirs) to obtain higher task competency on a swarm level. In our experiments, we successfully showed that this straightforward implementation of phenotypic plasticity results in higher scalability and significant overall performance.

6 Conclusion

Incorporating specialized behaviors within robot swarms holds the potential to significantly enhance overall swarm efficacy and robustness. However, this endeavor presents formidable challenges. Designing controllers for homogeneous swarms is inherently complex, and extending this to sub-groups within the swarm compounds the difficulty due to the added intricacy of sub-group interactions. In this paper, we show a viable approach to solve this challenge in a (sub)task-agnostic way, by co-evolving controllers heterogeneous swarm controllers while only specifying group-level task performance. We demonstrate that our evolved controllers show clear specialized sub-group behavior with sub-group interactions that improve the collective behavior. These learned behaviors are effectively used in an online regulatory mechanism, to enhance performance and scalability.

In the future, we propose to extend our work to encompass a broader spectrum of tasks, which could reveal other emergent specializations (communication, line-following, and mapping). Additionally, we foresee further improvements in the creation of more sophisticated controller designs where we evolve the number of subgroups and possibly the online adaptation rules to regulate phenotypic plasticity. Finally, we would like to test our controllers in a real world application for which we require the development of sensors to match our work. Such a milestone would be a first step for realizing hardware experiments.