Sophisticated collective foraging with minimalist agents: a swarm robotics test

How groups of cooperative foragers can achieve efficient and robust collective foraging is of interest both to biologists studying social insects and engineers designing swarm robotics systems. Of particular interest are distance-quality trade-offs and swarm-size-dependent foraging strategies. Here, we present a collective foraging system based on virtual pheromones, tested in simulation and in swarms of up to 200 physical robots. Our individual agent controllers are highly simplified, as they are based on binary pheromone sensors. Despite being simple, our individual controllers are able to reproduce classical foraging experiments conducted with more capable real ants that sense pheromone concentration and follow its gradient. One key feature of our controllers is a control parameter which balances the trade-off between distance selectivity and quality selectivity of individual foragers. We construct an optimal foraging theory model that accounts for distance and quality of resources, as well as overcrowding, and predicts a swarm-size-dependent strategy. We test swarms implementing our controllers against our optimality model and find that, for moderate swarm sizes, they can be parameterised to approximate the optimal foraging strategy. This study demonstrates the sufficiency of simple individual agent rules to generate sophisticated collective foraging behaviour.


Stigmergy-based foraging in ant colonies
Some ant species coordinate their food collection by leaving pheromone trails when returning from a discovered resource to their nest (Wilson 1962;Hölldobler and Wilson 1990). In these ant species, the deposited pheromone trails serve as a positive feedback mechanism for mass recruitment which guides nest-mates to the location of a discovered source of forage (Sumpter and Pratt 2003). Foraging ants, equipped with pheromone concentration sensors (Thienen et al. 2014), reach food sources by following the deposited pheromone trails with a preference to higher concentration trails (Hangartner 1969;V anV orhisKeyandBaker1982;Choe et al. 2012). The modulation of positive feedback [e.g. as a function of the source quality (Beckers et al. 1993;Portha et al. 2004;Shaffer et al. 2013) or footprint frequency (Devigne et al. 2004)] allows ant colonies to reach various collective patterns, such as selecting the bestquality food source available in the environment (Beckers et al. 1990(Beckers et al. , 1993Reidetal.2012;Shaffer et al. 2013), selecting the shortest path linking the food source to the nest (Goss et al. 1989;Deneubourg et al. 1990), and balancing predation risk and food quality (Nonacs and Dill 1990).
In addition to the ability of collective resource exploitation, adaptation to environmental fluctuations is a critically important ability for many biological organisms (Tsimring 2014), including foraging ants ). The mechanisms behind mass recruitment abilities (i.e. positive feedback) are generally in opposition to those that allow adaptation and flexibility (Tabone et al. 2010;Tsimring 2014); therefore, organisms showing adaptability are generally capable of a more complex behaviour. A remarkably interesting example is offered by Monomorium pharaonis ants which make use of repellent pheromone as a form of negative feedback (Stickland et al. 1999;Robinson et al. 2005Robinson et al. , 2008Detrain and Deneubourg 2006). Ants use this repellent pheromone to mark unrewarding trails and could thus be a strategy to stop the exploitation of trails that lead to depleted food sources. Other evidence of adaptability in ants has been documented by Beckers et al. (1990)whoshowed that Tetramorium caespitum ants are able to refocus their foraging efforts from a previously selected lower-quality food source, to a newly available higher-quality food source. Ants are able to adapt to the environmental changes because, in addition to pheromone-based recruitment, they use tandem running to recruit ants to newly available higher-quality food sources (Beckers et al. 1990). In contrast, Lasius niger ants, using pheromone-based recruitment only, are unable to switch their foraging efforts to the newly available food source. In fact, Lasius niger ants only rely on indirect forms of negative feedback, which may arise from physical constraints at the food source (e.g. overcrowding or food depletion) or within the nest (e.g. filling of food reserve) (Detrain and Deneubourg 2006). Finally, in another study, Shaffer et al. (2013) showed that Temnothorax rugatulus ants employing quality-dependent linear recruitment and quality-dependent abandonment are able to adapt to the environmental changes. T. rugatulus ants select the best-quality food source in case of two unequal-quality sources, exploit equally the two sources if they have equal qualities, and refocus their foraging efforts in case of changes in relative qualities (Shaffer et al. 2013).

Stigmergy-based foraging in swarm robotics
To implement the pheromone-based recruitment mechanism in a robot swarm, an important question concerns the means of implementing pheromone trails; in particular, how the robots deposit pheromone, how the pheromone trails in the environment evolve, and how pheromone can be sensed by the robots. Here, we categorise state-of-the-art work in this area into three main approaches: beacon robots, robots with on-board actuators and sensors, and smart environments.
In the first category of robotic systems, some robots are tasked as static beacon robots (Goss et al. 1992;Werger and Matarić 1996;Payton et al. 2001;Nouyan et al. 2009;Campo et al. 2010;Ducatelle et al. 2011b;H o f fe ta l .2012), which have the functions of storing pheromone levels and communicating with other robots in their neighbourhood. The biggest advantage of this approach is that the system can be implemented with simple robots in largely unknown and unstructured environments. However, there are some limitations: (i) allocation of beacon robots means they are not actively contributing to the main task, such as foraging; (ii) in large environments, the number of beacon robots increases in order to cope with the communication requirements, thereby further limiting the number of robots performing main tasks; (iii) beacon robots become obstacles themselves which restrict the movements of other robot agents. These issues can be overcome by the creation of mobile beacon robots, which can contribute to a main task as well as acting as beacons concurrently (Sperati et al. 2011;Ducatelle et al. 2011a). However, the performance of the latter approach relies on finding the correct balance between the swarm size and the communication range as a function of the environment size.
Researchers have made several attempts to equip robots with on-board actuators and sensors to implement indirect communication. For example, one early solution was to install marker pens on robots so they could draw lines on the path as pheromone trails (Svennebring and Koenig 2004). This method improved robots' performance in the area coverage task; however, it did not incorporate pheromone evaporation or diffusion which are features of real ant trails; evaporation in particular is considered important to avoid runaway positive feedback (Garnier et al. 2007(Garnier et al. , 2013. Another design proposed in (Purnamadjaja and Russell 2007) equipped robots with devices to emit and detect gas, which then provided guidance to robots towards a source area. The main limitation of this design was the high volatility of the chemicals used. In (Mayet et al. 2010), a technique of energising phosphorescent paint using UV-LEDS mounted on E-Puck robots to mark the path, as well as sensors for picking up the glowing paint signal representing the pheromone trail, was presented. Although this allowed emulation of pheromone decay, diffusion could not be emulated. A more recent study (Fujisawa et al. 2008(Fujisawa et al. , 2014 used ethanol for indirect communication signals between robots, with an ethanol pump and an ethanol sensor installed on each robot, which preserved the four characteristics of pheromone: evaporation, diffusion, locality (i.e. pheromone level is only affected by the local environments), and reactivity (i.e. pheromone evolution is based on reactions with the environment).
Perhaps the most popular approach in implementing pheromone communication is through a smart environment (Sugawara et al. 2004;Garnier et al. 2007;Hecker et al. 2012;Garnier et al. 2013;Arvin et al. 2015;Valentini et al. 2018), which has the capability to store and to supply virtual pheromone information to robot agents in real-time. The popularity arises from the fact that this approach is generally low cost and easily adaptable to different sizes of swarm and environment. Smart environments may be difficult to install and use for real applications; rather, such setups are often employed for targeted research experiments. This category can be further divided into three classes: the usage of (i) radio-frequency identification (RFID) tags Zambonelli 2005, 2007;Herianto et al. 2007;H e r i a n t o and Kurabayashi 2009;B o s i e ne ta l .2012; Khaliq et al. 2014); (ii) simulated pheromone environments, using projected light or other custom hardware for virtual pheromone implementations (Sugawara et al. 2004;Garnier et al. 2007Garnier et al. , 2013Arvin et al. 2015;Valentini et al. 2018) , and (iii) augmented reality tools in which a virtual environment is sensed and acted on by robots using virtual sensors and actuators (Reina et al. 2015b.

Optimal foraging theory
Foragers make economic decisions; hence, optimality models need to be based on suitable assumptions about 'currencies of costs' and benefits, as well as on constraints which may originate from features in the environment where foraging takes place (extrinsic) or inherent to the animals (intrinsic) (Stephens and Krebs 1986). It is often assumed that reproductive success (or fitness) and foraging behaviour are linked (Pyke 1984;Houston and McNamara 1999). Regarding currencies, i.e. the quantities to be maximised to achieve optimality, foraging animals often face a trade-off involving energy and time (Houston and McNamara 1999). Typically, an animal gains energy from eating a food item, but it also needs to invest time in handling such an item. Hence, if the quality of the food item is poor, the animal must decide whether to pick it, or to leave it and continue searching for better items. If the animal is a central-place forager (Orians and Pearson 1979;Kacelnik 1984), then its nest is the central place and food needs to be transported from the food source to the nest, where it is consumed.
Traditionally, two different currencies have often been used in foraging theory: the net rate of energy gain and efficiency (Kacelnik 1984;Houston and McNamara 2014). Whereas the net rate of energy gain is computed as the difference between the forager's gross rate of gain and its rate of energy expenditure, efficiency is derived by dividing gross rate of energy gain by rate of energy consumption (Houston and McNamara 2014). In honeybees, for example, there is mounting evidence that maximising energetic efficiency provides a better account of the observed foraging behaviour (Schmid-Hempel et al. 1985;Seeley 1986Seeley , 1994Cox and Myerscough 2003;Houston and McNamara 2014;Baveco et al. 2016). However, optimal foraging theory does not always apply to real systems, as has, for instance, been noted for leaf-cutting ants (Kacelnik 1993). Another study investigating seed-harvester ants, which always carry exactly one seed, made use of a different currency involving the seed mass to study optimal foraging (López 1987). Developing a theory that works for several foraging species seems inherently difficult, as mechanisms underlying foraging can be quite different (Traniello 1989). For example, red harvester ants (Pogonomyrmex barbatus)d o not rely on pheromone trails during foraging; rather, interactions between ants at the nest site regulate their foraging behaviour (Gordon 1991;Greene and Gordon 2003;Pagliara et al. 2018). There are, however, many ant species where the production of pheromone trails is crucial in the foraging process (Wilson 1962;Hölldobler and Wilson 1990;Detrain et al. 1999;Nicolis and Deneubourg 1999;Sumpter and Pratt 2003). In principle, other aspects also need to be considered when a foraging model is developed, which are more generally related to the overall state of the forager (e.g. competing alternative activities) and the conditions characterising the foraging landscape (such as predation risk) (Houston and McNamara 1999).
We emphasise that this section has only touched on the complex nature of foraging behaviour of animals and insect colonies and that it is by no means an exhaustive collection of references. For the latter, we refer to more comprehensive overviews, e.g. see Charnov (1976); Pyke (1984); Stephens and Krebs (1986); Houston and McNamara (1999). In our swarm robotics study, we made use of several aspects of the biological systems discussed above (see Sects. 3 and 4) and we constructed a model for optimal resource collection which is described in Sect. 5 and in Appendix A.
In this section, we formally define the investigated problem and the required capabilities of the robot (Sect. 3.1), then we describe the robotic platform (Sect. 3.2) and the augmented reality technology in use (ARK, Sect. 3.3).

The resource collection task
In this study, we investigate the problem of resource collection by a swarm composed of S robots. The environment has n circular source areas of radius 10 cm, denoted by A i with i ∈{ 1,...,n}, which are scattered around a central depot. Each area A i offers resource items of quality Q i . The quality is a numerical indication of the importance of the resource with respect to the task that will be performed; this is similar to the nutritional value of food items in animal foraging. In this work, we are interested in the foraging process at steady state; therefore, we assume sources which never deplete. If a robot enters a source area, it immediately collects one virtual item (or object) and returns it to the central circular depot (of radius 10 cm). We do not take into account any handling time of the resource item. Also, we do not consider the time spent in the resource patch, as the robot immediately finds an object and returns to the depot (no exploration within the source area). The load carried back to the nest site is always one item at a time. Travelling takes place with the same speed independent of the load carried (i.e. either unloaded or loaded with one object). Keeping these aspects in abstract terms helps to focus the study on the collective motion aspect and allocation of robots to source areas. In fact, this study focuses on strategies to coordinate the robot motion between depot and source areas through decentralised self-organising mechanisms. In particular, we explore how indirect communication in the form of virtual pheromone trails can allow the robot swarm to balance the trade-off between the quality of resource items and the distance between the source area and the central depot.
The robots have limited computational and memory capabilities and need to operate in an unknown environment. Robots are incapable of memorising source areas' locations, instead rely on pheromone trails to find again the previously discovered sources. This form of indirect communication requires the robots to be able to apply and read temporary marks in the environment. Additionally, we assume that robots always know the direction to the depot [similarly to path integration in ants and in other social insects (Collett and Collett 2002;Bregy et al. 2008;Heinzeetal.2018)] and are able to detect walls in front of them. However, robots do not possess any form of direct communication amongst each other and cannot perceive other robots in their surroundings.

The Kilobot robot
This study is conducted using Kilobots (Fig. 1a), which are minimalistic robots widely employed in swarm robotics research, with very limited capabilities provided by a small range of sensors and actuators (Rubenstein et al. 2014a). The Kilobot moves on a flat surface through a pair of vibration motors that allow the robot to perform a slip-stick differentialdrive motion. A Kilobot moves at a speed of v 0 ≈ 1cm/s and rotates at ∼ 40 • /s. It also has an infrared (IR) transceiver to communicate with other Kilobots in a range of 10 cm and to receive messages from an overhead control board (OHC), an RGB LED to display internal states through colours, and an ambient-light sensor. The OHC allows users to quickly program large swarms through wireless IR communication, and in our case, is used to augment the Kilobots with virtual sensors and actuators (see Sect. 3.3). While the Kilobot is quite limited in its capabilities, its simplicity results in a low-cost and easy-to-operate platform which is highly scalable.

Increasing Kilobots' capabilities through augmented reality
To overcome the Kilobot's limitations, researchers implemented open-source technology to extend the Kilobot's capabilities via customisable virtual sensors and actuators Valentini et al. 2018). This technology allows Kilobots to operate in an augmented reality in which, in addition to the real world, the Kilobots can sense and modify a computersimulated environment in real-time (see Sect. 2.2). Two implementations of this technology have been proposed in recent years: the augmented reality for Kilobots (ARK) by Reina et al. (2017) and the Kilogrid by Valentini et al. (2018). In this study, we use the ARK system because of its low installation cost and its ability to automatically perform several house-keeping tasks such as motor calibration, unique ID assignment, and experiment videorecording.
ARK consists of an overhead camera array to track the Kilobots, an IR-OHC to communicate to the Kilobots, and a computer (base control station, BCS) to simulate the virtual environment. The information about the virtual sensors is computed on the BCS and communicated to the specific robot with addressed messages via the OHC. The information about virtual actuators is computed on-board by the Kilobots, communicated with colour-coded messages via LEDs visible by the overhead cameras, and processed by the BCS which updates the virtual environment. Additionally, the BCS updates the temporal dynamics of the virtual environment. In this way, each Kilobot can receive personalised information about its virtual sensors in real-time and autonomously decides when to modify the virtual environment through virtual actuators.
In this study, we employ ARK to allow robots to apply and read virtual pheromone which evaporates and diffuses over time. We equip the Kilobots with five virtual sensors and one virtual actuator. In particular, each robot is equipped with: -area sensor (either depot or source): the Kilobot is able to perceive if it is within the depot or a source area (this information is encoded in 2 bits); -item quality sensor: the Kilobot is able to estimate the quality of the item it retrieves from the source area. Additionally, when the Kilobot enters in the depot, it can estimate the quality of the items that have been collected up to now (this information is encoded in 4 bits); -depot direction sensor: the Kilobot has always knowledge about its relative direction to the depot (this information is encoded in 4 bits); -wall sensor: the Kilobot can sense if there is a wall at a distance of ∼ 5 cm in front of itself; note that this does not allow the Kilobot to sense the presence of other robots (this information is encoded in 4 bits); -pheromone gland actuator: the Kilobot can deposit a drop of pheromone at its location (it expresses this behaviour by blinking its LED blue); -pheromone antennae: the Kilobot can sense the presence of pheromone at a distance of ∼ 3.5 cm from its centre in front of itself (this information is encoded in 4 bits, see To store information about the pheromone, ARK models the environment as a discrete 2D matrix with cells of 6.7 × 6.7mm 2 . Each time-step of length Δt = 0.5 s, ARK updates the pheromone matrix by adding pheromone deposited by the robots (each drop consists of b Kilobots sense via ARK the presence of virtual pheromone in front of themselves at a distance of ∼ 3.5cm in four 45 • -wide sectors. The virtual sensor indicates the presence or absence of pheromone as binary values, therefore, the Kilobot has no information about the pheromone quantity or concentration difference. In this illustration, pheromone is represented as blue circles, and thus, the virtual sensor readings are [1, 0, 1, 0]. When an exploring Kilobot detects pheromone, it interrupts random exploration and moves towards the detected pheromone. If more than one sector has pheromone (as in the illustration), to decide its motion direction the robot compares the sectors' direction with the depot direction (depot illustrated as a house and direction differences as red and green angles), and moves towards the largest angle (green arrow) (Colour figure online) an increment of φ = 250 in the cell under the robot's centre), and computes evaporation and diffusion of the pheromone. Each matrix cell m(i, j) is updated as where the parameters ǫ = 0.1andγ = 0.02 are the evaporation and diffusion rates, respectively. Equation (1) is a discrete realisation of Fick's law of diffusion (Fick 1855), where we introduce the exponential term to take into account the pheromone evaporation consistently with studies from biology (Garnier et al. 2013).

A simple individual behaviour for complex coordination
The individual robot behaviour is relatively simple and can be described by the probabilistic finite state machine (PFSM) illustrated in Fig. 2. The main structure of the behaviour is based on the control software designed by Font Llenas et al. (2018). The behaviour has been enriched by adding a new Obstacle Avoidance state (indicated as AO in Fig. 2), by including an additional form of indirect communication that enables adaptability to different quality scales (as described in Sect. 4.1), and by allowing for probabilistic transitions and tuneable pheromone functions (as described in Sect. 4.2). The robots do not have previous knowledge about the number, location, and items' quality of the source areas. Therefore, a robot starts by exploring the environment to discover source areas (state RW in Fig. 2). Due to the Kilobot's limited capabilities (see Sect.3.2), the exploration is performed via an isotropic random walk which is a simple and efficient method to search for targets in an unknown environment (Dimidov et al. 2016). The random walk consists of alternate straight motion for 10 s and uniformly random rotation in [−π, π]. Upon encounter of a source area, the robot (virtually) picks up an item and transports it to the depot (state GD in Fig. 2). As indicated in Sect. 3.1, we assume that the robots are limited in memory and only able to keep track of the direction towards a single location in the space, in our case the direction to the depot. This assumption is in line with the behaviour of several ants species which rely on path integration to return to the nest (Collett and Collett 2002;Bregy et al. 2008;Heinzeetal.2018). The robots follow the direction to depot to bring back collected items. Instead, to memorise the source locations, the robots rely on their stigmergic coordination which represents a form of collective memory. Therefore, on its way to the depot, the robot lays down virtual pheromone to allow itself, as other robots, to find again the source area. The robot, every four seconds, takes a probabilistic decision to deposit the next pheromone drop using the function P φ (Q i ) which is function of the collected item's quality Q i . 1 The function P φ (Q i ) isgi v enbyEq.(2) and described in details in Sect. 4.2. On arriving to the depot, the robot unloads the item and probabilistically decides [according to Eq. (3)] to turn back to follow the just-formed pheromone trail (state TB in Fig. 2), or to interrupt its exploitation of this source area and to resume exploration through random walk. When a robot senses virtual pheromone via the virtual antennae (composed by four sectors described in Sect. 3.3), the robot follows the trail by moving in the direction of the triggered antennae sector (state FP in Fig. 2). If the robot senses pheromone in more than one direction, e.g. both left and right sectors as in the illustration of Fig. 1b, the robot compares the sensed-pheromone directions with the direction to depot (red and green angles in Fig. 1b) and moves towards the direction with the largest difference (green arrow in Fig. 1b). This decision relies on the assumptions that robots only deposit pheromone in their straight path from a source area to the depot and that they always have access to the depot vector.
Compared with previous studies (Font Llenas et al. 2018), the robot's behaviour has been enriched through the inclusion of obstacle avoidance (state AO in Fig 2). In fact, robots have been equipped with a virtual sensor to detect walls (see Sect. 3.3). The robot reacts to a wall only if sensed in a frontal position, i.e. the two central sectors in the range [− 45 • , 45 • ] of the robot's heading (note that the virtual wall sensor is composed by four sectors equal to the virtual antennae of Fig. 1b). Upon wall detection, the robot turns left or right for about 22.5 • in the opposite direction of the sensed obstacle, then moves straight for 2.5 s, and finally returns to either the random walk (RW) state or the go depot (GD) state, depending on whether it carries an item or not. This behaviour may be triggered multiple times, until no obstacle is sensed in the central sectors. In case of symmetric sensing, i.e. both central sectors sense an obstacle, the robot uses as tie-breaker the lateral obstacle sectors to turn in the freest direction. In the case of complete symmetry, the direction is selected at random.

Adaptivity to relative quality differences
The robots do not have any prior information about the range of the sources' qualities that the unknown environment can offer. In order to allow the swarm to tune its behaviour to an unknown quality range, the individual robots update over time their knowledge on the best currently available quality Q max . Initially, the robot has no prior knowledge about the quality range and thus ranks the first source it finds as the best available. Over time, the robot constantly compares its range (i.e. the best available quality Q max ) with other items collected by other swarm members. The communication between robots is indirect and takes place within the depot. Each time a robot enters the depot, it can see the qualities of the items collected by the swarm until now; thus, the robot compares its information with the best quality and, if higher, updates its Q max accordingly. This mechanism is consistent with Robots start exploring the environment through a random walk (RW); when they find a source area they collect an item and return to the depot (GD) laying pheromone according to Eq. (2). Once arrived at the depot, they either turn back (TB) or resume exploration (RW). When explorer robots detect pheromone, they follow it (FP). When robots detect a wall, they avoid it (AO). Controlling individuals through this simple PFSM leads to sophisticated collective foraging dynamics animal behaviour where individuals can assess the nutrient quality of the swarm's reserves and compare against their own (Dussutour and Simpson 2009;Arganda et al. 2014).
In our study, we consider unlimited item sources to investigate the steady state regime; however, in case of limited sources (i.e. with a limited number of items) the robots may update their quality range by only observing the latest collected items. In this way, we predict the swarm being able to flexibly adapt to appearance or depletion of sources.

Modulation of the individual rules to obtain a plastic behaviour
After collecting an item, the robot returns to the depot laying a pheromone trail. The pheromone trail acts as a form of indirect communication between robots which inform each other about paths connecting depot to discovered sources. Collective contribution to these trails leads to a form of swarm memory which allows the swarm to remember the location of sources in the environment. In fact, our simple robots cannot internally store sources' locations, although the swarm, as a whole, can remember locations through pheromone trails. A pheromone trail is formed by a sequence of drops that the robot deposits via its virtual pheromone gland (see Sect. 3.3). Similar to the approach of Font Llenas et al. (2018), a robot probabilistically decides every four seconds whether to lay the next drop or not. In the previous work, we implemented a simple linear function to map the quality Q i into a pheromone deposition probability, i.e. P φ (Q i ) = Q i /Q max . Linking the pheromone deposition function to perceived source quality allowed the swarm to give priority to better-quality sources over inferior sources.
In this study, we implement a tuneable function to allow the robot to regulate its selectivity on the quality through a single parameter α ≥ 0. The probability to deposit the next pheromone drop is given by The individual robots have access to α in a decentralised way and can alter this value to vary the global response. Using an α>1, the function has an exponential shape on Q i resulting in high selectivity in favour of the highest quality sources. A value of α ≈ 1 leads to (approximately) linear response therefore approximating the function investigated in (Font Llenas et al. 2018), thus having Eq.
(2) as a generalisation of the previous specific function. Finally, decreasing α<1 gradually flattens out the function to a constant value, that at the limit of α = 0 becomes constant P φ (Q i ) = 1; this results in constant pheromone trails irrespective of the sources' qualities.
To further expand the individual robot capabilities to be able to balance the distance-quality trade-off, we introduce a decay function P d (t i ) that robots use, upon arrival in the depot with an item (event indicated with the letter 'a' in Fig. 2), to decide whether to keep exploiting the same source or to start exploring for new sources. P d (t i ) is inspired by similar abandonment behaviours observed in social insects [e.g. foraging ants (Shaffer et al. 2013) and househunting honeybees (Seeley et al. 2012)] and allows the robots to abandon exploiting source A i that required a long travel time t i (either because it is distant or has an overcrowded path). The travel time t i is measured by the robots as the time spent between the item collection (from the source A i ) and the item deposition (in the depot). The function P d (t i ), similarly to P φ (Q i ) of Eq. (2), is modulated by the parameter α as where t max is a parameter indicating robot's prior knowledge on the maximum acceptable time to return from a source. The t max could be adaptively tuned (similarly to Q max in Sect. 4.1), although in this study we do not explore this aspect and we fix t max = 100 s. Assuming a fixed t max is reasonable, because in both biological and artificial systems source areas may be accepted only if they are located within a certain maximum distance (or travel time t i ) from the depot that is decided a priori.
Equations (2)and (3) are linked by the parameter α which the robots can regulate to alter the swarm behaviour. Increasing α>1 has the combined effect of increasing discriminability on quality Q i and flattening P d (t i ) ≈ 0 for any distance; therefore, the swarm ignores distance but selects the higher-quality source. Conversely, small α<1 flattens out quality differences P φ (Q i ) ≈ 1 and accentuates differences on travel time with an exponential abandonment P d (t i ) on high travel times; this leads to a system where the only discriminating factor on source selection is distance due to a combination of evaporation and abandonment on farther sources. Finally, intermediate values α ≈ 1 give a quasi-linear response of P φ (Q i ) and sublinear P d (t i )>0 which allow the swarm to balance the distance-quality trade-off [similarly to what has been reported in Font Llenas et al. (2018)].

An optimal resource collection model
In this section, we model the optimal resource collection by the robot swarm through a mathematical model inspired by general aspects of optimal foraging theory (Kacelnik 1984;Houston and McNamara 2014).
Our model describes the utility gained by collection of resource items discounted by the cost incurred in transporting these items to the depot. The main components of our model are the items' qualities, the allocation of robots to various source areas, and the source-depot travel time. We model the robot allocation as ρ j (with j ∈{1,...,n}) which is the fraction of robots on the trail between central depot and source area A j . All robots that are actively involved in transportation of items from the n sources are called workers; their fraction is denoted by ρ w = n j=1 ρ j . The remaining robots that explore the landscape are called explorers; their fraction is denoted by ρ e = 1 − ρ w . The travel time is a function of the source-depot distance and of the traffic congestion on the path. In fact, crowded paths lead to frequent collisions between robots and result in longer travel times. The model is derived in Appendix A; here, we report the main quantity which is the swarm yield R,definedas where S is the swarm size, q j = Q j /Q max is the normalised quality of source area A j , ρ j is the fraction of robots on the trail between central depot and source area A j , the parameter β j is a fitting parameter characterising the relationship between the number of collected items from source A j and the number of robots on the trail to A j (see Eq. (7) in Appendix A), d j is the distance between source area A j and depot, v 0 = 1cm/s is the Kilobot's speed, and the function T C, j (ρ j S), defined in Eq. (12), models the additional travel time arising from traffic congestion. Therefore, Eq. (4) models traffic congestion as an increase of the travel distance d j by accumulating the additional length of v o T C, j (ρ j S).

Estimation of model parameters from simulation data
As for the model of Appendix A, three free parameters per source area (T 0, j , β j ,andκ j , with j ∈{1,...,n}) need to be estimated from data. To do so, we use the relationship between the number of robots on a path and the number of collected items given in Eq. (7). For the case of two source areas, the results of fitting are depicted in Fig. 3 and summarised in Table 2 (in Appendix A). As shown in Fig. 3, for small-to-medium numbers of robots on a trail, the number of collected items per time interval increases linearly with the number of robots on a trail, whereas for medium-to-large numbers of robots on a trail, we observe a nonlinear decay. This type of curve is widespread in several natural and artificial systems and is often indicated as Universal Scalability Law (Gunther 2000;K r a u s ee ta l .2002; Hamann 2012Hamann , 2013Hamann , 2018a.

Basic properties of the optimal resource collection model
To study the basic properties of the yield function R in Eq. (4), we consider the case of resource collection in an environment with n = 2 source areas. The robot swarm aims at optimally allocating its robots between the two source areas to maximise the yield R.F o r simplicity, we assume that all robots in the swarm are involved in resource collection (i.e. all robots are workers and ρ w = 1, ρ e = 0); the fraction ρ 1 = ρ collects items from source A 1 , and the fraction ρ 2 = 1 − ρ collects items from source A 2 . The yield function in Eq. (4)is then given as Here, we are interested in how the robot swarm allocates its resources. Therefore, we explicitly mention the dependency of R on ρ in Eq. (5); in what follows we derive the optimal value of 0 ≤ ρ ≤ 1 that maximises Eq. (5). Different outcomes are possible. If we consider increasing ρ,whereρ ∈[0, 1],thenwehave -1 global maximum at ρ = 1 (all workers allocated to source area A 1 ), if R(ρ) monotonically increases, Fig. 3 Fits of Eq. (7) to data generated by physics-based simulations in order to obtain the model parameters reported in Table 2 in Appendix A. Fitting is performed in case of n = 2 source areas with different quality and equal distance in a, equal quality and different distance in b, and equal quality and distance in c. Data points are represented using symbols, and fits are represented using lines (circles and solid grey lines show collection from source A 1 , while triangles and dash-doted blue lines show collection from source A 2 ). Error bars represent 95% confidence intervals. There is a linear growth for small-to-medium numbers of robots on a path, and a nonlinear decay for medium-to-large numbers of robots on a path. This type of growth-decay curves on population size is widespread in nature (Krause et al. 2002) as in engineering (Gunther 2000 -1 global maximum at ρ = 0 (all workers allocated to source area A 2 ), if R(ρ) monotonically decreases, or -either 1 global maximum or 2 local maxima (one of which is also the global maximum) where 0 <ρ<1 (workers split between source area A 1 and A 2 ).
For the last case, we can derive the optimal swarm deployment with respect to ρ from ∂ R/∂ρ = 0. We give the full expressions of the first-order derivatives in Eq. (15)i n Appendix B and use a graphical approach in this section to picture the behaviour of R(ρ). Without loss of generality, below we make use of the averaged quantities obtained for d 1 = d 2 = 1mandq 1 = q 2 = 1(reportedinT able2) to demonstrate the basic behaviour of the model.

Equal distances and varying qualities (fixed swarm size)
We expect that the effect of crowding on a trail will lead to different behaviours when the source areas are near the depot compared with the case when they are sufficiently far away that crowding can be neglected. We consider both scenarios with n = 2 source areas at equal-far or equal-near distance and show the corresponding results in Fig. 4a, b for fixed (a) (b) Fig. 4 Model predictions of yield R depending on worker allocation ρ for equally distant sources d 1 = d 2 = 3.5m,ina, and equally nearby sources d 1 = d 2 = 0.6m,inb,forfixedq 1 = 1andvaryingq 2 ∈{0.5, 0.75, 1}. When sources are relatively far (a), it is optimal to allocate all workers to the better-quality source area, whereas for source areas in close proximity (b) the yield is maximised if the trail between the higher-quality option and depot does not become overcrowded. Parameters: β 1 = β 2 =β = 0.965, T 0,1 = T 0,2 =T 0 = 0.029, κ 1 = κ 2 =κ = 2.321, and S = 200 q 1 = 1andvaryingq 2 ∈{0.5, 0.75, 1}, respectively. In case the qualities of items contained in the two different source areas are different, at first glance it seems intuitive to allocate as many workers as possible to the source with the higher-quality items. However, this strategy may lead to frequent collisions on the transport path and hence to traffic congestion that slows down the resource income. This means that there is a limitation on the item collection efficiency which depends on the number of workers and the space available on the transport trail. Figure 4a shows that for sufficiently large distances between source area and depot it is indeed optimal to allocate all workers to the source containing higher-quality items. If the qualities of resource items in the two source areas are also equal then the yield is, albeit only marginally, larger if both source areas are exploited equally. In case both source areas are near the depot then the optimal strategy is different. Exploiting equally both resources does not give the highest yield, instead the best strategy is to avoid traffic congestions on the trail leading to the higher-quality items (low ρ in Fig. 4b). Interestingly, reducing collisions and congestion on the higher-quality source path means allocating more workers to the lowerquality source. Even when both source areas provide items of equal quality, it is better to focus on any of the two available sources to optimise the resource income from the other (Fig. 4b).

Equal qualities and varying distances (fixed swarm size)
Let us now consider the case when both available source areas contain objects of equal quality. Given a fixed swarm size, optimising the transport yield R should then be affected by the distance of each source area to the depot. In Fig. 4, we depict the corresponding yield function for equal qualities q 1 = q 2 = 1, swarm size S = 200, fixed A 1 's distance d 1 = 0.6m, and varying A 2 's distance d 2 ∈{ 0.3m, 0.6m, 0.9m}. The model results of Fig. 5ap r e d i c tt h e effect of overcrowding. The optimal strategy consists of allocating most robots to the more distant source area in order to keep the path to the closer source congestion-free and to allow for more efficient resource collection. Model predictions of yield R depending on workers allocation ρ for equal qualities q 1 = q 2 = 1. In a, we show the effect of distance on R;w efi x e ds w a r ms i z eS = 200 and A 1 's distance d 1 = 0.6m, and varied of A 2 's distance d 2 ∈{ 0.3m, 0.6m, 0.9m}. Due to the effect of overcrowding, the maximum yield is attained when only a limited number of workers, 10-20%, collected from the nearer source in order to keep the path free from traffic congestion. In b, we report the effect of the swarm size S when it is larger, smaller, or equal to the critical size S c . The sources have equal qualities q 1 = q 2 = 1 and depot-source distances d 1 = d 2 = 0.6 m. The critical swarm size S c characterises the effect of overcrowding, i.e. when the swarm is sufficiently large (S > S c ) it is optimal to keep at least one path with less than 50% workers; otherwise, the effect of overcrowding would decrease the income of resources on both paths. Parameters: β 1 = β 2 =β = 0.965, T 0,1 = T 0,2 =T 0 = 0.029, κ 1 = κ 2 =κ = 2.321

Critical swarm size for equal qualities and equal distances
Through our model, we can derive the critical size S c , below which the best predicted strategy is to equally split workers between the two source areas, assuming sources at equal distances and with equal qualities. We analytically derive the expression to obtain S c in Appendix C and we depict in Fig. 5bho wR varies for swarm sizes S larger, smaller, or equal to S c .If the swarm size gets too large (S > S c ), it is optimal to allocate more robots to one source although collection from either source would give the same reward and incur identical costs. This means that the robot swarm should avoid overcrowding both paths to maximise the yield from resource collection. However, compared with the case S < S c , the possible yield for S > S c is smaller, i.e. if the swarm exceeds its critical size S c of collecting workers it cannot achieve the maximum yield it could possibly achieve for a smaller number of workers involved in the resource transportation. This result highlights the importance of controlling the number of workers to maximise the global intake; a strategy implemented in a decentralised fashion by ants (Charbonneau et al. 2015;Pagliara et al. 2018), and recently investigated in the context of swarm robotics (Mayya et al. 2019).

Results
Through physics-based simulations, we systematically tested a variety of experimental conditions to study the performance of the proposed system. We validated some of the simulation results through experiments with up to 200 physical Kilobots. In Sect. 6.1,w ep r e s e n ta set of simulation results that highlight the benefits of having introduced a virtual wall sensor, adaptability to unknown environmental scenarios, and behaviour modulation to balance the distance-quality trade-off. In Sect. 6.2, we compare the model predictions against robot swarm simulations for different swarm sizes. The physics-based simulations were conducted with ARGoS (Pinciroli et al. 2012(Pinciroli et al. , 2018 which is a state-of-the-art swarm robotics simulator that accurately and efficiently simulates the Kilobots and the ARK system via a dedicated plug-in (Pinciroli et al. 2018). The physical robot experiments were run with fully charged Kilobots whose motors have been automatically calibrated through ARK . The videos of these experiments are augmented by superimposing the virtual environment information (see a sample image in Fig. 6) and available as online supplementary material (Online Resource 1-9) and at https://www. youtube.com/playlist?list=PLCGKY9OHLZwMaGeB6cxVfxmHwhBFqKF7a. The robot simulation code is open source and available online at https://github.com/DiODeProject/ PheromoneKilobotSwarmIntell.

Results show tuneable and adaptive swarm responses
We report here the simulation and physical robot results to show evidence of the behaviours obtained through obstacle avoidance, adaptivity, and individual function modulation. Obstacle avoidance Figure 7b shows a screenshot of an experiment inspired by the wellknown study of Goss et al. (1989) which showed that ants are able to exploit the shorter path in double-bridge experiments with branching paths of different lengths. In our system, the individual robots have lower cognitive capabilities than the individual ants. In fact, the Kilobots cannot distinguish pheromone intensity, follow its gradient, nor make decisions with respect to differences in pheromone concentration. Nevertheless, the robot swarm was able to preferentially exploit the shorter path. This outcome was not limited to conditions where the pheromone evaporation was too high to exploit the longer path while sufficient to establish a path on the shorter, but it also applied to scenarios in which both paths were viable. In fact, we tested the swarm in an environment where we blocked the shorter path and only the longer path was active (see Fig. 7a) and the robots exploited the longer path, as illustrated in the plot of Fig. 7c. Similar double-bridge experimental setups have been emulated and investigated in previous swarm robotics studies such as Montes de Oca et al. 2010 andScheidler et al. (2016), in which, however, the swarm behaviour and goal were different.
Our results indicate that, for certain types of experimental conditions, cognitively simpler individuals would suffice to reproduce the collective level behaviour observed in colonies of Fig. 7 A 50 simulated Kilobot swarm experiment inspired by the ants' double-bridge experiment by Goss et al. (1989) in which two paths, a longer path (1.8 m long) and a shorter path (1.4 m long), connected source to depot. When the simulated swarm had access to only the longer path (a), the Kilobots reinforced pheromone on that path and used it for their collections. Instead, when both paths were available (b), the Kilobots disregarded the longer path and (almost exclusively) used the shorter for their collections. c Shows the number of robots on the two paths at the end of one simulated hour (boxes range from 1st to 3rd quartile of the data from 100 simulations and indicate the median with a horizontal line, the whiskers extends to 1.5 IQR). The individual Kilobots cannot follow a pheromone gradient nor detect any difference in pheromone concentration. Despite their limited individual capabilities, the robot swarm shows (in certain experimental conditions) behaviour similar to ants' colonies, which instead rely on much higher cognitive abilities at the individual level more complex ants. However, we believe that the ants, exploiting gradient sensing, are more flexible and can optimise path lengths in a larger range of environments than our robotic system. In fact, our results may vary if we would increase the robots density and/or vary the paths' lengths. However, we cannot ascribe the observed behaviour to the manually tuned maximum travel time t max = 100 s of Eq. (3) because our experiments were conducted with α = 10 which flattens Eq. (3) to zero for every path length. Therefore, the observed dynamics emerged from a more complex interplay between the Kilobots' behaviour and the virtual pheromone dynamics, and resulted in an efficient swarm selection of the shortest path. Adaptivity As described in Sect. 4.1, the swarm is able to adapt to any quality range and have a response that only considers the ratio between qualities rather than the absolute quality values. Figure 8 shows the system's response to three scenarios with n = 2 sources with the same quality ratio (i.e. Q 2 /Q 1 = 0.4) but different absolute quality values (i.e. Q 1 = 15, Q 2 = 6 on the left, Q 1 = 10, Q 2 = 4 in the centre, and Q 1 = 5, Q 2 = 2 on the right of the x-axes of Figs. 8a, b). The results show that the adaptive strategy (white boxplots) adapted to any condition and, as the quality ratio remained the same, also the swarm response remained the same. Instead, the constant range strategy (dark boxplots) reckoned with absolute quantities and led to the desired outcome only when the prior knowledge on the quality range matched the environment's range (central experimental scenario of Fig. 8). The ability to respond to Fig. 8 Simulation results showing the adaptivity of the system. We measured the number of collected items in a and the number of robots on each path in b for the two source areas, the superior A 1 and inferior A 2 , both at equal distance d 1 = d 2 = 1 m. We kept the same quality ratio, i.e. Q 2 /Q 1 = 0.4, but varied the absolute value of the objects (indicated on the x axis). All experiments were conducted with swarms of S = 50 Kilobots and an intermediate value of α = 0.85 in Eqs. (2)and (3). Boxes range from 1st to 3rd quartile of the data from 100 simulations and indicate the median with a horizontal line; the whiskers extend to 1.5 IQR. Having a constant range (dark boxplots) shows good results only if the predefined range matches the actual range of the environment (central experiment). Instead, an adaptive strategy allows the swarm to exploit resources as a function of the their relative qualities in a range adapted to the environment the relative quality of food sources, rather than to an absolute quality range, has been recently documented also in foraging ants (Wendt et al. 2018). Behaviour modulation ViaEqs. (2)and (3), the individual robots can modulate their behaviour to give priority to closer (low α) or better-quality (high α) source areas. This modulation at the individual level translates to different collective responses at the swarm level. We investigated such dynamics in swarms of S = 50 Kilobots operating in an n = 2 sources scenario environment with a superior source area A 1 at distance d 1 = 1 m with Q 1 = 10 and an inferior source area A 2 with Q 2 = 4 and varying distance d 2 ∈[0.5, 1]m. The relatively small swarm size was motivated by preliminary results that we reported in (Font Llenas et al. 2018) which showed that large swarms do not discriminate between sources as there are enough robots to maximally exploit any area. Figure 9 shows the effect of the three tested values of α ∈{ 0, 0.85, 10} on the swarm dynamics. Using α = 0 promoted distance selectivity, in fact the simulated swarm had the highest item collection per minute (panel a) from the closest source (A 2 ) to which the majority of the workers was deployed (panel b). Using α = 10 promoted quality selectivity, in fact the simulated swarm had the highest item collection per minute from the highest quality source (A 1 ) to which the majority of the robots was deployed. Finally, intermediate values of α, e.g. α = 0.85, led to a distance-quality tradeoff where the swarm exploited the nearest inferior-quality source only if it was much closer than the farther superior-quality source.
We ran three experiments with 50 physical robots for each of the two limit cases of quality-selective α = 10 (solid black symbols) and of distance-selective α = 0 (solid light-grey symbols). The videos of these six experiments are available as online supplementary material (Online Resource 1-6) and at https://www.youtube.com/playlist? list=PLCGKY9OHLZwMaGeB6cxVfxmHwhBFqKF7a. Physical robots showed a resource collection less efficient than simulation; despite this, in both cases, the two strategies favoured either the best-quality or the nearest source area, as shown by the simulations. We explain Fig. 9 Effect of the modulation of the parameter α from Eqs. (2)and (3) to favour nearer source areas (α = 0), to favour the best-quality sources (α = 10), or to balance the distance-quality trade-off (0 <α<10). Results of α = 0 are shown in light-grey, α = 0.85 in dark-grey, and α = 10 in black. We report the results for simulations and physical robots experiments of one hour each in scenarios with n = 2 sources. We excluded the initial exploration phase and indicate mean values for the last 30 min. Physical robots results are indicated as solid symbols with vertical bars indicating the 95% confidence intervals of 3 runs for each condition (the symbols are slightly shifted to avoid bar overlaps but all represent results for d 2 = 0.6 m). Lines represent the mean of 100 simulations (shaded areas are 95% confidence intervals). Source A 1 had quality Q 1 = 10 and was located at distance d 1 = 1 m; source A 2 had quality Q 2 = 4 and varying distance d 2 ∈[0.5, 1.0]m. We report the rate of collected items per minute in a, the mean number of robots on each path in b,andthe rate per minute of collected items weighted by the normalised quality q 1 = 1.0andq 2 = 0.5inc. Individual robots can locally modulate the decentralised parameter α to lead the swarm to a range of different collective responses, e.g. selecting almost exclusively the best-quality source (high α) or balancing the distance-quality trade-off (low α). Physical robots are less efficient than simulations; however, ordering between sources is preserved; this confirms the effects of α-modulation observed in simulation the observed difference between reality and simulation (the reality gap) as a motion speed difference between robots and simulation. In fact, the simulation was accurately tuned on the movement of fully charged Kilobots (Pinciroli et al. 2018), but did not take into account that the robot's speed was reduced over time due to the decrease of its battery level. Figure 9c shows the rate per minute of collected items weighted by their normalised qualities (q 1 = 1.0andq 2 = 0.5). We did not include any cost because in our experiments every robot moved constantly and continuously (either as worker or as explorer). Therefore, the swarm incurred a constant cost independent of the collections (this would be different if, as ants do, some individuals would stop exploration to save energy (Charbonneau et al. 2015), or to avoid overcrowding as discussed above). Interestingly, the results show that there was not one α-value that was better than all others; rather the best strategy varied in relation to the environment. For large distance difference, i.e. d 2 ≪ d 1 , the distance-selective strategy Fig. 10 Comparison of model with simulations and experiments: Total yield R as a function of the normalised swarm allocation ρ/ρ w and the number of worker robots ρ w S. We report the predicted yield R from the model of Eq. (4) as a colour heatmap, and we overlay robot simulations for three strategies: distance-selective α = 0 (circles), distance-quality trade-off α = 0.85 (diamonds), and quality-selective α = 10 (triangles). We report simulations for swarm sizes S = 50 (cyan), S = 100 (green), S = 200 (purple) and S = 500 (white). Under the model's assumptions, the simulated robot swarm performs best for S = 200 and α = 0.85 (R = 150.6m −2 ) in a, α = 10 (R = 177.1m −2 )inb and α = 10 (R = 120.4m −2 )inc. Swarms of large size (S = 500) do not achieve good performance as they equally exploit both sources and do not avoid overcrowding. The star symbol in c was obtained from three experiments with 200 Kilobots assuming α = 0.85 (see online videos). Error bars represent 95% confidence intervals. Parameters: β j , T 0, j and κ j are given in Table 2 (Colour figure  online) (α = 0) displayed the highest weighted collection. Conversely, for similar distances, the best strategy consisted of favouring the best-quality source (α = 10), analogously to what has been observed in some species of ants which focused their foraging efforts on the richer of two equally distant sugar sources (Beckers et al. 1993;Shaffer et al. 2013).

Comparison of model and simulation data
Here, we compare the performance of binary resource collections for varying swarm sizes S and varying α which regulates the swarm strategy [as from pheromone deposition in Eq. (2) and trail abandonment in Eq. (3)]. The plot in Fig. 10 shows the yield R as a function of the fraction of workers allocated to source A 1 (with ρ 1 = ρ) divided by the fraction of total workers involved in resource collection ρ w , and of the number of worker robots ρ w S (i.e. involved in collecting resource items).
Best performing swarms have an intermediate size (i.e. S = 200). Relatively small swarms allocate robots more selectively depending on the implemented strategy. For instance, in Fig. 10a, the quality-selective strategy (α = 10 indicated as triangles) shows an allocation of workers predominantly to the best-quality source (ρ/ρ w > 0.8) when S ≤ 200. Instead, large swarms of S = 500 do not discriminate between sources and equally exploit both. The distance-selective strategy (α = 0 indicated as circles) in Fig. 10bhasamuchsmaller deviation and is visible only for the smallest swarm. Observing such a change in the swarm response is not an obvious result because robots cannot perceive each other. The observed change is an emergent property.
In general, simulations and the model show differences especially for swarms of size S = 500. In fact, for large swarms, the model predicts that the best strategy would be to allocate only a limited number of robots to the best path, in order to avoid overcrowding. We suggest that it would be possible to implement such a strategy by allowing the robots to sense and perceive peers (while they do not in this study). In the current strategy, we tried to overcome overcrowding by including the trail abandonment function of Eq. (3), although this did not demonstrate sufficient ability to deviate from a symmetric exploitation for large swarms. The resulting dynamics for S = 500 are an equal split between the two paths ( Fig. 10a), which could be caused by physical 'pushing' between individuals, similarly to what is observed in some experiments of ants' traffic organisation (Dussutour et al. 2004(Dussutour et al. , 2005Fourcassié et al. 2010).
To investigate how collision between individuals affects the collective dynamics, we reproduced the results of Fig. 10 in the collision-free case in which we removed any effect of physical interactions between robots. Figure 11 reports the model results with null traffic congestion contribution, i.e. Eq. (12) becomes T C, j (ρ j S) = 0. We overlay the simulation results with deactivated collisions, i.e. the Kilobots' physical body is not simulated and robots can move through each others.
As expected, the model predicts that for every workers size, ρ w S the best strategy is always to allocate all workers to the best-quality source (Fig. 11a), or to the closest source (Fig. 11b). Some of the simulations approximate such an optimal behaviour. In the case of asymmetric qualities (Fig. 11a), the quality-selective strategy (α = 10 represented as triangles) has high values of ρ. Similarly, the closer area in Fig. 11b is largely exploited by distance-selective strategies (α = 0 represented as circles and α = 0.85 represented as diamonds).

Discussion
Our results show how simple individual agents can collectively forage in a sophisticated manner. We assumed a minimal cognitive architecture including maintenance of a home vector [well evidenced in ants (Collett and Collett 2002;H e i n z ee ta l .2018)], and simple binary detection of pheromone trails and obstacles; our agents are thus much simpler than real ants. Combined with a simple pheromone deposition rule with a single tuneable parameter, however, we are able to qualitatively reproduce classical results such as the shortest path exploitation observed in lab ant colonies (Goss et al. 1989), and able to manage the classical distance-quality trade-off of foraging. We have further derived an optimality model accounting for congestion costs in foraging and examined the effect of resource distribution and colony size on the optimal distribution of foragers over forage patches. While others have previously considered the effect of colony size on recruitment strategy (Planqué et al. 2010;Pagliara et al. 2018;Mayya et al. 2019), our analysis instead assumes the recruitment strategy, and considers the optimal distribution. Our simple heuristic agent controllers are able to approximate the optimal distribution for relatively small swarm sizes, although large swarms depart from optimality. Large swarms cause crowded environments which require strategies Fig. 11 Total yield R as a function of the normalised swarm allocation ρ/ρ w and the number of worker robots ρ w S in the collision-free condition. We removed the effect of physical interactions (i.e. collisions between robots) that may cause traffic congestions, and we report the predicted yield R from model (4) as a colour heatmap, and we overlay robot simulations for three strategies: distance-selective α = 0 (circle), distancequality trade-off α = 0.85 (diamond), and quality-selective α = 10 (triangle). We report simulations for swarm sizes S = 50 (cyan), S = 100 (green), S = 200 (purple) and S = 500 (white). Without collision, the predicted best strategy is allocation of all workers to the best-quality or closest source area. The collisionfree simulations approximate such result when the corresponding strategy is activated, e.g. quality-selective α = 10 (triangle) in panel (a) and the distance-selective α = 0 (circle) in panel (b). Error bars represent 95% confidence intervals. Parameters: β j , T 0, j and κ j are given in Table 2 (Colour figure online) to clear paths in order to reduce traffic congestion. We identify two possible strategies to limit traffic congestion: modifying the abandonment strategy or enriching the individual behaviour with collision-reactive states. In this work, after abandonment, the robots simply resumed exploration. The effects of this abandonment strategy are limited as robots quickly rediscover a path (which may be already congested). We believe that a better abandonment strategy [e.g. to stay at the depot for a period of time before resuming exploration, similar to ants (Pagliara et al. 2018)] could improve the results of the abandonment behaviour introduced in this work. Complementarily, traffic flow can be maintained undisrupted even in relatively crowded conditions by individual ants changing their behaviour as a function of collisions with other ants (Dussutour et al. 2004;Poissonnier et al. 2019). Inspired by these results, the robot behaviour could be enriched with new collision-dependent states.
Our results are complementary to other approaches to minimal controllers necessary for collective behaviour in the swarm robotics field (Gauci et al. 2014;Özdemir et al. 2018). Simple controllers increase the transferability to various robotics platforms thanks to their limited hardware requirements. Additionally, simple behaviours generally reduce the impact of the reality gap and preserve consistent dynamics in reality and simulations, as shown in our experiments where the same control software produced qualitatively similar results.
Our results illustrate the sophisticated collective dynamics that can be generated even by simple agents, which should be of interest to biologists and of practical utility to engineers. Similarly, our study of swarm size, and the scalability of foraging success, should interest both biologists and engineers, although it is worth noting that at least in some species of ants congestion is much less of a problem compared to robots (Hönicke et al. 2015;Poissonnier et al. 2019). In Sect. 6.2, we investigated a case closer to biology in which congestions did not impact the travel time; with model and simulations adapted accordingly. Nevertheless, we argue that taking a unifying perspective on the biology and engineering of collective foraging is illuminating, both through their similarities, and their differences.
where T L j and T U j are the travel times to overcome the distance between central depot and source A j with and without load, respectively.
The cost for one robot in the swarm (unit cost) associated with travelling between depot and source A j to collect resource items is given as where c L j and c U j are the costs per time interval related to travelling with an item (L = loaded) and without (U = unloaded). Using Eqs. (6)and (9), let us define the transport yield (in the steady state) per unit cost as As we consider virtual (weightless) objects, the cost for travelling with and without load is identical, hence c L j = c U j = c 0 . Similarly, travel times with and without load may be assumed to be identical, that is, we have T U j = T L j = T j . We can then express the travel time as where d j is the distance between depot and source area A j ,andv 0 ≈ 1cm/sisthetra v el speed of the Kilobot (see Sect. 3.2). The term T C, j (ρ j S) is a function that takes into account how the travel time is influenced by the number of robots ρ j S on the path to A j . Overcrowded paths cause prolonged travel time due to frequent collisions between robots. We model the additional time arising from traffic congestion as T C, j (ρ j S) = T 0, j exp κ j ρ j S N crit, j − 1 , where T 0, j is a constant which sets the time scale of the additional travel time, κ j is a constant included to fine-tune the nonlinear effect of overcrowding on the path to A j ,andN crit, j is the critical number for which traffic congestion may have a significant effect. This means that T C, j (ρ j S) is negligible if κ j ρ j S ≪ N crit, j . The critical number N crit, j may be obtained using where Z KB = 3.3 cm is the diameter of a Kilobot and ⌊ξ j ⌋ returns the greatest integer less than or equal to ξ j . Normalising Eq. (10) by the experimental constant K = e 0 v 2 0 (T end − T eq )/(4 c 0 ),wecan derive the normalised yield as which is identical to Eq. (4), and mentioned here again for convenience. This means that travel distance d j increases due to traffic congestion between central depot and source area A j by the additional length v o T C, j (ρ j S). The normalised yield, R,i nE q s .( 4)a n d ( 14), respectively, is given in the physical unit m −2 , which seems appropriate as we consider robots looking for source areas in a two-dimensional space.

C Deriving the critical swarm size for equal qualities and equal distances
The critical swarm size, denoted by S c , indicates when it is better to collect items from both source areas when we assume equal qualities and equal distances. The critical size S c may be derived from solving where we usedd 1 =d 2 =d, N crit,1 = N crit,2 = N crit and ρ w = 1 (giving ρ = 1/2). Further to this, we substituted κ 1 = κ 2 =κ = (κ 1 + κ 2 )/2, T 0,1 = T 0,2 =T 0 = (T 0,1 + T 0,2 )/2 and β 1 = β 2 =β = (β 1 + β 2 )/2, where the model parameters are given in Table 2.T h e expression in Eq. (18) follows from the second-order derivative according to The goodness-of-fit is quantified by R 2 GoF, j = 1 − i (y i − y fit i ) 2 / i (y i −ȳ) 2 ,wherey i = ΔU i / min, the y fit i correspond to the fitted values, andȳ represents the mean value of all y i . The index j corresponds to the trail. Mean model parameter values including one standard deviation errors (values in brackets) are given at which point the global maximum at ρ = 1/2 becomes a local minimum and two local maxima (ρ = 1/2) arise with equal yields R for S ≥ S c . The full expression of the secondorder derivative ∂ 2 R/∂ρ 2 is given in Eq. (17) in Appendix B.