Herding stochastic autonomous agents via local control rules and online global target selection strategies

In this Paper we propose a simple yet effective set of local control rules to make a group of"herder agents"collect and contain in a desired region an ensemble of non-cooperative stochastic"target agents"in the plane. We investigate the robustness of the proposed strategies to variations of the number of target agents and the strength of the repulsive force they feel when in proximity of the herders. Extensive numerical simulations confirm the effectiveness of the approach and are complemented by a more realistic validation on commercially available robotic agents via ROS.


Introduction
Exploration and rescue, evacuation from dangers, surveillance and crowd control are all examples of multi-agent herding problems in which two kinds of agents interact [1,2]. In these problems, a set of "active" agents (the herders) need to drive a set of "passive" agents (the herd) towards a desired goal region and confine them therein. In most cases, repulsive forces exerted by the herders on the herd are exploited to drive the movements of the passive agents that need to be corralled and, at times, cooperation among the herders (such as attractive forces between them) are used to enhance the herding performance. Notable herding solutions are those proposed in [3,4,5,6,7] for single herders and in [8,9,10,11,12] for multiple herders.
One of the problems to be addressed in the control design of herder agents is deciding at any given time what passive agent a herder should target first when more than one herder is present. For the sake of comparison with our approach, we now briefly review the most relevant research from the literature addressing multiagent herding, where more than one herder is required to collect and drive a group of passive agents towards a desired goal region.
Related work. One of the earliest solutions to the herding problem was proposed by Lien et al. in [4] and [8]. The trajectories followed by passive and herder agents were generated using global rule-based roadmaps -abstract representations of the walkable paths given as a directed graph [13]. Numerical simulations showed that multiple herders were successful in coping with increasing sizes of the herd. Nevertheless, herders' performance worsened as the flocking tendency of passive agents decreased.
Multi-agent herding scenarios were also considered in [9,14]. Here the authors addressed the problem of controlling a group of herders so as to entrap a group of passive agents in a region from which they could not escape. To solve this problem, each herder was preassigned some region of influence. Targets' motion was then only influenced by a specific herder if they happened to be within its region of influence; travelling otherwise at constant speed and heading aligned to that of their neighbouring agents. The velocities of the herders were regulated according to that of the other passive agents with which they interacted, arranging themselves in two opposite rows or in a carousel.
Other multi-agent herding scenarios where many herders are required to collect and patrol a group of passive agents were also proposed in [10]. Inspired by the limited visual field of real sheepdogs and the absence of centralised coordination among them, the latter work proposed a herding algorithm based entirely on local control rules. The dynamics of both herders and passive agents were modelled as the linear combination of potential field-like forces within a sensing area. In addition to this basic dynamics, passive agents were also subject to a repulsive force from the herders. Herders were controlled by an appropriate input selected as a function of their distance from the nearest passive agent and their distance from a desired goal. The result of the proposed shepherding behaviour was the emergence of an arc formation among the herders (a similar formation was instead hard-coded in the algorithm presented earlier in [8]). Numerical simulations showed the effectiveness of the approach under the assumption that passive agents tend to flock together. In this case, herders could indeed collect and herd multiple sub-flocks without any explicit coordination rule.
In Robotics, feedback control strategies have been recently presented to solve multi-agent herding problems and guarantee convergence of the overall system. For instance, in [11] the case of multiple herder agents regulating the mean position of a group of flocking passive agents was investigated. An arc-based strategy was proposed for the herders to surround and drive the targets towards a desired goal region. The proposed control law and its convergence properties were explored by modeling the whole herd as a single unicycle controlled by means of a point-offset technique (see [11] for further details).
A different approach was used in Cognitive Science [15,12,16,17], where a model of the herding agent was derived from experimental observations of how two human players herd a group of randomly moving agents in a virtual reality setting. It was observed that, at the beginning of the task, all pairs of human players adopted a search and recovery strategy; players individually chasing the farthest passive agent in the half of the game field assigned to them and driving it inside the desired containment region. Once all agents are gathered inside the goal region, most pairs of human herders were observed to switch to an entirely different containment strategy, based on exhibiting an oscillatory movement along an arc around the goal region creating effectively a "repulsive wall" for the passive agents keeping them therein [16]. To reproduce this behaviour in artificial agents, a nonlinear model was proposed in [17] where the switch from search and recovery to the oscillatory containment strategy is induced by a Hopf bifurcation triggered by a change in the distance of the herd agents from the goal region.
With regard to a single herder agent gathering oneby-one a group of passive agents, recent work by [18] employed a backstepping control strategy for the single herder to chase one target at a time, with the herder switching among different targets and succeeding in collecting them within a goal region of interest. This idea was further developed in [19,7] where other control strategies and further uncertainties in the herd's dynamics were investigated. An alternative approach is to frame the problem as a pursuit-evasion game, as done for example in [20,21,22], where the case of one passive agent evading from one pursuer is solved by computing off-line the optimal solution of a dynamic programming problem; the case of multi-driver and multievader agents being more recently analysed in [23].

Contributions of this paper
In this paper, we consider the case of multiple herders chasing a group of passive agents whose dynamics, as often happens with natural agents such as fish, birds or bacteria, is stochastic and driven by a random Brownian noise. However, contrary to what is usually done in the rest of the literature [9,4,11,10], we do not consider the presence of any flocking behaviour between passive agents, making the problem more complicated to solve as each target needs to be tracked and collected independently from the others.
To solve the problem, we present a simple, yet effective, dynamic herding strategy based on the combination of local feedback control laws among the agents and a set of global target selection rules that drive how herders make decisions on what targets to follow. With respect to other solutions in the literature [4,11], our approach does not involve the use of ad hoc formation control strategies to force the herders surround the herd, but we rather enforce cooperation between herders by dynamically dividing the plane among them by means of simple yet effective and robust rules that can be easily implemented in real robots.
We then numerically analyse how robust these strategies are to parameter perturbations, uncertainties and unmodeled disturbances in passive agent dynamics. Moreover, we assess how different choices of the target selection rules affect the overall effectiveness of the methodology we propose. Finally, for the sake of completeness we provide a ROS implementation of our strategy to test its ability to solve the herding problem in a more realistic robotic setting.

The herding problem
We consider the problem of controlling N H ≥ 2 herder agents in order for them to drive a group of N T > N H passive agents in the plane (R 2 ) towards a goal region and contain them therein. We term y ( j) the position in Cartesian coordinates of the j-th herder in the plane and x (i) that of the i-th passive agent. We denote as (r ( j) , θ ( j) ) and (ρ (i) , φ (i) ) their respective positions in polar coordinates as shown in Fig. 1. We assume the goal of the herders is to drive the passive agents towards a circular containment region G, of radius r centred at x . Without loss of generality, we set x to be the origin of R 2 .
Assuming the herders have their own trivial dynamics in the plane, the herding problem can be formulated as the design of the control action u governing the dynamics of the herders given by where m denotes the mass of the herders assumed to be unitary, so that the herders can influence the dynamics of the passive agents (whose dynamics will be specified in the next section) and guarantee that where · denotes the Euclidean norm; that is, all passive agents are contained, after some finite timet, in the desired region G.
We assume an annular safety region B of width ∆r exists surrounding the goal region that the herders leave between themselves and the region where targets are contained.
In what follows we also assume that (i) herder and passive agents can move freely in R 2 ; (ii) herder agents have global knowledge of the environment and of the positions of the other agents therein.

Target dynamics
Taking inspiration from [12], we assume that, when interacting with the herders, passive agents are repelled from them and move away in the opposite direction, while in the absence of any external interaction, they randomly diffuse in the plane. Specifically, we assume passive agents move according to the following stochastic dynamics where V (i) r (t) describes the repulsion exerted by all the herders on the i-th passive agent, W (i) (t) = Figure 1: Illustration of the spatial arrangement in the herding problem. The herder agent y ( j) (yellow square), with polar coordinates (r ( j) , θ ( j) ), must relocate the target agent x (i) (green ball), with polar coordinates (ρ (i) , φ (i) ), in the containment region G (solid red circle) of centre x and radius r . The buffer region B, of width ∆r , is depicted as a dashed red circle. [ is a 2-dimensional standard Wiener process and α b > 0 is a constant. We suppose the distance travelled by the passive agents depends on how close the herder agents are and model this effect by considering a potential field centred on the j-th herder given by v (i, j) = 1/( x (i) − y ( j) ), exerting on the passive agents an action proportional to its gradient [11]. Specifically, the dynamics of the i-th passive agent is influenced by the reaction term where α r > 0 is a constant. Possible modelling uncertainties in the repulsive reaction term (3) can be seen as being captured by the additional noisy term in (2). Notice that according to (3) every passive agent feels the influence of all the herders. Nevertheless, we assume that each herder only chases one target at a time as explained below. The position of the i-th passive agent when it is targeted by the j-th herder will be denoted as x (i, j) or in polar coordinates as (ρ (i, j) ,φ (i, j) ).

Herder dynamics and control rules
Our solution to the herding problem consists of two layered strategies; (i) a local control law to drive the motion of the herder towards the target it selected, and to push it inside the goal region and (ii) a target selection strategy through which herders decide what target to chase. When the herd are all gathered, the herders switch back to an idling condition by keeping theirself within the safety region surrounding the goal region.

Local control strategy
For the sake of comparison with the strategy presented in [12,17], we express in polar coordinates the control law we propose to drive each herder. Albeit not resulting in the shortest possible path travelled by the herders, the controller expressed in polar coordinates ensures circumnavigation of the goal region, avoiding targets already contained therein from being scattered around. Specifically, the control input to the j-th herder dynamics (1) j are unit vectors, and its components are chosen as with b r , b θ > 0, and where the feedback terms R(x (i, j) , t) and T (x (i, j) , t) are elastic forces that drive the herder towards the chased target i and push it towards the containment region G. Such forces are chosen as with r , θ > 0, and where ξ ( j) (t) regulates the switching policy between collecting and idling behaviours. That is, ξ ( j) (t) = 1, ifρ (i, j) (t) ≥ r , and ξ ( j) (t) = 0, if ρ (i, j) (t) < r , so that the herder is attracted to the position of the i-th chased targetx (i, j) (plus a radial offset ∆r ) when the current target is outside the containment region (ξ ( j) = 1) or close to the boundary of the buffer region at the idling position (r +∆r , ψ(t)), in polar coordinates, otherwise (ξ ( j) = 0). The value of the idling angle ψ(t) depends on the specific choice of the target selection strategy employed, which are discussed next. Note that the control laws (4)-(5) are much simpler than those presented in [12] as they do not contain any higher order nonlinear term nor are complemented by parameter adaptation rules (see [12] for further details).

Target selection strategies
In the case of a single herder chasing multiple agents, the most common strategy in the literature is for it to select the target chased as either the farthest passive agent from the goal region, or the centre of mass of the flocking herd [3,5,18]. When two or more herders Herders are depicted as yellow squares, passive agents as green balls. The colours in which the game field is divided correspond to regions assigned to different herders. Herder y ( j) is currently chasing target agentx (i, j) , while passive agent x (i) is not chased by any herder.
are involved, the problem is usually solved using a formation control approach, letting the herders surround the herd and then drive them towards the goal region [11,4]. Rather than using formation control techniques or solving off-line or on-line optimisation problems as in [8,24], here we present a set of simple, yet effective, target selection strategies that exploit the spatial distribution of the herders allowing them to cooperatively select their targets without requiring any computationally expensive optimisation problem to be solved on-line. We present four different herding strategies, starting from the simplest case where herders globally look for the target farthest from the goal region. A graphical illustration of the four strategies is reported in Fig. 2 for N H = 3 herders.
Global search strategy (no plane partitioning). Each herder selects the farthest passive agent from the containment region which is not currently targeted by any other herder (Fig. 2(a)). Being the simplest possible strategy, we will use this strategy as a benchmark to compare the performance of the others strategies considered here.
Static arena partitioning. At the beginning of the trial and for all of its duration, the plane is partitioned in N H circular sectors of width equal to 2π/N H rad centred at x . Each herder is then assigned one sector to patrol and selects the passive agent therein that is farthest from G ( Fig. 2(b)). Note that this is the same herding strategy used in [12] for N H = 2 herders.
Dynamic leader-follower (LF) target selection strategy. At the beginning of the trial, herders are labelled from 1 to N H in anticlockwise order starting from a randomly selected herder which is assigned the leader role. The plane is then partitioned dynamically in different regions as follows. The leader starts by selecting the farthest passive agent from G whose angular positionφ (i,1) is such that where θ (1) (t) is the angular position of the leader at time t. Then, all the other follower herders ( j = 2, . . . , N H ), in ascending order, select their targets as the passive agent farthest from G such that with ζ ( j) = 2π( j − 1)/N H . As the leader chases the selected target and moves in the plane, the partition described above changes dynamically so that a different circular sector with constant angular width 2π/N H rad is assigned to each follower at any time instant. In Fig. 2(c) the case is depicted for N H = 3 in which the sector (θ (1) − π 3 , θ (1) + π 3 ] is assigned to the leader herder while the rest of the plane is assigned equally to the other two herders.
Dynamic peer-to-peer (P2P) target selection strategy. At the beginning of the trial herders are labelled from 1 to N H as in the previous strategy. Denoting as ζ + j (t) the angular difference between the positions of herder j and herder ( j+1) mod N H at time t, and as ζ − j (t) that between herder j and herder ( j + N H − 1) mod N H at time t, then herder j selects the farthest passive agent from G whose angular position is such that Unlike the previous case, now the width of the circular sector assigned to each herder is also dynamically changing as it depends on the relative angular positions of the herders in the plane.
The idling angle ψ(t) in (7) is set equal to the angular positionφ (i, j) of the last contained target for the global search strategy, otherwise it is set equal to the angular position corresponding to the half of the angular sector assigned at each time to the herder.
A crucial difference between the herding strategies presented above is the nature (local vs global) and amount of information that herders must possess to select their next target. Specifically, when the global search strategy is used, every herder needs to know the position x (i) of every passive agent in the plane, not currently targeted by other herders. In the case of the static arena partitioning instead a herder needs to know its assigned (constant) circular sector together with the position x (i) of every passive agent in the sector.
For the dynamic target selection strategies, less information is generally required. Indeed, in the dynamic leader-follower strategy the herders, knowing N H , can either self-select the sector assigned to themselves (if they act as leader) or self-determine their respective sector by knowing the position of the leader y (1) (t). Similarly in the dynamic peer-to-peer strategy herders can self-select their sectors by using the angles ζ + j (t) and ζ − j (t). Note that in the unlikely event of perfect radial alignment of the herder and its target, the herder might push the target away, rather than towards the goal region. Despite its rare occurrence, such an event can be avoided by extending the herder dynamics by extra term (B.1) described in Appendix B.

Numerical validation
The herding performance of the proposed control strategies has been evaluated through a set of numerical experiments aimed at (i) assessing their effectiveness in achieving the herding goal; (ii) comparing the use of different target selection strategies; (iii) studying the robustness of each strategy to parameter variations. The implementation and validation of the strategies in a more realistic robotic environment is reported in the next section where ROS simulations are included.

Performance Metrics
We defined the following metrics (see Appendix A for their definitions) to evaluate the performance of different strategies. Specifically, for each of the proposed strategies we computed the (i) gathering time t g , (ii) the average length d g of the path travelled by the herders until all targets are contained, (iii) the average total length d tot of the path travelled by herders during all the herding trial, (iv) the mean distance D T between the herd's centre of mass and the centre of the containment region, and (v) the herd agents' spread S % . Note that lower values of t g correspond to better herding performance; herders taking a shorter time to gather all the passive agents in the goal region. Also, lower values of D T and S % correspond to a tighter containment of the passive agents in the goal region while lower values of d g and d tot correspond to a more efficient herding capability of the herders during the gathering and containment of the herd.

Performance analysis
We carried out 50 simulation trials with N T = 7 passive agents and either N H = 2 or N H = 3 herders, starting from random initial conditions. (All simulation parameters and a description of simulation setup adopted here are reported in Appendix B.) The results of our numerical investigation are reported in Tab. 1. As expected, when herders search globally for agents to chase, their average gathering and total paths, d g and d tot , are notably longer than when dynamic target selection strategies are used, pointing out that this strategy is going to be the least efficient when implemented.
As regards the aggregation of the herd in terms of D T and S % , all strategies presented comparable results. On the other hand, dynamic strategies showed consistently shorter gathering times t g and travelled distances d g than the static target selection strategies. In particular, in the case of three herders (N H = 3), the peer-to-peer strategy exhibited values of t g and d g which are 50% and 74% smaller, respectively, than the static partitioning one. Therefore, we find that in general higher level of cooperation between herders and a more efficient cover-  age of the plane, as those guaranteed by dynamic strategies, yield an overall better herding performance which is more suitable for realistic implementations in robots or virtual agents that are bound to move at limited speed.

Robustness analysis
Next, we analysed the robustness of the proposed herding strategies to variations of the herd size and of the magnitude of the repulsive reaction to the herders exhibited by the passive agents (Fig. 3). Specifically, we vary N T between 3 and 60 and the repulsion parameter α r in (3) between 0.05 and 2.5, while keeping N H = 2. Strikingly, we find that all strategies succeed in herding up to 60 agents in a large region of parameter values [see the blue areas in Fig. 3(a)].
The global strategy where herders patrol the entire plane is found as expected to be the least efficient in terms of total distance travelled by the herders  Table 2: Average performance over successful trials of different herding strategies for N T = 60 passive agents.
( Fig. 3(b)); the dynamic peer-to-peer strategy offering the best compromise and robustness property in terms of containment performance (see Fig. 3(a)) and efficiency (see Fig. 3(b)). To validate these findings we carried out 50 trials where N H = 3 herders were required to herd N T = 60 passive agents, starting from different initial conditions. The resulting performance averaged over the successful trials is reported in Tab. 2. Herders adopting the global and peer-to-peer strategies successfully herd all agents in over 50% of the trials. Moreover, herders globally searching for the target to chase spent on average slightly less time gathering the targets (t g = 12.96) and achieved and maintained lower herd spread (S % = 0.48), although the path travelled to achieve the goal (d tot ) was significantly higher than when static or dynamic selection strategies were adopted.

Validation in ROS environment
To validate in a more realistic robotic setting the strategies we propose, we complemented the numerical simulation presented in Sec. 5 with their ROS implementation 5 as described below. ROS [25] is an advanced software framework for robot software development that provides tools to support the user during all the development cycle, from low-level control and communication to deployment on real robots. We used the Gazebo software package 6 to test the designed control architecture on accurate 3D models of commercial robots to simulate their dynamics and physical interaction with the virtual environment.
We considered a scenario where N T = 3 passive agents need to be herded by N H = 2 robotic herders.
All agents were chosen to be implemented as Pioneer 3-DX [26], a commercially available two-wheel twomotor differential drive robot whose detailed model is available in Gazebo (see Fig. 4). The desired trajectories for the robots are generated by using equations (2) and (4)-(7) for the passive and herder robots, respectively, which are used as reference signals for the on-board inner control loop to generate the required tangential and angular velocities (see Appendix C for further details).
Examples of ROS simulations are reported in Fig. 5 where all the target selection strategies that were tested (static arena partitioning, leader-follower, peer-to-peer) were found to be successful with herder robots being able to gather all the passive robots in the containment region. Fig. 5 also shows that the angular position of the herders remain within the bounds defining the sector of the plane assigned to them for patrolling. The only exception is found in panel Fig. 5(e) where the leader-follower strategy is adopted and the follower herder temporarily exceeds the bounds when the leading herder changes its angular position while chasing its target. This is essentially due to the subordinate role of the follower herder with respect to the leader.

Conclusions
We presented a control strategy to solve the herding problem in the scenario where a group of multiple herders is chasing a group of stochastic passive agents. Our approach is based on the combination of a set of local rules driving the herders according to the targets' positions and a global herding strategy through which the plane is partitioned among the herders, who then select the target to chase in the sector assigned to them either statically or dynamically. Our results show the effectiveness of the proposed strategy both via numerical simulations and by means of a more realistic implementation in ROS on commercially available robotic agents. Also, we evaluated the ability of the proposed strategies to cope with an increasing number of passive agents and variations of the repulsive force they feel when the herders approach them. We wish to emphasise that to date our approach is the only one available in the literature to drive multiple herders to collect and contain a group of multiple agents that do not possess a tendency to flock and whose dynamics is stochastic. A pressing open problem is to derive a formal proof of convergence of the overall control system.
A smaller average distance indicates better ability of the herders to keep the herd close to the containment region.
Herd spread measuring how much scattered the herd is in the game field. Denote as Pol(t) the convex polygon defined by the convex hull of the points x (i) at time t, that is, Pol(t) := Conv {x (i) (t), i = 1, . . . , N T } . Then, the herd spread S is defined as the mean in time of the area of this polygon, that is Lower values corresponds to a more cohesive herd and consequently better herding performance. The herd spread can also be evaluated with respect to the area of the containment region, A cr = π(r ) 2 , as S % = S /A cr · 100.

Appendix B. MATLAB simulations
In all simulations we considered the case of N H = 2 or N H = 3 artificial herders and N T = 7 passive agents. Moreover, we considered a circular containment region with radius r = 1 and a buffer region of width ∆r = 1. The numerical integration of the differential equations describing the dynamics of passive agents and herders has been realised using Euler-Maruyama method [27] in the time interval [0, T ] = [0, 100] s with step size dt = 0.006 s.
The initial positions of the passive agents have been set outside the containment region as x (i) 0 = 2 r e φ (i) 0 , ∀i = 1, . . . , N T , with φ (i) 0 drawn with uniform distribution in the interval (−π, π], while the initial positions of herders have been taken on the circle with radius 4r and with angular displacement (2π)/N H . Furthermore, collision avoidance forces between passive agents was also considered in the numerical simulations. Specifically, the model (2) is extended by adding the term ≤ r c } is the set of all passive agents at time t inside the closed ball centred in x (i) with radius r c .
To avoid that perfect alignment between the herder and the chased targeted agent may cause the latter to move away from the goal region, a circumnavigation force u ( j) ⊥ (t) can be added to the dynamics of the herders in (1). This force is orthogonal to the vector ∆x i j = x (i) − y ( j) , and its amplitude depends on the angle χ i j between ∆x i j and y ( j) , such that it is maximum when the two vectors are parallel (χ i j = π) and zero when they are anti-parallel (χ i j = 0). Specifically, it is defined as: whereŪ > 0 is the maximum amplitude, and v ∈ {−1, 1}, whose value depends on which halves of the assigned sector the herder is currently in, to guarantee that the targeted agent is always pushed toward the interior of the sector.

Appendix C. ROS simulations
The mobile robots used for both passive and herder agents have been designed as Pioneer 3-DX robots driven by the differential drive controller provided in the set of ROS packages (gazebo-ros-pkgs) that allows the integration of Gazebo and ROS.
The environment and the robots share information through an exchange of messages that occurs publishing and subscribing to one or more of the available topics. A ROS node is attached to each herder and passive robots. It subscribes to the /odom topic; implements the agent's dynamics; and publishes a personalised /cmd vel topic. The passive agents collect odometric information from all the herders in the environment. The herder agents subscribe to the ID of the passive agent to-be-chased and collect its position. The published message is a velocity control input w.r.t. the robot's reference system to the differential drive of the robot: a translation v along x-axis and a rotation ω around z-axis of the robot. The reference trajectory y (t) = [r cos θ , r sin θ ] , generated as in Sec. 3-4, is followed by each robot by means of the Cartesian regulator v = −k 1 (y − y ) [cos Φ, sin Φ] where Φ(t) denotes the robot orientation w.r.t. the global reference system. The gains k 1 = 0.125 and k 2 = 0.25 have been tuned by trial-and-error to achieve smooth robot movements. The initial position of the agents have been set as in Appendix B.
The target selection strategies (Sec. 4.2) are processed in an ad-hoc ROS node. It subscribes to the odometry topic; computes the user-chosen strategy (i.e. global, static arena partitioning, leaderfollower or peer-to-peer); and publishes a custom message with the ID of the targets to-be-chased on the /herder/chased target topic. The custom message is an array of integer numbers, its j-th element corresponds to the passive agent chased by the j-th herder robot.
The Gazebo-ROS simulations were run on Ubuntu 18.0404 LTS hosted on a VirtualMachine with a 10GB RAM with ROS Melodic distribution and Gazebo 9.13.0.