Bio-Inspired Local Information-Based Control for Probabilistic Swarm Distribution Guidance

This paper addresses a task allocation problem for a large-scale robotic swarm, namely swarm distribution guidance problem. Unlike most of the existing frameworks handling this problem, the proposed framework suggests utilising local information available to generate its time-varying stochastic policies. As each agent requires only local consistency on information with neighbouring agents, rather than the global consistency, the proposed framework offers various advantages, e.g., a shorter timescale for using new information and potential to incorporate an asynchronous decision-making process. We perform theoretical analysis on the properties of the proposed framework. From the analysis, it is proved that the framework can guarantee the convergence to the desired density distribution even using local information while maintaining advantages of global-information-based approaches. The design requirements for these advantages are explicitly listed in this paper. This paper also provides specific examples of how to implement the framework developed. The results of numerical experiments confirm the effectiveness and comparability of the proposed framework, compared with the global-information-based framework.


I. INTRODUCTION
T HIS paper addresses a task allocation problem for a largescale multiple-robot system, called a robotic swarm. Robotic swarms have attracted lots of attention because they are regarded as promising solutions to handle complicated missions that other systems may not be able to manage [1], [2]. Agents in a swarm are assumed to be homogeneous because the swarm is usually realised through mass production [3]. In this context, the task allocation problem can be reduced to a problem of how to distribute a swarm of agents into given tasks (or bins), satisfying the desired population fraction (or swarm density) for each task. This problem is known as the swarm distribution guidance problem [4]- [6].
For a large number of agents, probabilistic approaches based on Markov chains [4]- [13] or differential equations [14]- [18] have been widely utilised. Since these approaches focus not on individual agents but instead on the ensemble dynamics, they are also called Eulerian [11]- [13] or macroscopic frameworks [18], [19]. In these approaches, swarm densities for each bin are represented as system states, and a state-transition matrix describes stochastic decision policies, i.e., the probabilities that agents in a bin switch to another. Individual agents in the swarm make decisions based on these policies, but in a random, independent, and memoryless manner. Initially, open-loop-type frameworks have been proposed [4]- [7], [16]- [18]. Agents under these frameworks are controlled by time-invariant stochastic decision policies. The policies, which make a swarm converge to a desired distribution, are pre-determined by a central controller and broadcasted to each agent before executing the mission. Communication between agents is hardly required during the mission, so that it can reduce communication complexity under these frameworks. However, the agents only have to follow the given policies without incorporating any feedbacks, and thus there still remain some agents who unnecessarily and continuously switch bins even after the swarm reaches the desired distribution. This gives rise to a trade-off between convergence rate and long-term system efficiency [17].
There have been also some other works, called closed-looptype frameworks [9]- [15]. This type of frameworks allows agents to adaptively construct their own stochastic decision policies at the expense of sensing the concurrent swarm status through interactions with other agents. Based on such information, agents can synthesise time-inhomogeneous transition matrices to achieve certain objectives and requirements: for example, maximising convergence rates [10], minimising travelling costs [13], and temporarily adjusting given policies when bins are more overpopulated or underpopulated than certain levels [14], [15]. In particular, Bandyopadhyay et. al. [13] recently proposed a closed-loop-type algorithm that exhibits faster convergence as well as less undesirable transition behaviours, compared with an open-loop-type algorithm. This algorithm is expected to mitigate the trade-off raised in openloop-type frameworks.
To the best of our knowledge, most of the existing closedloop-type algorithms are based on Global Information Consistency Assumption (GICA) [20]. GICA implies that necessary information is required to be consistently known by entire agents. We refer to such information as global, because achieving information consistency needs agents to somehow interact with all the others through a multi-hop fashion and thus it "happens on a global communication timescale" [20].
This paper proposes a framework that requires Local Information Consistency Assumption (LICA) [20]. Unlike GICAbased algorithms, the proposed framework require only local consistency on information with neighbouring agents, not the global consistency. LICA can provide various alternative advantages to the proposed framework, compared with GICA. Firstly, it "provides a much shorter timescale for using new information because agents are not required to ensure that this information has propagated to the entire team before using it" [20]. Secondly, LICA enables a foundation on which an asyn-chronous decentralised decision-making process can be developed. Note that the timescales for achieving the information consistency between the agents can be different depending on their local circumstances. Considering any possibly-extrinsic heterogeneity of agents (e.g., different sensing frequency due to local communication delays), an asynchronous algorithm is regarded as more realistic in coordinating a robotic swarm, so increasing its system efficiency [21]- [23]. Finally, LICA makes the proposed approach additionally robust against dynamical changes in bins and those in agents. Given that inclusions or exclusions of bins are perceived by neighbouring agents, the proposed approach works well even without requiring other far-away agents to know the changes.
The LICA-based framework developed in this paper utilises local information as its feedback gains, which is motivated from the recent GICA-based work in [13]. This framework is inspired by the mechanism of decision-making in a fish swarm, in which each of them adjusts its individual behaviour based on those of neighbours [24]- [27]. Similarly, each agent in the framework developed uses its local status, i.e. the current density of its associated bin relative to those of its neighbour bins, to generate its time-varying stochastic decision policies. The agent is not required to know any global information, and hence the aforementioned advantages of LICA can be exploited.
We prove that, even using local information, the proposed framework asymptotically converges to a desired swarm distribution and it retains the advantages of existing closedloop-type approaches. This paper explicitly presents the design requirements for a time-inhomogeneous Markov chain to achieve these desired features. It is thus expected that the user can utilise the requirements in designing their own algorithm. In addition, three specific examples are provided to demonstrate how to implement the proposed framework: 1) minimising travelling cost; 2) maximising convergence rate under upper flux bounds; and 3) generation of quorum-based policies (similar to [14], [15]).
The rest of this paper are organised as follows. Section II introduces the desired features of a swarm distribution guidance framework along with relevant definitions and notations. Section III proposes our framework with its design requirements, the biological inspiration, and an analysis regarding whether the desired features are satisfied. We provide examples of how to implement the framework for specific problems in Section IV, and an asynchronous implementation in Section V. The results of numerical experiments are shown in Section VI, followed by concluding remarks in Section VII.

Notations
∅, 0, I and 1 denote the empty set, the zero matrix of appropriate sizes, the identity matrix of appropriate sizes, and a row vector with all elements are equal to one, respectively. v ∈ P n is a stochastic (row) vector such that v ≥ 0 denotes the probability that event E will happen. Physical motion constraint matrix (Definition 6); C k Communicational connectivity matrix (Definition 7); N k (i) A set of (communicationally-connected) neighbour bins of the i-th bin (Definition 7); Current local swarm density at the i-th bin (Eqn. (4)); Locally-desired swarm density at the i-th bin (Eqn. (5)); P j k Primary guidance matrix (Eqn. (10)); S j k Secondary guidance matrix (Eqn. (10) Primary local-feedback gain (e.g., Eqn. (6)); Secondary local-feedback gain (Eqn. (9));

A. Definitions
This section presents necessary definitions and assumptions for our proposed framework, which will be shown in Section III. Since most of them are embraced from the recent existing literature [10], [13], we here briefly provide their essential meanings.
Definition 1 (Agents and Bins). A set of agents A are supposed to be distributed over a prescribed region in a state space B. The entire space is partitioned into n bin disjoint bins (subspaces) such that B = ∪ n bin i=1 B i and B i ∩ B j = ∅, ∀i = j. We also regard B = {B 1 , ..., B n bin } as the set of all the bins. Each bin B i represents a predefined range of an agent's state, e.g., position. The number of the entire agents is time-varying, and its value at time instant k is denoted by n A k = |A|. Note that we do not assume that the agents keep track of n A k . Definition 2 (Agent's state). Let a j k ∈ {0, 1} n bin be the state indicator vector of agent j ∈ A at time instant k. If the agent's state belongs to bin B i , then a j k [i] = 1, otherwise 0. Definition 3 (Current (global) swarm distribution). The current (global) swarm distribution µ k ∈ P n bin is a row-stochastic vector such that each element µ k [i] is the population fraction (swarm density) of A in bin B i at time instant k: Definition 4 (Agent's stochastic state and decision policy). Agent j's stochastic state is a row-stochastic vector x j k ∈ P n bin in which each element x j k [i] gives the probability that the agent's state belongs to bin B i at time instant k: The probability that agent j in bin B i at time instant k will transition to bin B l before the next time instant is called its stochastic decision policy, denoted as: Note that M j k ∈ P n bin ×n bin is a row-stochastic matrix such that M j k ≥ 0 and M j k · 1 = 1 , and will be referred as Markov matrix.
Definition 5 (Desired swarm distribution). The desired swarm distribution Θ ∈ P n bin is a row-stochastic vector such that each element Θ[i] indicates the desired swarm density for bin B i . Assumption 1. For ease of description for this paper, we assume that Θ[i] > 0, ∀i ∈ {1, ..., n bin }. Obviously, in practice, there may exist some bins whose desired swarm densities are zero. These bins can be accommodated by adopting any subroutines ensuring that all agents eventually move to and remain in any of the positive-desired-density bins, for example, an escaping algorithm in [13, Section III.C].
Assumption 2 (The number of agents [6], [10], [13], [15], [17]). It is assumed that n A k n bin so that the time evolution of the swarm distribution is governed by the stochastic decision policy in Equation (3). Although the finite cardinality of the agents normally cause a residual convergence error, a lower bound on n A k that probabilistically guarantees a desired convergence error is analysed in [13,Theorem 6] by exploiting Chebyshev's equality. Note that this theorem is generally appliable and thus is also valid for our work.
Definition 6 (Physical motion constraint [6], [10], [13]). Motion constraints of agents are denoted by the matrix A k ∈ {0, 1} n bin ×n bin , where A k [i, l] = 1 if agents in bin B i at time instant k are allowed to transition to bin B l by the next time instant; A k [i, l] = 0, otherwise. It is assumed that A k is symmetric and irreducible (i.e., strongly-connected); and A k [i, i] = 1 for all agents, bins, and time instants.
Definition 7 (Communicationally-connected). Bins B i and B l are said to be communicationally-connected, if there exists at least one agent in bin B i who can directly communicate with some agents in bin B l , and vice versa. This communicational connectivity over all the bins at time instant k is defined by the matrix C k ∈ {0, 1} n bin ×n bin , where C k [i, l] = 1 indicates that bins B i and B l are communicationally-connected. Note that C k is symmetric and all its diagonal entries are set to be one. For each bin B i , we define the set of its (communicationallyconnected) neighbour bins as N k (i) = {∀B l ∈ B | C k [i, l] = 1}. The set of agents in any of bins in N k (i) is denoted by A N k (i) = {∀j ∈ A | a j k [l] = 1, ∀l : B l ∈ N k (i)}. Assumption 3 (Communicational connectivity over bins). The physical motion constraint of a robot is, in general, more stringent than its communicational constraint. From this, it can be assumed that if the transition of agents between bin B i and B l is allowed within a unit time instant, then the both bins are communicationally-connected, i.e., if A k [i, l] = 1 then This implies that the matrix C k is irreducible, as is A k . The communication network over the agents is assumed to be strongly-connected [10], [13]. Using distributed consensus algorithms [10], [11], [28], each agent can access necessary local information in its neighbour bins.
Assumption 4 (Pre-known Information [13]). The desired swarm distribution Θ, the motion constraint matrix A k (also C k ), and other pre-determined values such as variables regarding objective functions and user-design parameters (which will be introduced later) are known by all the agents before they begin a mission.
Assumption 5 (Agent's capability [10], [13]). Each agent can determine the bin to which it belongs, and know the locations of neighbour bins so that it can navigate toward any of these bins. The agent is capable of collision avoidance behaviours against other agents or obstacles.

B. Problem Statement
The objective of the swarm distribution guidance problem considered in this paper is to distribute a set of agents A over a set of bins B by the Markov matrix M j k in a manner that holds the following desired features: Desired Feature 1. The swarm distribution µ k asymptotically converges to the desired swarm distribution Θ as time instant k goes to infinity.
Desired Feature 2. Transitions of the agents between the bins are controlled in a way that M j k becomes close to I as µ k converges to Θ. This implies that the agents are settled down after Θ is achieved, and thus unnecessary transitions can be reduced. Moreover, the agents identify and compensate any partial loss or failure of the swarm distribution.
Desired Feature 3. For each agent in bin B i , the information required for generating time-varying stochastic decision policies is not global information (e.g., µ k ) but locally available information within A N k (i) . Thereby, the resultant time-inhomogeneous Markov process is based on LICA, and has benefits such as a shorter timescale for obtaining new information (than GICA), the potential for an asynchronous process, etc.
Remark 1. One of our main contributions is to provide Desired Feature 3 as well as to retain Desired Features 1 and 2 by additionally adopting Assumption 3, which can be elicited from other assumptions in the existing literature.

III. A CLOSED-LOOP-TYPE FRAMEWORK USING LOCAL INFORMATION
This section proposes a LICA-based framework for the swarm distribution guidance problem. The framework is different from the recent closed-loop-type algorithms in [10], [13] in the sense that they utilise the global information (e.g., the current swarm distribution in Equation (1)) for constructing a time-inhomogeneous Markov matrix, whereas ours uses the local information in Equation (7). We present, in spite of using such relatively insufficient information, how the desired features described in the previous section can be achieved in the proposed framework. Before that, we introduce the biological idea, which is about decision-making mechanisms of a fish swarm, that inspires this framework to particularly attain Desired Feature 3. In addition, we explicitly provide the design requirements for a Markov matrix in order for prospective users to easily incorporate their own specific objectives into this framework.

A. The Biological Inspiration
For a swarm of fishes, it has commonly been assumed that their crowdedness limits their perception ranges over other members, and their cardinality restricts the capacity for individual recognition [25]. How fishes end up with collective behaviours is different from the ways of other social species such as bees and ants, which are known to use recruitment signals for the guidance of the entire swarm [29], [30]. Thus, in biology domain, a question naturally has arisen about the mechanism of fishes' decision-making in an environment where local information is only available and information transfer between members does not explicitly happen [24]- [27], [31], [32].
It has been experimentally shown that fishes' swimming activities vary depending on their perceivable neighbours. According to [31], fishes have the tendency to maintain their statuses (e.g., position, speed, and heading angle) relative to those of other nearby fishes, which results in their organised formation structures. In addition, it is presented in [32] that spatial density of fishes has influences on both the minimum distances between them and the primary orientation of the fish school.
Based on this knowledge, the works in [24]- [27] suggest individual-based models to further understand the collective behavioural mechanisms of fishes: for example, their repelling, attracting, and orientating behaviours [24], [26]; how the density of informed fishes affects the elongation of the formation structure [25]; and group-size choices [27]. The common and fundamental characteristic of these models is that every agent maintains or adjusts its personal status with consideration of those of other individuals within its limited perception range.
As inspired by the understanding of fishes, we believe that there must be an enhanced swarm distribution guidance approach in which each agent only needs to keep its relative status by using local information available from its nearby neighbours. In this approach, a global information is not necessary to be known by agents, and thereby the corresponding requirement of extensive information sharing over all the agents can be alleviated.

B. Fundamental Idea of the Proposed Approach
Suppose that each agent in bin B i is required to keep its local statusμ k [i], which we referred to as the current local swarm density at bin B i , at the value of the corresponding  [33]). In the proposed framework, agents in the bin only need to obtain the local information from other agents in its neighbour bins (shaded). Note that each square indicates each bin, and the red arrow between two bins B i and B l means locally-desired swarm densityΘ [i]. They are respectively defined as follows: where n k [i] is the number of agents such that a j k [i] = 1; and .
We use the termμ j k [i] as an estimate ofμ k [i] by agent j, which can be obtained through a distributed information consensus algorithm [10], [11], [28].
The fundamental idea of the proposed approach is to make each agent j in bin B i : (i) only need to estimate the difference ofμ k [i] and Θ[i], which are both locally-available information within N k (i); and (ii) more reluctant to deviate from the current bin as the difference becomes smaller (i.e., M j where α > 0 and ξ > 0 are design parameters. We call this gain primary local-feedback gain because it is utilised to control the primary guidance matrix P j k (shown in the next subsection). (4) is equivalent to the i-th element of the following vector:

Remark 2. Equation
Namely, Here, we intentionally introduce Equation (7) for ease of comparison with the information required for feedback gains in the existing literature (e.g., Equation (1)). From this, it is implied that, in order for each agent in bin B i to estimateμ k (i)[i] (i.e., the current local swarm densityμ k [i]), the set of other agents whose information is necessary is restricted within A N k (i) . That is, each agent needs to have neither a large perception radius nor an extensive information consensus process over the entire agents.

C. A LICA-based Closed-loop-type Framework
This subsection presents our closed-loop-type framework based on locally-available information feedbacks. The basic form of the stochastic decision policy for agent j in bin B i is such that is the weighting factor to have different weights on the agent's primary decision policy P j k [i, l] ∈ P and secondary decision policy where τ j is a design parameter; and , while diminishing as time instant k goes to infinity.
Equation (8) can be represented in matrix form as where P j k ∈ P n bin ×n bin and S j k ∈ P n bin ×n bin are rowstochastic matrices, called primary guidance matrix and secondary guidance matrix, respectively. W j k ∈ R n bin ×n bin is a diagonal matrix such that diag(W j k ) = (ω j k [1], ..., ω j k [n bin ]). The stochastic state vector of agent j is governed by the Markov process: For now, we claim that, in order for this Markov system to achieve Desired Features 1-3, P j k must satisfy the following requirements. Requirement 1. P j k is a matrix with row sums equal to one, i.e., In fact, P j k needs to be row-stochastic, for which it should further hold that P j k [i, l] ≥ 0, ∀i, l. Note that this constraint is implied by (R4), which will be introduced later.
Requirement 2. All diagonal elements are positive, i.e., Requirement 3. The stationary distribution of P j k is the desired swarm distribution Θ, i.e., With consideration of (R1), this requirement can be fulfilled by A Markov process satisfying this property is said to be reversible.
Note that C k is already assumed to be irreducible in Assumption 3.
Requirement 5. P j k becomes close to I asμ j k converges tō Θ, i.e., Depending on the objectives of a user, can be designed differently under given specific constraints. As long as P j k holds (R1)-(R5) for all time instant k and all agent j ∈ A, the aforementioned desired features are achieved. Note that (R1)-(R4) are associated with Desired Feature 1, whereas (R5) is with Desired Feature 2. The detailed analysis will be described in the next subsection.
Every agent executes the following algorithm at every time instant. The detail regarding Line 6-8 will be presented in Section IV, which shows examples of how to implement this framework.  (8); // Individually behave based on the policy 11: Generate a random number z ∈ unif[0, 1]; 12: Select bin B q such that 13: Move to the selected bin;

D. Analysis
We first show that the Markov process in Equation (11) holds Desired Feature 1 under the assumption that P j k satisfies the requirements (R1)-(R4) for each time instant. The stochastic state of agent j at time instant k ≥ k 0 , governed by the Markov process from an arbitrary initial state x j k0 , can be written as: For ease of analysis, we assume that every agent j knows any necessary information correctly, i.e.,μ j Theorem 1. Provided that the requirements (R1)-(R4) are satisfied for all time instants k ≥ k 0 , it holds that lim k→∞ x j k = Θ pointwise for all agents, irrespective of the initial condition.
Proof. This claim can be proved by following similar steps in proving [13,Theorem 4]. The claim is true if lim k→∞ x j k = x j k0 · lim k→∞ U j k0,k = x j k0 · 1 Θ = Θ. In order for that, the matrix product U j k0,k should (i) be strongly ergodic and (ii) have Θ as its unique limit vector, i.e., lim k→∞ U j k0,k = 1 Θ. We will show that the two conditions are valid under the assumption that (R1)-(R4) are satisfied.
Lemma 5 in Appendix describes the characteristics of M j k and U j k0,k , which will be used for the rest of this proof. From this lemma, (a) U j k0,k is primitive (thus, regular); (b) there exists a positive lower bound γ for M j k , ∀k; and (c) M j k is asymptotically homogeneous. Then, from [33,Theorem 4.15,p.150] it follows that U j k0,k is strongly ergodic, which fulfils the condition (i).
Let e k ∈ P n bin be the unique stationary distribution vector corresponding to M j k (i.e., e k M j k = e k ). Due to the prior condition (b) and the fact that (d) M j k is irreducible for ∀k ≥ k 0 , it follows from [33,Theorem 4.12,p.149] that the asymptotical homogeneity of M j k with respect to Θ (i.e., lim k→∞ ΘM j k = Θ) is equivalent to lim k→∞ e k = e and Θ = e, where e is a limit vector. According to [33,Corollary,p.150], under the prior conditions (b) and (d), if U j k0,k is strongly ergodic with its unique limit vector v, then v = e. Hence, it turns out that the unique limit vector of U j k0,k is Θ (i.e, lim k→∞ U j k0,k = 1 Θ). Thereby, the condition (ii) is also fulfilled.
Theorem 1 implies that the stochastic state of any agent eventually converges to the desired swarm distribution, regardless of S j k , G j k [i] and (R5). In other words, even if (R5) is not satisfied, the Markov system can converge to Θ. However, the system induces unnecessary transitions of agents even after being close enough to the desired swarm distribution, which means that Desired Feature 2 does not hold.
For now, we present that Desired Feature 2 can be obtained by (R5) and Theorem 2, which will be described later. Suppose that, for every bin B i ,μ k [i] converges to and eventually reachesΘ[i] at some time instant k. The following shows that at this moment it also holds that µ k reaches Θ. From Equations (4)-(5) and the supposition ofμ ∀i. This can be rearranged as: where X ∈ R n bin ×n bin is a diagonal matrix such that diag(X) = (1/Θ [1], 1/Θ [2], ..., 1/Θ[n bin ]); C k is the communicational connectivity matrix (in Definition 7); and n k ∈ R n bin is a row vector such that the i-th element indicates n k [i], i.e., the number of agents in bin B i at time instant k.
Lemma 1. Given n bin bins communicationally-connected as a tree-type topology, the rank of its corresponding matrix B in Equation (13) is n bin − 1. Proof. The matrix B ∈ R n bin ×n bin can be linearly decomposed into n e of the same-sized matrices B (i,j) , where n e is the number of edges in the underlying graph of C k . Here, ; and all the other entries are zero. For example, consider that four bins are given and connected as shown in Figure 2 It is trivial that the rank of every B (i,j) is one, and the matrix has only one linearly independent column vector, denoted by v (i,j) . Without loss of generality, we consider v (i,j) ∈ R n bin as a column vector such that the i-th entry is − 1 Θ[i] , the j-th entry is 1 Θ[j] , and the others are zero: for an instance, v (1,2) . It is obvious that v (i,j) and v (k,l) are linearly independent when the bin pairs {i, j} and {k, l} are different. This implies that the number of linearly independent column vectors of B is the same as that of edges in the topology. Hence, for a treetype topology of n bin bins, since there exist n bin − 1 edges, the rank of the corresponding matrix B is n bin − 1.
Lemma 2. Given a strongly-connected topology of bins, the rank of its corresponding matrix B is not affected by adding a new edge that directly connects any two existing bins.
Proof. We will show that this claim is valid even when a tree-type topology is given, as it is a sufficient condition for strong-connectivity. Given the tree-type topology in Figure  2(a), suppose that bin B 1 and B 4 are newly connected. Then, the new topology becomes as shown in Figure 2(b), and it has new corresponding matrix B new , where B new = B + B (1,4) . As explained in the proof of Lemma 1, the rank of B (1,4) is one and it has only linearly independent vector v (1,4) . However, this vector can be produced as a linear combination of the existing v vectors of B (i.e., v (1,4) = v (1,2) + v (2,4) ). Thus, the rank of B new retains that of B. Without loss of generality, this implies that the rank of B of a given strongly-connected topology is not affected by adding a new edge that directly connects any two existing bins.
Thanks to Lemma 1 and 2, we end up with the following corollary and theorem: Corollary 1. Given n bin bins that are communicationally strongly-connected, the rank of its corresponding B is n bin −1.
Theorem 2. Given n bin bins that are communicationally strongly-connected, convergence ofμ k toΘ is equivalent to convergence of µ k to Θ.
Proof. From Equation (5), it can be said that Θ·B = 0. When µ k [i] is assumed to converge toΘ[i] at some time instant k for every bin B i , Equation (13) is valid (i.e., n k · B = 0). Since the nullity of B is one, due to Corollary 1, there is only one linearly-independent row-vector a ∈ R n bin such that a·B = 0. Hence, it is obvious that n k = · Θ, where is an arbitrary scalar value. This implies that µ k [i] = n k [i]/n A k = Θ[i], ∀i : B i ∈ B. Therefore, convergence of µ k to Θ is equivalent to convergence ofμ k toΘ.
From this theorem and (R5), Desired Feature 2 finally holds.
Corollary 2. If P j k satisfies (R5), it can be said from Theorem 2 that P j k becomes I as µ k converges to Θ. And this is also the case for the Markov process M j k , which satisfies Desired Feature 2.
In order for each agent j in bin B i to generate the timevarying stochastic decision policy M j k [i, l] in Equation (8), the agent only needs to obtain its local information within A N k (i) . Therefore, Desired Feature 3 is also achieved.
Remark 3 (Robustness against dynamic changes of agents and those of bins). The proposed framework is robust with against dynamic changes in the number of agents and bins. Similarly to what is claimed in [13,Remark 8], as each agent behaves based on its current bin location and local information in a memoryless manner, Desired Features 1-3 in the proposed framework won't be affected by inclusion or exclusion of agents in a swarm. Furthermore, provided changes on bins are perceived by at least nearby agents in the corresponding neighbour bins, robustness against those changes can be hold in the proposed framework. This is because agents in bin B i utilise only local information such asΘ[i] andμ j k [i], and are not required to know any information from other far-away bins. Moreover, the proposed framework does not need to recalculate Θ, reflecting such changes on bins, so that ∀i Θ[i] = 1 because computingΘ[i] in (5) includes normalisation of Θ.

IV. IMPLEMENTATION EXAMPLES A. Example I: Minimising Travelling Expenses
This section provides examples on implementations of the framework proposed. In particular, this subsection addresses a problem of minimising travelling expenses of agents during convergence to a desired swarm distribution.
This problem can be defined as: given a cost matrix E k ∈ R n bin ×n bin in which each element E k [i, l] represents the travelling expense of an agent from bin B i to B l , find P j k such that subject to (R1)-(R5) and where  (14). Θ[l] enables agents in bin B i to be distributed over its neighbour bins in proportion to the desired swarm distribution. f (E k [i, l]) ∈ (0, 1] is a scalar that monotonically decreases as E k [i, l] increases (see Equation (29) for instances), encouraging agents in bin B i to avoid spending higher transition expenses. Note that we assume that E k is symmetric; E k [i, l] > 0 if A k [i, l] = 1; and its diagonal entries are zero.
Corollary 3. The optimal matrix P j k of the problem (P1) is given by: ∀i, l ∈ {1, ..., n bin } and i = l, Proof. We can prove this by following the proof of [13,Corollary 1]. Suppose that the problem is only subject to (R4) and (14), without (R1)-(R3) and (R5). Then, the off-diagonal elements of an optimal matrix should be their corresponding lower bounds in (14) if C k [i, l] = 1. The diagonal elements of the matrix do not affect the objective function due to the fact that E k [i, i] = 0, ∀i. Accordingly, the matrix P j k that holds (16) and (17) is also an optimal matrix for the simplified problem.
For reducing unnecessary transitions of agents during this process, it is favourable that agents in bin B i such thatμ j k [i] ≤ Θ[i] (i.e., underpopulated) do not deviate. To this end, we set S j k = I and G j k [i] as follows [13]: .
The gain value is depicted in Figure 3(a) with regard to β.  This subsection presents an example in which the specific objective is to maximise the convergence rate under upper bounds regarding transitions of agents between bins, denoted by upper flux bounds. The bounds can be interpreted as safety constraints in terms of collision avoidance and congestion: higher congestions may induce higher collisions amongst agents, which may bring unfavourable effects on system performance. A similar problem is addressed by an open-looptype algorithm in [17], where transitions of agents are limited only at a desired swarm distribution. This restriction is not for considering the aforementioned safety constraints, but rather for mitigating the trade-off between convergence rate and longterm system efficiency.
For the sake of imposing upper flux bounds during the entire process, we consider the following one-way flux constraint: This means that the number of agents moving from bin B i to B l is upper-bounded by c (i,l) . The bound value is assumed to be very small with consideration of mission environments such as the number of agents, the number of bins, and their topology. Otherwise, all the agents can be distributed over the bins very soon so that the upper flux bounds become meaningless, and thus the corresponding problem can be trivial.
Regarding the convergence rate of a Markov chain, there are respective analytical methods depending on whether it is time-homogeneous or time-inhomogeneous. For a timehomogeneous Markov chain, if the matrix is irreducible, the second largest eigenvalue of the matrix is used as an index indicating its asymptotic convergence rate [34, p.389]. In contrast, for a time-inhomogeneous Markov chain, coefficients of ergodicity can be utilised as a substitute for the second largest eigenvalue, which is not useful for this case [35]. Particularly, this paper uses the following proper coefficient of ergodicity, amongst others: The convergence rate of a time-inhomogeneous Markov chain M k ∈ P n×n , ∀k > 1 can be maximised by minimising τ (M k ) at each time instant k, thanks to [33, Theorem 4.8, p.137]: τ (M 1 M 2 · · · M r ) ≤ r k=1 τ (M k ). Hence, the objective of the specific problem considered in this subsection can be defined as: find P j k such that min τ (P j k ) subject to (R1)-(R5) and (21).
Remark 5 (Advantages of the coefficient of ergodicity in (22)). may have the trivial case such that τ 1 (P j k ) = 1 (or τ 2 (P j k ) = 1) for some time instant k, when they are applied to this problem. This is because, given a strongly-connected topology C k , there may exist a pair of bins B i and B l such that P j k [i, s] = 0 or P j k [l, s] = 0, ∀s. To avoid this trivial case, the work in [13] instead utilises τ 1 ((P j k ) d C k ) as the proper coefficient of ergodicity, where d C k denotes the diameter of the underlying graph of C k . However, this implies that agents in bin B i are required to additionally access the information from other bins beside N k (i), causing additional communicational costs. The coefficient of ergodicity in (22) does not suffer this issue. Note that τ (M) ≤ τ 1 (M) ≤ τ 2 (M) [33, p. 137].
Finding the optimal solution for the problem (23) is another challenging issue, which can be called fastest mixing Markov chain problem. Since the purpose of this section is to show an example of how to implement our proposed framework, we heuristically address this problem at this moment.
Suppose that matrix P j k satisfying (R1)-(R5) is given, and the topology of bins is not fully-connected. Since the matrix is non-negative and there exists at least one zerovalue entry in each column, the coefficient of ergodicity can be said as τ (P j k ) = max ∀i,∀s (P j k [i, s]). Assuming that max ∀l =i P j k [i, l] ≤ 1/|N k (i)|, which is generally true due to the smallness of c (i,l) , it turns out that each diagonal element of P j k is the largest value in each row. Thus, we can say that τ (P j k ) = max ∀i P j k [i, i]. The objective function of this problem can be said as maxmin ∀i ∀l =i P j k [i, l] because minimising the maximum diagonal element of a stochastic matrix is equivalent to maximising the minimum row-sum of its off-diagonal elements.
We turn now to the constraints (R1)-(R5) and (21). In order to comply with (R3), we initially set P j k is a symmetric matrix that we will design now. The constraint (21), (R4), and the symmetricity of Q k are integrated into the following constraint: ∀i, ∀l = i, For (R2) and (R5), we set the diagonal entries of P j k as ∀i. This can be rewritten, with consideration of (R1) (i.e., Then, the reduced problem can be defined as: subject to (24) and (25). The algorithm for this problem is shown in Algorithm 3. If we neglect (25), an optimal solution can be obtained by making Q j k [i, l] equal to its upper bound of (24) (Line 2). However, this solution may not hold (25). Thus, we lower the entries of Q j k to satisfy (25), while keeping them symmetric and as higher as possible (Line 3-9). In details, Line 3 (or Line 6) ensures the constraint (25)  , we obtain the corresponding lowering factor again (Line 5-6). The minimum value is taken for both maintaining Q j k symmetric and satisfying (25) (Line 7). Then, the corresponding stochastic decision policy is generated based on the resultant Q j k (Line 9-10). Note that we set G j k [i] = 0 for all time instants, all bins, and all agents, so M j k = P j k .

C. Example III: Local-information-based Quorum Model
This subsection shows that the proposed framework is able to incorporate a quorum model, which is introduced in [14], [15]. In this model, if a bin is overpopulated above a certain level of predefined threshold called quorum, the probabilities that agents in the bin move to neighbour bins are temporarily increased, rather than following given P j k . This feature eventually brings an advantage to the convergence performance of the swarm.
To this end, we set the secondary guidance matrix S j k as follows: ∀i, l ∈ {1, ..., n bin } and ∀j ∈ A, This matrix makes agents in a bin equally disseminated over its neighbour bins. In addition, the secondary feedback gain where γ > 0 is a design parameter, and q i > 1 is the quorum for bin B i . The value of the gain is shown in Figure 3(b), varying depending on γ and q i . Asμ j k [i]/Θ[i] becomes higher than the quorum, G j k [i] gets close to 1 (i.e., S j k [i, l] becomes more dominant than P j k [i, l]). The steepness of the function at the quorum value is regulated by γ.
The existing quorum models in [14], [15] require each agent to know µ k [i], which implies that the total number of agents n A k should be tracked in real time. It could be possible that some agents in a swarm unexpectedly become faulted by internal or external effects during a mission, which hinders for other alive agents from keeping track of n A k in a timely manner. On the contrary, this requirement is not the case for the quorum model in this subsection, and it works by using the local information available from A N k (i) .

V. ASYNCHRONOUS IMPLEMENTATION
A synchronous process induces extra time delays and interagent communications to make entire agents, who may have different timescales for obtaining new information and make decisions, remain in sync. Such unnecessary waiting time and communications may cause unfavourable effects on mission performance or even may not be realisable in practice [23].
In the previous sections, it was assumed that a swarm of agents act synchronously at every time instant. Here we show that the proposed framework allows agents to operate in an asynchronous manner, assuming that the union of underlying graphs of the corresponding Markov matrices across some time intervals is frequently and infinitely strongly-connected.
Suppose that an algorithm to compute P j k that satisfies (R1)-(R5) in a synchronous environment is given (e.g., Algorithm 2 or 3). We propose an asynchronous implementation, as shown in Algorithm 5, which substitutes for Line 6 in Algorithm Algorithm 5 Asynchronous Construction of P j k [i, l] (Substitute for Line 6 of Algorithm 1) Compute P j k [i, l] as usual, ∀B l ∈ R + k \ {B i }; 3: We refer to a set of bins where agents are ready to use their respective local information (e.g.,μ j k [i]) as R + k , and a set of the other bins as R − k . It is assumed that each agent j in bin B i ∈ R + k also knows the local information of its neighbour bin B l ∈ N k (i) if B l ∈ R + k . As shown in Line 2, the agent follows an existing procedure as long as all information required to generate P j k [i, l] is available (e.g.,μ j k [i] andμ j k [l] for Algorithm 2, and¯ Q [i] and¯ Q [l] for Algorithm 3). On the contrary, if any local information of its neighbour bin B l ∈ N k (i) is not available, the probability to transition to the bin is set as zero (Line 3). In the meantime, each agent for whom necessary local information is not ready does not deviate but remains at the bin it belonged to. Equivalently, it can be said that P j k [i, i] = 1 and P j k [i, l] = 0, ∀l = i (Line 6). Hereafter, for the sake of differentiation from the original P j k generated in a synchronous environment, let us refer to the matrix resulted by Algorithm 5 as asynchronous primary guidance matrix, denoted byP j k . Accordingly, the asynchronous Markov matrix can be defined as: Here, we show that this asynchronous Markov process also converges to the desired swarm distribution.
, ∀l. Proof. The matrixP j k is row-stochastic because of Line 4 and 6 in Algorithm 5. Furthermore, given that P j k satisfies (R2), the property (2) Let us now turn to the property (3). For ∀B i ∈ R − k , it is trivial that . We apply the findings into the following equation: The first term of the right hand side becomes zero because of (i). Due to (ii) and the fact that because of (iii). Putting all of them together, Equation (28) is equivalent to Θ[i] ∀l: Lemma 4. If the union of a set of underlying graphs of {P k1 ,P k1+1 , ...,P k2−1 } is strongly-connected, then the matrix productP k1,k2 :=P k1Pk1+1 · · ·P k2−1 is irreducible.
Theorem 3. Suppose that there exists an infinite sequence of non-overlapping time intervals [k i , k i+1 ), i = 0, 1, 2, ..., such that the union of underlying graphs of {P ki ,P ki+1 , ...,P ki+1−1 } in each interval is stronglyconnected. Let the stochastic state of agent j at time instant k ≥ k 0 , governed by the corresponding Markov process from an arbitrary state x j k0 , be x j k = x j k0Ū j k0,k := x j k0M j k0M j k0+1 · · ·M j k−1 . Then, it holds that lim k→∞ x j k = Θ pointwise for all agents, irrespective of the initial condition.
Proof. Thanks to Lemma 3 and 4, the matrix productP ki,ki+1 for each time interval [k i , k i+1 ) satisfies (R1)-(R4). Therefore, one can prove this theorem by similarly following the proof of Theorem 1.

VI. NUMERICAL EXPERIMENTS
A. Effects of Primary Local-feedback Gainξ j k [i] Depending on the shape of primary feedback gainξ j k [i], the performance of the proposed framework changes, especially with respect to convergence rate, fraction of transitioning agents, and residual convergence error. Let us first investigate the effect of changes in the feedback gain using Algorithm 2 with Equation (20).
We consider a scenario where a set of 2, 000 agents are supposed to be distributed over an arena consisting of 10 × 10 bins, as depicted in Figure 1. There are vertical and horizontal paths between adjacent bins. Note that the agents are allowed to move at most 3 paths away within a unit time instant. All the agents start from a bin, which reflects the fact that they are generally deployed from a base station at the beginning of a mission. The desired swarm distribution Θ is uniformrandomly generated at each scenario. The agents are assumed to estimate necessary information correctly, e.g.μ j . For the rigorous validation, the performance of the proposed algorithm will be compared with that of the GICA-based algorithm [13]. To this end, f (E k [i, l]) is set to be the same as the corresponding coefficient in [13,Corollary 1]: where E k,max is the maximum element of the travelling expense matrix E k , and E is a user-design parameter. E k [i, l] is defined as a linear function based on the distance between bin B i and B l : where ∆s (i,l) is the minimum required number of paths from B i to B l ; E1 and E0 are user-design parameters. The agents  (18); and τ j = 10 −6 in (9).
As a performance index for the closeness between the current swarm distribution µ k and Θ, we use Hellinger Distance, i.e., Hellinger Distance is known as a "concept of measuring similarity between two distributions" [37] and is utilised as a feedback gain in the existing work [13]. More importantly, to examine the effects of the shape of ξ j k [i], we set α in (6) as 0.2, 0.4, 0.6, 0.8, 1 and 1.2. Figure 4 reveals that the convergence rate can be traded off against the fraction of transitioning agents and the residual convergence error. Asξ j k [i] becomes more concave, i.e. the value of α decreases, the summation of off-diagonal entries of P j k becomes higher, leading to more transitioning agents, but a faster convergence rate. At the same time, a higher value ofξ j k [i] even at a low value of |Θ[i]−µ j k [i]| gives rise to unnecessarily higher off-diagonal entries of P j k . Hence, the swarm tends to be prevented from converging to the desired swarm distribution properly, resulting in higher residual convergence error.

B. Comparison with a GICA-based Method
Let us now compare the LICA-based method for (P1) with the GICA-based method in [13]. The scenario considered is the same as the one in the previous subsection except for α = 0.6. Note that Θ in Remark 4 can control convergence rate, but is not discussed in [13]. For the fair comparison, Θ is applied to both the methods. We conduct 100 runs of Monte Carlo experiments. Figure  5 presents the results of one representative scenario and the statistical results of the Monte Carlo experiments are shown in Figure 6. According to Figure 5(a), the convergence rate of the proposed method is slower at the initial phase, but similar to that of the the GICA-based method as reaching D H (Θ, µ k ) = 0.10. This is confirmed by the statistical results in Figure 6(a), where the ratio of the required time instants for converging to D H (Θ, µ k ) ∈ {0.30, 0.28, ..., 0.12, 0.10} in the LICA-based method to those of the GICA-based method is presented. At this point, it is worth noting that these convergence rate results are presented in respect to time instants of each Markov process. As the LICA-based framework may have a much shorter timescale, its convergence performance in practice could be better than that of the GICA-based method. Figure 5(c) shows that the cumulative travel expenses are smaller in the proposed method. The expenses by the proposed method and those by the compared method are 1.72 × 10 4 and 1.96 × 10 4 , respectively, and their ratio is 0.878. This is also confirmed by the statistical result in Figure 6(b). A possible explanation is that when some of the bins do not meet their desired swarm densities, the entire agents in the GICA-based method would obtain higher feedback gains, which might lead to unnecessary transitions. On the contrary, this is not the case in the LICA-based method since agents are only affected by their neighbour bins.

C. Robustness in Asynchronous Environments
This subsection investigates the effects of asynchronous environments in the proposed LICA-based method for (P1) and compares them with those in the GICA-based method in [13]. Hence, a realistic scenario where an asynchronous process is required is considered: in the scenario, it is assumed that agents in some bins cannot communicate for some reason (such bins are called blocked) and thus other agents in normal bins have to perform their own process without waiting them.
The proportion of blocked bins to the entire bins is set to be different values, i.e. 0%, 10%, 20% and 30%. At each time instant, the corresponding proportion of bins are randomly selected as blocked bins. For the proposed framework, the asynchronous implementation in Section V is built upon Algorithm 2. In the GICA-based method, for the comparison purpose only, it is assumed that agents in normal bins obtain µ j k = µ k without interacting with agents in the blocked bins. The rest of scenario setting are the same as those in Section VI-B. Figure 7 illustrates the performance of each method: convergence rate, fraction of transitioning agents, and cumulative travel expenses. As the proportion of the blocked bins increases, the GICA-based method tends to have faster convergence speed, whereas it loses Desired Feature 2 and thus increases cumulative travelling expenses (as shown in Figure  7(a), 7(b), and 7(c), respectively). On the contrary, the LICAbased method shows a graceful degradation in terms of Desired Feature 2 (as shown in Figure 7(b)). A possible explanation for these results could be that higher feedback gains due to the communication disconnection induce faster convergence performance in each method than the normal situation. This effect  is dominant for the GICA-based method because it affects the entire agents, who use global information. However, in the LICA-based framework, the communication disconnection only locally influences so that its effectiveness is relatively modest.

D. Demonstration of Example II and III
This subsection demonstrates the LICA-based method for (P2) (i.e., Algorithm 3) and the quorum model (i.e., Algorithm 4). For the former, we consider a scenario where 10, 000 agents and an arena consisting of 10 × 10 bins are given. The arena is as depicted in Figure 1, where the agents are allowed to move only one path away within a unit time instant. For each one-way path, the upper flux bound per time instant is set as 20 agents (i.e., c (i,l) = 20, ∀i = l). All the agents start from a bin, and the desired swarm distribution is uniform-randomly generated.
For the latter, we build the quorum model upon the LICAbased method for (P2). This can be a good strategy for a user who wants to achieve not only faster convergence rate but also lower unnecessary transitions after equilibrium, which are regulated by the upper flux bounds. Thus, in the same scenario described above, we will demonstrate the combined algorithm that computes S j k and G j k by Algorithm 4 and P j k by Algorithm 3. We set q i = 1.3 and γ = 30 for (27); α = 1 and ξ = 10 −9 for (6); and τ j = 10 −6 for (9). Figure 8(a) and 8(b) presents that the both approaches make the swarm converge to the desired swarm distribution. It is observed that the number of transitioning agents in the method for (P2) are restricted because of the upper flux bound during the entire process. Meanwhile, the quorum-based method very quickly disseminates the agents, who are initially at one bin, over other bins, and thus the fraction of transitioning agents is very high at the initial phase. After that, the population fraction by the quorum-based method drops and remains as low as that by the method for (P2). Figure 8(c) presents the maximum value amongst the number of transitioning agents via each (one-way) path. The red line indicates the actual result by the method for (P2), while the green line indicates the corresponding probabilistic value (i.e., max ∀i max ∀l =i n k [i]P j k [i, l]). It is shown that the stochastic decision policies reflect the given upper bound, meanwhile this bound is often violated in practice due to the finite cardinality of the agents. However, the result in the same scenario with setting |A| = 100, 000 and c (i,l) = 200, ∀i = l (denoted by Case 2), depicted by the blue and magenta lines in Figure 8(c), suggests that such violation can be mitigated as the number of given agents increases.

VII. CONCLUSION
This paper poposed a LICA-based closed-loop-type framework for probabilistic swarm distribution guidance. Since the feedback gains can be generated based on local information, agents have shorter and different timescales for using new information, and can incorporate an asynchronous decisionmaking process. Even using local information, the proposed framework converges to a desired density distribution, while maintaining scalability, robustness, and long-term system efficiency. Furthermore, the numerical experiments have showed that the proposed framework is suitable for a realistic environment where communication between agents is partially and temporarily disconnected. This paper has explicitly presented the design requirements for the Markov matrix to hold all these advantages, and has provided specific problem examples of how to implement this framework.
Future works include optimisation ofξ j k [i], which can mitigate the trade-off between convergence rate and residual error. In addition, it is expected that the communication cost required for the proposed framework can be reduced by incorporating a vision-based local density estimation [38].

ACKNOWLEDGEMENT
The authors gratefully acknowledge that this research was supported by International Joint Research Programme with Chungnam National University (No. EFA3004Z). Thanks to Sangjun Bae for supportive discussions.

APPENDIX
A. Regarding the Convergence Analysis in Theorem 1 Definition 9 (Irreducible). A matrix is reducible if and only if its associated digraph is not strongly connected. A matrix that is not reducible is irreducible.
Definition 10 (Primitive). A primitive matrix is a square nonnegative matrix A such that for every i, j there exists k > 0 such that A k [i, j] > 0.
Definition 11 (Regular). A regular matrix is a stochastic matrix such that all the entries of some power of the matrix are positive.
Definition 12. [33, pp.92, 149] [13] (Asymptotic Homogeneity) "A sequence of stochastic matrices M k ∈ M n×n , k ≥ k 0 , is said to be asymptotically homogeneous (with respect to d) if there exists a row-stochastic vector d ∈ P n such that lim k→∞ dM k = d." The matrix product U k0,k := M k0 M k0+1 · · · M k−1 , formed from a sequence of stochastic matrices M r ∈ P n×n , r ≥ k 0 , is said to be strongly ergodic if for each i, l, r we get lim r→∞ U k0,r [i, l] = v[l], where v ∈ P n is a rowstochastic vector. Here, v is called its unique limit vector (i.e., lim r→∞ U k0,r = 1 v). Lemma 5. Given the requirements (R1)-(R4) are satisfied, M j k in Equation (11) has the following properties: 1) row-stochastic; 2) irreducible; 3) all diagonal elements are positive, and all other elements are non-negative; 4) there is a positive lower bound γ such that 0 < γ ≤ min + i,l M j k [i, l] (Note that min + refers to the minimum of the positive elements); 5) asymptotically homogeneous with respect to Θ. In addition, U j k0,k in Equation (12) has the following properties: 6) irreducible; 7) all diagonal elements are positive, and all other elements are non-negative; 8) primitive [39,Lemma 8.5.4,p.541]