Noise-resistant and scalable collective preference learning via ranked voting in swarm robotics

Swarm robotics studies how to use large groups of cooperating robots to perform designated tasks. Given the need for scalability, individual members of the swarm usually have only limited sensory capabilities, which can be unreliable in noisy situations. One way to address this shortcoming is via collective decision-making, and utilizing peer-to-peer local interactions to enhance the behavioral performances of the whole swarm of intelligent agents. In this paper, we address a collective preference learning scenario, where agents seek to rank a series of given sites according to a preference order. We have proposed and tested a novel ranked voting-based strategy to perform the designated task. We use two variants of a belief fusion-based strategy as benchmarks. We compare the considered algorithms in terms of accuracy and precision of decisions as well as the convergence time. We have tested the considered algorithms in various noise levels, evidence rates, and swarm sizes. We have concluded that the proposed ranked voting approach is significantly cheaper and more accurate, at the cost of less precision and longer convergence time. It is especially advantageous compared to the benchmark when facing high noise or large swarm size.


Introduction and related works
Swarm robotics refers to a design paradigm that employs the intelligent collective behavior of a group of robots to achieve a designated task (Brambilla et al., 2013). Inspirations are taken from natural intelligent swarms, such as insect colonies and fish schools, who can effectively pool information from agents with poor individual capabilities and All data generated or analyzed during this study are included in this published article. display complex collective behaviors without centralized control mechanisms (Camazine et al., 2020). Collective decision-making is a field within the study of swarm intelligence, which focuses on the process where a swarm of intelligent agents achieve a global decision via only local interactions among each other and with the environment. This field of study has its roots in attempting to model and understand natural intelligent swarms (Garnier et al., 2007) and has also been increasingly utilized to construct decision-making strategies for artificial intelligent swarms (Dorigo et al., 2021).
Best-of-n problems refer to collective decision-making problems that focus on discrete consensus forming (Valentini et al., 2017). Site selection is a long-standing studied scenario among best-of-n problems. It takes inspirations from the house-hunting behaviors of honey bees (Garnier et al., 2007), which is an example of decentralized decision-making in natural intelligent swarms. Similar scenarios are used to gauge the capabilities of artificial intelligent swarms in collective decision-making. The experimental setup of site selection problems started with binary environments with two sites in the arena (Parker & Zhang, 2009, 2011 and have gradually evolved into multi-site environments (Lee et al., 2018;Talamali et al., 2019). Recently, there has been a trend in the broader collective decision-making research to move from simple binary decisionmaking scenarios and toward enabling the agents to make more complicated decisions, such as multicolor collective perception (Ebert et al., 2018;Bartashevich & Mostaghim, 2021), collective estimation (Strobel et al., 2018;, and the aforementioned multi-option site selection. In swarm robotics, learning the ranking of a number of options according to their relative preference is an important operation, that has many real-life applications and can also serve as building blocks for more complex behaviors. To perform such preference learning in a swarm robotics setting, the robots need to converge to a consensus regarding the ranking of available options using a distributed and localized strategy. In this paper, we tackle such a collective preference learning problem, where the group of agents is tasked with ranking the available options from best to worst. A similar problem has been addressed in a non-physics-based environment by Crosscombe and Lawry (2021). They have proposed a belief fusion-based algorithm to achieve consensus in the ranking.
Another potential source of inspiration for collective decision-making strategies is the election process, where the voters collectively decide among the available candidates (Tideman, 2017). However, for the distributed decision-making processes of intelligent swarms, a centralized tallying of all the ballots cannot be performed. Thus, in swarm intelligence settings, majority-voting-based decision-making strategies usually implement local-scale voting among small groups of agents, such as in Direct Modulation of Majority-based Decisions (Valentini et al., 2015(Valentini et al., , 2016, which gives good performances in various binary decision-making scenarios. However, in more complex scenarios, a simple majority voting tends to be insufficient. Here, we focus on a ranked voting system, Borda count, which was proposed by Jean-Charles de Borda in the eighteenth century (Emerson, 2013). Consensus formation using iterative pairwise voting has been studied in the context of social networks (Hassanzadeh et al., 2013;Brill et al., 2016;Guha & Dasgupta, 2021), where its ability to converge the agents with different opinions to a consensus was proven. Similar ranked voting techniques have also already been utilized in another collective decision-making scenario, discrete collective estimation, by .
In this paper, we seek to apply a ranked voting-based decision-making strategy to perform collective preference learning, with the aforementioned belief fusion strategy as a baseline. We will test the considered algorithms in physics-based simulated environments with different noise values, rate of evidence, and swarm size.
The structure of the paper is as follows. In Sect. 2, we will introduce the collective preference learning problem we are investigating in this paper and the background of the algorithms investigated. In Sect. 3, we will show the two considered algorithms in this paper in detail. Section 4 includes the experimental results. Section 5 is our analysis and discussion on the experimental results. And finally, Sect. 6 is the conclusion.

Problem statement
We investigate a preference learning problem inspired by classical site selection scenarios as well as the collective preference learning scenario investigated in Crosscombe and Lawry (2021). An illustration of the environment is shown in Fig. 1. There are K sites distributed over the arena. A swarm of N robot robots is shown in red. They roam the experimental environment and are tasked with ranking the sites from best to worst.
In our experimental settings, each site is associated with a fix index and quality, the latter indicated by the intensities of the gray color in Fig. 1. When an agent is over a site area, it has a probability of detecting the site per control loop, referred to as the evidence rate r e . When a site is detected, the agent records the index and the quality of the site. The former is measured accurately, while the latter is subject to an additive Gaussian noise, with the mean being 0 and the standard deviation being the noise level noise .
An agent has limited computational and memory resources and can only record the indices and measured qualities of two sites. The agent will thus obtain a pairwise comparison between them. Depending on the decision-making strategy, the agent records its own computed ranking among all sites. In addition, an agent has a limited communication radius and can only broadcast and receive messages to its peers within the communication radius.
We use the belief fusion algorithm proposed by Crosscombe and Lawry (2021) as a baseline. They encode the full ranking in a K × K matrix that indicates pairwise comparison between all available pairs of sites. An element o i,j in the matrix can take one of three values, 0 and 1 mean that the agent believes that quality i < quality j and quality i > quality j , respectively, while 1/2 means that the comparison is unknown. Their experiments have Fig. 1 Illustration of the simulated experimental environment used in this paper; gray circular areas represent the K = 8 sites, their color intensities represent their qualities; red dots indicate mobile robots roaming the arena (Color figure online) been conducted in a non-physics-based environment. At every control loop, every agent tries to obtain an unknown pairwise comparison between two sites with a success probability. Two agents from the swarm also perform belief fusion and combine their beliefs together. In our paper, we have applied this algorithm to our aforementioned environment.
Additionally, we seek to apply a ranked voting-based decision-making strategy to this problem. We have chosen Borda count (Emerson, 2013) as a promising ranked voting system. The original voting system works as follows. Each voter ranks all candidates according to the own preference, from best to worst. During the tallying process, every candidate receives a number of points according to every ballot. If there are n candidates, the most preferred candidate on a ballot receives n points, the next most preferred n − 1 points, and the least preferred 1 point. These points are added up for all candidates, and the candidate with the most points wins the voting. The same voting system can also be used to obtain a consensus in the ranking of the available candidates by looking at the ranking of the final tallied points. We use this approach in the collective preference learning problem. The details of our algorithm are shown in the Methodology section.

Methodology
In this section, we describe the algorithms considered in detail. We start with how the robots obtain pairwise quality comparisons from the raw quality reading. Then, we cover the state-of-the-art approach in solving similar collective preference learning problems. After that, we will introduce our proposed ranked voting algorithm. Finally, we define the computation method we use for the performance metrics.

Obtaining pairwise quality comparisons
In both considered algorithms, we use the same assumptions used by Crosscombe and Lawry (2021) on the cognitive capabilities of the robots considered. Robot n keeps track of the indices and qualities of two sites, expressed as follows: All four variables are initialized to −1 . At every control loop, every robot who is in the area of a site has a probability (evidence rate r e ) to detect the site and updates its recorded pairwise comparison using Algorithm 1.
Indices ∶ D n ={d n,1 , d n,2 } Qualities ∶ Q n ={q n,1 , q n,2 } if q n,1 < q n,2 then 12: Switch(d n,1 , d n,2 ); Switch(q n,1 , q n,2 ) 13: end if 14: end procedure The detected site index d * and measured quality q * are recorded. If d * is present in D n , the robot updates the associated quality value with q * (line 3-6). In this paper, the robots do not take repetitive measurements of the noisy site quality to determine the true value, as it is assumed that necessary procedures to minimize the noise have been implemented at low level. If one value in D n is −1 , indicating the position is empty, a new d * also fills the position (line 3-6). If both values in D n are filled and are not equal to d * , then one of the two positions is selected at random and filled with d * and q * (line 8-9). Finally, the robot always preserves the ordering q n,1 ≥ q n,2 , so that if this is no longer the case after updating then the values in both D n and Q n will be switched (line 12).

Benchmark algorithm: belief fusion
The state-of-the-art strategy to solve a collective preference learning problem is a belief fusion-based algorithm proposed by Crosscombe and Lawry (2021). They have experimented on two variants of it, one with an operation that preserves the transitivity in pairwise comparisons, and the other without, producing different results. We use a modified version of it with both variants considered as benchmarks in this paper. The detailed pseudocode is shown in Algorithm 2.

4:
if Robot n is on site k & Site detected with probability r e then if f t i.e. transitivity needs to be preserved then The belief matrix B n records the pairwise relationship between all possible pairs of sites. Element B n [k1, k2] can take one of three values, 1 when quality k1 > quality k2 , −1 when quality k1 < quality k2 , and 0 when the pairwise relationship is unknown or when k1 = k2 . The overall behavior of the robot is similar to in Algorithm 2. One important difference is in Algorithm 2 line 7, where the robot updates the belief matrix B n by modifying the corresponding elements. In the original version of the algorithm, a belief fusion operation changes the belief matrices of both robots, thus requiring bidirectional communication. We have modified this feature to keep the hardware requirements on the same level as our proposed algorithm. The robot broadcasts its belief matrix B n constantly to its nearby neighbors. In practice, due to B n [a, b] = −B n [b, a] , only half of the matrix needs to be transmitted. At every control loop, it picks up the belief matrix of a random neighbor and performs belief fusion to update its own belief matrix (Algorithm 2 line 11-12). The message transfers are peer-to-peer and pairwise. There are no requirements for the robots to be uniquely identifiable.
Another important operation in the belief fusion algorithm is the preservation of transitivity in pairwise comparisons in the belief matrix (Algorithm 2 line 14-24). Here, f t is a Boolean variable that marks this setting. The operation makes sure when the belief matrix B n records q a > q b and q b > q c , it will automatically also record q a > q c . Since the operation needs to traverse the whole matrix K times, it is an expensive operation with complexity scaling to K 3 and presents a trade-off between performance and computational resources needed. We have thus experimented on the benchmark belief fusion algorithm both with and without operations to preserve transitivity in the decision-making process for a full comparison with the proposed ranked voting algorithm.

Collective preference learning via ranked voting with Borda count tallying process
We will now introduce the proposed ranked voting-based decision-making strategy. The decision-making behavior of the robot n using the proposed strategy is shown in Algorithm 3, while Algorithms 4 and 5 are subroutines used in the algorithm. In this algorithm, the robot n encodes the ranking of all known sites in a list ranking n , which is empty at initialization.
The maximum length of ranking n is the total number of sites K. At every control loop, the robot attempts to detect a potential site. A site will only be detected when the robot is in the marked area of the site and a random variable satisfies the evidence rate r e (Algorithm 3 line 4). if Robot n is on site k & Site detected with probability r e then 5:

Algorithm 3 Collective Preference Learning using Ranked Voting
update pair(n, k, sample(N (quality k , σ 2 noise ))) # shown in Algorithm 1 6: if d n,1 >= 0 and d n,2 >= 0 then If a site is detected, the robot updates its internal record of a pairwise comparison using the procedure update_pair (Algorithm 3 line 5) introduced in Sect. 3.1 Algorithm 1, and with the index and measured quality of the detected site as input. After that, if both positions in its pairwise comparison are filled, the robot updates its computed ranking of all sites ranking n using the recorded pairwise comparison following the procedure update_ranking (Algorithm 3 line 6-7), which is shown in Algorithm 4. In procedure update_ranking , the robot seeks to insert the two sites in its recorded pairwise comparison D n into its computed ranking ranking n , while preserving the pairwise relationship (Algorithm 4 line 4,6,10). For example, inserting an element after that of d n,1 is done by inserting an element in a random position marked by downward arrows.
If both sites are present in ranking n , the robot checks if the rankings are complying with the pairwise relationship, and switches the rankings if not (Algorithm 4 line 12). An example of the switching operation is as follows.
The robot constantly broadcast its current computed ranking n to its neighbors in its communication radius. If a site is not detected, it randomly picks up a message sent by its neighbor, if one is present, and it performs an election to generate a new ranking n (Algorithm 3 line 9-11). We keep all interactions among the robots to a peer-to-peer and pairwise fashion similar to in the benchmark algorithm, such that the communication paradigms of the considered algorithms in this paper can be roughly similar. The differences between the message sizes of the considered algorithms depend on how the messages are encoded. For the benchmark belief fusion algorithm, the messages have 3 K(K−1)∕2 possible values, while for the proposed ranked voting algorithm, the messages have roughly (K + 1)! possible values. In this paper K = 8 , thus the possible message values are 3 28 and 362,880, respectively. When represented in binary, they can be represented in a minimum of 45bits and 19bits, respectively. However, this encoding method needs significant computational and storage resources to decode the messages during the operation of the algorithms. On the other hand, using the simplest method of encoding, where every value used is stored in a short int variable of 16bits. The messages' sizes would be 8K(K − 1) bits and 16K bits , respectively, and in this paper 448bits and 128bits. Thus, compared to the benchmark algorithm, the proposed ranked voting algorithm not only has lower requirements on the 1 3 communication bandwidth in the settings of this paper, the bandwidth also scales less rapidly when facing higher number of options. An election in this context is held with only two voters, the robot n and its chosen neighbor m. The detailed process is shown in Algorithm 5. In the election process, the rankings need to be transformed into the scores of all considered sites, which are stored in score n and score m for the two voters, respectively. The transformation is done in Algorithm 5 line 3-10. The corresponding score of a considered site is the ranking of it in ranking n or ranking m (Algorithm 5 line 5,8). The two score vectors must then be padded to contain the same sites, which are tracked by the vector candidates . The unranked candidates' indices are selected using Boolean indexing in Algorithm 5 line 11-12. This is different from when ranked voting is utilized in real-life elections. This is because when a real-life ranked voting ballot has missing entries, it means that the unranked candidates have lower preferences than all ranked candidates and hence can be given the highest rankings. However, in our algorithm, an unranked site has an unknown quality relative to the ranked sites. Therefore, we assigned them a temporary ranking that is half of the number of ranked sites (Algorithm 5 line 11-12), such that the resulting ranking of unranked sites only considers the opinion of the other robot.
The following example illustrates the aforementioned operations.
Keeping up with the example above, the following is an example of how result_ranking is produced.
Finally, the election results also need to be checked if they comply with the recorded pairwise comparison using update_ranking (Algorithm 3 line 12-13).
Overall, at the design level, the proposed algorithm uses less communication, storage, and computational resources compared to the benchmark algorithm based on belief fusion, especially the variant of it with the transitivity-preserving operation.

Evaluation metrics
In order to evaluate the performances of the two considered algorithms, we have to unify their outputs to the same format. The proposed ranked voting algorithm encodes the ranking in a vector with length of K, while the benchmark belief fusion algorithm records all pairwise relationships using a K × K matrix. Since the conversion from the latter to the former can result in information loss, we convert the rankings produced by the proposed ranked voting algorithm into a same-sized matrix containing all known pairwise relationships. The conversion is done using Algorithm 6. After unifying the outputs from the two considered algorithms, the output is compared to the belief matrix produced by the pairwise relationships of the true values of the sites B * . The error is defined as follows: Adding Score Vectors ∶ score total = [2, 6, 2, 2, 5, 5.5]

3
At initialization, all elements in B n are set to 0; hence, the error at initialization is Error 0 = K(K − 1) . In this paper K = 8 , thus the error at initialization is 56. The maximum error that can theoretically be reached is 2K(K − 1) , where every pairwise relationship in the matrix is the opposite of the correct value. In our paper, this value is 112. The lowest error that can be achieved is 0, where the ranking of every robot is exactly correct.
We also pay attention to the level of scatter in the produced decisions within the swarm. We define the quantity Scatter as the average error between the belief matrices computed by every robot and those of every other robot, as follows:

Experiments and results
In this section, we explain the experimental settings and results in detail. A swarm of N robot robots is simulated in a 3 m × 3 m 2D environment as shown in Fig. 1. The individual robots are programmed with the same low-level control mechanism to perform a random walk in the arena. A robot alternates between two modes of movement, walking forward in a straight line and rotating in place in a random direction. The two modes of movement have lengths that are randomly distributed, sampled from exp(40)s and unif(0, 4.5)s, respectively. In order to avoid collisions, a robot moving forward will abort its current movement and start turning if another robot or the edge of the arena is detected in front of it. The robots here are simulated with the mechanical specification of e-puck robots (Mondada et al., 2009) and have a linear speed of 0.16 m/s and a rotational speed of 0.75 rad/s. The control loops are 1 s long where the aforementioned decision-making algorithms are executed.
The K = 8 sites in the experimental environment are in fixed positions of (0.5, 0.5), (1.5, 0.5), (2.5, 0.5), (0.5, 1.5), (2.5, 1.5), (0.5, 2.5), (1.5, 2.5), and (2.5, 2.5), all with radii of 0.3 m. Their qualities are chosen from the array [0, 1, … , 7] randomly in every experimental instance. Noise N(0, 2 noise ) is added to the true qualities of the sites to simulate different levels of inaccuracies in the cognitive abilities of the robots. We observe the performances of considered algorithms in different environments by changing the experimental parameters noise , r e , and N robot . We gauge the performances via error and scatter at convergence, which is determined by the lowest error achieved during an experimental instance within a time limit of 2400 s. We compute the convergence time as the time step taken for the whole swarm to reach 90% of its peak performance, i.e., reach an error lower than Error Conv + (Error 0 − Error Conv ) * 0.1.

Performances of ranked voting algorithm with respect to noise and evidence rates
The mean error and scatter at convergence, together with the mean convergence time, across 20 experiments at every parameter combination for the proposed ranked voting algorithm at various noise level noise and evidence rate r e settings are shown in Table 1.
It can be observed from the mean error and mean scatter results that the noise level has a significant impact on the accuracy and precision performances of the proposed algorithm. As the noise level noise increases, there is a very clear increase in both mean error and mean scatter at convergence. However, for most noise level and evidence rate combinations, the mean scatter is consistently higher than the mean error at convergence. This shows an accurate but imprecise decision distribution from the proposed algorithm. At low r e values below 0.02 and at especially high noise levels, the relationship above can be reversed, and the error could be higher than the scatter at convergence. This is to be expected as at these r e values, the robots get very few observations. Coupled with a high noise level, erroneous pairwise observations tend not to be challenged, leading to inaccurate results.
At a particular noise level, the lowest mean errors and mean scatters are quite likely to be found on the middle range of evidence rates from 0.05 to 0.5, while both too low and too high an evidence rate can negatively affect the decision-making accuracy. Due to the stochasticity in the proposed algorithm's decision-making process, especially the random inserting of observed pairwise relationships in Algorithm 4, the proposed algorithm needs a certain number of pairwise opinion combination relative to the evidence input to enforce a consensus, which is harder to meet when the evidence rate is too high.
On the other hand, the mean convergence time is more affected by the evidence rate r e than by the noise level. When r e increases from 0.01 to 0.1, there is a very apparent drop in mean convergence time at every noise level. However, beyond an evidence rate of 0.1, the change in mean convergence time is more irregular. This, combined with evidence rate's effects on errors and scatters at convergence, shows that for the proposed algorithm, a lack of evidence can hamper the decision-making process, but too high an influx of evidence does not necessarily have a positive effect.

Comparison with the belief fusion benchmark at different noise levels
The performance distribution across 20 experimental runs of considered algorithms under different noise levels is shown in Fig. 2. The evidence rate r e is set to 0.2. The swarm size N robot is set to 30. We have also performed linear regression of the mean performances across all experimental runs at individual parameter settings against noise level and computed the gradient of the best-fitting linear function and the coefficient of determination ( R 2 ), the latter of which measures the level of linear relationship observed in the data. The results are shown in Table 2. In Fig. 2a, b, we see that all the three algorithms produce comparable errors and scatters at convergence when the noise is low at 0 or 0.5. As shown  in Fig. 2c, both variants of belief fusion are also able to converge within a shorter time compared to the proposed ranked voting algorithm. Among them, belief fusion with transitivity-preserving operations is the fastest. However, when the noise increases, the advantages of both belief-fusion-based algorithms begin to diminish. As shown in Fig. 2a, when noise level noise is in the range between 1 and 3, the error at convergence increases significantly for both variants of belief fusion, as the median error increases from around 0 to 24.4 for belief fusion with transitivity-preserving operation, and 17.5 without. The reduction in accuracy in the face of noise is also observed in the proposed ranked voting algorithm; however, the increase in median error at convergence here is much milder and the median value only hit 9 at the highest experimented noise level of 3. This is substantiated by the statistical analysis in Table 2, where the proposed ranked voting algorithm obtains the lowest gradient of mean error with respect to noise at 3.77.
On the other hand, as observed in Fig. 2b, the proposed ranked voting algorithm produces a progressively higher scatter than the two variants of belief fusion as the noise level increases, reaching a median value of 10.7. As noted in the previous subsection, the scatter produced by the proposed ranked voting algorithm is consistently on roughly the same scale as the error. However, both variants of belief fusion, although experiencing a significant increase in error, only have a mild increase in scatter, to a median of 2.79 when transitivity is preserved and 3.39 when it is not, as noise increases. This is also shown in Table 2, where the proposed ranked voting algorithm obtains the highest gradient of mean scatter with respect to noise at 3.78.
From the aforementioned experimental data, we can conclude that as noise increases, the proposed ranked voting algorithm experiences a drop in precision, producing a higher scatter as the noise increases. Although the error also increases, it is consistently on the same scale or smaller than the scatter, confirming the fact that the proposed ranked voting algorithm keeps a high accuracy and much of the increasing error can be ascribed to scatter. In contrast, both variants of belief fusion experience a smaller increase in scatter, but they experience a much larger increase in error compared to the proposed ranked voting algorithm, demonstrating the fact that belief fusion can lead to consistent consensus among the swarm but is unable to reliably obtain the correct ranking at high-noise scenarios.
As shown in Fig. 2c, the convergence time for both variants of belief fusion experiences in general can increase as the noise level increases. Its variance also rises for both algorithms. At higher levels of noise from 2 to 3, the convergence time of all the three algorithms is roughly on the same level and the advantage in fast convergence of belief fusion does not hold anymore. As shown in Table 2, the linear relationships between convergence time and noise level are not as strong as for the previous two performance metrics, shown by lower R 2 values. However, the proposed ranked voting algorithm still obtains the lowest gradient at 118.
Taking an integrated look at the performances of the considered algorithms with respect to the noise level, the differences in their performances can be explained by looking at their decision-making mechanisms. Both variants of belief fusion use a deterministic fusion function that encodes every pairwise relationship, making it easy for the whole swarm to converge their individual beliefs. However, it is also vulnerable to being misled by erroneous information at high-noise scenarios. On the other hand, the proposed ranked voting algorithm limits the number of decision variables faced by the individual robots by using a more compact way of encoding the decisions. Its method of opinion combination also introduces a degree of stochasticity into the decision-making process, hence allowing the swarm to correct itself from wrong ordering easily, albeit at a cost of reducing the precision of the decisions made.
To better illustrate the differences in the decision-making mechanisms of the considered algorithms, Fig. 3 shows two toy examples of the benchmark belief fusion algorithm when there are only one pairwise relationship and three robots considered, and also in the absence of evidence input. The three robots are assumed to be within communication distance of each other. Every robot randomly receives a belief message from a random neighbor and performs its decision-making process. The top rows in both subfigures show the initial state in the locality, and the bottom rows show the possible states in the next time step. It can be seen in Fig. 3a for belief fusion that all three possible transitions eliminate the minority opinion −1 , and the first two transitions will result in all three robots picking the opinion +1 in the following time steps. In contrast, in Fig. 3b for ranked voting it can be observed that only the first and last outcome with a combined probability of 5/16 result in loss of information. In addition, no robots are left with the unknown status of 0 and the spread of a particular single opinion is significantly slowed.

Comparison with the belief fusion benchmark at different evidence rates
We then compare the impact on the operations of the considered algorithms from evidence rate r e . The performance distribution at different r e values is plotted in Fig. 4. The noise level noise is set to 1.5, and the swarm size N robot is set to 30. The results from linear regression of the mean performances against the natural log of the evidence rate ln(r e ) are shown in Table 3. From Fig. 4a, we can see that all considered algorithms experience a general reduction in error when the evidence rate increases. The reduction is the least apparent in belief fusion with transitivity-preserving operations. For the proposed ranked voting algorithm, there is also a significant drop in the variance of the error at convergence. This is also substantiated by the statistical analysis shown in Table 3, where belief fusion with transitivity preserved obtains a very weak linear relationship between mean error and ln(r e ) with R 2 = 0.146 , as well as between mean convergence time and ln(r e ) with R 2 = 0.144 . Figure 4b shows that both variants of belief fusion see higher scatter in their results as the evidence rate increases. There is also more variance in the scatter observed. However, this feature is not observed in the proposed ranked voting algorithm. Instead, the median scatter decreases when the evidence rate increases from 0.01 to 0.1 and starts increasing beyond that. There is also an observable increase in the variance of the scatter when evidence rate reduces beyond 0.1. As shown in Table 3, both variants of belief fusion obtain moderately strong linear relationships between mean scatter and ln(r e ) with R 2 being 0.791 and 0.738, respectively. On the other hand, for the proposed ranked voting algorithm, mean scatter is largely independent of evidence rates with G = 0.0204 and R 2 = 0.007.
In terms of convergence time, all considered algorithms experience a significant increase in decision speed when the evidence rate increases from 0.01 to 0.1. Beyond 0.1, the median convergence time either experiences a slight increase as in the case of the two variants of belief fusion, or does not see much change as in the case of the proposed ranked voting algorithm. At the same time, both variants of belief fusion experience an increase in the variance of the convergence time at high evidence rate. The same holds true for the proposed ranked voting algorithm when comparing to the variance at r e = 0.1.
Overall, the performances of the proposed ranked voting approach generally improve as the evidence rate increases, with a reducing error and convergence time. It is also more resistant to the effects of low evidence rates in terms of error compared to both variants of belief fusion. Its convergence time also increases at a slower rate than belief fusion without transitivity preserved when the evidence rate reduces, while being more vulnerable in this aspect compared to belief fusion with transitivity preserved. For the belief fusion benchmark, both variants see reducing error when the evidence rate increases, but both also see increasing scatter and a much higher uncertainty in convergence time as evidence rate increases beyond 0.2.

Performances of ranked voting algorithm with respect to swarm sizes
Afterward, we examine the impact of swarm sizes N robot on the performances of the proposed ranked voting algorithm. The mean performances across 20 experimental runs at every parameter combination are shown in Table 4. It can be observed that for all three metrics, optimal behaviors are more likely to be observed at medium ranges of swarm sizes Table 4 Performances of proposed ranked voting algorithm at different noise levels noise and swarm sizes N robot ; r e = 0.2 of 50 and 100, while the performances at extreme swarm sizes are often worse off. This is similar to the effects produced by varying the evidence rate r e . However, there is a more clear worsening of all considered metrics at higher swarm sizes compared to evidence rates. This is to be expected as a higher swarm size not only introduces more evidence but also introduces more agents that need to be brought into convergence for a consensus to form.

Comparison with the belief fusion benchmark at different swarm sizes
We now compare the impact from swarm size N robot on the performances of the considered algorithms, as shown in Fig. 5. The noise level noise is set to 1.5, and the evidence rate r e is set to 0.2. The results from linear regression of the mean performance against the natural log of the swarm size ln(N robot ) are shown in Table 5.
It can be observed that all considered algorithms experience an increase in error when the swarm size increases. This is substantiated by the statistical analysis in Table 5, where the proposed ranked voting algorithm obtains the lowest gradient of mean error against ln(N robot ) at 1.53. Both variants of belief fusion also see a general reduction in scatter as the swarm size increases, while for the proposed ranked voting algorithm, there is still a clear linear relationship between scatter and swarm size. It is thus shown that as the number of agents increases, both variants of belief fusion see a stronger push toward consensus, which produces lower scatter but higher error. For belief fusion with transitivity preserved, this also translates to a lower convergence time, with a gradient of −137 . The same effects are not observed in the proposed ranked voting algorithm, which sees its error scales much slower to swarm size. However, this comes at the cost of a higher and scaling convergence time with a gradient of 218. The proposed ranked voting algorithm also uses less communication bandwidth, storage, and processing power compared to the benchmarks, which makes it viable in large swarm sizes.

Discussion
Based on our experiments, we can characterize the performances of the proposed ranked voting algorithm as being, in general, slower and less precise, but more accurate and cheaper than the benchmark belief fusion algorithms. There is especially a clear advantage of the ranked voting at high noise and high swarm size scenarios. The differences in their performances are due to the different decision-making mechanisms used. The proposed ranked voting algorithm uses a more compact encoding method to represent the ranking among the sites. It is able to give a compromising result when agents of different opinions are combining their opinions, while in contrast, the benchmark belief fusion algorithms revert all entries in conflict back to the initial unknown status of value 0, resulting in information loss. This feature, combined with the mechanism in belief fusion operation to always assign any available +1 or −1 entry values to entries with the unknown status, results in a positive feedback loop within the swarm. Thus, swarms using the belief fusion algorithm can come to a consensus rapidly, but when most of the belief matrices are filled, it is very hard for dissenting agents to spread their opinions, even if they hold correct pairwise information.
Most of the classical opinion-based collective decision-making strategies have been built on similar positive feedback mechanisms, such as in Valentini et al. (2015), Valentini et al. (2016), and Ebert et al. (2020). In these decision-making strategies, the adoption of a particular opinion by an agent increases the probability of the same opinion being adopted by other agents. Such positive feedback has also been replicated using a probability fusion algorithm in . However, in these problems, the number of possible options is small. Thus, it is possible to accurately track every single potential option and use positive feedback to create fast consensus.
In contrast, in the collective preference learning problem among eight sites investigated in this paper, there are 8! = 40320 possible results. Therefore, the two algorithms considered in this paper also do not seek to accurately track all possible options, rather they both try to approach the collective preference learning problem as an optimization problem and the individual agents seek to make incremental changes in the form of single pairwise relationships to approach the true preference order. In such an approach, the existence of a positive feedback loop in the decision-making process can cause the swarm to be stuck on a local optimum, where a few agents have more accurate ranking information, but could not overpower the established consensus, leading to premature convergence. The impact of such premature convergence on the accuracy of the consensus depends on two factors, the level of dynamism in the environment, and the level of sensory capabilities of individual agents. In a dynamic environment, the established consensus can potentially prevent the swarm from responding to changes in the environment. On the other hand, this could also negatively impact the accuracy in a static environment when the individual agents have poor sensory capabilities, in terms of the environment being noisy or observations being hard to collect, due to establishing a consensus before the agents can make enough observations. This is substantiated by the performances of considered algorithms at high noise levels and low evidence rates, respectively. On the other hand, the proposed ranked voting algorithm employs a degree of stochasticity in its election process. The ordering among options with tied points in the election result is random. Since there are only two voters, ties are fairly common. This leads to a higher scatter in the final result, as conflicting information needs many pairwise robot interactions to be eliminated. However, it also means that dissenting opinions have an opportunity to spread within the swarm. The whole swarm can thus readily shift in opinions and has a much better chance in approaching the true result. It is also less likely for a pairwise robot interaction to result in loss of information, and the swarm can thus avoid being dominated by a single opinion.

Conclusion
In this paper, we investigate a collective preference learning scenario that can potentially be faced by an autonomous robot swarm. The swarm is tasked with ranking a series of potential sites in the order of preference. We have proposed a ranked voting algorithm with Borda count tallying to enable the simulated swarm to perform the designated task. We have then tested the viability of the proposed approach in collective preference learning scenarios with different noise levels, evidence rates, and swarm sizes. We have compared the performances of our proposed approach against those of two variants of a belief fusionbased benchmark algorithm, in terms of accuracy, precision, and speed.
On the design level, our proposed ranked voting algorithm is cheaper in memory usage, processing power required, and communication bandwidth needed. However, it can outperform the benchmarks in terms of decision accuracy and, in some cases, convergence speed, especially in high-noise and high swarm size situations. Its downsides include a higher scatter of the swarm's results at convergence and longer convergence time in low-noise situations.
In future works, we aim to implement the proposed ranked voting decision-making strategy in dynamic environments as well as on real robotic systems and investigate its performances. We also plan to further improve the ranked voting strategy so that it can achieve stronger convergence and also deal with cases where the robots are only interested in the rankings of the higher-quality subset of the available sites. In addition, we aim to integrate path planning and active searching by individual agents into the algorithm to further improve the performance.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.