Robot swarm democracy: the importance of informed individuals against zealots

In this paper we study a generalized case of best-of-n model, which considers three kind of agents: zealots, individuals who remain stubborn and do not change their opinion; informed agents, individuals that can change their opinion, are able to assess the quality of the different options; and uninformed agents, individuals that can change their opinion but are not able to assess the quality of the different opinions. We study the consensus in different regimes: we vary the quality of the options, the percentage of zealots and the percentage of informed versus uninformed agents. We also consider two decision mechanisms: the voter and majority rule. We study this problem using numerical simulations and mathematical models, and we validate our findings on physical kilobot experiments. We find that (1) if the number of zealots for the lowest quality option is not too high, the decision-making process is driven toward the highest quality option; (2) this effect can be improved increasing the number of informed agents that can counteract the effect of adverse zealots; (3) when the two options have very similar qualities, in order to keep high consensus to the best quality it is necessary to have higher proportions of informed agents.


Introduction
Collective decision-making is a collective behavior where a group of agents (or swarm) makes a joint decision using only local perception and communication, without any centralized leadership (Valentini et al., 2017). The distinctive feature of any collective decision-making process is that once the decision is made, it is no longer attributable to any of the individual agents participating to the process. The mechanisms underlying this type of processes are widely studied in behavioral biology (Camazine et al. 2001), in statistical physics (Bialek et al., 2012;Vicsek et al., 1995;Cavagna et al., 2018), social sciences (Kok et al., 2016), and more recently in behavioral economics (Bose et al., 2017). Collective decision-making is also studied in artificial systems such as robotic swarms, in which relatively simple autonomous robots interact to generate collective responses through selforganization processes (Hamann, 2018). In swarm robotics, examples of contexts where collective decision-making is studied are the following: (1) aggregation behavior, where a swarm has to aggregate either on a site among those available in the environment (Firat et al., 2020), or in any location of environments that do not offer specific aggregation sites (Gauci et al., 2014); (2) collective motion, where the group has to choose, among a virtually infinite number of options, a direction of motion (Couzin et al., 2005); (3) collective perception, where the relative abundance of certain environmental features is assessed by local measurements and communication among the agents (Valentini et al., 2016a).
A specific case of collective decision-making is represented by the best-of-n problem, where n is the number of the different available options, that can vary with respect to their qualities. Choosing the best quality option among the n available is a challenging task for a group of agents since it is assumed that none of the group members can evaluate the quality of all the n options (Reina et al., 2014(Reina et al., , 2015Valentini et al., 2016b). The best quality is the one associated to the lowest exploitation cost and/or highest benefit. Various studies have shown that the mechanism referred to as "modulation of positive feedback" can generate consensus among interacting agents engaged with the selection of the best option among the n available (see Font Llenas et al., 2018;Valentini et al., 2014Valentini et al., , 2016b. This mechanism is based on the following: (1) each agent advertises its currently selected option for a time proportional to the option's quality; (2) each agent can change its currently selected option if influenced by its neighbors. Repeated local interactions among the agents, under the above-mentioned conditions, generate a consensus with all agents achieving a common decision on which option to choose.
In the best-of-n problem, it is normally given for granted that agents are either all able to measure the quality or all unable to do so. In the latter case, the best-of-n problem reduces itself to a symmetry breaking scenario, whereby in the absence of environmental heterogeneities the swarm converges to a random option rather than to the best one (Valentini et al., 2017). Recently, few studies have pointed to the relevance of the individual ability to evaluate the option quality with respect to the collective decision dynamics (Khaluf et al., 2019;Berekméri and Zafeiris 2020). We referred to this ability as "quality awareness". These studies have called for further investigations to go beyond the "all or nothing" scenarios with respect to quality awareness, and for a deeper understanding of how the variability among the agents in quality awareness bears upon the collective decision-making process. Indeed, this is an issue that may affect the group dynamics in multiple biological and artificial collective systems. Thus, there is an interdisciplinary interest in developing a principled understanding of how this variability affects the collective dynamics in different decision-making contexts. For example, from a social science perspective, the variability in quality awareness may be caused by individual differences in perceiving and assessing the qualities of different options, due to differences in level of education, restricted access to information, etc. From a swarm robotics perspective, variability in quality awareness is not exclusively left to the designer's decision on how to assemble the robotic swarm. Indeed, it can be an ineluctable consequence of the inherent functional differences that are generally observed in seemingly identical hardware components such as sensors. For example, the same type of sensors on different robots may respond differently to the same stimulation. This can prevent some but not other robots from correctly evaluating and disseminating the quality of a specific option.
In this paper, we contribute to develop a principled understanding of how the variability in individual quality awareness in the best-of-n problem with n = 2 changes the decision-making dynamics. We study a group of simulated agents required to collectively choose one of two options which differ in their quality. In brief, each agent in the swarm goes through two sequential phases that periodically repeat: the exploration phase and the dissemination phase. During the exploration phase, the agents explore the environment and evaluate the options quality. During the dissemination phase, the agents interact with each other, an interaction that consists of two steps. First, each agent advertises (i.e. communicates via local broadcast) the individually selected option. We call an option opinion when it is the currently selected option by a focal agent. Then, the focal agent may change its current opinion under the influence of the other agents according to the rules of the specific decision mechanism (or voting system). The above phases are executed by all agents in an asynchronous manner. This type of scenario has already been studied in the literature (refer to Prasetyo et al., 2019, for a recent model). Differently from previous work, in this study, groups are made of agents that differ with respect to either their capability to directly evaluate the quality of each option, or their flexibility in changing option through interactions with group mates during the dissemination phases. The original contribution of this research resides in an in-depth analysis of the decision-making dynamics developed by systematically varying the model parameters, and carried out with a large methodological toolkit made of simulation models, mathematical models, and physical robots experiments. Consistently with the literature (Valentini et al., 2017), we consider the two most commonly studied decision mechanisms: the voter model and the majority rule.
We consider swarms made of three different types of agents. First, we have zealots which are characterized by the fact that their chosen option is attributed to them by the designer-rather than through the dynamics of the simulation scenario-and by definition they never change their opinion. In other words, neither the discovery of a better quality option nor the influence of group mates makes them change option. During dissemination, zealots disseminate the option attributed to them for a time proportional to its quality. Second, we have informed agents which are not associated to any specific option. They explore the environment, choose an option, and disseminate their chosen option for a time proportional to its quality. During dissemination, informed agents can eventually change their mind under the influence of group mates. Third, we have uninformed agents which, like informed agents, select their option either through exploration or during dissemination through the influence of other agents. Contrary to informed agents, uninformed agents are not able to properly evaluate the options quality. Thus, during dissemination time, they disseminate their current opinion for a fixed amount of time that does not depend on the quality of the chosen option. As for informed agents, also uninformed agents can eventually change their mind under the influence of group mates. With this experimental design, we generate interesting results that contribute to shed light on the effects of the variability of quality awareness on the dynamics of collective decision-making in best-of-n type of scenarios. For example, we illustrate and discuss the dynamics resulting from scenarios in which the options have different qualities, and the relative proportion of zealots for each option and the proportion of informed versus uninformed agents within the group play a fundamental role in shifting the group consensus to one or the other option.
The paper is organized as follows. The methodologies are described in Sect. 3 with details of the mathematical model illustrated in Sect. 3.1. The results are reported in Sect. 4. The methodology used and the results obtained with physical robots are illustrated in Sect. 5. Finally, in Sect. 6 we draw our conclusions and we illustrate a research agenda for the future.

Related work
The two central concepts explored in this article are those of informed agency and zealotry. In this section, we will review the related work, across different fields, performed around these two concepts.
The attention to the role of different level of information among the agents is being studied in different fields since at least two decades. One of the pioneering works is the one of Couzin et al. (2005) in the context of modeling collective motion of biological systems: they studied the effect of implicit leaders, individuals that have a preferred direction of motion, but are not seen as leaders by their co-specifics. This seminal study has motivated a body of experimental work in biology with real animals, including fish schools (Leblond & Reebs, 2006) and sheep herds (Pillot et al., 2010). The role of informed agents has been highlighted in the context of multi-agent systems by Yu et al. (2010), who shows that one or a few informed agents allow all agents to agree on a decision, acting in this sense as "leaders" of the swarm: the consensus process is essentially determined by the number of informed agents and their condence levels.
Inspired by Couzin et al. (2005), researchers in robotics have studied the effect of implicit leaders in the collective motion of self-organized robot swarms. One of the earliest studies is the one of Celikkanat and Şahin (2010), who introduced implicit leaders within the collective motion model designed by Turgut et al. (2008), which in turn was one of the earliest faithful implementations of Reynold's boids (Reynolds, 1987) model in swarm robotics. Subsequent studies have extended the study of informed individuals to robot swarms able to communicate and influenced by more than one subgroups of informed individuals with different goals (Ferrante et al., 2014), and to robot swarms with more minimalist individual capability that self-organize without exchanging orientation information (Ferrante et al., 2012). More recently, the notion of informed individuals has been ported to a different collective behavior than collective motion, namely to self-organized aggregation (Firat et al., 2020). In this paper, we port the notion of informed individuals for the first time to the best-of-n problem.
There are additional recent studies that analyzed the role of informed robots in an interdisciplinary context. Mann (2020) studies how the differences in information and differences in preferences among the agents affect the use and efficacy of social information, analyzing the collective behavior generated by rational agents with differing preferences. Another very recent paper (van Veen et al., 2020) studies the impact of overload of information on the accuracy and precision in collective decision-making. Berekméri and Zafeiris (2020) focus their attention to the role of the topology of interactions among agents in a collective decision-making process, finding that a fully connected topology promotes consensus, while a hierarchical structure favor accuracy, more than consensus.
The effect of zealots on collective decision-making is of interest in different communities. While zealots are sometimes called by different names, like "committed agents" or "stubborns" or "stubborn individuals", their impact has been investigated from a biological perspective, in social physics models, as well as in robotic swarms. In the latter field, zealots have recently been introduced as a mechanism that allows the swarm to cope with changes in the environments (Prasetyo et al., 2019), a setting that is recently gaining momentum (Wahby et al., 2019).
In the context of physics, Colaiori and Castellano (2016) introduced zealots in a model of pairwise social influence for opinion dynamics, showing a rich phase diagram of the possible dynamics in presence of a small percentage of zealots. In the context of Internet social networks, Hunter and Zaman (2018) studied the best placement of zealots that maximizes the impact on the consensus dynamics of the population, showing that a small number of zealots can significantly influence the overall opinion dynamics and influence the consensus of the population over disputed issues, such as Brexit. Mistry et al. (2015), using the naming game as a decision mechanism, showed that even a very small minority can drive the opinion of a large population, if committed agents are more active than the others. However, this effect can be hindered if nodes with the same opinion are more connected with each other than with nodes with different opinion, producing a polarization inside the network. Ghaderi and Srikant (2014) and Mukhopadhyay (2016) studied the impact of zealots in a social network, considering different degrees of zealotry. The focus of Ghaderi and Srikant (2014) is studying the effect of zealotry on the convergence time of the system. Mukhopadhyay (2016), despite having used the majority rule instead of the voter, was able to find similar results as in Prasetyo et al. (2019) and , in which introducing equal number of zealots on both option sides prevents the network from reaching a consensus state. Similarly, Yildiz et al. (2013) proved that the presence of zealots is able to prevent the formation of consensus, introducing instabilities and fluctuations in a binary voter model of a small-world network. A recent study by Bhat and Redner (2019) aimed at studying the influence of zealots on "politically polarized" state vs. consensus state and found that higher "influence of zealots" produces more polarization, shorter time to polarization, and conversely less consensus and longer to impossible time to consensus. Xie et al. (2011) showed the presence of a tipping point at which a minority of zealots is able to swing the initial majority opinion in a network. The study described by Masuda (2015) focused on zealots with the voter model to perform peer-to-peer opinion influence; however, differently from our work, zealots were nodes of a complex network. Galam and Jacobs (2007), introducing zealots in a majority model, showed that the system has spontaneous symmetry breaking when zealots numbers are symmetrical for the two options, while consensus toward one option emerged even with minimal unbalance in the number of zealots. In these studies, options did not have an intrinsic quality.
In a biologically inspired model, Couzin et al. (2011) show that strongly opinionated minorities (like groups of zealots) can drive the consensus of other individuals, but uninformed individuals spontaneously inhibit this process returning the consensus to the majority, favoring in this sense a democratic consensus. We found this work very inspiring and also found an interesting parallel between our and their results which we will explore.
Compared to the latest works in swarms (Canciani et al., 2019;Maître et al., 2020;Prasetyo et al., 2019;Primiero et al., 2018), to the best of our knowledge, in this paper we study for the first time the interplay between different option quality, zealot quantity and proportion of informed agents, by extending the preliminary studies in Prasetyo et al. (2020) and , in which either all agents or none of the agents were able to measure the quality of their opinion and disseminate differentially based on that. In particular, we introduce here the explicit distinction between informed and uninformed agents, and study for the first time the case in which these two types of agents co-exist in the swarm at the same time.

Methods
We focus on a classic best-of-n with n = 2 scenario, in which a population of agents is required to collectively choose one between two foraging sites: site A and B. As mentioned above, the distinctive features of this scenario is the heterogeneity of the population, with zealots, informed, and uninformed agents. The behavior of all agents is determined by the same finite state machine made of four possible states: two exploration states referred to as E A and E B , and two dissemination states referred to as D A and D B (see Fig. 1). The agents behave asynchronously with respect to each other, meaning that at a given time agents may be in any of the above states. This asynchronicity is ensured by having stochastic switching times between states, as explained below. Thus, even if multiple agents start from the same state, they soon break this synchronicity because they will switch states at different times. When in any of the two exploration states, an agent moves randomly within a square arena for a time that is randomly extracted from an exponential distribution. Agents in state E A are those holding opinion A, while agents in E B are holding opinion B. In our minimalist simulation scenario, during exploration none of the agents can change opinion. At the end of the exploration state, every agent enters into the dissemination state. Zealots and informed agents disseminate their currently held opinion for a time randomly extracted from another exponential distribution, where the time parameter depends on the quality of site A, for those agents in state D A , or on the quality of site B, for those agents in state D B . Contrarily to zealots and informed agents, uninformed agents disseminate their opinion for a time that is exponentially distributed with a parameter that is fixed to 1. Thus the agent disseminates always proportionally to a default quality value of 1 that represents the lack of information on the quality. At the end of the dissemination state, informed and uninformed agents can change their mind based on the logic of a voting system. In this research work, we compare the dynamics generated by two different voting systems: the majority and the voter model. When the majority is in place, an agent samples the opinion of G − 1 randomly chosen neighbors, where G is the group size in the majority model, including the focal agent. A single agent changes opinion when the majority of the sampled neighbors hold an opinion different from its opinion. In situations where the agent has fewer than G − 1 agents, it skips the application of the decision rule and does not change its opinion. In this way, we are sure that the effect of the parameter G is well captured and studied. When the voter is in place, an agent samples the opinion of a randomly chosen neighbor. It changes opinion when the sampled neighbor holds an opinion different from its opinion. Contrary to informed and uniformed agents, zealots never undergo this opinion changing process. Thus, zealots never change their opinion. The proportion of zealots holding opinion A and those holding opinion B are set by the simulation designer at the beginning of each simulation. These proportions will be systematically varied to study how they affect the collective decision-making dynamics. Given that the population size is fixed, it follows that the proportions of informed and uninformed agents depend on the number of zealots in the population. Nevertheless, the relative proportion of informed with respect to uniformed agents is also systematically varied by the model designer to study how it affects the collective decision-making dynamics.
The scenario is modeled using the Netlogo 1 multi-agent simulation software. Agents world is a 2D squared arena divided into a grid of squared patches. Each patch is a piece of "ground". Each agent occupies a patch and cartesian coordinates are used to indicate the position of each agent. Each agent can perceive the presence of other agents up to a distance of two patches in any direction. The size of the arena is 100 × 100 patches. A simulation run starts with the agents randomly placed within the arena. As for the termination condition, we do not base it on the reaching of consensus, as consensus is not always guaranteed in presence of zealots . We also do not use time as a termination condition, because convergence times in presence of zealots present strong nonlinearities as a function of the proportion of zealots parameter, thus it would be tricky to select the same termination time for all runs. Instead, we decide to terminate a run if its dynamics have reached a steady state. We define the following protocol to determine whether a steady state has been reached: every 10,000 steps, a check-point is included; the last 10,000 results are split into two sets and then the average and standard deviation of the percentage of consensus to the option A of each set are compared; if the difference of the average value is very small (less than 0.004% ) and the difference of standard deviation is less than 0.5, we assume that a steady state has been reached, thus the run is terminated. The agents start the experiment in an exploration state. Regardless of the proportion of zealots, a run begins with 50% of the agents holding opinion A and 50% holding opinion B. Regardless of the state in which an agent finds itself (i.e., exploration or dissemination), the agent performs a pseudo-random motion by which, at each time step, it moves for a distance equal to half of a single patch in a randomly selected direction chosen within the range [−30 • , 30 • ] with respect to its current heading. Collisions between agents are not considered, in line with previous work (Valentini et al., 2016b) that shows that best-of-n dynamics can be well predicted without collisions (real-robot validation performed in this paper will further reinforce this point). Only when colliding with the arena wall, the agents make a 180 • turn.

Mathematical model
We model the system using an Ordinary Differential Equation (ODE) model. We adapted the model proposed in  which extends the ones in Valentini et al. (2014Valentini et al. ( , 2016b. All the variables are normalized by the total number of agents N. Therefore, we consider only proportion of agents A ( x A ) and B ( x B ): the full population is represented by x A + x B = 1 . Agents are distinguished based on their opinion A and B and on their state (exploring, denoted by e or disseminating, denoted by d). Therefore informed agents can be distinguished in e Ai , e Bi , d Ai , d Bi , while uninformed agents can be distinguished in e Au , e Bu , d Au , d Bu .
The variables modeling sub-populations of zealots are constant and denoted as A and B . They are divided in the two states (exploration and dissemination). Therefore A = e AS + d AS and B = e BS + d BS . The total proportion of agents with opinion A and B are, respectively, The model includes a number of parameters which are explained in the following. In principle, the model can be also defined with fewer parameters, if we allow for rescaling of time and space. However, we chose not to do this in order to have the model better matching the experiments, by letting each parameter of the model have a corresponding one with the same name in the simulations. Some parameters play a crucial role in the mathematical model: the quality of the options, called A and B , the proportion of zealots of the two options A and B , and the proportion factor of informed agents . This is defined as the ratio between informed agents over the total number of non-zealots agents: = The remaining two parameters used in the model are the following: g is a factor that is multiplied by the quality (or by 1 in the case of uniformed individuals) and represents the average dissemination time (thus the inverse is the average dissemination rate), while q represents the average exploration time (thus the inverse is the average exploration rate). The terms p AA , p AB , p BA , and p BB are not parameters but represents the probabilities that an agents switches opinion or stays with its current opinion (depending on the specific subscripts), and their expressions contain only state variables and depend on the specific decision mechanism (the voter or the majority rule) as explained at the end of this section. In particular, p AA is the probability to remain with opinion A, while the probability to switch from A to B p AB is simply 1 − p AA . Similarly, p BA is the probability to switch from B to A and it is related to p BB by the relationship p BB = 1 − p BA .
The system consists of 12 ODEs with 12 state variables, given by: Equations (1-8) describe the dynamics of uninformed agents (Eqs. 1-4) and informed agents (Eqs. 5-8), while Eqs. (9-12) describe the dynamics of zealots. Informed/uninformed agents and zealots can never change their nature. In Eqs. (1) and (5), the proportion of agents disseminating opinion A increases because of agents returning from the exploration of A at rate 1 q , and decreases because of agents terminating dissemination at rate 1 g in Eq.
(1) (for uninformed agents that have no dependency on quality) and at a rate 1 A g in Eq. (5) (for informed agents that have a dependency on quality). Similarly, Eq. (2) and (6) describe the rate of change in the proportion of agents disseminating opinion B. In Eqs. (3) and (7), the number of agents exploring site A decreases because of agents finishing the exploration at rate 1 q , and increases because of two contributions: (1) agents that had previously opinion A and kept the same opinion after the application of the voter/majority model and (2) agents that had previously opinion B but switched to A as a result of the voter/majority model. Similarly, Eqs. (4) and (8) describe how agents exploring site B vary. The rates p AA , p AB , p BA , and p BB describe the probabilistic outcome of the two decision mechanisms and are described next. The dynamic of zealots is described in a very similar way by Eqs. (9)-(12). The only difference consists in the impossibility for a zealot to change its opinion after any interaction, thus the terms that depend on the decision mechanisms are omitted. For the zealot case, the dissemination always takes place proportional to A and B .
Regarding the decision mechanism, for the voter model the probability that the outcome of the decision is A (resp. B) is the probability that, when observing a random agent disseminating, that random agent is disseminating A (resp. B). This is given by the ratio of agents disseminating A with respect to the total number of disseminating agents For the majority model, where each agent switches its opinion to the one hold by the majority of its G − 1 neighbors, the two probabilities are simply given by the cumulative sum of probabilities distributed according to a hypergeometric distribution modeling how many neighbors have each of the two opinions (Valentini et al., 2016b). We define:

Results
In order to investigate the effect of heterogeneity in quality awareness in a population of agents engaged in the best-of-n with n = 2 scenario, we developed a simulation study based on an experimental design in which we systematically vary: (i) the population size N; (ii) the proportion of zealots disseminating for option A ( A ); (iii) the relative proportion of informed agents with respect to uninformed agents ( ); for = 1 , all non-zealots agents are informed, for = 0 all non-zealots agents are uninformed; (iv) the relative value of the quality of option B ( B ) with respect to option A; (v) the number of agents (G) considered during the running of the majority rule for changing opinion. We keep the size of the arena fixed to 100 × 100 patches, therefore while varying N we are effectively varying the agent density as well. However, in previous studies on a similar system (see Prasetyo et al., 2019), we have established experimentally that the density does not play a meaningful role provided it is contained within certain large bounds, so that agents' interactions are not significantly affected (we refer the reader to the original study for more details).
All parameters values explored in the simulation model are illustrated in Table 1. The parameters B and A do not vary, with the proportion of zealots disseminating option B set to B = 0.0125 2 , and the quality of option A set to A = 1 . Recall that the option qualities A and B bear upon the agents' dissemination time. With A = 1 and B defined as B = B A , with B varying as indicated in Table 1, we explore conditions in which the option B can be of equal or of higher quality than option A. With B = 0.0125 , and A varying as indicated in Table 1, we explore conditions in which the proportion of zealots disseminating option A in the population can be smaller (i.e., with A = 0.0 ) or bigger (i.e., with A ≥ 0.05 ) than the proportion of zealots disseminating option B. With this parameters' set, we investigate whether and for which values of B a progressively higher number of zealots (i.e., A ) for the less valuable option (i.e., A ) generates a consensus to option A. With varying as indicated in Table 1, we investigate whether the proportion of informed and uninformed agents within the population bears upon the decision-making dynamics. Note that the total proportion of informed and uniformed agents within the population is computed as . In the remaining of this section, we illustrate the results of the simulations in combination with the predictions of the ODE model.

decision-making dynamics with voter model
For each combination of all the different values of the simulation parameters illustrated in Table 1, we perform 50 runs. In this section, we discuss the results of the simulations and of the ODE model for the conditions in which the agents use the voter model as a voting system. These results are illustrated in Fig. 2, where the graphs indicate the average proportion of agents with opinion A for different proportions of zealots disseminating option A (see x-axes, A , in all graphs of Fig. 2), and for different ratio of the two options quality (see y-axes, B for all graphs in Fig. 2). The average (over 50 runs) proportion of agents with opinion A is computed, in each run, by counting the proportion of agents disseminating opinion A on the last time step of the run. Figure 2a and b refer to the proportion of agents with opinion A when all non-zealots are uninformed agents (i.e., = 0 ). Figure 2c and d refer to proportion of agents with opinion A when all non-zealots are informed agents (i.e., = 1 ). Figure 2a and c refer to the results of the simulations, while Fig. 2b and d refer to the results of the ODE model. Looking at these graphs, first we notice that simulations and ODE model generate identical results, which are characterized by the emergence of a single stable solution for each combination of values of A and B . This holds for the two conditions with = 0 (see Fig. 2a and b) and = 1 (see Fig. 2c and d). In all graphs, the areas in blue refer to conditions in which all non-zealot agents choose option B, while red areas refer to conditions in which there is a consensus for option A. The white areas refer to conditions in which each run terminates with roughly half of the non-zealots agents disseminating option A and half of them disseminating option B. For intermediate values of , the white line progressively shifts from vertical position as in = 0 to the inclined position as in = 1 . The message of these graphs can be summarized in the following: (1) when = 0 , a proportion of zealots disseminating option A slightly higher than the proportion of zealots disseminating option B can generate a consensus to option A even in the extreme case in which the quality of option B is twice the quality of option A; (2) the nature of non-zealot agents does change the collective decision-making dynamics. In particular, when all non-zealots are uniformed agents, a sharp transition occurs when increasing the amount of zealots disseminating option A ( A ), independently from the value of B (see Fig. 2a and b). When all non-zealot agents are informed, a progressively higher proportion of zealots disseminating for option A is necessary to counterbalance the effect of a progressive increase of the quality of option B (see Fig. 2c and d).

decision-making dynamics with majority model
In this section, we discuss the results of the simulations and of the ODE model for the conditions in which the agents use the voter model as voting system. As indicated in Table 1, we tested all combinations of parameters for three different values of the G parameter (i.e., G = 3, G = 5 and G = 7). In this section, we discuss only the results with G = 3 since for G = 5 and G = 7, we observed very similar collective decision-making dynamics, namely presenting the same number of equilibria each with the same stability.
When this best-of-n scenario with the majority rule is modeled with ODEs, the results suggest that a saddle point bifurcation can be observed in all tested conditions (see Fig. 3). Two different stable equilibria are observed for relatively low values of A . The progressive increase of the number of zealots disseminating opinion A leads to a transition point after  Fig. 4. The collective decision-making dynamics are qualitatively similar to those observed and discussed in Sect. 4.1 when the agents employ the voter model as voting systems, with the only exception of the emergence of a bi-stability region, as predicted by the ODE model. The areas characterized by the emergence of two equilibria is the one corresponding to low values of A , where the population of non-zealot agents converges with equal probability to consensus to option B (see  Fig. 4a-c, blue areas of the graphs) and to consensus to option A (see Fig. 4d-f, red areas of the graphs). As for the results discussed in Sect. 4.1, the ODE model and simulation generate qualitatively and quantitatively similar results (data are not shown for the ODE model). The most important phenomena to observe is that variations in the nature of the non-zealot agents do change the collective decision dynamics. As discussed above, also for this scenario with the majority model, when non-zealots are all informed agents, a progressively higher proportion of zealots disseminating for option A is necessary to counterbalance the effect of a progressive increase of the quality of option B (see Fig. 4a-c). Additionally, we can also appreciate how the dynamics change with progressively lower values of , the proportion of informed agents. When = 0 , we observe that the dynamics no longer depend on B . Furthermore, the degree of dependency from B smoothly decreases for progressively lower values of , as we observe from Fig. 4b/e, where the white line separating the two equilibria becomes progressively more vertical as the value of decreases. Figure 5 shows an in-depth analysis of the impact of the proportion of informed and uninformed agents on the collective decision-making dynamics in the best-of-n scenario with the majority model. In these graphs, we can observe how the proportion of agents with opinion A varies when progressively increase from 0 (i.e., all non-zealots are uninformed agents) to 1 (i.e., all non-zealots are informed agents), for four different values of A   Fig. 4 Best-of-n scenario in simulation with the majority model as voting system. Graphs showing the proportion of agents with opinion A when all non-zealots are uninformed agents (i.e., = 0 , see figures a and d), and when half all non-zealots are informed agents (i.e., = 0.5 , see figure b and e) and all non-zealots are informed agents (i.e., = 1 , see figure c and f). Each point in these graphs is an average over 50 runs. A spline interpolation has been applied to the original plot. Graphs in the top row show the first attractor, while graphs in the bottom row show the second attractor, that only exists for certain regions of the parameter space. The regions in which the second attractor does not exist are indicated in figure d, e, and f, with white background and black diagonal lines (i.e., the proportion of zealots disseminating option A) and 11 values of B (i.e., the quality of the option B). The graphs indicate that for very low values of B , high levels of proportion of agents favoring the worst quality option (i.e., option A) are maintained regardless of the nature of the non-zealots agents. For increasing values of B , a phase transition occurs, characterized by a drop of the proportion of agents with opinion A that becomes steeper for increasing values of B . Such transition corresponds to crossing the white area vertically in Fig. 4. The results of these tests indicate that the proportion of agents with opinion A progressively falls when the value of increases for a given value of B . This is clearly striking in Fig. 5a, b and c by observing the progression of the lines with the different colours. Figure 5a shows the results with the same proportion of zealots for the two options: symmetry breaking is observed in almost all the cases. Only for the specific case where also the quality of the two options is the same (blue continuous line), the symmetry is not broken. In the other figures instead, the quantity of zealots A is always larger than the amount of zealots B, which is kept fixed to the value B = 0.0125 . Nevertheless, when the difference in quality between the two options is small (i.e., small value of B ), informed agents tend to take the side of the zealots disseminating option A, while they tend to take the side Fig. 5 Best-of-n scenario in simulation with the majority model as voting system. Graphs showing the proportion of agents with opinion A (y-axes) for different proportion of informed agents among the non-zealot agents for: a A = 0.0125 ; b A = 0.05 ; c A = 0.1 ; d A = 0.2 of zealots disseminating option B when the difference in quality between the two options tend to increase (i.e., for bigger value of B ). In the graphs, this latter trend corresponds to the fall of the proportion of agents with opinion A when the value of B increases for a given value of . The drop of proportion of agents with opinion B tends to disappear for progressively higher value of A (see Fig. 5d). For the remaining intermediate cases, uninformed agents facilitate the rise in the proportion of agents with opinion corresponding to the best option. This is shown in Fig. 5b and c. In this condition, the consensus to option B is achieved for a broad range of values of B and whenever there is large enough proportion of informed agents , starting from a minimum reasonable difference of quality ratio B = 1.2 (Fig. 5b) or B = 1.4 (Fig. 5c).
This result is very relevant and confirms what has been previously observed in Couzin et al. (2011) and Hartnett et al. (2016). Using a biological collective motion model, the authors of Couzin et al. (2011) found that uninformed agents (not aware of the quality) can help the establishment of the option held by the majority. The option held by the majority was considered in that study as the democratic one, but it was also the option associated with the lowest weight or strength, or quality as per the settings of our paper. Other words to describe the analogy between our work and the one in Couzin et al. (2011) are the following: when there were more informed agents and fewer uninformed agents, the "nondemocratic" choice preferred by a minority with stronger weight was prevailing. A comparison of the two papers shows similar results, even if the two collective decision model dramatically differs in their motivation.

Experiments with physical robots
In order to validate the results obtained with simulation and the ODE model, we run further tests with physical robots. For these tests, we use kilobots which are small-sized and low-cost robots that communicate using infrared transceivers positioned beneath the robot body (Rubenstein et al., 2012) (see Fig. 6a). We run two sets of experiments: set I, in which the voting system is implemented with the voter model, and set II in which it is implemented with the majority model. In set I, with swarm size N = 20 , the kilobots operate in a rectangular arena of 80 × 35 cm 2 , with a relative density of 0.007 robot/cm 2 (see Fig. 6b). In set II, with swarm size N = 40 , the kilobots operate in a larger rectangular arena of 85 × 50 cm 2 , with a relative density of 0.009 robot/cm 2 . In both set I and set II, the arena is divided into 3 zones (see Fig. 6b). The central zone which measures 37 × 35 cm 2 in set I, and 37 × 50 cm 2 in set II, represents the nest. The lateral zones, positioned on the left and on the right of the nest, correspond to exploration sites associated with quality A and B , respectively. The robots are controlled by the same finite state machine illustrated above (see also Fig. 1). Each run lasts 20 min with the kilobots pseudo-randomly placed in the nest. All robots are initialised in exploration state. It is imposed that at run start, both options are chosen by half of the swarm. Robots in state E A move to option A, while those in state E B move to option B. The movement toward and away from the respective option (A or B), is controlled by a light source positioned on the right side of the arena. This light works as a landmark with respect to which the robots develop a phototactic or an anti-phototactic response depending on their state. Robots in state E A perform phototaxis to reach site A; robots in state E B perform antiphototaxis to reach site B. On entering the sites, each kilobot assesses the site quality by sensing an infra-red signal emitted from an Arduino based platform placed beneath the transparent arena surface, in correspondence of each site. Each Arduino based platform continuously emit signals with message containing the site type (i.e., A or B) and the quality associated with the site (i.e., the value of A or B ). The robots remain in the exploration state for a time sampled from an exponential distribution with a rate equal to roughly 1/4.76 s −1 .
At the end of the exploration state, robots in state E A transition to state D A and return to the nest performing antiphototaxis; robots in state E B transition to state D B and return to the nest performing phototaxis. Once reached the nest, the robots disseminate their currently chosen opinion for a time randomly sampled from an exponential distribution with characteristic time proportional to the opinion's dissemination factor. While in the dissemination state, the robots move pseudo-randomly and continuously broadcast their opinion as well as their unique 16-bit identifier to ensure their vote is counted only once in each dissemination phase. Finally, after disseminating, the robots (with the exception of zealots) apply the voting system to confirm or to reconsider their currently chosen option and then they transition to the exploration state for a new cycle. G Group size in majority rule 3 1 3

Results
The experimental design with physical robots aims to reproduce the one used in simulation and ODE model. However, due to time constraints, with physical robots we tested less experimental conditions, and the values of some parameters have been adjusted to the smaller swarm size (see Table 2, for a detailed description of the parameters used). For each set of experiments, we run 10 runs, with 5 runs in which all agents are uninformed and only the zealots disseminate proportional to the quality (i.e., = 0 ) and 5 runs in The results of the runs for set I, where the robots use the voter model as voting system, are shown in Fig. 7a and b. Overall, these results match those obtained with simulation and ODE model, illustrated in Fig. 2. For example, we observe that the maintenance of the proportion of agents with opinion A, for a progressively better quality of option B, requires a progressively larger number of zealots disseminating option A, when the non-zealots are all informed (see Figs. 7b and 2c and d for a comparison of the results with kilobots, simulated agents and ODE model, respectively). For set-II, where the robots used the majority model as voting system, given the low number of experiments, only the average results have been plotted, being hard to define a robust criterion of assessment of bi-stability with very few runs. The graphs in Fig. 7c and 7d show results that match those obtained with simulation and ODE model, illustrated in Fig. 4. When only one solution is expected (right of the white area in Fig. 7c and d), we can observe a very good match between physical robots and simulated agents experiments. While, when two solutions are expected (left of the white area in Fig. 7c and d) the outcome (consensus around 0.4) is simply the average of the two theoretical solutions. From the perspective of robotic applications in real scenarios, the voter model seems more robust to the adverse action of zealots disseminating the worst quality option (i.e., option A). In fact, for A around 0.25, high quality ratio B can still win and drive the consensus. For the majority model, given the bi-stability, for similar values of A the consensus is driven in some cases to consensus to the best option A and in some cases to the worst option B. From an engineering perspective, a deep understanding of the possible hidden dynamics of the majority rule plays a crucial role in the proper design of a robot swarm. While most of the applied studies disregard the bi-stability, focusing only on the average behavior of the system, our paper sheds light on the micro-macro link, describing the different microscopic dynamics leading to a macro effect at swarm level.

Discussion and conclusions
In this paper, a generalized version of the best-of-n problem has been investigated for n = 2 . We consider informed agents, able to measure the quality of the two options and to modulate their strategy based on it; uninformed agents, unable to measure the quality of the two options; and zealots, able to measure the quality of the options but unable to change their initial opinion. From this paper, two interesting points emerge: the first point is the interplay between the abundance of zealots for an option (e.g., A) and the quality ratio between the two options; the second is instead the interplay between the proportion of informed agents versus the quality ratio.
The first point (interplay between zealot abundance and quality) is explored in two extreme scenarios, one in which all agents are informed and can thus measure the quality and disseminate proportionally to it, and the other where only zealots can measure quality and disseminate proportionally, while all the other (uninformed) agents disseminate for a time that is independent from the option. In the first scenario we show that for a limited abundance of zealots of the worst opinion, the consensus dynamics converge to the best option. However, when the number of zealots of the worst option is too high (above 10% ), then the dynamics converge to the worst option. This behavior is observed for both voting models (voter and majority rule). In the second scenario, where only zealots can measure the quality, the option with the highest abundance of zealots is almost always dominating the consensus dynamics, except for the case with very few zealots.
The results of this paper shed further light on the potential dual role zealots can have within a collective decision-making system. In fact, previous results have highlighted a potential beneficial role for zealots, whose presence was deemed necessary to achieve adaptability of the system to changing environments (Prasetyo et al. 2018) (i.e., to option qualities that change over time). Also in that study, it was found that only a limited number of zealots was required to achieve adaptability, while increasingly high abundance was disrupting the consensus dynamics. The above results for the current paper further confirm this finding, in showing that if the abundance of zealots is too high, this is not beneficial for the system because it is preventing the consensus to the best option.
The second point (interplay between informed agents proportion and quality) has been investigated by varying the proportion of informed agents able to measure the quality. We considered, as values for the other parameters, those that kept the system close to the transition between consensus to A and B. The main result of this section shows that, whenever the two options can be sufficiently well discriminated thanks to a high difference in their quality, only a small proportion of informed is necessary to make the consensus dynamics converge to the best option. Conversely, when the two options are too similar, the system requires a higher discriminatory capability from the swarm, therefore more informed agents able to measure the quality are required. These results have a strong analogy with the study of Couzin et al. (2011), in which the focus was on uninformed agents (agents that are not aware of the quality) which are able to "restore democracy" to the option held by the majority even if it had a lower strength or weight. Conversely, lower abundance of uninformed agents (which means higher abundance of informed agents) promoted the "nondemocratic" choice owned by a minority with stronger weight (the equivalent of quality in our paper). Despite the different focus on what is good or bad in the two papers, the results obtained are identical, which is remarkable as the two collective decision models dramatically differ: we use the best-of-n while the authors of Couzin et al. (2011) used a collective motion model as the one of their previous study (Couzin et al., 2005). Another analogy can be drawn also with one of the oldest known results in collective decision-making: the Condorcet's jury Theorem (Condorcet, 1785). Despite the numerous differences (such as for example the centralized nature and the absence of time dynamics in Condorcet's description), both in our paper as well as in the original theorem we draw a similar conclusion: that a critical mass of knowledgeable or skilled individuals (in Condorcet: > 50% ) is required for a system to reach the correct collective decision. Our work could, in this light, be giving a mechanistic interpretation of what it takes to be knowledgeable: the ability to measure quality and to disseminate based on it.
The experiments with real kilobots confirm all the evidence coming from ODE and simulations. These results show that this mechanism can in principle be used also in real-life applications, with minimal requirements in terms of hardware. For example, when a swarm is used for the purpose of monitoring a wide region, one can potentially split the region into smaller areas. A quality can be associated to each small area, representing the interest of the area (Albani et al., 2018). The current model can be applied for the choice of the best area to be further explored, in scenarios in which not all robots are equipped with specialized sensors (e.g. thermal or infra-red cameras) to assess the potential interest of each area or the abundance of a certain feature in the environment. In the near future, we plan to demonstrate these approaches to applications in large unstructured environments. Furthermore, the results of this study offer to swarm designers another tool to engineer and control the results of self-organising collective dynamics in best-of-n scenarios. In particular, the role of the proportion of different types of agents on the collective decision process could be exploited by the swarm designer to determine the way in which a swarm of artificial agents respond to certain environmental conditions requiring a collective decision to be made. For example, we can imagine a scenario in which the designer has the possibility to communicate with some or all of the agents of a swarm in a way to change their characteristics. Under these conditions, the design could mutate a "normal" agent into a zealots or vice-versa. This action has the effect of varying the proportion of each type of agent within the swarm and consequently can induce the swarm to reach a consensus on one or the other option, as shown by the results of this study. This idea is part of a larger experimental approach in which different forms of swarm heterogeneity are used to control the collective dynamics ( Firat et al., 2020). We will test the effectiveness of this approach in the best-ofn scenario in our future empirical studies.