1 Introduction

When confronted with a choice or endeavoring to formulate an assessment pertaining to a matter, be it the caliber of a technological breakthrough or the success of a political faction, we commonly find ourselves swayed by the viewpoints of our friends, family, colleagues, and the figures whose opinions we value. Hence, our opinions are constantly influenced and shaped through interactions with our connections.

Moreover, the preceding decades have borne witness to an exponential surge in the utilization of digital social platforms such as Facebook, Instagram, WeChat, TikTok, and Twitter, affording individuals the opportunity to establish connections with their acquaintances, amass knowledge, and articulate their viewpoints. As a result, these platforms provide the means for information to travel extraordinary fast and the opinions to be exchanged and formed in a higher rate.

Enterprises, political parties, and even governing bodies endeavor to harness the potential of opinion formation and influence dissemination via online social platforms in order to reach their commercial and political objectives. Illustratively, marketing endeavors routinely employ online social networks as a means to sway individuals’ perspectives in their favor, deploying tactics such as selectively targeting specific segments of the population with complimentary product samples or deceptive information. Consequently, the diffusion of opinions and the propagation of (mis)information have the potential to impact diverse facets of our life, spanning the realms of economics, politics, fashion, and music.

There has been a fast-growing demand for a better and deeper understanding of opinion formation and information spreading processes in social networks. An integral aspect is to gain insights into how the social ties among a community’s members can exert influence on the process through which opinions are diffused and shaped. Acquiring a deeper understanding of the mechanisms underlying collective decision-making and the dissemination of opinions would empower us to exercise control and oversight over the impact wielded by marketing endeavors and political campaigns, thereby mitigating the propagation of misinformation.

The evolution of social dynamics has been a topic of intense study by researchers from a wide range of backgrounds such as economics, epidemiology, social psychology, and statistical physics. It particularly has gained significant popularity in theoretical computer science, especially in the rapidly growing literature focusing on the interface between social choice and social networks, cf. [27] (this survey is by now a bit outdated).

Numerous models have been proposed to simulate the opinion formation processes. It is inherently difficult to develop models which reflect reality perfectly since these processes are way too complex to be expressed in purely mathematical terms. Therefore, a suitable model strives to capture the fundamental properties of opinion spreading processes, but at the same time be simple enough to permit accurate and profound mathematical analysis. Therefore, the objective is to establish models which justifiably approximate the real opinion diffusion processes by disregarding less essential, but distracting, parameters. The analysis of such approximate models would allow researchers to shed some light on the fundamental principles and recurring patterns in the opinion diffusion processes, which are otherwise concealed by the intricacy of the full process.

Each opinion diffusion model has three essential components. Firstly, one needs to define how the interactions between the individuals take place. A well-received choice is to use a graph structure, where a node represents an individual and an edge between two nodes corresponds to a relation between the respective individuals, e.g. friendship or common interests. Secondly, there exist different options for modeling the opinion of the individuals. A popular choice is to assign a binary value, say blue or white, to each node, which indicates whether the node is positive or negative about a certain topic. Last but not the least, a crucial component of any model is its updating rule which defines how and in what order the nodes update their opinion. In the plethora of various updating rules, the majority rule, where a node chooses the most frequent opinion (i.e., color) in its neighborhood, has attracted a substantial amount of attention.

Different aspects of opinion diffusion models have been investigated, both theoretically (by exploiting the rich tool kit from graph and probability theory) and experimentally (by conducting a vast spectrum of experiments on graph data from real-world social networks). An enormous part of the research performed in this area falls under the umbrella of the following three fundamental questions:

  1. 1.

    How long does the process need to reach a stable configuration, and what are such stable configurations?

  2. 2.

    What is the minimum number of nodes which need to be blue to ensure that the whole graph eventually becomes blue?

  3. 3.

    What is the expected final number of blue nodes starting from a random initial coloring?

In the present paper, we contribute to the study of the aforementioned questions for two of the most basic majority based models on general graphs and special classes of graphs, in particular cycles. The first question, regarding convergence properties, is essential for designing and controlling dynamical systems. The other two questions have direct relation to the viral marketing strategies, where a marketer aims to maximize the adoption of a product or opinion by the selection of a set of seed nodes, strategically or randomly.

We provide a collection of results which address the aforementioned questions. We believe that not only these results are interesting by their own sake, but also the provided proof techniques are of interest for the future work to study other information spreading and opinion formation processes. (Please refer to Sect. 1.2 for more details.)

Roadmap. In the rest of this section, we first provide some basic definitions which create the ground to describe our contributions in more depth; then, we give a brief overview of the relevant prior work. Our theoretical findings to address questions 1, 2, and 3 are presented in Sects. 234, respectively. Finally, our experimental results are provided in Sect. 5.

1.1 Preliminaries

Graph Definitions. Let \(G=\left( V,E\right)\) be a simple connected undirected graph and define \(n:= |V|\) and \(m:=|E|\). For a node \(v\in V\),

$$N\left( v\right) :=\{u\in V: \{u,v\} \in E\}$$

is the neighborhood of v. For a set \(S\subset V\), we define \(N_S\left( v\right) :=N\left( v\right) \cap S\). Moreover, \(d\left( v\right) :=|N\left( v\right) |\) is the degree of v and \(d_S\left( v\right) :=|N_S\left( v\right) |\). Note that whenever graph G is not clear from the context, we add a superscript, e.g. \(d^G(v)\).

Models. For a graph G, a coloring is a function \(\mathcal {C}:V\rightarrow \{b,w\}\), where b and w represent blue and white. For a node \(v\in V\), the set \(N_a^{\mathcal {C}}\left( v\right) :=\{u\in N\left( v\right) :\mathcal {C}\left( u\right) =a\}\) includes the neighbors of v which have color \(a\in \{b,w\}\) in the coloring \(\mathcal {C}\). Furthermore, we write \(\mathcal {C}|_S=a\) for a set \(S\subseteq V\) if \(\mathcal {C}(v)=a\) for every \(v\in S\).

Assume that we are given an initial coloring \(\mathcal {C}_0\) on a graph G. In a model M, \(\mathcal {C}_t\left( v\right)\), which is the color of node v in round \(t\in \mathbb {N}\), is determined based on a predefined updating rule. We are interested in the Majority Model (MM) where the updating rule is as follows: \(\mathcal {C}_t(v) = {\left\{ \begin{array}{ll} \mathcal {C}_{t-1}(v) \quad if \ |N_b^{\mathcal {C}_{t-1}}(v)| = |N_w^{\mathcal {C}_{t-1}}(v)|\\ argmax_{a \in \{b, w\}}|N_a^{\mathcal {C}_{t-1}}(v)| \quad otherwise\\ \end{array}\right. }.\)

In other words, each node chooses the most frequent color in its neighborhood and keeps its color in case of a tie. The Random Majority Model (RMM) is the same as MM except that in case of a tie, a node chooses one of the two colors independently and uniformly at random. See Fig. 1 for an example. (It is worth mentioning that the impact of local ties also have been considered from a strategic voting perspective, cf. [43, 48].)

Fig. 1
figure 1

(Left) The coloring obtained deterministically after one application of MM. (Right) Two possible colorings after one application of RMM

In these models, we define \(b_t\) and \(w_t\) for \(t\in \mathbb {N}_0\) to be the number of blue and white nodes in \(\mathcal {C}_t\). These correspond to random variables in RMM and also in MM when the initial coloring is random.

We say the process reaches the blue (white) coloring if it reaches the coloring where all nodes are blue (white). For a cycle graph \(C_n\) with even n, there are two colorings where every two adjacent nodes have different colors. We call these two colorings the alternating colorings. If MM or RMM process reaches one of the two alternating colorings, it keeps switching between them. We say the process has reached the blinking configuration. See Fig. 2 for some examples.

Fig. 2
figure 2

a Blue coloring b white coloring c blinking configuration d stable non-monochromatic coloring

For a graph G, we say that a coloring \(\mathcal {C}\) is stable if one application of MM (similarly RMM) on \(\mathcal {C}\) deterministically outputs \(\mathcal {C}\). (For RMM, this implies that there are no ties.) Note that a stable coloring need not be monochromatic. (See Fig. 2 for an example.) Furthermore, a p-random coloring, for \(0\le p \le 1\), is a coloring where each node is colored blue independently with probability (w.p.) p and white otherwise.

Stabilization Time and Periodicity. Since the updating rule in MM is deterministic and there are \(2^n\) possible colorings, for any initial coloring the process reaches a cycle of colorings and remains there forever. The number of rounds the process needs to reach the cycle is the stabilization time and the length of the cycle is the periodicity of the process.

RMM on an n-node graph G corresponds to a Markov chain. This Markov chain has \(2^n\) states (i.e., \(2^n\) possible colorings) and there is an edge from state s to \(s'\) if there is a non-zero probability to go from s to \(s'\) in RMM. Since this is a directed graph, its state set can be partitioned into maximal strongly connected components. (A set of states is a strongly connected component if every state in the set is reachable from every other state in the set, and it is maximal if the property does not hold when we add any other state to the set.) Furthermore, we say a maximal strongly connected component is an absorbing component if it has no outgoing edge. If each maximal strongly connected component is contracted to a single state, the resulting graph is a directed acyclic graph. This implies that in RMM, regardless of the initial coloring, the process eventually reaches an absorbing component and remains there forever. The expected number of rounds the process needs to reach an absorbing component is the stabilization time and the size of the absorbing component is the periodicity of the process. In simple words, the process eventually reaches a subset of states (colorings) and keeps transitioning between them. The stabilization time is the expected number of rounds to get there, and the periodicity is their number.

Winning and Resilient Sets: For MM or RMM on a graph \(G=(V,E)\), we say a node set \(S \subseteq V\) is a winning set whenever the following holds: If all nodes in S are blue (white), then the process eventually reaches the blue (white) coloring regardless of the color of nodes in \(V\setminus S\) and all the random choices (in RMM). This is called consensus or full cascade. Furthermore, we say a node set \(S \subseteq V\) is a resilient set whenever the following holds: If S is fully blue (white) then all nodes in S remain blue (white) forever, regardless of the color of the other nodes and the random choices. We observe that a set S is resilient in MM (resp. RMM) if and only if for every node \(v\in S\), \(|N_{S}(v)|\ge d(v)/2\) (resp. \(|N_{S}(v)|> d(v)/2\)). (See Fig. 3 for some examples.)

Fig. 3
figure 3

a In the most left coloring, the set of blue nodes form a winning set and the two adjacent blue nodes form a resilient set in MM

Path Partition. Consider a cycle \(C_n\) and a coloring \(\mathcal {C}\). We say a path is blue (white) if all its nodes are blue (white). A path is monochromatic if it is blue or white. Furthermore, a path is alternating if every two adjacent nodes have opposite colors. The length of a path is its number of nodes and an even (odd) path is a path whose length is even (odd). Except when n is even and \(\mathcal {C}\) is one of the two alternating colorings, there must exist at least one monochromatic path of length two or larger. Let B (resp. W) be the set of nodes on the maximal blue (resp. white) paths of length at least two in \(\mathcal {C}\). Then, all the nodes which are not in \(B\cup W\) can be partitioned into maximal alternating paths, which are surrounded by the aforementioned monochromatic paths. We call the union of these maximal monochromatic and alternating paths, the path partition in \(\mathcal {C}\). Please see Fig. 4 for an example.

Fig. 4
figure 4

An example of path partition, where \(B=\{P_1,P_3\}\), \(W=\{P_4, P_6\}\) and the rest of paths are alternating

McDiarmid’s inequality: We use an extension of McDiarmid’s inequality which gives a bound on the input sensitivity of random variables when differences in the output satisfy some bound.

Definition 1

Let \(X:\Omega \rightarrow \mathbb {R}\) be a random variable over the probability space \(\Omega =\{0,1\}^n\). We say X is difference-bounded by \((\beta ,c,\delta )\) if the following holds: (i) there is a “bad” subset \(B\subset \Omega\), where \(|B|/|\Omega |=\delta\) (ii) if \(\omega ,\omega '\in \Omega\) differ only in the i-th coordinate, and \(\omega \notin B\), then \(|X(\omega )-X(\omega ')|\le c\) (iii) for any \(\omega\) and \(\omega '\) differing only in the i-th coordinate, \(|X(\omega )-X(\omega ')|\le \beta\).

Theorem 1.1

(An Extension of McDiarmid’s Inequality [34]) Let random variable \(X:\{0,1\}^n\rightarrow \mathbb {R}\) be difference-bounded by \((\beta ,c,\delta )\), then for any \(\epsilon >0\)

$$\begin{aligned} \mathbb {P}[(1-\epsilon )\mathbb {E}[X]\le X\le (1+\epsilon )\mathbb {E}[X]]\ge 1-2\exp \left( \frac{-\epsilon ^2\mathbb {E}[X]^2}{8nc^2}\right) -\frac{2\delta nb}{c}. \end{aligned}$$
(1)

With High Probability. We assume that n (i.e., |V|) tends to infinity. We say an event happens with high probability (w.h.p.) when it occurs w.p. \(1-o(1)\).

1.2 Our contribution

Contribution 1: Stabilization Time and Periodicity. It is known [41] that the stabilization time in MM on a graph G is in \(\mathcal {O}(m)\). However, it was left open whether a similar bound holds for RMM or not. We show that the answer is negative by providing an explicit graph construction and coloring for which the stabilization time of RMM is exponential, in n. Furthermore, we investigate the stabilization time when the underlying graph is a cycle \(C_n\). We prove the upper bound of \(\lceil n/2 \rceil -1\) for MM and \(\mathcal {O}(n^2)\) for RMM. For the former we exploit some combinatorial arguments and for the latter we analyze the “convergence” time of a corresponding Markov chain. We show that both of these bounds are tight.

A trivial bound on the periodicity of MM is \(2^n\). However, Goles and Olivos [26] proved that its periodicity is one or two, i.e., the process always reaches a fixed coloring or switches between two colorings. While a similar behavior was observed for RMM on some special classes of graphs, cf. [1], we prove that this does not apply to the general case. More precisely, we give graph structures and initial colorings for which the periodicity of RMM is exponential.

We also initiate the study of the number of stable colorings. We prove that the number of stable colorings of a cycle \(C_n\) is in \(\Theta (1)\) for RMM and in \(\Theta (\Phi ^n)\) for MM, where \(\Phi =(1+\sqrt{5})/2\) is the golden ratio. This is another indication how small alterations in the local behavior of a process, such as the tie-breaking rule, can have a substantial impact on the global behavior of the process.

Contribution 2: Minimum size of a winning set in cycles. We prove that in RMM on a cycle \(C_n\), the only winning set is the set of all nodes. In MM on \(C_n\), the minimum size of a winning set is equal to \(\lfloor n/2\rfloor +1\).

Contribution 3: Random initial coloring. The problem of finding the expected “final” number of blue nodes starting from a p-random coloring has been attacked by previous work (see Sect. 1.3). However, only some loose bounds for special classes of graphs have been provided, which seems to be due the inherent difficulty of the problem. We make some advancements on this front, by answering the question for cycle graphs. (As we explain later, we believe that our techniques can be used to prove similar results for a larger class of graphs, namely the d-dimensional torus or more broadly vertex-transitive graphs.) We show that in RMM on \(C_n\), the expected final number of blue nodes is equal to pn. On the other hand, this is equal to \((2p^2-p^3)n/(1-p+p^2)\) for MM (it was brought to our attention that a similar result was proven in [39]. However, we believe our proof is more intuitive and more importantly we prove a w.h.p. statement).

Contribution 4: Proof techniques. One of the main contributions of the present paper is introducing several proof techniques built on Markov chain analysis, counting arguments, potential functions, greedy approaches, martingale processes, and recursive functions, which we believe can be very beneficial for the future work to make advancements on majority based (more generally, threshold based) opinion diffusion models. A fair amount of effort has been put into ensuring that the proofs are accessible by avoiding unnecessary complexities imposed by adding less essential components to the model or the underlying graph structure. This has been our main motive for focusing on two of the most basic models and presenting a big fraction of our results on cycle graphs. We should emphasize that cycles have several interesting graph properties such as regularity, vertex-transitivity, connectivity with almost minimum number of edges, and long diameter, which have made them a popular foundation for the study of various dynamical systems, cf. [23, 35, 46]. While handling the case of cycle graph is not very interesting by its own sake from a practical point of view, it usually allows developing techniques and insights in this simplified setup, which then can be used for a more rigorous analysis of more complicated graph structures. We explain how some of our techniques can potentially be utilized to prove similar results in a more general framework.

Contribution 5: Experimental results. We present the outcomes of several experiments that we have conducted. A subset of these experiments has been designed to merely support and complement our theoretical findings. However, some of the executed experiments let us uncover other interesting characteristics of our models. In particular, we investigate the effect of adding some random edges to the underlying graph structure. This leads to some open problems and conjectures about the connection between graph parameters such as conductance and vertex-transitivity and the process properties such as the stabilization time, which could serve as potential future research directions.

1.3 Related work

Numerous opinion diffusion models have been introduced to study how the members of a community form their opinions through social interactions, cf. [20, 28, 30, 45]. Among all these models, a considerable amount of attention has been devoted to the study of the majority based models, cf. [4, 5, 7, 16, 51].

Stabilization time and periodicity: It was proven [26] that the periodicity of MM is always one or two. Chistikov et al. [19] showed that it is PSPACE-complete to decide whether the periodicity is one or not for a given coloring of a directed graph. Since we prove an exponential time bound for RMM, it would be interesting to see if these results apply to RMM. Furthermore, it was proven [41] that the stabilization time of MM is bounded by \(\mathcal {O}(m)\). Stronger bounds are known for special classes of graphs. For instance, for a d-regular graph with strong conductance the stabilization time is in \(\mathcal {O}(\log _d n)\), cf. [50]. The stabilization properties have also been studied for other majority based models, cf. [1, 13]. Lesfari and Perennes [36] studied a variant of MM where the nodes are biased towards one of the two opinions, called the superior opinion. In their model, at each round, a randomly selected node chooses the superior opinion with some probability \(\alpha\), and with probability \(1-\alpha\) it conforms to the opinion manifested by the majority of its neighbors. They exhibited classes of network topologies for which they proved that the expected time for consensus on the superior opinion can be exponential. This is similar to the exponential bound that we provide on the stabilization time of RMM.

Minimum size of a winning set: Motivated from viral marketing where a company aims to trigger a large cascade of further adoptions of its product by convincing a subset of individuals to adopt a positive opinion about its product (e.g., by giving them free samples), the problem of finding the minimum size of a winning set has been studied extensively, cf. [8, 31, 33]. Gärtner and Zehmakan [25] proved that the minimum size of a winning set in MM on a random d-regular graph is almost as large as n/2 w.h.p. if d is sufficiently large. Using the expander mixing lemma, it was proven [50] that this is actually true for all graphs with a certain level of conductance, including random regular graphs and Erdős-Rényi random graph. For general graphs, it was proven in [6] that every graph has a winning set of size at most n/2 under the asynchronous variant of MM. In [9, 40], the minimum size of a winning set on graph data from real-world social networks was investigated for a variant of MM where the nodes with the highest degrees (called the elites) have a larger “influence factor” than others. Our work complement these results by exploring the impact of tie-breaking rule on the minimum size of a winning set.

Furthermore, the problem of finding the minimum size of a winning set for a given graph G is known to be NP-hard for different majority based models, cf. [42], and approximation algorithms based on various techniques, such as integer programming formulations [49] and reinforcement learning [32], have been proposed. For MM and RMM, it was proven [38] that this problem cannot be approximated within a factor of \((\log \Delta \log \log \Delta )\), unless P=NP, but there is a polynomial-time \((\log \Delta )\)-approximation algorithm, where \(\Delta\) is the maximum degree. Chen [18] proved that the problem is traceable for special classes of graphs such as trees. The problem of finding the minimum size of a winning set is closely related to the well-studied target set selection problem, cf. [2].

Random initial coloring: The problem of finding the expected final number of blue nodes in MM and RMM with a p-random initial coloring has been studied for different graphs, e.g., random regular graphs [25], hypercubes [11] and preferential attachment graphs [3]. Motivated from applications in certain interacting particle systems such as fluid flow in rocks and dynamics of glasses, this also has been studied extensively when the underlying graph is a d-dimensional torus, cf.[10, 24]. Gray [29] studied the problem for cycle graphs where some noise is added to the process. Roughly speaking, the main finding of the aforementioned work is that there are thresholds \(p_1\) and \(p_2\) so that if p is sufficiently smaller than \(p_1\) (similarly larger than \(p_2\)) then the process reaches the white (resp. blue) coloring and a non-monochromatic configuration if p is in between w.h.p. The main difficulty in this set-up is to determine the values of \(p_1\) and \(p_2\).

In the last few years, a lot of attention has been given to the study of MM on Erdős-Rényi random graph starting from a p-random initial coloring. In the Erdős-Rényi random graph \(\mathcal {G}_{n,q}\) each node is present independently with probability q on a set of n nodes. In [50], it was proven that when p is “slightly” larger than 1/2, then the process reaches the blue coloring w.h.p. What if we have \(p=1/2\)? Benjamini et al. [12] conjectured if q is “sufficiently” larger than 1/n, then w.h.p. one of the two colors almost takes over (i.e., all nodes share the same color at the end, except a sub-linear number of them). Fountoulakis, Kang and Makai [22] proved that the conjecture is true when q is larger than \(1/\sqrt{n}\). A similar statement using different proof techniques and stabilization time was proven in [15, 44, 47]. Chakraborti, Kim, Lee, and Tran [17] extended the lower bound to \(\Omega (\log n/n^{3/5})\), but the conjecture has remained open for q smaller than this bound.

2 Stabilization time and periodicity

2.1 Stabilization time in general graphs

As mentioned, it was proven [41] that the stabilization time of MM is in \(\mathcal {O}(m)\). It is easy to argue this bound holds even when the nodes are updated asynchronously or when we have a biased tie-breaking rule (i.e., always blue is chosen in case of a tie). However, it was left open whether a similar bound can be proven for random tie-breaking. We settle this, in Theorem 2.1, by providing an explicit graph construction and coloring for which RMM needs exponentially many rounds to stabilize in expectation. (Our proof actually works for any random tie-breaking rule, where a node chooses blue (white) independently w.p. \(0<q<1\) (resp. \(1-q\)) in case of a tie.)

Theorem 2.1

There is a graph \(G=(V,E)\) and a coloring \(\mathcal {C}_0\) for which the stabilization time of RMM is exponential in n.

Proof

To provide the construction of graph G, we first define three smaller graphs and then explain how to connect these graphs to create G. We define \(\kappa := \lfloor n/3 \rfloor -1\). Let \(S_b\) be a star graph with an internal node \(v_b\) and \(\kappa -1\) leaves and \(S_w\) be a star graph with an internal node \(v_w\) and \(n-2\kappa -1\) leaves. Furthermore, let I be the graph built of \(\kappa\) isolated nodes. Now to build graph G, for each node in I we add an edge to \(v_b\) and an edge to \(v_w\). (Note that the total number of nodes is equal to \(|V_{S_b}|+|V_{S_w}|+|V_I|=\kappa +(n-2\kappa )+\kappa =n\).) Please see Fig. 5 for an example.

Fig. 5
figure 5

The construction given in Theorem 2.1 for exponential stabilization time in RMM

Claim 1: The nodes in \(S_w\) form a resilient set. Each node in \(S_w\) has more than half of its neighbors in \(S_w\). This is trivial for all the leaf nodes. The internal node \(v_w\) is adjacent to \(n-2\kappa -1\) leaves in \(S_w\) and \(\kappa\) nodes in I and we have \(n-2\kappa -1 > \kappa\).

Claim 2: Let \(\mathcal {U}\) be the set of colorings where \(S_w\) is white, \(S_b\) is blue, and at least one node in I is blue. For a coloring \(\mathcal {C}\in \mathcal {U}\), in the next round, all nodes in \(S_b\) and \(S_w\) keep their color and each node in I chooses a color uniformly at random.

All nodes in \(S_w\) remain white according to Claim 1. All leaves in \(S_b\) have exactly one neighbor which is blue; thus, they remain blue. Node \(v_b\) is of degree \(2\kappa -1\) and has at least \(\kappa\) blue neighbors, thus it remains blue too. Each node in I has exactly one blue neighbor (\(v_b\)) and one white neighbor (\(v_w\)), thus it chooses among blue and white uniformly at random.

Assume that in \(\mathcal {C}_0\), all nodes in \(S_w\) are white and the rest of the nodes are blue. \(\mathcal {C}_0\) is clearly in \(\mathcal {U}\). We show that the process eventually reaches the white coloring. Hence, the stabilization time is upper-bounded by the expected number of rounds we need to reach a coloring not in \(\mathcal {U}\) (because the white coloring obviously is not in \(\mathcal {U}\)). Note that from a coloring in \(\mathcal {U}\), if at least one node in I selects blue, we are still in \(\mathcal {U}\) in the next round, according to Claim 2. The only way to leave \(\mathcal {U}\) is that all nodes in I select white. Since this happens only w.p. \(1/2^{\kappa }\), it takes \(2^{\kappa }=2^{\lfloor n/3\rfloor -1}\) rounds in expectation for it to happen.

It remains to prove that the process eventually reaches the white coloring. Note that according to Claim 1, \(S_w\) remains white forever. Thus, it suffices to prove that from any coloring where \(S_w\) is fully white, there is a non-zero probability to reach the white coloring. Let \(\mathcal {C}\) be such a coloring. There is a non-zero probability that all nodes in I become white in the next round (since they all have at least one white neighbor, namely \(v_w\)). It is possible that in the round after all nodes in I remain white and \(v_b\) becomes white (recall \(d(v_b)=2\kappa -1\)). One round after that, all nodes will be white.

2.2 Stabilization time in cycles

We prove that on a cycle \(C_n\) the stabilization time is at most \(\lceil n/2\rceil -1\) for MM (Theorems 2.3) and in \(\mathcal {O}(n^2)\) for RMM (Theorem 2.5). It is straightforward to infer Theorem 2.3 from Lemma 2.2, given below. Furthermore, to prove Theorem 2.5, we rely on the Markov chain analysis given in Lemma 2.4.

Lemma 2.2

In MM on a cycle \(C_n\) with a coloring \(\mathcal {C}\), if there exist two adjacent nodes with the same color, the process reaches a stable coloring after exactly \(\lceil l/2 \rceil\) rounds, where l is the length of the longest alternating path in the path partition of \(\mathcal {C}\).

Proof

Let B (resp. W) be the set of nodes on the (maximal) blue (resp. white) paths in the path partition in \(\mathcal {C}\). All nodes in B and W keep their color forever. Furthermore, all alternating paths in the path partition keep shrinking until they disappear. Consider an alternating path \(v_1,\cdots , v_k\). After one round, \(v_1\) and \(v_k\) “join” the adjacent monochromatic paths and thus it shrinks to the alternating path \(v_2,\cdots , v_{k-1}\), which is of length \(k-2\). If k is even, the path disappears after \(k/2=\lceil k/2 \rceil\) rounds. If k is odd, its length decreases by two in each round until it is of length 1. Then, it needs one more round to disappear. This is equal to \(\lceil k/2\rceil\) rounds overall. Therefore, after \(\lceil l/2 \rceil\) rounds all nodes are on a monochromatic path of length at least two and will never change their color.

Theorem 2.3

The stabilization time of MM on a cycle \(C_n\) is at most \(\lceil n/2\rceil -1\) and this bound is tight.

Proof

Let’s first consider the case where there are no two monochromatic adjacent nodes in the initial coloring. This is possible only for even n. In that case, the process keeps switching between the two alternating colorings, i.e., the process has reached the blinking configuration. In this set-up, the stabilization time is zero by definition.

Now, assume that there are two adjacent monochromatic nodes. Then, the longest alternating path in the path partition is of size at most \(n-2\). Thus, according to Lemma 2.2, the process ends after at most \(\lceil (n-2)/2\rceil = \lceil n/2 \rceil -1\) rounds.

Tightness. To prove the tightness, for odd (even) n consider a coloring where two (three) adjacent nodes are white and the remaining nodes form an alternative path of length \(n-2\) (resp. \(n-3\)). According to Lemma 2.2, the process needs \(\lceil (n-2)/2\rceil\) rounds, for odd n, and \(\lceil (n-3)/2\rceil\) rounds, for even n, to end. We observe that both these values are equal to \(\lceil n/2\rceil -1\). Thus, the bound is tight.

Lemma 2.4

Consider the time-homogenous Markov chain which is defined over the state set \(S:=\{s_0,\cdots ,s_k\}\) with the transition matrix \(P:=(p_{s_i,s_j})_{s_i,s_j\in S}\), where for \(1 \le i \le k-1\) we have \(p_{s_i,s_i}=\frac{1}{2}\) and \(p_{s_i,s_{i+1}}=p_{s_i,s_{i-1}}=\frac{1}{4}\) and for \(i=0,k\) we have \(p_{s_i,s_i}=1\). The expected number of rounds it needs to reach from a state \(s_i\) to \(s_0\) or \(s_k\) is equal to \(2i(k-i)\) (Fig. 6).

Fig. 6
figure 6

The visualization of the Markov chain described in Lemma 2.4

Proof

Let \(T_i\) be the expected number of rounds the Markov chain needs to reach from state \(s_i\) to state \(s_0\) or \(s_k\). Obviously, we have \(T_0=T_k=0\). Furthermore, from state \(s_i\), for \(1\le i\le k-1\), if we move to state \(s_{i+1}\) w.p. 1/4, then in addition to this step we need in expectation \(T_{i+1}\) steps to finish. A similar argument applies to the transition to \(s_{i-1}\) and remaining in state \(s_i\), which happen w.p. 1/4 and 1/2 respectively. Thus, conditioning on these three possibilities we conclude that \(T_i=\frac{1}{4}T_{i-1}+\frac{1}{4}T_{i+1}+\frac{1}{2}T_i+1\) for \(1\le i \le k-1\). By rearranging the terms we get the following non-homogenous linear recursion of order 2:

$$\begin{aligned} \frac{1}{2}T_{i+1}-T_i+\frac{1}{2}T_{i-1}=-2, \quad T_0=T_k=0. \end{aligned}$$

Let us first look at the homogeneous equation \(\frac{1}{2}T_{i+1}-T_i+\frac{1}{2}T_{i-1}=0\) whose characteristic equation is equal to \(\frac{1}{2}\lambda ^2-\lambda +\frac{1}{2}=0\), for some value \(\lambda\) to be determined. The characteristic equation has the repeated root \(\lambda =1\). Thus, the general solution is of the form \(T_i=A+Bi\) for some constants A and B.

Now, we need to find a “particular solution” to the inhomogeneous equation. If we plug in \(Ci^2\) for a constant C, we get:

$$\begin{aligned} -2 = \frac{1}{2}C(i+1)^2-Ci^2+\frac{1}{2}C(i-1)^2 = C. \end{aligned}$$

So the general solution to the inhomogeneous equation is equal to \(T_i=-2i^2+A+Bi\). Since \(T_0=0\) and \(T_0=-2*0^2+A+B*0=A\), we have \(A=0\). Furthermore, \(T_k=0\) and \(T_k=-2k^2+A+Bk=-2k^2+Bk\) imply that \(B=2k\). Therefore, we can conclude that \(T_i=-2i^2+0+2ki=2i(k-i)\).

Theorem 2.5

The stabilization time of RMM on \(C_n\) is in \(\mathcal {O}(n^2)\).

Proof

Let us first introduce lazy RMM on \(C_n\) which is basically a slower version of RMM. For a coloring \(\mathcal {C}\), consider all the maximal monochromatic paths of length at least 2 on \(C_n\), and let \(\mathcal {A}\) denote the set of maximal alternating paths which sit between two such monochromatic paths. This includes alternating paths of length 0, when two monochromatic paths with opposite colors are adjacent. (This is essentially the set of alternating paths in the path partition in \(\mathcal {C}\) plus the mentioned path of length 0.) Define \(\mathcal {A}^{+}\) to be the set of paths obtained by taking each path from \(\mathcal {A}\) and attaching its two adjacent nodes to it. (See Fig. 7 for an example.) In the lazy RMM instead of updating all nodes at once, we pick up the paths in \(\mathcal {A}^{+}\) one by one (in an arbitrary order) and then update the color of all nodes on the picked path at once following the RMM rule. Once we have exhausted \(\mathcal {A}^{+}\), we regenerate \(\mathcal {A}^{+}\) for the new coloring and continue. However, note that we do not actually bring the updated colors to effect until we have gone through all paths in \(\mathcal {A}^+\). You can imagine that we keep the updated color for each node in a buffer and then it comes to effect once \(\mathcal {A}^+\) is empty.

Fig. 7
figure 7

The set of “extended” maximal alternating paths \(\mathcal {A}^{+}\) are enclosed with green curves

Note that every two paths in \(\mathcal {A}^+\) are disjoint (because we considered the monochromatic paths of length at least two). Furthermore, each node not on any path in \(\mathcal {A}^+\) will not change its color in RMM since it has the same color as both its neighbors. Thus, the coloring which is generated after processing all elements of \(\mathcal {A}^{+}\) is the same as the coloring which would have been outputted had we applied RMM instead (of course, assuming the same source of randomness, i.e., a node makes the same random choice in both processes in case of a tie). Moreover, the lazy RMM stops when the process reaches a coloring where \(\mathcal {A}^+\) is empty. This means the process has reached a monochromatic/blinking configuration, which is equivalent to stabilization in RMM, as we prove formally in Theorem 2.7. In short, the lazy RMM is just a slower version of RMM, where we break a round into smaller sub-rounds. Thus, it suffices to prove our desired upper-bound of \(\mathcal {O}(n^2)\) for the lazy RMM.

Let \(P:= v_1, \cdots , v_k\) be a path in \(\mathcal {A}^+\). We claim that after updating the nodes on P, the number of blue nodes increases (decreases) by 1 w.p. 1/4 and remains the same w.p. 1/2. First consider the case of even k. Since the original alternating path \(v_2,\cdots , v_{k-1}\) is of even length, the adjacent monochromatic paths containing \(v_1\) and \(v_k\) must be of opposite colors. Without loss of generality, assume that \(v_1\) is blue and \(v_k\) is white. Thus for \(2\le i\le k-1\), \(v_i\) is white for even i and blue for odd i. Overall, there are k/2 blue nodes before the update. After the update: (i) each node \(v_i\), for \(2\le i\le k-1\), deterministically switches its color, which gives \((k-2)/2\) blue nodes (ii) \(v_1\) and \(v_k\) choose a color uniformly and independently at random. They both choose blue (white) w.p. 1/4, which gives \((k-2)/2+2=k/2+1\) (resp. \((k-2)/2=k/2-1\)) blue nodes, i.e., an increase (resp. decrease) by one in the number of blue nodes. Furthermore, one of them chooses blue and the other one chooses white w.p. 1/2 which gives \((k-2)/2+1=k/2\) blue nodes, i.e., no change. We can prove the same statement for the case of odd k by applying a very similar argument.

Consider the Markov chain described in Lemma 2.4 for \(k=n\), where state \(s_i\) represents having i blue nodes. We claim that the maximum number of rounds this Markov chain needs to reach \(s_0\) or \(s_n\), in expectation, is an upper bound on the stabilization time of the lazy RMM process. As we discussed in each round of the lazy RMM, the number of blue nodes decreases/increases by 1 w.p. 1/4 and remains the same w.p. 1/2. For odd n, if the process has not reached the white or blue coloring (corresponding to state \(s_0\) and \(s_n\) in the Markov chain), the set \(\mathcal {A}^+\) is non-empty. Thus the Markov chain actually models the lazy RMM precisely. When n is even, it is possible that we reach a coloring where \(\mathcal {A}^+\) is empty but we are not in the blue or white coloring (this happens if the process reaches the blinking configuration, where the corresponding Markov chain is in the state \(s_{n/2}\)). However, as we are looking for an upper bound, this is not an issue. Hence, starting from a coloring with i blue nodes, the stabilization time is bounded by \(2i(n-i)\) rounds. Since \(2i(n-i)\) is maximized for \(i=n/2\), this is at most \(n^2/2=\mathcal {O}(n^2)\).

Tightness. Now, we prove that the quadratic bound given in Theorem 2.5 is tight. Let \(l=n-5\) for even n and \(l=n-4\) for odd n. (Note that l is odd.) We define a k-alternating coloring to be a coloring with a blue path of length \(n-k\) plus an alternating path of length k for some odd k between 5 and l. (The alternating path starts and ends with a white node.) Consider a k-alternating coloring for \(7\le k\le l-2\). Using an argument similar to the one from the proof of Theorem 2.5, we can observe that from such coloring in the next round, we have a \((k+2)\)-alternating coloring (similarly a \((k-2)\)-alternating coloring) w.p. 1/4 and a k-alternating coloring w.p. 1/2.

Let’s assume that the process starts from an \(l'\)-alternating coloring for \(l'\) being the closest odd integer to l/2. Suppose that we say the process has stabilized if it reaches a 5-alternating coloring or an l-alternating coloring. Note that this is obviously a lower bound on the original stabilization time since for the process to stabilize (i.e., reach a white/blue/blinking configuration, according to Theorem 2.7) it must first reach one of these two colorings. Therefore, the defined process (running RMM starting from an \(l'\)-alternating coloring and stopping once reached a 5-alternating or an l-alternating coloring) is equivalent to the Markov chain defined in Lemma 2.4, where \(s_i\), for \(0\le i \le k=(l-5)/2\), corresponds to being in a \((2i+5)\)-alternating coloring; in particular, \(s_0\) and \(s_k\) correspond to being in a 5-alternating and an l-alternating coloring. According to Lemma 2.4, the number of rounds to reach a 5-alternating or an l-alternating coloring is \(2*\frac{l'-5}{2}\left( \frac{l-5}{2}-\frac{l'-5}{2}\right)\). Using the fact that \(l'\) is equal to \(l/2\pm 1/2\) and \(l\ge n-5\), it is straightforward to show that this is in \(\Omega (n^2)\).

It is worth mentioning that the tight quadratic bound provided above heavily relies on the fact that the random tie-breaking rule is symmetric. More precisely, if a node picks blue (resp. white) independently w.p. \(q\ne 1/2\) (resp. \(1-q\)), then the exact bound might differ. We believe our proof techniques can be leveraged to provide tight bounds for this more generic setup. This is left for future work.

2.3 Periodicity in general graphs and cycles

A trivial upper bound on the periodicity of MM and RMM is \(2^n\). It was proven [26] that the periodicity of MM is always 1 or 2. Theorem 2.6 states that for RMM the trivial bound of \(2^n\) is actually the best possible, up to some constant factor. On the other hand, if we limit ourselves to the cycle graphs, then the periodicity for both RMM and MM is always one or two, see Theorems 2.7.

Theorem 2.6

For any integer n, there is an n-node graph G for which the periodicity of RMM is in \(\Omega (2^n)\).

Proof

We define \(\kappa\) to be the largest integer smaller than \(n-6\) which is divisible by 4. Let us explain how to construct the graph G step by step. Consider a path \(P:= v_0,\cdots , v_{\kappa -1}\), a clique \(C_w\) of size 3, and a clique \(C_b\) of size \(n-3-\kappa\). To build the graph G, add an edge between \(v_0\) and a node in \(C_w\), called \(u_w\), and an edge between \(v_{\kappa -1}\) and a node in \(C_b\), called \(u_b\). (Note that the output graph has exactly n nodes.) See Fig. 8 for an example.

Fig. 8
figure 8

The construction with exponential periodicity in RMM for \(n=15\)

Let us observe that \(C_w\) (analogously, \(C_b\)) is a resilient set. Each node u in \(C_w\) (analogously, \(C_b\)) has more than half of its neighbors in \(C_w\) (resp. \(C_b\)). This is true for \(u_w\) (resp. \(u_b\)) since it has 3 neighbors (resp. \(n-\kappa -3\ge 3\)) neighbors and only one of them is not in \(C_w\) (resp. \(C_b\)). This is trivial for other nodes since they have all their neighbors in \(C_w\) (resp. \(C_b\)).

Let \(\mathcal {U}\) be the set of all colorings where \(C_w\) is fully white and \(C_b\) is fully blue. Note that \(|\mathcal {U}|=2^{\kappa }=\Omega (2^n)\). We will prove that for every two colorings \(\mathcal {C},\mathcal {C}'\in \mathcal {U}\), there is a non-zero probability to reach from \(\mathcal {C}\) to \(\mathcal {C}'\), i.e., there is a path from \(\mathcal {C}\) to \(\mathcal {C}'\) in the underlying directed graph of the corresponding Markov chain. Thus, the colorings in \(\mathcal {U}\) form a strongly connected component. Note that there is no edge from a coloring in \(\mathcal {U}\) to a coloring outside \(\mathcal {U}\) because this requires that a node in \(C_w\) or \(C_b\) to change its color, which is not possible since they are both resilient sets. Therefore, this is actually an absorbing strongly connected component which implies that the periodicity is in \(\Omega (2^n)\).

It remains to prove that for \(\mathcal {C},\mathcal {C}'\in \mathcal {U}\), we can reach from \(\mathcal {C}\) to \(\mathcal {C}'\). Let us define coloring \(\mathcal {C}_M\in \mathcal {U}\) where \(v_i\) is blue if \((i\mod 4) = 0, 1\) and white otherwise. (See Fig. 8 for an example.) In \(\mathcal {C}_M\), each node on P has one blue neighbor and one white neighbor and thus chooses its color at random. This implies that we can reach any coloring in \(\mathcal {U}\) from \(\mathcal {C}_M\). Hence, it suffices to show that there is a path from each coloring \(\mathcal {C}\in \mathcal {U}\) to \(\mathcal {C}_M\). Firstly, from \(\mathcal {C}\) we can reach the coloring where P is fully blue. In the first round, we can color \(v_{\kappa -1}\) blue since it has at least one blue neighbor, namely \(u_b\). Then, we can color \(v_{\kappa -2}\) blue (while \(v_{\kappa -1}\) remains blue) since it has at least one blue neighbor, namely \(v_{\kappa -1}\), and so on. Thus, after \(\kappa\) rounds, P is fully blue. Now, we argue that there is a non-zero probability that in the next four rounds the following updates take place: (i) \(v_0\) become white (which is possible since the adjacent node \(u_w\) is white) (ii) \(v_0\) becomes blue (which is possible since \(v_1\) is blue) and \(v_1\) becomes white (which is possible since \(v_0\) is white) (iii) \(v_2\) becomes white and \(v_1\) becomes blue (iv) \(v_3\) becomes white. (Note that we assume any node which is not mentioned remains unchanged. This is possible since all other nodes have at least one adjacent node of the same color.) After these four rounds, \(v_0,v_1\) are blue and \(v_2,v_3\) are white, which is identical to their coloring in \(\mathcal {C}_M\). Now, we repeat the same process for \(v_4, v_5, v_6, v_7\) and so on. After \(\kappa /4\) repetitions (i.e., \(\kappa\) rounds) we reach \(\mathcal {C}_M\). Overall, we can conclude that there is a non-zero probability to reach \(\mathcal {C}_M\) from any coloring in \(\mathcal {U}\). This finishes the proof.

Theorem 2.7

In MM on a cycle \(C_n\):

  • If n is odd, the process always reaches a stable coloring.

  • If n is even, the process reaches a stable coloring or the blinking configuration.

In RMM on a cycle \(C_n\):

  • If n is odd, the process always reaches the white (blue) coloring.

  • If n is even, the process reaches the white (blue) coloring or the blinking configuration.

Proof

First consider MM. If n is odd, then for any coloring there must exist at least two adjacent monochromatic nodes. Thus, according to Lemma 2.2, the process must reach a stable coloring. For even n, if there are two adjacent monochromatic nodes, then again we can apply the same argument. If not, then the process is in the blinking configuration.

Now, consider RMM. Let n be odd. It suffices to prove that it is possible (i.e., there is a non-zero probability) to reach from any coloring to the white or blue coloring. Consider an arbitrary coloring \(\mathcal {C}\). Since n is odd, there must be two adjacent monochromatic nodes in \(\mathcal {C}\). Thus, there exists a monochromatic, say blue, path P of length at least 2. There is a non-zero probability that all nodes on P remain blue in the next round and the node(s) adjacent to P become/stay blue (because all these nodes have at least one blue neighbor). Thus, it is possible that path P keeps growing until it takes over the whole cycle and we reach the blue coloring. For even n, if there is at least one monochromatic path of length two or larger, then the above argument applies again. Otherwise, the process is in the blinking configuration.

Number of stable xolorings: According to Theorem 2.7, there are two stable colorings, namely the white and blue coloring, in RMM on cycle \(C_n\). What about the number of stable colorings in MM? We answer this question in Theorem 2.8.

Theorem 2.8

In MM on a cycle \(C_n = (v_0, \cdots , v_{n-1})\), there are \(\Theta (\Phi ^n)\) stable colorings, where \(\Phi =\frac{1+\sqrt{5}}{2}\) is the golden ratio.

Proof

We say a blue (resp. white) node is solitary if both of its neighbors are white (resp. blue). A coloring is stable in MM if and only if it has no solitary node. If there is no solitary node, then each node has a neighbor of the same color and keeps its color, i.e., the coloring is stable. If there is a solitary node in the coloring, it changes its color in the next round, i.e., the coloring is not stable. Thus, we want to determine \(|\mathcal {S}_n|\), where \(\mathcal {S}_n\) is the set of all colorings on a cycle \(C_n\) with no solitary nodes.

Let \(\mathcal {R}_n\) denote the red-green colorings of a cycle \(C_n\), where there is an even number of red nodes and there are no two adjacent red nodes. We define a mapping \(\mathcal {M}:\mathcal {S}_n\rightarrow \mathcal {R}_n\). \(\mathcal {M}\) maps a blue-white coloring \(\mathcal {C}\in \mathcal {S}_n\), to a red-green coloring \(\mathcal {C}'\in \mathcal {R}_n\) in the following manner: for \(0\le i \le n-1\) if \(\mathcal {C}(v_i)=\mathcal {C}(v_{i+1})\) (\(i+1\) is calculated modular n), then \(\mathcal {C}'(v_i) = g\) and \(\mathcal {C}'(v_i) = r\) otherwise (where g and r stand for green and red). The generated red-green coloring \(\mathcal {C}'\) is in \(\mathcal {R}_n\) because if there are two adjacent red nodes in \(\mathcal {C}'\), then there is a solitary node in \(\mathcal {C}\) (but that is not possible since \(\mathcal {C}\) is in \(\mathcal {S}_n\)). Furthermore, since the number of changes from blue to white and white to blue must be even, there is an even number of red nodes. We claim that for each \(\mathcal {C}'\in \mathcal {R}_n\), there are exactly two colorings in \(\mathcal {S}_n\) which are mapped to \(\mathcal {C}'\). Let’s try to construct a prospective coloring \(\mathcal {C}\in \mathcal {S}_n\) which is mapped to \(\mathcal {C}'\). Assume that \(\mathcal {C}({v_0})=b\); then \(\mathcal {C}(v_i)\), for \(1 \le i \le n-1\), is enforced by \(\mathcal {C}(v_{i-1})\) and \(\mathcal {C}'(v_{i-1})\). For example, if \(\mathcal {C}'(v_0)=r\), then \(\mathcal {C}(v_1)=w\) (because \(v_0\) and \(v_1\) must have opposite colors when \(\mathcal {C}'(v_0)=r\)) and \(\mathcal {C}(v_1)=b\) otherwise. Therefore, if we apply the mapping \(\mathcal {M}\) on \(\mathcal {C}\) we get a coloring which matches \(\mathcal {C}'\) on all nodes \(v_0, \cdots , v_{n-2}\) by construction. Note that \(\mathcal {C}'(v_{n-1})\) must be the same since there are an even number of red nodes. We can construct another coloring which also gets mapped to \(\mathcal {C}'\) by starting to color \(v_0\) with white. Overall, we argued each coloring in \(\mathcal {S}_n\) is mapped to exactly one coloring in \(\mathcal {R}_n\) and for each coloring \(\mathcal {C}'\in \mathcal {R}_n\), there are exactly two colorings in \(\mathcal {S}_n\) which are mapped to \(\mathcal {C'}\). This implies that \(|\mathcal {S}_n|=2|\mathcal {R}_n|\) (Fig. 9).

Fig. 9
figure 9

An example of the mapping \(\mathcal {M}:\mathcal {S}_n\rightarrow \mathcal {R}_n\)

To calculate \(r(n):=|\mathcal {R}_n|\), let us first calculate p(n) which is the number of red-green paths of length n with no two adjacent red nodes and an even number of red nodes. It is straightforward to compute the starting values p(1), p(2), p(3), and p(4). Furthermore, we have \(p(n)=p(n-1)+p(n-4)+\cdots +p(1)+2\) for \(n\ge 5\). This is true because if for an n-node path \(v_0\cdots , v_{n-1}\) we color \(v_0\) with green, then there are \(p(n-1)\) ways to color the remaining part. If we color \(v_0\) red, then the second node must be green and then there must be at least one red node from \(v_2\) to \(v_{n-1}\) (since there must be an even number of red nodes). Let \(v_j\) be the smallest j between 2 and \(n-1\) for which \(v_j\) is red. If \(j\le n-3\), then \(v_{j+1}\) must be green and the remaining part can be colored in \(p(n-j-2)\) ways. If \(j=n-2\), then \(v_{n-1}\) must be colored green, which gives 1 coloring. \(j=n-1\) also gives one coloring. This justifies the recursion \(p(n)=p(n-1)+p(n-4)+\cdots +p(1)+2\) for \(n\ge 5\). This is a Fibonacci-type of sequence, which can be lower and upper bounded by \(\Phi ^n\), up to a constant factor. Thus, we conclude that \(r(n)=\Theta (\Phi ^n)\). We clearly have \(r(n)\le p(n)\). Furthermore, if we color \(v_0\) in a cycle \(C_n = (v_0,\cdots , v_{n-1})\) green, then the remaining nodes can be colored in \(p(n-1)\) ways. Thus, we have \(p(n-1)=\Theta (\Phi ^{n-1})\le r(n)\le p(n)=\Theta (\Phi ^n)\). This implies that \(s(n)=\Theta (\Phi ^n)\) since \(s(n)=2r(n)\). (Actually if we solve the recursion accurately, we get s(n) is equal to \(\Phi ^n\), up to some additive terms of smaller orders.)

3 Winning sets

How small could a winning set be? Berger [14], surprisingly, proved that there exist arbitrarily large graphs which have winning sets of constant-size in MM. Actually, a proof was sketched that this statement holds regardless of the tie-breaking rule. This is stated more formally in Theorem 3.1 and for the sake of completeness a full proof is given in the appendix, Sect. 1.

Majoirty Rule. We say a model follows the majority rule if in each round, every node updates its color to the most frequent color in its neighborhood, and a tie is broken in any arbitrary manner. This in particular includes MM and RMM.

Theorem 3.1

For every positive integer k and a model which follows the majority rule, there is an n-node graph with \(n\ge k\), which has a winning set of size 36.

Theorem 3.2

In RMM on a cycle \(C_n = (v_0, \cdots , v_{n-1})\), the only winning set is the set of all nodes. In MM on \(C_n\), the minimum size of a winning set is equal to \(\lfloor n/2\rfloor +1\).

Proof

For RMM, it suffices to prove that if there is even one white node in the initial coloring, there is a non-zero probability that the process does not reach the blue coloring. Let one white node form an alternating path of length one and the rest of the cycle be blue. Then, it is possible that the alternating path grows from both sides in each round. After \(\lceil n/2\rceil -1\) rounds, the process reaches the blinking configuration (if n is even) and a coloring with two adjacent white nodes (if n is odd). In the first case the process never reaches the blue coloring and in the second one it is possible that this white path grows in each round until the process reaches the white coloring. Thus, there is no winning set of size \(n-1\) or smaller.

Consider a winning set \(\mathcal {B}\) in MM. For every two adjacent nodes, at least one must be in \(\mathcal {B}\). This is true because otherwise if initially only nodes in \(\mathcal {B}\) are blue such two adjacent nodes are colored white and remain white forever, which is in contradiction with \(\mathcal {B}\) being a winning set. This implies that \(|\mathcal {B}|\ge \lceil n/2\rceil\). For odd n, this implies that \(|\mathcal {B}|\ge \lfloor n/2 \rfloor +1\). For even n, if there are no two adjacent nodes outside \(\mathcal {B}\) and \(|\mathcal {B}|= \lceil n/2\rceil =n/2\), then it means only nodes in odd (or even) position are in \(\mathcal {B}\). In that case, \(\mathcal {B}\) is not a winning set because starting from a coloring where only \(\mathcal {B}\) is blue, the process is in the blinking configuration. Therefore, in the even case, we have \(|\mathcal {B}|\ge n/2+1=\lfloor n/2\rfloor +1\). Furthermore, the bound of \(\lfloor n/2\rfloor +1\) is tight. For both odd and even n, the set \(\{v_i: (i \mod 2) =1\}\cup \{v_0\}\) is a winning set of size \(\lfloor n/2\rfloor +1\).

4 Random initial coloring

We determine the expected final number of blue nodes starting with a random coloring on a cycle graph for MM and RMM respectively in Theorems 4.1 and 4.2.

Theorem 4.1

In MM on a cycle \(C_n\) with a p-random initial coloring for some \(p\ge 1/2\), the process reaches a stable coloring with \((1\pm \epsilon )\frac{2p^2-p^3}{1-p+p^2}n\) blue nodes, for an arbitrarily small constant \(\epsilon >0\), in \(\mathcal {O}\left( \log n\right)\) rounds w.h.p.

Proof

Let \(\mathcal {E}\) be the event that there is no alternating path of size larger than \(n-4\) in the initial coloring. The probability of \(\mathcal {E}\) not happening can be upper-bounded by \(2n(p(1-p))^{\lfloor (n-4)/2\rfloor }\) which is exponentially small in n. Since our statement needs to hold w.h.p. (i.e., w.p. \(1-o(1)\)), in the rest of the proof, we assume that \(\mathcal {E}\) happens. (To be fully accurate, we need to condition on \(\mathcal {E}\) happening in our calculations, but we skip that for the sake of simplicity.) Thus, in the initial coloring the nodes can be partitioned into maximal blue and white paths of length at least two and maximal alternating paths of size at most \(n-4\). From such an initial coloring, the monochromatic paths keep growing and the alternating paths shrink until the process reaches a stable coloring with only monochromatic paths. (See proof of Lemma 2.2 for more details.)

Let \(p_f\) be the probability that an arbitrary node v is blue at the end. To compute \(p_f\), we consider the three cases of v being on a monochromatic path, on an odd alternating path, or an even alternating path in the path partition of the initial coloring, which results in Equation (2). (I) If v is on a white path, it never becomes blue. If it is on a blue path, it remains blue forever. The probability of v being on a blue path is equal to \(p(p^2+2p(1-p))\) since v and at least one of its neighbors must be blue. (See the first term in Equation (2).) (II) An odd alternating path is adjacent to two monochromatic paths of the same color (they potentially could be the same path) and all nodes on the alternating path eventually choose the color of the monochromatic path(s). The probability that v is on an odd alternating path of length k which is adjacent to blue path(s) is equal to \(p^4kp^{\lfloor k/2\rfloor }(1-p)^{\lceil k/2\rceil }\). (The term \(p^4\) is for two adjacent nodes at each side of the path to be blue. Note that since we assume that there is no alternating path of size larger than \(n-4\), these four nodes are distinct.) Summing over all choices of k, we get the second term in Equation (2). (III) An even alternating path P is adjacent to a blue path and a white path. The nodes on P which are closer to the blue (white) path become blue (resp. white) after at most |P|/2 rounds. The probability that v is on an alternating even path of length k and is closer to the blue path is equal to \(2p^2(1-p)^2\frac{k}{2}p^{k/2}(1-p)^{k/2}\). Summing over all choices of k, we get the third term in Equation (2).

$$\begin{aligned} \begin{aligned} p_f =&\left( 2p^2-p^3\right) + p^4\sum _{\text {odd } 1\le k\le n-4}k p^{\bigl \lfloor \frac{k}{2}\bigr \rfloor }(1-p)^{\bigl \lceil \frac{k}{2}\bigr \rceil }+\\ {}&p^2(1-p)^2\sum _{\text {even } 1\le k\le n-4}k p^{\frac{k}{2}}(1-p)^{\frac{k}{2}} \end{aligned} \end{aligned}$$
(2)

Let us define \(q:= p(1-p)\). Then we can write the last sum as \(2q^2\sum _{i=1}^{\lfloor \frac{n-4}{2}\rfloor }iq^i\). This is equal to \(2q^2\frac{q}{(1-q)^2}+\mathcal {O}(nq^{\frac{n}{2}})\), where we used the fact that this is the derivative of a geometric series. Similarly, we can show that the first sum in Equation (2) is equal to \(p^3(\frac{2q}{(1-q)^2}-\frac{q}{1-q})+\mathcal {O}(nq^{\frac{n}{2}})\). By plugging these into Equation (2), doing some basic calculations, and using the fact that n tends to infinity, we get \(p_f=\frac{2p^2-p^3}{1-p+p^2}\). (We are ignoring the additive term \(\mathcal {O}(nq^{\frac{n}{2}})\) because it is converging to 0 and can be hidden behind the estimate \((1\pm \epsilon )\) that we add later.) This implies that \(\mathbb {E}[b_f]=\frac{2p^2-p^3}{1-p+p^2}n\) where \(b_f\) is the final number of blue nodes.

Let \(l_p\) denote the length of the longest alternating path in a p-random coloring on \(C_n\). Then, for \(l^*:=8\log _2 n\) we have

$$\begin{aligned} \mathbb {P}[l_p\ge l^*]\le 2n (p(1-p))^{l^*/2}\le 2n \left( \frac{1}{2}\right) ^{l^*/2}=\frac{2}{n^3}. \end{aligned}$$
(3)

Therefore, w.p. at least \(1-2/n^3\), the process ends before \(l^*\) rounds.

We claim that the random variable \(b_f\) (defined over \(\Omega =\{w,b\}^n\)) is difference-bounded by \((\beta =n,c=4l^*+7,\delta =2/n^3)\). (I) Let B be the set of colorings where there is an alternating path of length at least \(l^*\). If we set \(p=1/2\), then we pick a coloring uniformly at random among the \(2^n\) colorings. According to Equation (3), the probability that such a randomly chosen coloring has an alternating path of size at least \(l^*\) is at most \(2/n^3\), i.e., \(|B|/|\Omega |\le 2/n^3\). (II) Consider a coloring \(\mathcal {C}\notin B\). Since the length of the longest alternating path is less than \(l^*\), the process ends before \(l^*\) rounds. Now, assume we flip the color of a node v to obtain the coloring \(\mathcal {C}'\). The longest alternating path in \(\mathcal {C}'\) cannot be longer than \(2l^*+3\). Thus, the process starting from \(\mathcal {C}'\) ends in at most \(t\le 2l^*+3\) rounds. Furthermore, the color of node v influences the final color of at most \(2t+1\) nodes, namely the nodes whose distance from v is at least t. Therefore, the difference between the final number of blue nodes when starting from \(\mathcal {C}\) and \(\mathcal {C}'\) is at most \(2(2l^*+3)+1=4l^*+7\), i.e., \(|b_f(\mathcal {C})-b_f(\mathcal {C}')|\le 4\,l^*+7\). (We are actually quite generous with our calculations here.) (III) For two arbitrary colorings \(\mathcal {C}\) and \(\mathcal {C}'\), we trivially have \(|b_f(\mathcal {C})-b_f(\mathcal {C}')|\le n\). Now, applying Theorem 1.1 implies that \(\mathbb {P}[(1-\epsilon )\mathbb {E}[b_f]\le b_f\le (1+\epsilon )\mathbb {E}[b_f]]\), for some \(\epsilon >0\), is at least \(1-2\exp (-(\epsilon ^2\mathbb {E}[b_f]^2)/(8n(4l^*+7)^2))-4/(n(4l^*+7))\) where we used \(\beta =n\), \(c=4l^*+7\), \(\delta =2/n^3\). Using \(\mathbb {E}[b_f]^2=(2p^2-p^3)^2n^2/(1-p+p^2)^2=\Theta (n^2)\) for \(p\ge 1/2\) and \((4l^*+7)^2=\Theta (\log ^2 n)\), the above probability is at least \(1-\exp (-\Theta (n/\log ^2 n))-1/\Theta ((n\log n))=1-o(1)\). Furthermore, we already proved that the process ends w.h.p. before \(8\log _2 n\) rounds. Therefore, the process reaches a stable coloring with \((1\pm \epsilon )\frac{(2p^2-p^3)}{1-p+p^2}n\) blue nodes in \(\mathcal {O}(\log n)\) rounds w.h.p.

Theorem 4.2

Consider RMM on \(C_n\) and assume that \(b_0=pn\) for some \(0\le p\le 1\). Then, we have \(\mathbb {E}[b_t]=pn\) for any \(t\in \mathbb {N}\).

Proof

It suffices to prove that the sequence \(b_0, b_1, b_2,\cdots\) is a discrete-time martingale, i.e., \(\mathbb {E}[b_t|b_0, b_1, \cdots , b_{t-1}]=b_{t-1}\). Let us formulate RMM in a slightly different way. Assume that in each round, a white (blue) node sends a white (blue) pebble to each of its two neighbors. Then, each node uniformly and independently at random chooses one of the two pebbles it has received and picks its color. This is the same as the RMM rule because if the neighbors of a node agree on a color, it picks that color w.p. 1, and otherwise it picks a color independently and uniformly at random. Now, assume that there are b blue nodes in the round \(t-1\). Then, each of the b blue nodes sends out two blue pebbles and each blue pebble is selected and results in a blue node w.p. 1/2. Thus, by the linearity of expectation, the expected number of blue nodes in round t is equal to \(2b*(1/2)=b\). This concludes the proof that the sequence is a martingale. Therefore, we have \(\mathbb {E}[b_t]=pn\) for any \(t\in \mathbb {N}\).

Theorem 4.2 holds for any initial coloring with pn blue nodes, regardless of their position. We can apply this to the case of a p-random initial coloring because a simple application of the Chernoff bound [21] implies that there are pn blue nodes initially w.h.p. up to some “small” error factor.

Corollary 4.3

For RMM on \(C_n\) with \(b_0=pn\):

  • If n is odd, the process reaches the blue coloring w.p. p and the white coloring w.p. \(1-p\).

  • if n is even, the process reaches the blue coloring w.p. \(p^2\), the white coloring w.p. \((1-p)^2\) and the blinking configuration w.p. \(2p(1-p)\).

Proof

For odd n, according to Theorem 2.7, the process must reach the white or blue coloring. Let \(b_f\) denote the number of blue nodes in the final coloring. We have \(\mathbb {E}[b_f]=\sum _{i=1}^{n} i*\mathbb {P}[b_f=i]=n*\mathbb {P}[b_f=n]\), where for the last equality we used the above statement. Furthermore by Theorem 4.2, we have \(\mathbb {E}[b_f]=pn\). Therefore, we get \(\mathbb {P}[b_f=n]=p\).

For even n, since graph \(C_n\) is a bipartite graph, its node set can be partitioned into two subsets L and R, which both form an independent set of size n/2. According to Theorem 2.7 after f rounds, for some even integer f, all nodes in L (similarly in R) share the same color. Using a similar argument to the one for the odd case and the fact that L and R are symmetric, we can show that the probability that all nodes in L (similarly R) are blue in round f is equal to p.

Furthermore, by a simple inductive argument, one can show that the color of nodes in L (similarly R) in round t for some even integer t only depends on the color of nodes in L (resp. R) in round 0 (i.e., their initial coloring). This implies that the color of nodes in L is independent of the color of nodes in R in round t.

Combining the statements from the last two paragraphs, we can conclude that in round f both L and R are blue (i.e., the fully blue coloring) w.p. \(p^2\), both L and R are white (i.e., the fully white coloring) w.p. \((1-p)^2\), and one of them is blue and the other one is white (i.e., the blinking configuration) w.p. \(2p(1-p)\).

5 Experiments

We also study MM and RMM from an experimental perspective. The conducted experiments are two-fold. Firstly, some of the experiments are closely related to our theoretical findings from above, for example the experiments on cycles and 2-cycles. These experimental results support and complement our theoretical findings. Secondly, some other experiments are executed which establish the grounds for the future work. In particular, we experimentally explore the connection between graph characteristics, such as conductance and vertex-transitivity, and the behavior of MM and RMM. Then, building on some intuitive arguments, we state some propositions to be explored further.

5.1 Graph data

Some of our experiments are executed on cycle, 2-cycle (to build a 2-cycle, take a cycle \(C_n\) and add an edge between every two nodes which are in distance 2), and some random graph (which is the graph obtained by adding two randomly selected edges to each node in a cycle \(C_n\)). See Fig. 10 for some examples.

Fig. 10
figure 10

Examples of a cycle b 2-cycle c random graph

We also have conduced some experiments on some real-world graph data. We rely on the publicly available network datasets from SNAP [37]. In the Facebook graph we have \(n=4039\) and \(m=88234\), in the LastFM Asia graph \(n=7624\) and \(m=27806\), and in the Wikipedia vote graph \(n=7115\) and \(m=103689\).

Remark. In the diagrams below, we use “expected” for what we expect according to our theoretical findings and “actual” is the output of the experiments.

5.2 Stabilization Time

Figure 11 depicts the stabilization time of MM on a cycle graph \(C_n\) for an extreme coloring, where there is a white path of length 2 (or 3) and an alternating path of length \(n-2\) (or \(n-3\)). The stabilization time for the cycle perfectly matches the bound \(\lceil n/2 \rceil -1\) proven in Theorem 2.3. Interestingly, once we add two random neighbors for each node on the cycle (to obtain the random graph), then the process ends extremely faster (i.e., in less than 25 rounds even for \(n=10,000\)). To argue that this is not merely the effect of adding extra edges, but rather how they are added, we ran the process on a 2-cycle graph (which has the same number of edges as the random graph). As you can observe, even though the process speeds up slightly, it is still substantially slower than the random case. Is it true that the stabilization time on random graphs, such as Erdős-Rényi random graph and random regular graphs, or more generally graphs with strong conductance properties is small, perhaps (sub)-logarithmic in n? This is left as an open problem.

Fig. 11
figure 11

The stabilization time in MM as a function of n, with a white path of length 2 (or 3) and an alternating path of length \(n-2\) (or \(n-3\)). (Please refer to Sect. 5.1 for the definition of 2-cyle)

The stabilization time of MM on the studied real-world social graphs starting from a p-random initial coloring is depicted in Fig. 12. We observe that the process takes the longest when \(p=1/2\), that is, each node chooses one of the two colors uniformly and independently at random. But even in that case, the process stabilizes fairly quickly. This can be explained by the following two points: (i) we start from a random coloring (rather than a worst-case coloring as in Fig. 11), (ii) while the real-world networks are not very strong expanders (a graph is said to be expander if it has high degree of connectivity), they enjoy a certain level of expansion. The second point is aligned with our results from above about the stabilization time for cycle and 2-cycle (which do not have strong expansion properties) and random graph (which possesses strong expansion).

Fig. 12
figure 12

The stabilization time in MM on the Facebook, LastFM, and Wikipedia vote graphs, starting from a p-random coloring

Remark. RMM has been excluded in these experiments. This is because as we proved the stabilization time and periodicity of RMM can be exponential and thus similar experiments for RMM can be computationally expensive.

5.3 Stable colorings

Figure 13 visualizes the number of stable colorings in MM for a cycle \(C_n\) obtained from our experiments (using brute-force search) alongside the expected estimate \(\Phi ^{n}\) from Theorem 2.8. Again, adding random edges results in a considerably different behavior, i.e., the number of stable colorings decreases drastically. (A 2-cycle also has exponentially many stable colorings, but with a base smaller than the one for cycle, \(\Phi\). Adding that to the diagram will reduce the readability. Thus, it has been left out.)

Note that a stable coloring corresponds to a partition of the nodes into resilient sets. Thus, if there are many ways to partition a graph’s node set into resilient sets, there exist many stable colorings. In graphs with strong conductance properties, such as the above random graph, for the sets which are not too large, the number of edges on the boundary is more than twice the number of edges inside the set. Thus, such sets do not form resilient sets. Another parameter which, we believe, plays a role is vertex-transitivity because it provides a certain level of “symmetry” which could result in the formation of resilient sets. Therefore, it would be interesting to characterize the number of stable colorings in terms of different graph parameters, in particular conductance and vertex-transitivity, in the future work.

Fig. 13
figure 13

The number of stable colorings as a function of n in MM

5.4 Expected final density

Fig. 14 visualizes the final ratio of blue nodes by starting from a p-random coloring for different values of p and \(n=2000\) in both MM and RMM. For MM on \(C_n\), the output of our experiments acceptably matches what one would expect according to our result in Theorem 4.1. For RMM, it, unsurprisingly, does not match the expected final density p (see Theorem 4.2) because we know that the process always reaches a monochromatic coloring or the blinking configuration (see Corollary 4.3). (If we let n be odd, e.g. \(n=1999\), then it only can get monochromatic.) Once we switch to our random graph, the process exhibits a behavior called perfect classification, i.e., if p is smaller (larger) than 1/2, then the process reaches the white (resp. blue) coloring. This is aligned with the results from prior work, cf. [50], on the relation between conductance and perfect classification. On the other hand, both cycle and 2-cycle graphs, up to some degree, exhibit a property known as fair classification, i.e., the expected final ratio of blue nodes is “almost” equal to their initial ratio p.

Fig. 14
figure 14

The final ratio of blue nodes for different values of p, starting from a p-random coloring

Figure 15 visualizes the final ratio of blue nodes by starting from a p-random coloring for different values of p in MM on the three studied real-world social graphs, as well as cycle and random graphs. The behavior of these social networks is somewhat between cycle and random graph, but closer to random graph. In particular, we observe that the behavior is more similar to the perfect classification rather than fair classification, which again can be explained by the fact that real-world social graphs possess some level of expansion. Recall that the random graph, unlike cycle, has strong expansion properties.

Fig. 15
figure 15

The final ratio of blue nodes for different values of p, starting from a p-random coloring in MM on Facebook, LastFM, and Wikipedia Vote, cycle, and random graphs

6 Conclusion

We studied two very fundamental majority based opinion diffusion processes. Developing several novel proof techniques, we provided tight bounds on the stabilization time, periodicity, minimum size of a winning set, and the expected final density in these processes.

Stabilization Time and Periodicity of RMM. We proved that the stabilization time and periodicity of RMM can be exponential for some graphs. It would be interesting to characterize graphs for which a polynomial upper bound exists.

Number of Stable Colorings. We initiated the study of the number of stable colorings and provided tight bounds for the cycle graph in both MM and RMM. A potential future research direction is to determine the graph parameters which govern the number of stable colorings. Building on our experimental findings, we nominated conductance and vertex-transitivity as potential candidates. A suitable starting point would be to study the Erdős-Rényi random graph (which has strong expansion properties) and the d-dimensional torus (which is vertex-transitive).

Perfect and Fair Classification. It is known by prior work, cf. [50], that for perfect classification, it suffices that the graph enjoys strong conductance properties. For fair classification, our theoretical results on the cycle graph and the outcome of our experiments suggest that vert-transitivity plays a crucial role. It is left to the future work to investigate this further.