Formally verified asymptotic consensus in robust networks

Distributed architectures are used to improve performance and reliability of various systems. Examples include drone swarms and load-balancing servers. An important capability of a distributed architecture is the ability to reach consensus among all its nodes. Several consensus algorithms have been proposed, and many of these algorithms come with intricate proofs of correctness, that are not mechanically checked. In the controls community, algorithms often achieve consensus asymptotically, e.g., for problems such as the design of human control systems, or the analysis of natural systems like bird flocking. This is in contrast to exact consensus algorithm such as Paxos, which have received much more recent attention in the formal methods community. This paper presents the first formal proof of an asymptotic consensus algorithm, and addresses various challenges in its formalization. Using the Coq proof assistant, we verify the correctness of a widely used consensus algorithm in the distributed controls community, the Weighted-Mean Subsequence Reduced (W-MSR) algorithm. We formalize the necessary and sufficient conditions required to achieve resilient asymptotic consensus under the assumed attacker model. During the formalization, we clarify several imprecisions in the paper proof, including an imprecision on quantifiers in the main theorem.


Introduction
To enhance reliability, robustness and performance, many modern systems use a distributed architecture, composed of multiple nodes communicating with each other.Examples range from coordinated control of multi-robot systems such as swarms of mobile and aerial robots, to load-balancing among servers answering many queries per second.A fully decentralized system, where decisions are made collectively by the nodes rather than by one master node, greatly improves reliability by ensuring there is no single point of failure in the system.A distributed architecture also provides greater performance (depending on the context, in terms of load capacity, reduced latency, smaller communication overhead, etc.) than any single node could ever achieve.Distributed architectures are supported by distributed algorithms, which particularly focus on carefully handling situations where some nodes become faulty, stop responding, or become malicious.
One central aspect of distributed algorithms is the ability to achieve consensus.Consensus is said to be achieved in a network if all normal (correct) nodes agree on a certain value, where a node is normal if it is not faulty [34].The value agreed upon by all nodes can be a reference point for the next position of a swarm, or the sequence of commands executed by a set of replicas in State Machine Replication [44].Consensus has been studied extensively in different communities.In the distributed computer systems communities, some prominent algorithms achieving consensus are Paxos [29], MultiPaxos [46], Raft [36], and Practical Byzantine Fault Tolerance (PBFT) [6].However, these algorithms deal with the problem of exact consensus.There are many scenarios where exact consensus is not achievable, ranging from the design of human controlled systems to analysis of natural systems like bird flocking.These problems have to be solved under harsh environmental restrictions such as restricted communication abilities and presence of communication uncertainty.Therefore, these problems warrant the study of asymptotic consensus problems, which unlike exact consensus, do not require strong assumptions on the underlying network [16].This paper presents the first formal proof of an asymptotic consensus algorithm, by formalizing the Weighted-Mean Subsequence Reduced (W-MSR) algorithm [30,49].The problem of asymptotic consensus is of much importance to the distributed robotics and controls community, who have studied algorithms like the Mean Subsequence Reduced (MSR) algorithm [27] and its recent extension W-MSR.These algorithms are designed to achieve asymptotic consensus in partially connected groups of nodes, but have not been formally verified.Formal verification of consensus algorithms is important as has been emphasized by the distributed computer systems community, who have long invested in producing mechanically checked proofs of its consensus protocols.The controls community, however, lags behind in this direction.In recent years, the distributed systems community has embraced formal methods to provide mechanically-checked proofs of its consensus protocols and their implementations, using a wide range of techniques from interactive and automated theorem proving [47,25,8,5,18,9,31] to automatic generation of inductive invariants [33,21,48,20].In the distributed robotics and controls community however, researchers usually prove their consensus protocols with paper proofs, using mathematical analysis based on Lyapunov theory and its extensions, without computer-checked formalizations.As we show in this paper, our formalization of asymptotic consensus for the W-MSR algorithm [30] reveals imprecisions in the placement of quantifiers in the main theorem and several missing pieces in the proof, thereby highlighting the importance of machine-checked proofs.Thus a significant contribution of our work is providing the first mechanically checked formalism of the asymptotic consensus and its application to the W-MSR algorithm, widely used in the controls community.We have chosen to formalize this algorithm since it is a widely-used algorithm for resilient consensus [42,41,45].From the perspective of practical applications, enabling resilient consensus in the presence of misbehaving or faulty nodes is desirable for many applications in autonomous systems and robotics, e.g., for coordinated control of multi-robot systems.
The MSR and W-MSR algorithms are very different from exact consensus algorithms such as MultiPaxos, Raft or PBFT.As such our formal verification of the correctness of W-MSR uses different techniques than previous proofs of exact consensus algorithms.The first major difference is that MSR and W-MSR guarantee asymptotic consensus rather than finite-time consensus.A second major difference is that MSR and W-MSR provide consensus in networks that are not fully connected : two normal nodes might not be able to communicate with each other directly, but might have to rely on another (possibly faulty) node to forward their messages to each other.This last property is crucial to model multirobot systems where complete communication between any two robots may not be feasible at all times.Because of those differences, providing a mechanicallychecked proof of W-MSR requires the development and use of different techniques than the ones typically used to mechanically check Multipaxos, Raft or PBFT.In particular, our formalization crucially relies on formalization of limits and real analysis, because many of the techniques used in model-checking or for generating invariants are not well-suited to prove asymptotic properties.
Contributions: The original contribution of this work is the formalization in the Coq theorem prover of the convergence results of the W-MSR algorithm [30].Specifically, we provide a machine-checked concrete counterexample for the proof of necessity, a clean proof of Lemma 1 and the Coq formalization of the main theorem (Theorem 1).We also fill in several missing details and clarify imprecisions in the proof of sufficiency, which can be viewed as an addition to the existing proof [30].Additionally, this is, to our knowledge, the first mechanical formalization of a consensus algorithm where the consensus is obtained asymptotically, opening the door to more such proofs.
This paper is organized as follows.In Section 2, we discuss the problem setup and define terminologies related to graph topology and the W-MSR algorithm [30].In Section 3, we discuss the formalization of the necessary and sufficient conditions in Coq, for achieving resilient asymptotic consensus.We also discuss some specific challenges we encountered during the formalization.After reviewing some related work in Section 4, we conclude in Section 5 by discussing key takeaways from our work and generic challenges we encountered during the formalization.We also lay down a few directions that could be addressed in future work.

Preliminaries
In this paper we consider the problem of formalizing consensus in a network, and adopt the problem formulation from [30].While the original paper discusses consensus in a distributed control graph for both malicious and byzantine threat models for both time-varying and time-invariant graph structures, we limit our formalization to the case of a time-invariant graph for a malicious threat model and for a particular threat scope: F-total, where the total number of malicious nodes in the control graph is bounded.We will next discuss briefly what each of these highlighted terms means in the context of the following problem.

Problem formulation
Consider a network that is modeled by a digraph (directed graph), D = (V, E), where V = {1, . . ., n} is the node set and E ⊂ V × V is the directed edge set.The node set is partitioned into a set of normal nodes N , and a set of adversary nodes A, which are unknown a priori to the normal nodes.Each directed edge (j, i) ∈ E models information flow and indicates that node i can be influenced by (or receive information from) node j at time-step t.The set of in-neighbors of node i is defined as V i = {j ∈ V|(j, i) ∈ E}.Intuitively, the set of in-neighbors contains all neighboring nodes of i, such that the direction of information flow is from those nodes to i.The cardinality of the set of in-neighbors is called the in-degree, Since each node has access to its own value at time-step t, we also consider a set of inclusive neighbors of node i, denoted by J i = V i ∪ {i}.

Threat Model
As discussed earlier, we formalize a threat model (F-total malicious model [30]) in which every adversary node in the graph is malicious, and there exists an upper bound F on the number of malicious agents in the graph, i.e., the set of adversary nodes are F -totally bounded.In the context of the problem in Section 2.1, some relevant formal definitions pertaining to the threat model are stated as: Definition 1 (Malicious node [30]).A node i ∈ A is called Malicious if it sends the same value x i (t) to all its neighbors at each time step t, but applies a different update function f ′ i (.) at some time step.
Definition 2 (F-total set [30]).A set S ⊂ V is F-total if it contains at most F nodes in the network, i.e., |S| ≤ F , F ∈ Z ≥0 .
Definition 3 (F-totally bounded [30]).A set of adversary nodes is F-totally bounded if it is an F-total set.
Note that while Definitions 2 and 3 may appear similar, they define different terminologies.Definition 2 defines an F-total set with at most F nodes in a network.Definition 3 specializes this to a set of adversary nodes saying that there are at most F adversarial nodes in a network.

Robust network topologies
The ability of a set of normal nodes in a control graph to achieve consensus depends on its ability to make local decisions effectively.Le Blanc et al. [30] defined a topological property called network robustness for reasoning about the effectiveness of purely local algorithms to succeed, which we formalize in Coq.
In particular, they define a property called (r, s)-robustness, which is stated as: Definition 4 ((r, s)-robustness [30]).: A digraph D = (V, E) on n nodes (n ≥ 2) is (r, s)-robust, for nonnegative integers r ∈ Z ≥0 , 1 ≤ s ≤ n, if for every pair of nonempty, disjoint subsets S 1 and S 2 of V at least one of the following holds The condition (iii) states that there are a total of at least s nodes from the union of sets S 1 and S 2 , such that each of those nodes have at least r nodes outside of their respective sets in the union S 1 ∪ S 2 .The idea is that "enough" nodes in every pair of nonempty, disjoint sets S 1 , S 2 ⊂ V have at least r neighbors outside of their respective sets.This ensures that the network is well connected, and that loss of information from a node due to malicious attack does not affect the whole network.Figure 1 illustrates an example of a network with (2, 2) robustness.

Update model for the normal nodes
In this paper, we formalize a consensus algorithm, called the W-MSR algorithm [30].This algorithm provides an update model for the normal nodes in the network.A schematic of the algorithm is illustrated in Figure 2. We denote the value emitted by node i at time t as x i (t), and the value of the directed weighted edge from node j, to node i at time t as w ij (t).The value x i (t) could represent a measurement like position, velocity, or it could be an optimization variable.The quantity x i j (t) is the information that the j th node in the neighboring set of node i sends to the node i.Each node also has a varying set of neighbors which it ignores that we denote as R i (t).The set R i (t) changes because the nodes are removed depending on their value with respect to the value of node i at time t.In this algorithm, the updated value of a normal node i at time t + 1 is the convex sum of the values of its neighboring set including itself.Hence, x i (t + 1) = j∈Ji\Ri(t) w ij (t)x i j (t), where we assume the existence of a constant α ∈ R, such that 0 < α < 1, and the weights w ij (t) satisfy the conditions: 1. w ij (t) = 0 whenever j / ∈ J i ; 2. w ij (t) ≥ α, ∀j ∈ J i ; and 3.
j∈Ji\Ri(t) w ij (t) = 1 for all i ∈ N , and t ∈ Z ≥0 .It is important to note that the third condition depends on the set of removed nodes, which may change over time.In order to satisfy this condition the values of the weights may need to change over time.
The choice of neighboring sets in the W-MSR algorithm is defined as follows: 1 An important point to note here is that the above update model holds only for the normal nodes, i.e., i ∈ N .The update function for adversary nodes, i.e. i ∈ A, and their influence on the normal nodes depend on the threat model.We will next discuss the formalization of the W-MSR algorithm in Coq.

A formal proof of consensus for the W-MSR algorithm
Theorem 1. [30] Consider a time-invariant network modeled by a digraph D = (V, E) where each normal node updates its value according to the W-MSR algorithm with parameter F .Under the F-total malicious model, resilient asymptotic consensus is achieved if and only if the network topology is The proof of this theorem requires us to prove both a sufficiency and a necessity condition.The original paper proof relies on a safety condition, which provides an invariant condition that must hold at all times in the state update.We will next discuss the proof of the safety condition (Section 3.1), then sufficiency (Section 3.2) and necessity (Section 3.3) conditions individually.
3.1 Proof of the safety condition in W-MSR Lemma 1 (Safety condition).[30] Suppose each node updates its value according to the W-MSR algorithm with parameter F under the F-total malicious model.Then for each node i ∈ N , , regardless of the network topology.
Here, m(t) = min i∈N {x i (t)} and M (t) = max i∈N {x i (t)}.Note that the original paper [30] does not provide a proof of this lemma, and our proof, which we formalize in this paper, is an original contribution.We provide a detailed proof of the lemma by explicitly enumerating the cases from the definition of the W-MSR algorithm.On the other hand, the original paper [30] merely states an outline, making a careful check of the proof difficult.
Proof.We prove Lemma 1 by showing inductively, that at each time t, and for every normal node i, there exists a node , thus: Symmetrically there exists a j 2 ∈ J i ∩N such that ∀k ∈ J i \R i (t), x j2 (t) ≥ x k (t).Thus, the symmetric inequality x i (t+1) ≤ M (t), holds for the same reason.Since the proof of the existence of j 1 and j 2 are nearly identical, we only show the proof of the former in Appendix A.

Formalization in Coq:
We formalize Lemma 1 in Coq as: The definition of F total malicious states that the model is F-total malicious if the set of adversary nodes are F-totally bounded (i.e., there are at most F adversary nodes in the network) and all the adversary nodes are malicious.Here A: D → bool is a tagging function.If A i == true, then i is classified as an Adversary node else it is classified as a Normal node.mal : nat → D → R is an arbitrary update function for a malicious node.Since we do not know beforehand, how this function would look like, we assume it as a parameter.The function init : D → R is an initial value associated with a node.We define a malicious node in Coq as that node in the graph for which the normal update model does not hold, i.e., there exists a time t such that x i (t + 1) ̸ = j∈Ji\Ri(t) w ij (t)x i j (t).
(** Condition for a node to have malicious behavior at a given time **) Definition malicious_at_i_t (mal:nat The second hypothesis wts well behaved states that we respect those three conditions on weights that we discussed in Section 2.4.The assignment of weights depend on whether a node j ∈ J i \R i (t) or not.Here, J i denotes the inclusive set of neighbors of the node i. R i (t) denotes the removed set of nodes according to the W-MSR algorithm, and we define R i (t) in Coq as follows Note that we use the filter function from the MathComp sequence library.This is crucial as it gives us lemmas that allow us to assert that any node in satisfies the conditions of the filter.Additionally, the filter function requires that its first argument has a pred type, D → bool in our case.Therefore, we need our inequality operations to be decidable.Hence, we used the decidable versions of the inequality operations, such as Rle dec, provided by Coq's reals library instead of it's built-in ≤ operation.We then define the set

in Coq as
Definition incl_neigh_minus_extremes Since J i \R i (t) is defined based on the value of node i, x i (t), which indeed depends on A, mal, init.Hence, wts well behaved depends on A, mal, init.
The trickiest parts of the proof of Lemma 1 rely on the fact that we desire J i \ R i (t) when treated as a list to be sorted.In order to fulfill this condition we use the formalization for sorting found in the MathComp library.To do this we first define a relation on D as: This definition ensures that if x i (t) < x j (t), then i is ordered as less than j with respect to this relationship.In the case of nodes with equivalent values we use an arbitrary mechanism to break ties.Doing so ensures that this relation is total, and satisfies transitivity, anti-symmetry, and reflexivity.This relation lets us use the sorting lemmas in MathComp's path library [13], and it ensures the weaker condition that we occasionally use in the proof: The biggest difficulty with formalizing this proof arises when dealing with the case that |R < i (t)| < F , where R < i (t) := {j ∈ J i : x j (t) < x i (t) and idx Ji (x j (t)) < F }, and define idx l (x k (t)), to be the index of the value x k (t) in a given list l of values, or the size of l if x k (t) is not present..In particular, showing that idx Ji\Ri(t) (j) = 0 =⇒ n j (J i ) = |R < i (t)|.This requires proving an extra lemma on the J i list: With this lemma, we can reason that the zero-th index of Using this lemma, we can prove the existence of j 1 in the proof of lem 1. Symmetrically, we can show the existence of j 2 such that ∀k ∈ J i \ R i (t), x j2 (t) ≥ x k (t).Tying it all together, we complete the proof of the lemma lem 1 in Coq.

Proof of Sufficiency
Lemma 2. [30] Consider a time-invariant network modeled by a digraph D = (V, E) where each normal node updates its value according to the W-MSR algorithm with parameter F .Under the F-total malicious model, if a network is (F+1, F+1) robust, resilient asymptotic consensus is achieved.This is an important lemma because we would like to design a network such that the normal nodes in the network reach an asymptotic consensus in the presence of malicious nodes in the network.Next we will discuss an informal proof of the Lemma 2 followed by its formalization in the Coq proof assistant.
Proof.The proof of Lemma 2 is done by contradiction.We start by assuming that the limits A M and A m of the functions M (t) and m(t) respectively are different, i.e., A M ̸ = A m .The limits A M and A m of the functions M (t) and m(t), respectively, exist because M (t) and m(t) are both continuous and monotonously decreasing functions of t.Therefore, by definition of limits for M (t) and m(t), we know that ∀ t, A M ≤ M (t) ∧ m(t) ≤ A m , as illustrated in Figure 3.We will show that by carefully constructing the sets S 1 and S 2 in the definition of (r, s)-robustness, and unrolling the definition of (r, s)-robustness at every timestep inductively, we eventually arrive at the desired contradiction: ∃ t, M (t) < A M ∨ A m < m(t).We discuss the details of the proof in Appendix B.

Formalization in Coq:
We introduce the following axiom in Coq to support reasoning by contradiction.We observe the behavior of functions M (t) and m(t) inside this tube of convergence ∀t ≥ tϵ.We prove that M (t) and m(t) are monotonous ∀t ≥ tϵ, and they approach the limits AM and Am, respectively.We start by assuming that AM ̸ = Am, but later prove that AM = Am by contradiction, thereby proving asymptotic consensus.
This is a propositional completeness lemma that allows us to reason classically and is consistent with the formalization of classical facts in Coq's standard library.We need this lemma because we prove the sufficiency condition using contradiction.We are choosing to use classical reasoning because the original paper [30] does not provide a constructive proof.The reasoning used in the paper is classical.This requires us to state the following lemma in Coq Lemma P_not_not_P: ∀ (P: Prop), P ↔ ¬(¬ P).
The proof of P not not P uses the axiom proposition degeneracy.
We state the sufficiency condition (Lemma 2) for the network to achieve resilient asymptotic consensus as the following in Coq.
Lemma strong_sufficiency: The sufficiency condition requires that the graph is non-trivial, i.e., there are at least two nodes in the graph, and the number of faulty nodes F in the graph is bounded by the total number of nodes D. We define r s robustness in Coq as Definition r_s_robustness (r s:nat where Xi S r S1 r is the set of all nodes in the set S1 such that all of its nodes have at least r neighboring nodes outside S1.In Coq, we define Xi S r as Definition Xi_S_r (S: {set D}) (r:nat We define Resilient asymptotic consensus in Coq as Here, is lim seq is a predicate in Coquelicot that defines limits of sequences.Rbar is the extended set of reals, which includes +∞ and −∞.To prove that the network achieves resilient asymptotic consensus under the (F +1, F +1)-robustness condition, we need to prove the following two conditions in the definition of Resilient asymptotic consensus: We state the first subproof as the lemma statement interval bound in Coq.The proof of lemma interval bound is a consequence of Lemma 1.We prove this lemma by an induction on time t and then apply Lemma 1 to complete the proof.
We prove the second subproof by contradiction in Coq.To start the proof of contradiction, we need to assume that the limits A M and A m of the maximum and minimum functions M (t) and m(t) are different.We then instantiate the sets S 1 and S 2 in the definition of (r, s)-robustness with X M (t ϵ , ϵ o ) and X m (t ϵ , ϵ o ) respectively, where In Coq, we define the sets X M for any epsilon and t as follows Definition X_m_t_e_i (e_i: R) (A_m :R) (t:nat) (mal : where Rlt dec is Coq's standard decidability lemma for less than operation. We need to prove that the sets X M and X m are disjoint at all times till we reach a point when either X M or X m are empty.This requires us to prove the following lemma in Coq Since X m (t ϵ + l, ϵ l ) is a set of all nodes with values at least, A M − ϵ l and X m (t ϵ + l, ϵ l ) is a set of all nodes with values at most A m + ϵ l , these two sets are disjoint if A M − ϵ l > A m + ϵ l .For l = 0, we have defined This would indeed require us to show that ϵ l < ϵ o , ∀l, 0 < l.This holds since we had defined ϵ l recursively as A crucial aspect of the sufficiency proof is proving that the (F + 1, F + 1)robustness implies that there exists a node in the union of the set X M ∩ N and X m ∩N such that it has at least F +1 nodes outside the set.This was particularly challenging because in the original paper [30], the authors do not use all three conditions in the definition of (F + 1, F + 1) robustness condition to informally prove the implication.They use only the third condition Xm |) to state the implication, while leaving it up on the readers to connect the missing dots with the first two conditions.For the implication to hold, all three conditions in the definition of (F + 1, F + 1)-robustness should imply the existence of such a node since there is an or in the definition of (F + 1, F + 1)robustness connecting the three conditions.To prove the implication from the first two conditions, we need to first prove the existence of a normal node in the sets X M and X m for all l ≤ N .This holds since the node i with value M (t ϵ + l) will always be above the threshold A M − ϵ l because M (t) ≥ A M , ∀t due to the existence of the limit A M .Hence, 0 Hence by definition of X F +1 X M (tϵ+l,ϵ l ) , there exists a normal node in the set X M (t ϵ +l, ϵ l ) such that it has at least F +1 nodes outside X M (t ϵ +l, ϵ l ).We prove this formally in Coq using the following lemma statement Lemma X_m_normal_exists_at_j (t_eps l N: nat) (a A_m: R)(eps_0 eps:posreal) By symmetry, we prove that 0 < |X F +1 Xm(tϵ+l,ϵ l ) |.The other part that was not explicit from the paper proof in the original paper [30] was that the largest value that the node i uses at time step t ϵ + l is M (t ϵ + l), which is provided without proof.This was a challenge during our formalization.To formally prove this we had to split the neighbor set of i into two parts depending on their relative position with respect to i.While it is easy to bound the values of the nodes positioned in the left side of i with M (t ϵ + l) since the neighboring list is assumed to be sorted at the time of update and we have established this upper bound for any normal node from lemma 1, bounding the values for the nodes positioned in the right of the normal node i was not trivial.We proved this using a case analysis on the cardinality of the set R > i (t).In Coq, we formally prove this using the lemma statement x right ineq 1 in Coq.We do not expand on this lemma here for brevity.
Another challenge during the formalization was using the bound of the neighboring node of i, A M − ϵ l in the update of the value of i at the next time step.We know that the neighbors outside the set J i (t ϵ + l)\X M (t ϵ + l, ϵ l ) have value at most A M − ϵ l .But to use these nodes in the update function, we need to show that these neighboring nodes are in the inclusive set of the normal node i minus the extremes, i.e, there exists a node in the intersection of the sets J i (t ϵ + l) and the set s which contains nodes outside the set J i (t ϵ + l)\X M (t ϵ + l, ϵ l ).We prove the existence of such a node using the following lemma statement in Coq We instantiate the set A with J i \R i (t) and the set B with R < i (t).We know that by definition of the W-MSR algorithm, |R < i (t)| ≤ F .To use the lemma exists in intersection, we first had to prove that s ⊂ (J i \R i (t)) ∪ R < i (t).Applying the lemma exists in intersection then gives us a node k as a witness which lies in the intersection of the set s and J i \R i (t).We use this node to apply the bound A M − ϵ l in the proof of inequality 1 for l ≤ N .All other nodes in the neighboring list of the normal node i minus extremes are shown to be bounded by M (t).
To show that the inequality ∃t, M (t) < A M ∨ A m < m(t) holds, we need to prove that for every l such that l ≤ N , the cardinality of the set X M decreases or the cardinality of the set X m decreases or both under the (F + 1, F + 1)robustness condition.This requires us proving the following lemma in Coq Lemma sj_ind_var (s1 s2: nat → nat) (N:nat): We instantiate s1 and s2 with X M (t ϵ + l, ϵ l ) and X m (t ϵ + l, ϵ l ) respectively.We use the lemma sj ind var to arrive at a contradiction and complete the proof of the sufficiency.

Proof of necessity
Lemma 3. [30] Consider a time-invariant network modeled by a digraph D = (V, E) where each normal node updates its value according to the W-MSR algorithm with parameter F. Under the F-total malicious model, if resilient asymptotic consensus is achieved then the network is (F+1, F+1)-robust.
Necessity is a secondary, but still significant lemma.It tells us that there is no weaker condition than (F + 1, F + 1)-robustness such that the normal nodes within the network reach asymptotic consensus.We now discuss an informal proof of Lemma 3. Note that the original paper [30] does not provide a clean proof of this lemma.For example, the original paper provides a sketch of the proof of Lemma 3 by contrapositivity, but does not provide a concrete counterexample to discharge the proof by contrapositive.The paper proof in [30] does not talk about construction of weights or the proof that these weights are not well-behaved under non-(r, s)-robustness.These issues were non-trivial and posed challenges in Coq, as will be explained in this section.We also highlight challenges in the construction of this counterexample and the proof of necessity in Coq, including an issue of mutual recursion in Coq.The issues with missing details in the original paper proof, which we had to develop explicitly, make the proof in this paper an original contribution.
Proof.We proceed by proving the contrapositive of necessity, that is: if the network is not (F + 1, F + 1) robust then it does not achieve resilient asymptotic consensus.Assuming that the network is not (F + 1, F + 1)-robust we know that there are non-empty sets One way of interpreting this condition is that the number of nodes within S 1 and S 2 that can receive a lot of information from outside of their respective sets is less than F +1 in total, and less than the number of nodes in each set respectively.We seek to construct a set of adversaries, initial values, malicious functions, and weights such that resilient asymptotic consensus is not achieved.In particular we seek to prove that there exists two normal nodes i, j such that lim We discuss the details of the proof in the Appendix D.

Formalization in Coq: We formalize the lemma 3 in Coq as
Formalization of necessity proof exposed some inconsistencies in definitions in the original paper [30].In particular, the paper defines those three conditions on weights, that we discussed in the Section 2.4, only for normal nodes.During our formalization, we found this to be restrictive.Those conditions on weights should hold for any node.The need for applying the conditions in the paper to the weights of adversary nodes, is that in order to ensure that a node i ∈ A is malicious, as defined in the paper, there must exist a time t such that the quantity x i (t + 1) ̸ = j∈Ji\Ri(t) w ij (t)x i j (t).In other words at some time the value emitted by a given node must not equal the value it would emit if it was normal, but the sum is clearly undefined if the weights of an adversary node are undefined.Therefore, we relax the condition that the set of weights described in the paper only exists for normal nodes.Fortunately this does not create a problem as adversary nodes can update their values according to any function they wish, meaning that they do not have to use the described set of weights, or any weights at all, leaving their values unconstrained by this condition.
Another thing that was not explicit in the original paper [30] was the right placement of quantifiers.Formalizing the proof of necessity helped us identify the right placement of quantifiers and provide an accurate formal specification for the W-MSR algorithm.At the start of our formalization it was not evidently clear to us whether the paper meant to imply that: or: In the first formula, the quantified values A, mal, init are not bound to the definition of resilient asymptotic consensus.Therefore, in the necessity proof, we cannot construct a counterexample by appropriate instantiation of A, mal and init, to discharge the proof by contradiction.In the second formula, the quantified values are bound to the definition of resilient asymptotic consensus, which allows us to construct the counterexample by propagating the negation through the quantified values.Essentially, the difference is between the formulae (∀X, P (X) → Q(X)) and ((∀X.P (X)) → (∀X.Q(X))), where X represents the tuple (A, mal, init), and the first statement is stronger.Therefore, the former, stronger condition is not necessarily true in the necessity direction, while the weaker later condition is.
Another difficulty we encountered was defining the weights in such a way that w ij (t) = 1 |Ji\Ri| .This is a result of Coq's sensitivity to ill-defined recursion.The issue arises because defining w ij at time t requires knowing the value of x i at time t, however, as we had defined x i , it takes the set of weights it uses as a parameter, even though mathematically there is no issue since x i (t) only relies on the values of x j (t − 1), and w ij (t − 1).In order to solve this issue we defined a function which returns a pair of functions (x i , w ij ).In order to ensure that Coq could guess the parameter being recursed on we also had to add another parameter two t which is initialized as 2•t, and ensure that the pair (x i (t), w ij (t)) is returned when two t = 2•t, and (x t+1 , w ij (t)) is returned when two t = (2•t)+1.

Formal proof of the main theorem
We state the main theorem statement 1 in Coq as: We close the proof of F total consensus by splitting the theorem into sufficiency and necessity sub-proofs and applying the lemmas sufficiency proof and necessity proof.The only detail worth noting is that necessity proof relies on the decidable of r s robustness, which we need the axiom of the excluded middle to conclude.

Related Work
Recently there has been a growing interest in the formalization of distributed systems and control theory, using both automated and interactive verification approaches.
Some notable works in the area of automated verification use model checking, temporal logic, and reachability techniques.For instance, Cimatti et al. [11] have used model checking techniques to formally verify the implementation of a part of safety logic for railway interlocking system.Schrer et al. [43] extended the JavaPathFinder [24] model checker to support modeling of a real-time scheduler and physical system that are defined by differential equations.They verify the safety and liveness properties of a control system, and also verify the programming errors.Besides model checking, temporal logic based techniques have been applied to control synthesis [40], robust model predictive control [14] and automatic verification of sequential control systems [35].Other approaches for verifying safety use reachability methods like flow pipe approximations [10], zonotope approximation algorithms [19,28,2], and ellipsoidal calculus [4].
There has also been significant work in the formalization of control theory using interactive theorem provers [39,1,38].In the area of formalization of stability analysis for control theory, Cyril Cohen and Damien Rouhling formalized the LaSalle's principle in Coq [12].Stability is important for the control of dynamical systems since it guarantees that trajectories of dynamical systems like cars and airplanes, are bounded.Chan et al. [7] formalize safety properties like Lyapunov stability and exponential stability of cyber-physical systems, in Coq.In [39], Damien Rouhling formalized the soundness of a control function [32] for an inverted pendulum.Some works have also emerged in the area of signal processing for controls.Gallois-Wang et al. [17] formalized some error analysis theorems about digital filters in Coq.Araiza-Illan et al. [3] formally verified high level properties of control systems such stability, feedback gain, or robustness using the Why3 tool [15].Rashid et al. [38] formalized the transform methods in HOL-Light [22].Transform methods are used in signal processing and controls to switch between the time domain and the frequency domains for design and analysis of control systems.A few works have emerged in the area of formalization of the feedback control theory to guarantee robustness of control systems.Jasim and Veres et al [26] proved one of the most fundamental and general result of nonlinear feedback system -the Small-gain theorem (SGT), formally using Isabelle/HOL [37].Hasan et al [23] formalized the theoretical foundations of feedback controls in HOL Light.Another notable work in the formalization of control systems is the formalization of safety properties of robot manipulators by Affeldt et al. [1].
Most of the above works deal with the problem of formalizing the theoretic foundations of control theory -stability analysis, transform methods, filtering algorithms for signal processing, feedback control design.But, to our knowledge, none of these works tackles the problem of consensus in a formal setting.Given that consensus is a quantity of interest in distributed control applications, our work on the formalization of the W-MSR algorithm, is a first step towards formally verified distributed control systems.

Conclusion
In this work, we formalize a consensus algorithm [30] for distributed controls in Coq.We formally prove the necessary and sufficient conditions for a set of normal nodes in the network to achieve asymptotic consensus in the presence of a fix bound of malicious nodes in the network.During the process of formalization we discover several areas where the proof in the original paper is imprecise, especially when defining the lemma statements of sufficiency and necessity.In particular, the order of quantifiers on some variables was unclear, and we had to spend time clarifying their order.We also prove a stronger version of the sufficiency condition than the original theorem requires.This is done to ensure that the conditions in both directions of the double implication holds.The definitions and lemmas we formalize in this paper can be used for verifying consensus for other threat models described in the original paper [30].Overall our work is a first of its kind to provide formal specifications of a consensus algorithm in distributed controls.The total length of Coq proofs is about 11 thousand lines of code.It took us 6 person months for the entire formalization.
A possible future direction of work is to verify the implementation of the algorithm.The proof of this algorithm in the original paper [30], and our formalization assume that all computations are in the real field.However, an actual implementation would need to use finite precision arithmetic.It would therefore be interesting to study the effect of finite precision on the robustness of this algorithm.It would also be interesting to formalize the algorithm for time-variant networks in which the edge relation between the nodes can change with time.Possible use cases for such network model are drone swarms for military and rescue operations, in which each drone in the network could be expected to dynamically change the flow of information from its neighbors.

A Proof of the Lemma 1
Proof for the existence of j 1 : We define the following sets.Regard J i to be the set of neighbors of i, interpreted as a list sorted according to the x-values of it's nodes with ties broken according to a total ordering placed on V i , and define idx l (x k (t)), to be the index of the value x k (t) in a given list l of values, or the size of l if x k (t) is not present.If the value x k (t) is repeated then idx l (x k (t)) is the index corresponding to where the node k would be relative to the total ordering on V i .We may use idx(x k (t)) if the list is clear from the context.Let R < i (t) := {j ∈ J i : x j (t) < x i (t) and idx Ji (x j (t)) < F }, and define R > i (t) in a similar fashion.
Note that in all cases |R < i (t)| ≤ F , and j ∈ R < i (t) =⇒ ∀k ∈ J i \ R i (t), x j (t) ≤ x k (t).We proceed by case analysis on the size of R < i (t). 1.
or there exists a k ∈ R < i (t), such that k ∈ N .In the first case all nodes in J i \ R i (t) are also normal nodes, so we may take the largest such node as our j 2 .In the second case, by the definition of R < i (t), x k (t) ∈ N , so we may pick k as our j 2 .
2. |R < i (t)| < F .Let j be the node corresponding to the first value in the sorted list J i \ R i (t).Thus, ∀k ∈ J i \ R i (t), x j (t) ≤ x k (t).However, we do not know that j is a normal node, but we can prove that x j (t) = x i (t).By the above set of inequalities x j (t) ≤ x i (t).Now we assume WLOG that j ̸ = i.Since we know that x j (t) / ∈ R < i (t), it follows that x i (t) ≤ x j (t), or F ≤ idx Ji (x j (t)).However, we know that R < i (t) makes up the first ) is false, we know that x i (t) ≤ x j (t), and we are done, since ∀k ∈ J i \ R i (t), x i (t) = x j (t) ≤ x k (t), and we know by assumption that i ∈ N .Thus we may take i as our j 1 .

B Proof of the Lemma 2
We discuss the proof construction of the Lemma 2 in this section.
B.1 Construction of the sets S 1 and S 2 in the definition of (r, s)− robustness: To use the definition of (r, s)− robustness in the hypothesis of the lemma, we need to instantiate the sets S 1 and S 2 in its definition.Let us construct a set, X M (t, ϵ l ) = {i ∈ V : x i (t) > A M − ϵ l } which includes all normal and malicious nodes that have values larger than A M − ϵ l .We can similarly construct a set, X m (t, ϵ l ) = {i ∈ V : x i (t) < A m + ϵ l } which includes all normal and malicious nodes that have values smaller than A m + ϵ l .By the definition of convergence, there exists a time t ϵ such that M (t) < A M + ϵ and m(t) > A m − ϵ, ∀t ≥ t ϵ .Figure 3 illustrates the behavior of M (t) and m(t) inside the tube of convergence bounded above by A M + ϵ and bounded below by A m −ϵ.At time instance t ϵ , consider the nonempty sets X M (t ϵ , ϵ o ) and X m (t ϵ , ϵ o ).By density of reals, there exists a constant Therefore, X M (t ϵ , ϵ o ) and X m (t ϵ , ϵ o ) are disjoint.We obtain the constant ϵ by fixing it such that ϵ < α N 1−α N ϵ o which satisfies ϵ o > ϵ > 0. Here, α is a lower bound on the weights w ij (t) which comes from the conditions on weights we discussed in section 2.4.N is the cardinality of the normal set of nodes N .At time t ϵ , we instantiate S 1 and S 2 with X M (t ϵ , ϵ o ) and X m (t ϵ , ϵ o ), respectively.For all t, t ≥ t ϵ , we instantiate the set S 1 and S 2 with X M (t, ϵ l ) and X m (t, ϵ l ), respectively, as long as there is a normal node in these sets.
B.2 Unrolling the definition of (r, s)− robustness for one time step: Since X M (t ϵ , ϵ o ) and X m (t ϵ , ϵ o ) are nonempty and disjoint, (F + 1, F + 1)robustness implies that there exists a normal node in the union of X M (t ϵ , ϵ o ) and X m (t ϵ , ϵ o ) such that it has at least F + 1 neighbors outside its set.This follows from the definition of (r, s)-robustness.According to the condition (iii), at least F + 1 nodes must have at least F + 1 neighbors outside the set.Since the network is allowed to have a maximum of F faulty nodes, there is at least one normal node in the union that has at least F + 1 neighbors outside the union.By definition, these neighbors have values at most equal to A M − ϵ o or at least A m + ϵ o .
Since there exists a normal node in the union of the sets X M (t ϵ , ϵ o ) and X m (t ϵ , ϵ o ), let us assume for the purpose of illustration that such a node lies in the set X M (t ϵ , ϵ o ), i.e., i ∈ X M (t ϵ , ϵ o ) ∩ N with at least F + 1 neighbors outside of X M (t ϵ , ϵ o ).The set of arguments we lay for i ∈ X M (t ϵ , ϵ o )∩N can be similarly constructed for i ∈ X m (t ϵ , ϵ o ) ∩ N by symmetry.
Let us now consider an update of the value of the node i at the next time step, i.e., x i (t ϵ + 1).According to the W-MSR update, x i (t ϵ + 1) = j∈Ji\Ri(tϵ) w ij (t ϵ )x i j (t ϵ ).The problem is now to bound x i (t ϵ + 1).This bound can be obtained by the following set of inequalities To prove this inequality, we need to show that the upper bound on the value of nodes in the set J i \R i (t ϵ ) is M (t ϵ + 1), and that at least one node in the set J i \R i (t ϵ ) has an upper bound of A M − ϵ o on its value.We present an informal proof of the inequality 2 in the Appendix C. Note that this proof was missing in the original paper [30] and is thus an original contribution in this paper.Here, we consider the fact that α is a lower bound on the weights and the sum of all weights is 1.By following a similar line of arguments starting from the set X m (t ϵ , ϵ o ) ∩ N , we can prove that Consider the set X M (t ϵ , ϵ 1 ).Since at least one of the normal nodes of X M (t ϵ , ϵ o ) decreases at least to A M − ϵ 1 (or below) or increases to at least A m + ϵ 1 , it must be that or both, i.e., that node is kicked out of the set X M (t ϵ +1, ϵ 1 )∩N or X m (t ϵ +1, ϵ 1 )∩N , or both.

B.3 Proving consensus inductively:
So far, we have shown that due to (r, s)− robustness, a normal node is kicked out of the sets X M (t ϵ + 1, ϵ 1 ) ∩ N and/or X m (t ϵ + 1, ϵ 1 ) ∩ N .
We can carry this forward inductively as long as there are still normal nodes in X M (t ϵ + l, ϵ l ) and X m (t ϵ + l, ϵ l ) for time step t ϵ + l, and But we know that ∀t, We can observe that the inequalities 3 and 4 are contradictory.Hence, it must be the case that A M = A m , i.e., the limits of M (t) and m(t) converge as t approaches infinity.Thus, resilient asymptotic consensus is achieved.This ends the proof of the sufficiency condition.

C Proof of the inequality 2
Proof.Consider the sets R > i (t ϵ ) as the set of all nodes with values strictly greater than x i (t ϵ ).Similarly, R < i (t ϵ ) is the set of all nodes with values strictly less than x i (t ϵ ).The nodes in R < i (t ϵ ) and R > i (t ϵ ) will be removed when we update the value of node i at next time step, (t ϵ + 1).By the W-MSR algorithm, |R < i (t ϵ )| ≤ F and |R > i (t ϵ )| ≤ F .The remaining set of nodes in the inclusive neighbors of i form the set J i \R i (t ϵ ).The sets R < i (t ϵ ), R > i (t ϵ ) and J i \R i (t ϵ ) are mutually disjoint and their union form the set of inclusive neighbors of i.Since the node i takes a sorted list of neighboring nodes for its update according to the W-MSR algorithm, we assume that the inclusive and the inclusive neighbors minus extremes, i.e., J i \R i (t ϵ ) are sorted.
We can divide the set J i \R i (t ϵ ) into two sets: depending on their relative position to the node i.The values of the nodes in the set (J i \R i (t ϵ )) high at time t ϵ are bounded above by M (t ϵ ).This holds because if |R > i (t ϵ )| < F , then all nodes with values strictly greater than the node i are removed and all nodes in the set (J i \R i (t ϵ )) high have the same value as the node i.Since the node i is normal, its value is bounded above by M (t ϵ ) at time step t ϵ since M (t ϵ ) = max i {x(t ϵ )}, i ∈ N .Hence, all the nodes in the set (J i \R i (t ϵ )) high have value at most M (t ϵ ).If |R > i (t ϵ )| = F , we consider two cases: 1.All nodes in the removed set are adversary.Since by definition of F-total malicious model, there can be at most F malicious nodes in the network, all nodes in the set (J i \R i (t ϵ )) high are normal and are bounded above by M (t ϵ ) at time t ϵ .
2. At least one node in the removed set is normal.Therefore, the values of all the nodes in the set (J i \R i (t ϵ )) high will be bounded above by the value of the removed normal node which in itself is bounded above by M (t ϵ ).
Therefore, x k (t ϵ ) ≤ M (t ϵ ), ∀k ∈ (J i \R i (t ϵ )) high Since there are at least F + 1 neighbors outside the set X M (t ϵ , ϵ o ), there exists a set s with F + 1 nodes such that s ⊂ J i \X M (t ϵ , ϵ o ) and its values are at most A M − ϵ o .Since |R < i (t ϵ )| ≤ F , there exists a node in the intersection of sets s and J i .This node will have a value at most A M − ϵ o .We can prove that except for this node, other nodes in the set J i is bounded above by M (t ϵ ).This holds because for the nodes in the set (J i \R i (t ϵ )) high , the values are bounded above by M (t ϵ ) as discussed earlier.For the nodes in the set (J i \R i (t ϵ )) low , the values are also bounded above by M (t ϵ ) since the set J i is sorted and the nodes in the set (J i \R i (t ϵ )) low lie to the left of the node i.Therefore, Hence,

S1
which by definition must be normal, and likewise for S 2 .We initialize all nodes in S 1 to have value 0, all nodes in S 2 to have value 1, and all nodes not in S 1 , or S 2 to have value 1  2 .Furthermore, We fix all values of nodes in χ F +1 S1 to be 0, and all nodes in χ F +1 S2 to be 1 for all time.We inductively prove that ∀k 1 ∈ S 1 , x k1 (t) = 0, and ∀k 2 ∈ S 2 , x k2 (t) = 1 forall t.If k 1 or k 2 are adversary nodes we are done, so assume they are normal.Note that the base case where t = 0 is clear from the definitions.
To prove the inductive case note that with these sets of choices, (by definition) all nodes in S 1 \ χ F +1 S1 receive at most F values from nodes outside S 1 .Since all other inputs a node in S 1 \ χ F +1 S1 receives are from S 1 , which are all 0 by induction, the W-MSR procedure removes all nodes that are not zero from the set of neighbors it considers for it's update procedure.Hence at time t + 1, the node in consideration still has value 0. For a similar reason for any node i ∈ S 2 , and for all of its neighbors j ∈ J i \ R i (t), x j (t) = 1.The only difference to prove the result for S 2 , is that we must have a set of weights that are well behaved, so that when a given node in S 2 performs the update step of the W-MSR procedure, the weights, and hence the weighted average, sum to 1.One such set of weights is w ij (t) := 1 |Ji\Ri(t)| if j ∈ J i \ R i (t), and w ij (t) := 0 otherwise.Therefore, ∃k 1 , ∈ S 1 ∩ N , k 2 ∈ S 2 ∩ N such that ∀t ∈ N, x k1 (t) = 0, and x k2 (t) = 1 which implies that lim t→∞ x k1 (t) = 0 ̸ = 1 = lim t→∞ x k2 (t), hence resilient asymptotic consensus is not achieved.

Fig. 1 .
Fig. 1.Illustration for (2, 2) robustness.In the illustration (a), every node of the set S2 has 2 neighboring nodes outside S2.Similarly every node in the set S1 has at least 2 neighboring nodes outside S1.In the illustration (b), there are 2 nodes in the union S1 ∪ S2 that have 2 neighbors outside the set.Note that the sets S1 and S2 are disjoint.

Fig. 3 .
Fig.3.Illustration of the tube of convergence bounded above by AM + ϵ and bounded below by Am − ϵ.We observe the behavior of functions M (t) and m(t) inside this tube of convergence ∀t ≥ tϵ.We prove that M (t) and m(t) are monotonous ∀t ≥ tϵ, and they approach the limits AM and Am, respectively.We start by assuming that AM ̸ = Am, but later prove that AM = Am by contradiction, thereby proving asymptotic consensus.

[
since, M (t ϵ ) < A M + ϵ, ∀t ≥ t ϵ ] ≤ A M − αϵ o + (1 − α)ϵD Proof of Lemma 3Proof.Define the adversary set to be χ F +1 S1 ∪ χ F +1 S2 .Then we know there exists a normal node in S 1 \ χ F +1 S1 , and inS 2 \ χ F +1 S2 .This follows because |χ F +1 S1 | ̸ = |S 1 | which implies |χ F +1 S1 | < |S 1 |, so since |χ F +1 S1 | ⊂ |S 1 |, there exists a node in S 1 \ χ F +1 . At each time-step t, each normal node i obtains the values of its neighbors, and forms a sorted list 2. If there are fewer than F nodes with values strictly greater than the value of i, then the normal node removes all those nodes.Otherwise, it removes precisely the largest F values in the sorted list.Likewise, if there are less than F nodes with values strictly less than the normal node i, the normal node removes all such nodes.Otherwise, it removes precisely the smallest F nodes in the sorted list.Schematic of the W-MSR update.At time t, the node i obtains values from its neighbors and forms a sorted list.The algorithm then removes the largest and the smallest F nodes in the sorted list, or if there are less than F nodes with values strictly greater than or less than the value of i, the algorithm removes all those nodes.