Automated Verification of Parallel Nested DFS

Model checking algorithms are typically complex graph algorithms, whose correctness is crucial for the usability of a model checker. However, establishing the correctness of such algorithms can be challenging and is often done manually. Mechanising the verification process is crucially important, because model checking algorithms are often parallelised for efficiency reasons, which makes them even more error-prone. This paper shows how the VerCors concurrency verifier is used to mechanically verify the parallel nested depth-first search (NDFS) graph algorithm of Laarman et al. [25]. We also demonstrate how having a mechanised proof supports the easy verification of various optimisations of parallel NDFS. As far as we are aware, this is the first automated deductive verification of a multi-core model checking algorithm.

and parallel programs. The presented verification is inspired by a previous mechanical verification of sequential NDFS [37] that was carried out in Dafny [30].
This paper demonstrates the feasibility of mechanical program verification of parallel graph algorithms, like multi-core NDFS. To the best of our knowledge we present the first mechanical verification of a parallel graph algorithm. Our formalisation provides reusable components that can be used to verify variations of parallel NDFS, as well as other algorithms for parallel model checking.
Before listing our contributions ( §1.3) we first provide more background on model checking algorithms ( §1.1) and related work on their verification ( §1.2).

Background on Model Checking
Pnueli introduced the Linear-time Temporal Logic (LTL) [36] to specify properties of reactive systems. The model checking problem [12] decides whether a transition system satisfies a given LTL property. The automata-based approach [45] reduces the model checking problem to the graph-theoretic problem of checking the reachability of accepting cycles. Reachability of accepting cycles in directed graphs can be checked in linear time, with the nested depth-first search (NDFS) algorithm [13,19,41], which forms the basis of the Spin model checker [17].
Several distributed and parallel model checking algorithms have been proposed, to allocate more memory and processors to the problem [2]. NDFS is based on depth-first search, which is considered hard (impossible) to parallelise efficiently [39]. For distributed approaches, the best strategy is to turn to BFS algorithms [3], which are straightforward to parallelise but at the cost of increasing the amount of work beyond linear time. For the shared-memory setting, swarm verification was proposed [18], where each worker runs its own instance of NDFS. Various DFS-based multi-core algorithms for full LTL model checking have been devised for this strategy [14,15,25]. This paper considers the version by Laarman et al. [25], which is a parallel version of improved sequential NDFS [41].
The correctness of parallel NDFS is quite subtle. In particular, parallel DFS does not fully respect a global depth-first ordering, since each worker maintains its own search stack, yet the correctness of NDFS depends on the search order. Also, to realise speedups, the implementation avoids locking shared data structures by using atomics. This raises the question whether the implementation of a parallel model checker, meant to verify the correctness of safety-critical systems, is itself correct. For this reason the original paper [25] contains a detailed pen-and-paper correctness proof, which is based on a number of invariants.

Related Work
To raise the level of confidence in model checkers, one approach is to certify each of their individual runs. Obviously, the counterexample returned by a model checker is itself a certificate that can easily be verified independently. However, double-checking the absence of errors is harder. Namjoshi [33] proposed to instrument a µ-calculus model checker, to generate a deductive proof that can be checked independently, also in case the property holds. Recently, an IC3style symbolic LTL model checker has been extended with deductive proofs as well [16]. However, these approaches do not prove correctness of the model checking algorithm, but only validate its outcome for each specific use.
Alternatively, one can formalise the model checking algorithm and its correctness proof in an interactive theorem prover. An early example of this approach was the verification of a model checker for the modal µ-calculus in Coq [43]. A framework for verifying sequential depth-first search algorithms was developed in Isabelle [27,28], and applied to the verification of NDFS with partial order reduction [9] as well as a model checker for timed automata [47]. The recent formalisations of Tarjan's SCC algorithm [10] fit in the same line of research. These approaches require to model and verify the algorithm in an interactive theorem prover, allowing one to use the full power of the theorem prover.
If one wishes to verify the code of the algorithm directly, yet another approach is to model the algorithm and its specification in a (semi-)automated program verifier, where the code is enriched with sufficient annotations to prove its correctness. This approach was followed for several standard sequential graph algorithms in Why3 [46] and for sequential NDFS in Dafny [37]. However, there is hardly any work on automated verification of parallel graph algorithms. Raad et al. [38] verified four concurrent graph algorithms in the context of CoLoSL, but the proofs have not been automated. Sergey et al. [42] verified a concurrent spanning tree algorithm, but interactively, through an embedding in Coq.
To support the verification of shared-memory parallel software, program verifiers typically use concurrent separation logic. VeriFast [20] aims at sequential and multi-threaded C and Java programs. VerCors [6] verifies concurrent programs in Java and OpenCL, by applying a correctness-preserving translation into a sequential imperative language, delegating the generation of the verification conditions to Viper [32] and their verification ultimately to Z3 [31].

Contributions and Outline
This paper discusses the mechanical verification of the parallel NDFS algorithm of Laarman et al. [25] using VerCors. To the best of our knowledge, this is the first mechanical verification of a parallel graph (and model checking) algorithm. Section 2 recalls both sequential and parallel NDFS ( §2.1-2.2), and gives preliminaries on concurrency verification with VerCors ( §2.3). It also explains that parallel NDFS uses various colour markings on the input graph to administer the status of the nested searches of workers. Some of these colours are local to a single worker, while other colours are globally shared among all workers. Section 3.1 presents our new (informal) correctness proof of parallel NDFS, that is based on a number of global invariants on the possible colour configurations. The main challenge lies in proving completeness, which is particularly difficult since workers can delegate the detection of accepting cycles to other workers. To be able to mechanise our completeness proof, we contribute a new invariant (Lemma 4) that guarantees the preservation of so-called special paths. This allows to circumvent using the complicated inductive argument used by [25]. Section 3.2 discusses how parallel NDFS is specified in VerCors. In particular, this requires the specification of permissions, to verify data race-free access to shared data structures. Moreover, we encode the colour maps and the transition relation of the input automaton as matrices, which greatly contribute to the feasibility of proof checking. We also explain how atomic updates are specified, which was left implicit in the high-level pseudo code. Similarly, we implement asymmetric termination detection: if one worker finds a counterexample, all workers can terminate immediately; if, on the other hand, all workers have completely finished their exploration, only then may one conclude that the model is correct. Section 3.3 explains the techniques to formalise the full functional correctness proof in VerCors. In particular, this requires the distribution of permissions and invariants over threads and locks, and the introduction of auxiliary ghost state to track the precise progress of the various nested search phases of all workers. Section 4 demonstrates how our verification is reused to verify optimisations to the algorithm. In particular, we check the optimisation "early cycle detection" that, for weak LTL properties, detects all cycles in the outer search instead of the nested inner search. We also propose and verify a repair to the "all-red" extension, by inserting an extra check that was missing in [25]. This extension improves the speedup of parallel NDFS by sharing more global information.
Finally, Section 5 concludes with a perspective on reusing our techniques for verifying other parallel graph algorithms.

Preliminaries
Section 2.1 recalls the standard sequential NDFS algorithm for finding reachable accepting cycles in automata. We verified a parallel version of NDFS, which is introduced in Section 2.2. The verification has been performed with VerCors; Section 2.3 gives prerequisites on concurrency verification and separation logic.
Before discussing the NDFS algorithms, let us first recall the basic definitions of automata and accepting cycles. An automaton G is a quadruple (S, s I , succ, A) consisting of a finite set S of states, an initial state s I ∈ S, a next-state relation succ : S → 2 S and a set A ⊆ S of accepting states. A path in G is a sequence P = s 0 , . . . , s n+1 of S-states so that s i+1 ∈ succ(s i ) for every 0 ≤ i ≤ n. The notation |P | n + 2 denotes the length of P , P [i] s i the ith state on P , and P [i..] the subpath s i , . . . , s n+1 . Any state s is defined to be reachable (in G) if there exists an (s I , s)-path. Any path P is a cycle whenever P [0] = P [|P | − 1] and 1 < |P |. Finally, any cycle P is accepting if P [i] ∈ A for some 0 ≤ i < |P |. state, dfsblue calls the red search on line 7, to report any accepting cycle. This colours a state red after processing its successors recursively on line 16. The pink colour denotes states that are only partially explored by dfsred 5 .

Nested Depth-First Search
It is straightforward to see that NDFS is sound, meaning that it only reports true accepting cycles. To see that NDFS is also complete, i.e., finds an accepting cycle if one exists, observe that dfsred will indeed be started from every accepting state. This in itself is not enough: the red search ignores states marked red in a previous call. It is essential that dfsred explores accepting states in the right order. The crucial insight is that dfsred only visits cyan and blue states and that accepting states coloured blue cannot be part of any accepting cycle.
The correctness of NDFS has been verified with Dafny [37]. We ported this correctness proof to VerCors as the basis for the verification of parallel NDFS.

Parallel Nested Depth-First Search
A naive strategy for parallelising NDFS is swarming [18]: running several instances of NDFS in parallel, each working on a private set of colours. Swarmed NDFS tends to find accepting cycles faster, since its workers are expected to explore different parts of the input graph. The correctness of swarmed NDFS with respect to sequential NDFS is almost immediate, except for termination handling: workers only share information about the exit condition. We also verified swarmed NDFS in VerCors, as a stepping stone for verifying parallel NDFS.
Laarman et al. improve on the swarming algorithm by sharing information of the red search in the backtrack phase. Figure 2 presents the improved algorithm. Here every line of code is supposed to be executed atomically. The entry point is pndfs(s I , n), which spawns n parallel instances of dfsblue(s I , tid ) in the fashion of swarming. However, the red colourings are shared now, by which workers can guarantee that certain states are, or will be, sufficiently explored. So the red states can now be skipped in both the red search (line 19) and the blue search (line 4). PNDFS thus improves performance, since workers prune each other's search space. At the same time this significantly complicates the correctness argument, since workers may now prevent each other from finding accepting 1 void dfsblue(s, tid ) cycles. Moreover, if multiple workers initiated dfsred from the same accepting state s, they must now finish their red search simultaneously for the algorithm to be correct. The await synchroniser on line 23 ensures this, by blocking thread execution until s.count-the number of workers in dfsred(s, ·)-reaches 0.
The original correctness argument of Laarman et al. relies on a complicated inductive invariant stating that not all accepting cycles can be missed due to pruning. However, this invariant is unsuitable for use in a (semi-)automated verifier. Section 3 discusses the verification of pndfs and provides a new invariant on the red colours that allows its correctness to be proven mechanically. It also discusses how our verification handles concurrency and thread synchronisation.

Concurrency Verification with VerCors
Before discussing the actual verification, let us first briefly introduce VerCors, an automated program verifier for parallel programs. VerCors uses concurrent separation logic with permissions as its logical foundation. Its annotation language contains fractional permission predicates of the form Perm(s, π), in the style of Boyland [7], that capture the notion of ownership enforced by separation logic, where s is a shared memory location (e.g., a class field) and π ∈ (0, 1] Q a fractional value. The fractional permissions denote access rights: if π = 1 it denotes write access to s, whereas π < 1 denotes a read access to s. Sometimes Perm(s) is written as shorthand for ∃π : Perm(s, π), to indicate some ownership of s. Soundness of the underlying logic ensures that the total sum of permissions for any shared memory location does not exceed 1, which implies data race freedom.
In addition to ownership predicates, the annotation language supports the * * connective, which is the separating conjunction of separation logic. The assertion P * * Q expresses that the ownerships captured by P and Q are disjoint, e.g., it is disallowed that both express write access to the same shared location. Ownership predicates can be split into disjoint parts and be combined as follows: Perm(s, π 1 + π 2 ) ⇐⇒ Perm(s, π 1 ) * * Perm(s, π 2 ) A standard pattern in concurrency verification is to split and distribute the ownership of all shared memory over threads and locks. Clarifying the latter; in case multiple threads need to write to a common footprint of shared memory, the ownerships to this footprint are typically protected by a resource invariant. Threads can then only use the resources protected by this invariant when they execute atomic instructions (i.e., when no other threads can interfere). For more details we refer to the standard papers on concurrent separation logic [34,8,44].

Automated Verification of Parallel NDFS
This section elaborates on the verification of pndfs with VerCors [35]. Section 3.1 presents and discusses our new correctness argument for pndfs, which includes the new invariant on the red colours and a proof of its correctness. Sections 3.2 and 3.3 discuss the mechanisation of this proof in VerCors.

Correctness of pndfs
The soundness proof of pndfs is not very different from the soundness argument of sequential NDFS: every time report cycle is executed, a witness cycle can be found. The main challenge lies in proving completeness, i.e., proving that if there exists any accepting cycle, pndfs will report it. This is difficult since workers can obstruct each other's red searches and thereby prevent the detection of accepting cycles. This section proposes a new key invariant and completeness proof that is suitable for deductive verification.
We start by introducing a number of low-level invariants on the local configurations of colours that can arise during a run of pndfs. Let Cyan tid be the set of cyan-coloured states {s ∈ S | s.color [tid ] = cyan} private to worker tid , and likewise for White tid , Blue tid and Pink tid . Moreover, let Red be the set of globally red states, and succ(X) ∪ s∈X succ(s) the successor set of a given set X ⊆ S. Lemma 1. pndfs maintains the following global invariants during execution: Proof. The proof basically checks their preservation by each line of the program. Invariants 1.1 -1.5 are reused from [25], whereas 1.6 is new and needed for the new completeness proof. Proving completeness amounts to proving that not all reachable accepting cycles can be missed due to search space pruning. To help proving this, we identify a new class of paths, which we call tid-special paths.
Definition 1 (Special path). Any path P = s 0 , . . . , s n+1 is defined to be tidspecial if s 0 ∈ Pink tid , s n+1 ∈ Cyan tid , and none of the states on P are red, i.e., s k ∈ Red for every k such that 0 ≤ k ≤ n + 1.
Any path P is special if P is tid -special for some worker tid . Intuitively, the existence of a tid -special path during execution of pndfs means that (i ) worker tid is doing a red search, since it has pink states, and (ii ) this worker will eventually find an accepting cycle, unless other workers obstruct this path. Thus the above definition allows to formally define obstruction: a worker tid is obstructed (will miss an accepting cycle) if any state on a tid -special path is coloured red.
Our main strategy for proving completeness involves showing that every time a worker gets obstructed, a new special path can be found. A direct consequence of this is that not all accepting cycles can be missed: upon termination of pndfs, there are no more cyan or pink states. To help prove this, we use the following property (taken from [25], but rephrased to handle our special paths), that allows to find special paths by using the colouring invariants.
Proof. The original handwritten proof from [25] shows that this lemma follows from invariants 1.1 -1.6 , by induction on P .
The original completeness proof of [25] performs induction on the number of obstructed accepting cycles, to show the absence of such cycles upon termination as a result of Lemma 2. However, such an argument is out of reach for Hoare-style reasoning, since it is not an inductive invariant. We propose a new invariant that is inductive, which builds on the insight that, under certain colouring conditions, new special paths can always be found when workers get obstructed, as is shown by Lemma 3. In particular, pndfs guarantees that if there exists a special path before executing line 24, then there also exists a special path after its execution.
Lemma 3. For any non-red state r ∈ S \ Red that is on a tid -special path, if: i. r ∈ A =⇒ succ(r) ⊆ Red , and ii. r ∈ A ∩ Pink tid =⇒ Pink tid = {r}, then there still exists a special path after adding r to Red .
Proof. Let P = s 0 . . . s n+1 be a tid-special path and assume that r is on P , so that r = s for some such that 0 ≤ ≤ n + 1. Since Pink tid = ∅, worker tid is performing dfsred that was started from some accepting state a ∈ A ∩ Pink tid . Then a = r, as otherwise s 0 = a due to ii., which by i. would contradict that P is special. Moreover, since s n+1 ∈ Cyan tid there exists a (s n+1 , a)-path Q (this is a standard property of dfsblue; the path Q must be on the recursive call stack). Then Lemma 2 applies on the path s , . . . , s n+1 , Q[1..] and gives a new special path when considering Red ∪ {r} as the new set of red states.
Lemma 3 implies that every time an accepting cycle is missed due to pruning, there is always another accepting cycle that will eventually be reported. This is enough to establish completeness of pndfs, via the following key invariant. Proof. The interesting case is showing that this invariant remains preserved after making a non-red state s ∈ Pink tid \ Red red (on line 24 of Fig. 2), by some worker tid that is doing a red search from some accepting state a ∈ A ∩ Pink tid .
-Suppose s ∈ A. If s is on a special path, then Invariant 4.2 is reestablished due to Lemma  The next theorem shows how Lemma 4 allows deriving completeness of parallel NDFS. In particular, it shows that no accepting cycles can exist when all threads have terminated, in which case all the theorem's premises are fulfilled. Theorem 1. If for every worker tid it holds that Pink tid = ∅, Cyan tid = ∅ and s I ∈ Blue tid , then there does not exist a reachable accepting cycle.
Proof. Towards a contradiction, suppose that there exists an accepting cycle P that is reachable via an (s I , P [0])-path Q. Due to the theorem's premises no special paths can exist, and therefore by Lemma 4 there is an accepting state on P that is not red. Without loss of generality, assume that ( †) P [0] ∈ A \ Red . Since Q[0] ∈ Blue 0 (since there is at least one worker), by induction on Q together with Lemma 1 we have that P [0] ∈ Red , which contradicts ( †).
All the above invariants and proof steps have been encoded in VerCors, which was highly non-trivial. While mechanising the proofs, many implicit proof steps had to be made explicit. Section 3.3 further details the proof mechanisation.

Encoding of pndfs in VerCors
Graph structures are notoriously difficult to handle in separation logics, as they usually rely on pointer aliasing, which complicates ownership handling and prevents easy use of the frame rule [38]. However, since automata have a fixed and finite set of states, we can overcome this limitation by representing the input automata as an |S| × |S| adjacency matrix. This does not impose serious restrictions: other automata encodings can be transformed at the specification level to an adjacency matrix, e.g., via model fields in the style of JML [11,29]. The suitability of adjacency matrices for deductive verification is confirmed by [24].    Atomic operations. The handwritten correctness argument of [25] for Figure 2 assumes that all program lines are executed atomically. This is reflected in the VerCors encoding: all updates to shared memory are made within atomic operations, which specification-wise all give access to the same shared resources. For example, the assignment s.pink [tid ] := true on line 15 (Fig. 2) is implemented as the atomic operation "atomic { pink [tid ][s] := true }". On the specification level, the atomic sub-program receives all the missing access rights required for the assignment, which are otherwise protected by the resource invariant declared on line 11 (Fig. 3). The exact definition of resource invariant is deferred to §3.3, and the type resource is the type of separation logic assertions. Moreover, the await instruction on line 23 (Fig. 2) is implemented as a busy while-loop that only stops when s.count = 0, which is checked atomically in every iteration.
Termination handling. The pseudocode in Figure 2 uses an "exit all" command to terminate all threads when an accepting cycle has been found. However, this mechanism was left implicit. Our formalisation in VerCors makes the termination system explicit: it consists primarily of a global abort flag that is declared on line 9 in Figure 3. All workers regularly poll this flag to determine whether they continue or not. The abort flag is set to true by the main thread-the thread that started pndfs and spawned all worker threads on line 11 of Fig. 2-as soon as one of the workers returns with an accepting cycle.

Verification of pndfs in VerCors
One major challenge of concurrency verification is finding a proper distribution of shared-memory ownership, that allows proving memory safety as well as any functional properties of interest. This section starts by discussing how we distribute the ownership of the input automaton over threads and the resource invariant, in such a way that Invariants 1.1 -1.6 and 4.1 -4.2 can be encoded.
To prove the preservation of these invariants after every computation step, auxiliary bookkeeping is needed on the specification level. For example, to mechanise the proof of Lemmas 3 and 4 we need to make explicit that all workers tid with Pink tid = ∅ are doing a red search that was started from some root state a ∈ A ∩ Pink tid . This auxiliary bookkeeping is maintained in the resource invariant, via auxiliary ghost state, which is explained later. Finally, we give the fully annotated version of pndfs and explain how completeness is proven from Lemma 4, by applying the VerCors encoding of Theorem 1.
Ownership distribution. We start by explaining how the ownership of the automaton encoding (lines 2-8 in Fig. 3) is distributed among workers and the resource invariant. First observe that all colouring invariants express global properties that span over (i ) the shared red colourings, as well as (ii ) the local configurations color [tid ] and pink [tid ] of every worker tid . To define the ownership distribution for (i ), observe that the only way to distribute the access rights to red to enable all threads to regain write access, is to let the resource invariant protect full ownership of red . The resource invariant therefore fully captures the properties about red states expressed in Lemmas 1 and 4. However, to be able to specify that, it also requires partial ownership of all thread-local colourings. Figure 4 presents the full resource invariant, that includes: access rights to both global and thread-local colourings on lines 2-4; the encoding of Lemma 1 on lines 10-17 and 22; and the encoding of Lemma 4 on lines 30-32. In addition, the resource invariant holds partial ownership of the abort flag on line 8, to ensure that global termination is only announced when an accepting cycle is found.    Observe that the resource invariant holds a lot of quantified information. As a result, we experienced that proving the reestablishment of resource invariant after finishing atomics is expensive performance-wise. To make verification more efficient, we extracted all atomic operations (e.g., colour updates) into separate methods and prove their contracts in a function-modular way. This improves performance, as it cuts the problem of verifying dfsred and dfsblue into smaller sub-problems that are individually more manageable for the SMT solver.
Finally, Figure 5 presents an excerpt of the contract of dfsblue, which shows the ownership pattern of all threads. Notably, every thread tid receives the remaining ownership of color [tid ] and pink [tid ] on line 4. Thus threads can always read from their thread-local colour fields, and may write to them while doing so 1 context Perm(N ) * * Perm(nthreads) * * Perm(G) * * Perm(acc); 2 context 0 ≤ s < N ; 3 context 0 ≤ tid < nthreads; 4   Auxiliary ghost state. As mentioned earlier, to prove that pndfs also preserves the (encodings of) Invariants 1.1 -1.6 and 4.1 -4.2 after every computation step, additional ghost state needs to be maintained. In particular, we need to make explicit that every worker tid with Pink tid = ∅ is doing a dfsred search that was started from some root state a ∈ A ∩ Pink tid . In addition, the proof of Lemma 3 needs that there exists an (s, a)-path for every s ∈ Cyan tid . To prove the preservation of Lemma 4 we also need that, if worker tid is not yet executing the await instruction, we have that a ∈ Red , and otherwise that Pink tid = {a}.
This extra information is encoded in the loop invariant on lines 20-27 (Figure 4), via three ghost arrays, named exploringred , redroot and waiting. Firstly, exploringred administrates which workers are doing a red search. For verification purposes we added ghost code to the program, to set exploringred [tid ] to true whenever dfsred(a, tid ) is invoked by worker tid from a blue search, and back to false whenever dfsred(a, tid ) returns. Secondly, redroot stores the root state on which dfsred was invoked. Finally, waiting administrates which workers are executing an await instruction. These three ghost arrays together are closely related to the s.count fields in the program of Figure 2, via the following invariant: Establishing that pndfs adheres to the invariants in Lemmas 1 and 4 was highly non-trivial and required various complex auxiliary lemmas to be encoded and proven. These are all encoded in VerCors as ghost methods: side-effect-free helper methods on which the lemma is encoded in the method's contract [21,22]. Induction proofs, for example, are encoded using either loop invariants or recursion. Application of a lemma then translates to a function call on the specification level. The proofs in Section 3.1 are all encoded and applied in this way. 1 context Perm(N ) * * Perm(nthreads) * * Perm(G) * * Perm(acc) * * Perm(abort, 1 2 ExPath(sI , a, 1) ∧ ExPath(a, a, 2)); 8 ensures (∃a : acc[a] ∧ ExPath(sI , a, 1) ∧ ExPath(a, a, 2)) ⇒ \result; 9 bool pndfs(sI ) 10 par tid = 0 to nthreads 11 context Perm(N ) * * Perm(nthreads) * * Perm(G) * * Perm ( Fig. 6: The annotated version of pndfs, extending the excerpt given in Figure 3. Correctness of pndfs. Figure 6 gives the annotated version of pndfs 6 that extends the excerpt given earlier, in lines 23-25 of Figure 3. The main thread requires partial ownership of all thread-local colour fields on line 2 and distributes these over the appropriate threads on line 12. The contract associated to the parallel block (lines 11-17) is called an iteration contract and assigns pre-and postconditions to every parallel instance. For more details on iteration contracts we refer to [5]. Most importantly, the iteration contract of each thread holds enough resources to satisfy all the preconditions of dfsblue, on line 19. Soundness of pndfs (line 7) is proven as follows. Suppose that all threads have terminated and abort has been set to true. In that case, the resource invariant states that an accepting cycle has been found. This information can be retrieved by briefly obtaining the resource invariant in ghost code on line 22, which directly allows to deduce soundness. Note that this information is not lost upon releasing the resource invariant, as it is a Boolean property and thus duplicable.
To prove completeness, suppose that abort is still false when all workers have terminated. This implies that Pink tid = ∅ and Cyan tid = ∅ for every worker tid search from s I . Combining this information with the information in the resource invariant allows one to prove all the premises of Theorem 1. Therefore its ghost method encoding can be applied on line 22, from which completeness is derived. The encoding of parallel NDFS in VerCors [35] comprises roughly 2500 lines of code (of which ∼85% is proof overhead), which includes the mechanisation of all proof steps described in §3.1. The verification time is about 140s, measured on a Macbook with an Intel Core i5 CPU with 2,9 GHz, and 8Gb memory.

Optimisations
One major benefit of mechanically verified code is that optimisations can be applied with full confidence. Without verification, changes to critical code are often avoided, to ensure that no errors are introduced. A verified algorithm allows to apply optimisations easily, as these often do not change the outer contract, at most requiring only minor adaptions to the invariants. We illustrate this with two optimisations, for which [25] experimentally demonstrated improved speedup.
"Early cycle detection" checks already in the blue search if an accepting cycle is closed, cf. lines 4-6 in Figure 7a. It is known that for weak LTL properties, all accepting cycles will be found in the blue search when applying early cycle detection. To show that this optimisation indeed preserves all invariants, we simply inserted these 3 lines in the VerCors specification. The proof introduces a case distinction on whether s or t is accepting and constructs a witness path. This adds another 10 lines: two for the case distinction and four in each branch to show that a witness accepting cycle exists. Collectively, these extra 13 lines constitute indeed very little effort to prove this particular optimisation correct.
The second optimisation, called "all-red", checks if all successors of s became red during the blue search (lines 2 and 7 in Figure 7b). If so, we can mark s.red early (lines [8][9][10]. This optimisation is important, since it allows the global red colour to spread even in portions of the graph that are not under an accepting state, thereby allowing more pruning. However, this optimisation only preserves the invariants if we wait until s.count = 0 (on line 9). This test was erroneously omitted in [25] 7 . Fortunately, the version in Figure 7b is correct, which has now been checked in VerCors in a straightforward manner.

Conclusion
This paper presents the first automated deductive verification of a parallel graph algorithm: we verified soundness and completeness of parallel nested depth-first search using VerCors. We also show that this mechanisation is helpful in quickly discovering whether optimisations of the algorithm preserve its correctness.
Many of the presented verification techniques, e.g., the use of separate contracts for single statements, the way we handle termination, and the construction of explicit witnesses through ghost variables, will be useful for the verification of other similar algorithms. Moreover, our encoding of parallel nested DFS closely resembles the implementation of such an algorithm in mainstream programming languages like C++ and Java. It would be interesting to investigate how our VerCors encoding can be automatically deployed on multi-core architectures, for example to enable comparing its performance and scalability with LTSmin.
There are many possibilities to extend the line of research on the verification of parallel model checking algorithms initiated in this paper. First, one may consider to extend the scope of this verification closer towards the actual efficient C-implementation in LTSmin. This would involve verifying the underlying concurrent hash table to store visited states (a simplified version of which has been verified before with VerCors [1]), the encoding of the colours as "bits" in the hash table buckets, and the use of CAS to manipulate these bits.
One might consider alternative parallel NDFS versions, notably [15], which shares the blue colour, invoking a repair procedure when the depth-first order is violated. Both algorithms have been reconciled in [14], sharing both blue and red. This work could be extended to a wealth of other optimisations like partial-order reduction, or other parallel model checking algorithms, for example [26,4,40].
Our work can be considered as a first step towards a library for the verification of graph-based (multi-core) model checking algorithms. It will be an interesting line of future work to continue this: developing a full-fledged verification library for common subtasks, like graph manipulations and termination detection.