Improving parity games in practice

Parity games are infinite-round two-player games played on directed graphs whose nodes are labeled with priorities. The winner of a play is determined by the smallest priority (even or odd) that is encountered infinitely often along the play. In the last two decades, several algorithms for solving parity games have been proposed and implemented in PGSolver , a platform written in OCaml. PGSolver includes the Zielonka’s recursive algorithm ( RE , for short) which is known to be the best performing one over random games. Notably, several attempts have been carried out with the aim of improving the performance of RE in PGSolver , but with small advances in practice. In this work, we deeply revisit the implementation of RE by dealing with the use of specific data structures and programming languages such as Scala , Java , C++ , and Go . Our empirical evaluation shows that these choices are successful, gaining up to three orders of magnitude in running time over the classic version of the algorithm implemented in PGSolver .


Introduction
Parity games [22,52] are abstract infinite-round games that represent a powerful mathematical framework to address fundamental questions in computer science. They are intimately related to other infinite-round games, such as mean and discounted payoff, stochastic, and multi-agent games [2,9,[12][13][14][15]38].
In the basic setting, parity games are two-player, turn-based, played on directed graphs whose nodes are labeled with priorities (also called colors) and players have perfect information about the adversary moves. The two players, Player 0 and Player 1, take turns moving a token along the edges of the graph starting from a designated initial node. Thus, a play induces an infinite path and Player 0 wins the play if the smallest priority visited infinitely often is even; otherwise, Player 1 wins the play. The problem of deciding whether Player 1 has a winning strategy (i.e., can induce a winning play) in a given parity game is known to be in UPTIME ∩ CoUPTIME [29]; whether a polynomial time solution exists is a long-standing open question [51].
Several algorithms for solving parity games have been proposed in the last two decades, aiming to tighten the known complexity bounds for the problem, as well as come out with solutions that work well in practice. Among the latter, we recall the recursive algorithm (RE) proposed by Zielonka [52], the Jurdziński's small-progress measures algorithm [30] (SPM), the strategy-improvement algorithm by Jurdziński et al. [50], the (sub-exponential) algorithm by Jurdzińki et al. [32], the big-step algorithm by Schewe [44], the APT algorithm by Di Stasio et al. [20,21]. Table 1 summarizes the classic algorithms along with their complexity, where n, e, and c denote the number of nodes, edges, and priorities, respectively.
Recently, Calude et al. [11] have given a major breakthrough providing a quasipolynomial time algorithm for solving parity games that runs in time O(n log(c)+6 ). Previously, the best known algorithm for parity games was DD which could solve parity games in O(n √ n ), so this new result represents a significant advance in the understanding of parity games. This new approach is based on providing a compact witness that can be used to decide whether Player 0 wins a play. Traditionally, one must store the entire history of a play, so that when the players construct a cycle, we can easily find the largest priority on that cycle. The key observation in [11] is that a witness of poly-logarithmic size can be used instead. This allows to simulate a parity game on an alternating Turing machine that uses poly-logarithmic space, which leads to a deterministic algorithm that uses quasipolynomial time and space. This new result has already inspired follow-up works [10,18,23,31,37]. However, benchmarks in the literature have demonstrated that both on random games and real examples the quasi-polynomial is not the best performing one.
In formal system design [16,17,35,43], parity games arise as a natural evaluation machinery for the automatic synthesis and verification of distributed and reactive systems [1,36,46], as they allow to express liveness and safety properties in a very elegant and powerful way [40]. Specifically, the model-checking question, in case the specification is given as a Table 1 Parity algorithms along with their computational complexities Algorithm Computational complexity Recursive (RE) [52] O(e · n c ) Small Progress Measures (SPM) [30] O(c · e · ( n c ) c 2 ) Strategy Improvement (SI) [50] O(2 e · n · e) μ-calculus formula [34], can be rephrased, in linear-time, as a parity game [22]. So, a parity game solver can be used as a model checker for a μ-calculus specification (and vice-versa), as well as for fragments such as CTL, CTL , and the like. All algorithms mentioned in Table 1 (and several others) have been implemented in PGSolver, written in OCaml by Oliver Friedman and Martin Lange [24,25], a collection of tools to solve, benchmark and generate parity games, and extensively investigated experimentally. Noteworthy, PGSolver has allowed to declare RE as the best performing to solve parity games in practice, as well as explore some optimizations such as decomposition into strongly connect components, removal of self-cycles on nodes, and priority compression [3,30].
Despite the enormous interest in finding efficient algorithms for solving parity games, less emphasis has been put on how to improve their running time choosing efficient data structures or different programming languages for their implementation. Mainly, the scientific community has relied on OCaml as the best performing programming language to be used in this setting and PGSolver as an optimal platform to solve parity games. However, starting from graphs with a few thousand of nodes, even using RE, PGSolver would require minutes to solve the given game, especially on dense graphs. Therefore, a natural question that arises is whether there exists a way to improve the running time of PGSolver. Focusing the attention on RE, we identify two research directions to work on, which specifically involve: the way it is implemented, and the chosen programming language. As a result we introduce a slightly improved version of the RE along with an optimized implementation in OCaml (and then in PGSolver) and Scala programming languages. Scala [41,42] is a high-level language, proven to be well performing [28], with object and functional oriented features, that recently has come to the fore with useful applications in several fields of computer science including formal verification [5]. The choice to investigate first on Scala, among all programming languages, is substantiated as it shares several modern programming language aspects. Among the others, Scala carries functional and object-oriented features, compiles its programs for the JVM, is interoperable with Java and a high-level language with a concise and clear syntax, and the results we obtain strongly support our choice and allow to declare Scala as a first winner over OCaml, in terms of performance.
In more details, our investigation starts by looking at the main steps of RE and how they are implemented in PGSolver. Overall, we observe that RE requires to decompose the game into multiple smaller games, which is done by computing, in every recursive call, the difference between the game and a given set of nodes. This operation turns out to be quite expensive as it requires to generate a new game at each iteration as well as building the transpose of the game graph. In order to reduce the running-time complexity caused by these graph operations, we exploit a new implementation by introducing a requirement for immutability of the game ensuring that every recursive call uses the game without applying any modification to its nodes. Therefore, to construct the sub-games in the recursive calls, we keep track efficiently of each node that is going to be removed from the graph. Therefore, this improved version guarantees that the original game remains immutable tracking the removed nodes in every subsequent call and checking, in constant time, whether a node needs to be excluded or not. Finally, we also improve other parts of the algorithm, such as finding the maximal priority and obtaining all nodes with a given priority.
In our analysis, we first consider and compare four implementations of RE. The Classic (C) and Improved (I) Recursive (R) algorithms implemented in Scala (S) and OCaml (O). By means of benchmarks, we show that IRO gains an order of magnitude against CRO, as well as CRS against CRO. Remarkably, we show that these improvements are cumulative by proving that IRS gains two order of magnitude against CRO.
The proposed improvements turn out to be very successful in practice and the benchmarks we have run show this evidence. But, to have a complete overview we continue our investigation by also exploring these improvements in different programming languages such as Java, C++, and Go, and comparing their performance among them, as well as with Scala. The tool and more details can be found at https://github.com/vincepri/SPGSolver.
The combination of the improvements described above along with the use of these modern programming languages allows us to considerably gain in running time. We evaluated our implementations over both randomly generated games and real world benchmarks such as model checking problems. Our main finding is that C++ is the best performing one in both cases. Precisely, the experiments ran over random games show that the C++ implementation is three orders of magnitude faster than the classic version of the Zielonka's algorithm implemented in PGSolver, and with respect the improved version of RE, C++ is faster than Scala and Go by a factor of 4 and 1.5, respectively. They also report that Java and Scala have a very similar behavior. This trend is followed in the real world benchmarks, though they indicate more nuanced relationship between the programming languages. Overall, we have that Go running time, besides to be of the same order, tends to be closer of the C++ one. We take this as an interesting aspect to take into account. Indeed, although Go is a young programming language, it provides a high performance, super efficient concurrency handling like Java and fun to code like Python/Perl. Therefore, our results suggests the need to continue the investigation in evaluating the implementation of algorithms for solving parity games with the use of modern programming languages as Go.
Related works Several efforts in speeding up the solving process was previously attempted.
In [27], a GPU (Graphics Processing Units) is used to solve parity games. A GPU can excel at problems that can be easily split into a large number of parallels tasks. A modern GPU consists of several multi-processors which act independently of each other. Hoffmann and Luttenberger [27] proposes a GPU-enabled implementation for solving parity games such as SPM, RE and a variant of SI. In particular, for storing the graph and the node information multiple arrays are used, and before starting the actual solving process the implementation makes use of multiple preprocessing steps. Another approach to this problem was investigated in [4] where it is implemented, on a multi-core architecture, a parallel version of the attractor algorithm, that is the main kernel of RE, implementing the parallel algorithm using Java and comparing it with its sequential implementation in the same language, and the widely used one in PGSolver. Verver [49] provides an improved version of RE by recording the number of remaining "escaping" edges for each vertex during the attractor computation. This accomplished by simply using an extra integer per node. This work, based on the preliminary results in [19], has been the first to exploit the use of efficient coding and new programming languages to improve the use of RE in practice. Noteworthy, it has inspired the development of new and efficient tools such as Oink by Tom van Dijk [48]. Oink contains an improved version of RE, SPM, and SI, the quasi-polynomial algorithms, APT, the priority promotion [7], and the tangle algorithm [47]. It also provides a multi-core implementation of SPM and RE, inspired by the results in [4].
Outline The sequel of the paper is structured as follows. In Section 1, we give some preliminary concepts about parity games. In Section 2, we describe RE and how it is implemented in PGSolver. In Section 3, we introduce our improved version of RE, and provide its implementation in OCaml. In Section 4, we describe the Scala implementation of RE. In Section 5, we analyze the benchmarks of RE and the improved version of RE in OCaml and Scala. In Section 6, we explore the improved version of RE along different programming languages, and we analyze their performance by means of benchmarks.

Preliminaries
In this section, we briefly recall some basic concepts regarding parity games. As notation, the positive integers are denoted by N, and N 0 = N ∪{0}. A Parity Game (PG, for short) is a tuple G P 0 , P 1 , Mv, p , where P 0 and P 1 are two finite disjoint sets of nodes for Player 0 and Player 1, respectively, with P = P 0 ∪ P 1 , Mv ⊆ P × P, is the left-total binary relation of moves, and p : P → N 0 is the priority function. Each player moves a token along nodes by means of the relation Mv. By Mv(q) {q ∈ P : (q, q ) ∈ Mv} we denote the set of nodes to which the token can be moved, starting from node q.
Given a node q, Player 0 and a Player 1 strategies str 0 and str 1 , the play of these two strategies, denoted by play(q, str 0 , str 1 ), is the only play π in the game that starts in q and agrees with both Player 0 and Player 1 strategies, i.e., for all i ∈ N, if π i ∈ P 0 , then π i+1 = str 0 (π i ), and π i+1 = str 1 (π i ), otherwise.
A node q is winning for Player 0 (resp., Player 1) if Player 0 (resp., Player 1) wins the game from q. By Win 0 (G) (resp., Win 1 (G)) we denote the set of winning nodes in G for Player 0 (resp., Player 1). Parity games enjoy determinacy, meaning that, for every node q, either q ∈ Win 0 (G) or q ∈ Win 1 (G) [22]. Moreover, it can be proved that, if Player 0 (resp., Player 1) has a winning strategy from node q, then it has a memoryless winning strategy from the same node [52].
Attractor We now define the notion of attractor, core of the Zielonka's algorithm. Intuitively, given a set of nodes F ⊆ P, the i-attractor of F for a Player i ∈ {0, 1}, indicated by Attr i (G, F ), represents those nodes that i can force the play toward. That is, Player i can force any play to behave in a certain way, even though this does not mean that Player i wins the game. Formally, for all k ∈ N 0 : Then, we have that Subgames Let A be an arbitrary attractor set. The subgame G \ A is the game restricted to the nodes P \ A, i.e., It is worth observing that the totality of G \ A is ensured from A being an attractor.

Zielonka's recursive algorithm
In this section, we describe the recursive algorithm by Zielonka using the basic concepts introduced in the previous sections and some observations regarding its implementation in PGSolver.

Algorithm 1 Zielonka's recursive algorithm (RE).
1: procedure SOLVE(G) 2: if (P == ∅) then return (∅, ∅) 3: 10: if W j == ∅ then 11: 16: The recursive algorithm (RE, for short), reported in Algorithm 1, is one of the first exponential-time algorithm for solving parity games. It is based on the work of McNaughton [39] and it was explicitly presented as a solver for parity games by Zielonka [52]. The algorithm makes use of a divide and conquer technique and its core subroutine is the attractor described in Section 2.
At each step, the algorithm removes all nodes with the highest priority d, denoted by U , together with all nodes Player i = d mod 2 can attract to them, denoted by A, and recursively computes the winning sets (W 0 , W 1 ) for Player 0 and Player 1, respectively, on the remaining subgame G \ A.
At this point, there are two cases to be considered. First, if Player i wins G \ A, then he also wins the whole game G. Indeed, whenever Player 1 − i decides to visit A, Player i's winning strategy would be to reach U . Then, every play that visits A infinitely often has d as the highest priority occurring infinitely often, or otherwise it stays eventually in G \ A, and hence is won by i.
Second, if Player i does not win the whole subgame G \ A, i.e., W 1−i is non empty, then Player 1 − i wins on a subset W 1−i in G \ A. And, since Player i cannot force Player 1 − i to leave W 1−i , we have that Player 1−i also wins on W 1−i in the game G. Hence, the algorithm computes the attractor B for Player 1 − i of W 1−i and recursively solves the subgame G \ B.

The implementation of RE in PGSolver
PGSolver turns out to have a very limited application in several real scenarios. In more details, even using RE (that has been shown to be the best performing in practice), PGSolver would require minutes to decide games with few thousands of nodes, especially on dense graphs. In this work we deeply study all main aspects that cause such a bad performance.
Specifically, our investigation begins with the way in which RE has been implemented in PGSolver by means of the OCaml programming language. We start observing that the graph data structure in this framework is represented as a fixed length Array of tuples.
Every tuple has all information that a node needs, such as the player, the assigned priority and the adjacency list. Before every recursive call is performed, the program computes the difference between the graph and the attractor, as well as it builds the transposed graph. In addition the attractor function makes use of a TreeSet data structure that is not available in the OCaml's standard library, but it is imported from TCSlib, a multi-purpose library for OCaml written by Oliver Friedmann and Martin Lange. Such library implements this data structure using AVL-Trees that guarantees logarithmic search, insert, and removal. Also, the same function computes the number of successors for the opponent player in every iteration when looping through every node in the attractor.

An improved implementation of RE
All the observations given above lead to introduce an improved version of RE (IRE, for short), we report in Algorithm 2. In Algorithm 3 we report an improved version of the attractor function that the new algorithm makes use of. 12: if W j == ∅ then 13: else: 16: 18: Let G be a graph. Removing a node from G and building the transposed graph takes time Θ(|V | + |E|). Thus, dealing with dense graph this operation takes Θ(|V | 2 ). In order to reduce the running time complexity caused by these graph operations, we introduce an immutability requirement to the graph G ensuring that every recursive call uses G without applying any modification to the state space of the graph. Therefore, to construct the subgames, in the recursive calls, we keep track of each node that is going to be removed from the graph, adding all of them to a set called Removed.
The improved algorithm is capable of checking whether a given node is excluded or not in constant time as well as it completely removes the need for a new graph in every recursive call. At first glance this may seem a small improvement with respect to RE. However, it turns out to be very successful in practice as proved in the following benchmark section. Further evidences that boost the importance of such improvement can be related to the fact that the difference operation has somehow the same compliance of complementing automata [45]. Using our approach is like avoiding such complementation by adding constant information to the states, i.e. a flag (removed, ¬removed). Last but not least, about the actual implementation, it is also worth mentioning that general-purpose memory allocators are very expensive as the pre-operation cost floats around one hundred processor cycles [26]. Many efforts have been made over the years to improve memory allocation implementing custom allocators from scratch, a process known to be difficult and error prone [8].

Implementation in OCaml for PGSolver
Our implementation of IRE in OCaml, listed in Algorithm 4, does not directly modify the graph data structure (that is represented in PGSolver as an array of tuples), but rather it uses a set to keep track of removed nodes.
IRE takes three parameters: a parity game, the transpose of the game graph, and a set of excluded nodes. Our improved attractor uses a HashMap, called tmpMap to keep track of the number of successors for the opponent player's nodes. In addition, we use a Queue, from OCaml's standard library, to loop over the nodes in the attractor. Aiming at performance optimizations, the attractor function implemented in PGSolver also returns the set of excluded nodes.

Scala implementation
In this section, we give an implementation of IRE in the Scala programming language, starting with a brief introduction to it. Scala [41,42] is the programming language designed by Martin Odersky, the codesigner of Java Generics and main author of javac compiler. Scala defines itself as a scalable language, statically typed, a fusion of an object-oriented language and a functional one. It runs on the Java Virtual Machine (JVM) and supports every existing Java library. Scala is a purely object-oriented language in which, like Java and Smalltalk, every value is an object and every operation is a method call. In addition, Scala is a functional language where every function is a first class object, and it is equipped with efficient immutable and mutable data structures with a strong selling point given by Java interoperability. However, it is not a purely functional language as objects may change their states and functions may have side effects. The functional aspects are perfectly integrated with the object-oriented features. The combination of both styles makes possible to express new kinds of patterns and abstractions. All these features make Scala programming language as a clever choice to solve these tasks, in a strict comparison with other programming languages available such as C, C++ or Java.
In [28], researchers by Google show that Scala, even being an high level language, performs just 2.5x slower than C++ machine optimized code. In particular, Scala was shown to be faster than Java. As the paper notes: "While the benchmark itself is simple and compact, it employs many language features, in particular high level data structures, a few algorithms, iterations over collection types, some object oriented features and interesting memory allocation patterns".

Implementation in Scala of IRE
In this section we introduce our implementation of the IRE in Scala, reported in Algorithm 5 and 6.
Aiming at performance optimizations we use a priority HashMap where every key is a priority, and value is a set of nodes having key as priority, i.e., priority(v) = key for every v beloging to value. Moreover, we use the data structures HashMaps and ArrayLists contained in the open source library Trove. We rely on Scala's internal features and standard library making heavy use of the dynamic ArrayBuffer data structure. In order to store the arena we use an array of Node objects. The Node class contains: a list of adjacent nodes, a list of incident nodes, its priority and the player. ArrayBuffer also implements a factory method called "− − (set : ArrayBuff er[I nt])" that takes an ArrayBuffer of integers as input, flags all the nodes in the array as excluded, and returns the reference to the new graph. In addition, there is also a method called max priority() that will return the maximal priority in the graph and the set of nodes with that priority. Our attractor implementation in Scala makes use of an array of integers named tmpMap that is pre-allocated using the number of nodes in the graph with a negative integer as default value; we use tmpMap when looping through every node in the set A given as parameter to keep track of the number of successors for the opponent player. We add a node v ∈ V to the attractor set when its counter (stored in tmpMap [v]) reaches 0 (adj (v) ⊆ A and v ∈ V opponent ) or if v ∈ V player ; using an array of integers, or an HashMap guarantees a constant time check if a node was already visited and ensures that the count for the opponent's node adjacency list takes place one time only.

Experimental evaluations: new implementations in OCaml and Scala
In this section we study, analyze and evaluate the running time of the following implementations: Classic Recursive in OCaml (CRO), Classic Recursive in Scala (CRS), Improved Recursive in OCaml (IRO) and Improved Recursive in Scala (IRS). We have run our experiments on multiple instances of random parity games. Note that IRS and CRS do not apply any optimization steps to the arena before solving, while the OCaml implementations run those optimizations. However, to show the effectiveness of Scala implementations we keep them enable. All tests have been run on an Intel(R) Xeon(R) CPU E5620 @ 2.40 GHz, with 16GB of Ram (with no Swap available) running Ubuntu 14.04. 1 Precisely, we have used 100 random arenas generated using PGSolver of each of the following types, given N = i × 1000 with i integer and 1 ≤ i ≤ 10 and a timeout set at 600 s.
In the following, we report six tables in which we show the running time of all experiments under fixed parameters. Throughout this section we define abort[T ] when the program has been aborted due to excessive time and abort[M] when the program has been killed by the Operating System due to memory consumption. In Fig. 2 we also report the trends of the four implementations using a logarithmic scale with respect to seconds. This figure is based on the averages of all results reported in the tables below.

Trends analysis for random games
The speedup obtained by our implementation of IRE is in most cases quite noticeable. Figure 3 shows the running time trend for RE and IRE in both OCaml and Scala based on the results of the previous benchmarks. The seconds, showed on the Y -Axis, are limited to [0, 100], while on the X-Axis we report the number of nodes. As a result we show that even with all preprocessing steps enabled in PGSolver, IRS is capable of gaining two orders of magnitude in running time.

Trends analysis for special games
Here, we compare the performance of CRO and IRS over non-random games generated by PGSolver such as clique games, ladder games, model checker ladder games, and Jurdzinski games. These experiments have been run disabling all optimizations in PGSolver since IRS does not apply such optimizations. Clique[n] games are fully connected games without self-loops, where n is the number of nodes. The set of nodes is partitioned into V 0 and V 1 having the same size. For all v ∈ V p , priority(v) % 2 = p. For our experiments we set n = 2 k where 8 ≤ k ≤ 14. Table 2 reports the running time for our experiments and these results are drawn in Fig. 4.
In Ladder[n] game, every node in V 0 has priority 2 and every node in V 1 has priority 1. In addition, each node v ∈ V has two successors: one in V 0 and one in V 1 , which form a node pair. Every pair is connected to the next pair forming a ladder of pairs. Finally, the last pair is connected to the top. The parameter n specifies the number of node pairs. For our tests, we set n = 2 k where 8 ≤ k ≤ 19, and report our experiments in Table 3 whose trend is drawn in Fig. 5, where the seconds are limited to [0,2]. As the figure shows, there is a better performance for CRO than IRS using low-scaled (up to 2 13 ) values as input parameter. This behavior is not surprising as there is a warming-up time required by the Java Virtual Machine.

Model Checker Ladder[n]
consists of overlapping blocks of four nodes, where the parameter n specifies the number of desidered blocks. Every node is owned by player 1, i.e., V 1 = V and V 0 = ∅, and the nodes are connected such that every cycle passes through a single point of colour 0. For our experiments we set n = 2 k where 10 ≤ k ≤ 15, and report our experiments in Table 4 below and draw the trends in Fig. 6, where the seconds are limited to [0, 2].
Jurdzinski [n, m] games are designed to generate the worst-case behavior for SPM [30]. The parameter n is the number of layers, where each layer has m repeating blocks that are inter-connected as described in [30]. As this game takes two parameters, in our test we ran two experiments: one where n is fixed to 10 and m = 10 × 2 k , for k = 1, . . . , 5 and one where m is fixed to 10 and n = 10 × 2 k , for k = 1, . . . , 5. The results of our experiments are reported in Table 5. The trends are drawn in Fig. 7.

New implementations and experimental results
Our investigation has explored two possible directions to improve the performance of RE: i) using new data structures and more efficient coding, ii) exploiting its implementation along a different programming language as Scala. The experiments have highlighted how the combination of these two directions turns out to be very efficient in practice, showing an improvement of up to one order of magnitude both in improving the RE implementation in OCaml and choosing a different programming language. Thus, we reach an improvement of up to two orders of magnitude applying both.     In this section, we continue our investigation on the programming language side presenting multiple implementations in Java, C++ and Go of IRE. We introduce the languages used, making an overall explanation of their key aspects to achieve our primary goal. The full implementation can be found online via https://github.com/vincepri/SPGSolver.
The Java Programming Language was developed by James Gosling at Sun Microsystem and introduces for the first time in 1995. The latest release of Java is a huge step forward for the language that enriches the syntax and the standard library. It is a clear demonstration of a language evolution without compromising robustness, stability and still ensuring backward compatibility. Our Java solver implementation relies mainly on the standard library, Google Guava [6] library and Trove for high performing data structures. The Trove library offers regular and primitive collections for Java with high speed and memory efficient. Internally Trove does not use any java.lang.Number subclasses, in this way there is no boxing/unboxing overhead. The TIntArrayList data structure is built on top of an array using the corresponding data type (int[] in this case). Each Trove Array List has several helper method inherited from the java.util.Collections.
Modern C++ can be seen in three parts: low level language inherited from C, advanced language features and the standard library (stdlib) that provides useful data structures and algorithms.
Our implementation in C++ makes intense use of the language's standard library and the Boost C++ libraries for string based algorithms when parsing files and timer functions. The tool was compiled with clang, a compiler front end for C, C++ and Objective-C that uses LLVM as its backend. The compiled executable is around 62KB on disk and it must be noted that was compiled with full optimizations enabled (-Ofast flag).
The C++ version, showed a huge memory footprint saving compared to garbage collected programming languages. A Python module, that makes use the C++ implementation, has also been implemented using Boost.Python.
The Go Programming Language is an efficient, statically-typed compile language developed at Google in 2007. It has built-in support for concurrency and communication, a latency-free garbage collection and high speed compilation process. The standard library provides all core packages programmers need to build real world programs, such as two fundamental built-in collection types: slices (variable-length arrays) and maps, built to be efficient and to serve multiple purposes. The language does not hide pointers and there is no virtual machine getting in the way of performance, for this reason it is completely possible to design complex custom types with ease. Our implementation of parity games in Go strictly follows the Go rules and conventions for code syntax and only requires the standard library, keeping external dependencies at minimum.

Benchmarks and trends
In this section we study, analyze and evaluate the running time of our implementations.
We have run our experiments on multiple instances of random parity games. Precisely, we have used 100 random games generated by PGSolver with a number of nodes defined as N = i × 1000 with 1 ≤ i ≤ 60, and two priorities.
It is worth to note that these implementations do not apply any preprocessing steps to the arena before solving. The graph data structure is represented, in a consistent manner across the languages, as a fixed-length Array of objects, where every node contains perfect information such as the player, priority and adjacency list.
The trends are shown in Fig. 8, which gives a clear idea of what is possible to achieve in under a second, while Fig. 9, shows the clear gap between OCaml and the other languages.
The chosen languages have deep differences between each other, Go and C++, for example, make use of a static compiler to produce a full native binary, while Java and Scala use the JVM's compiler. In addition, the JVM languages and Go are garbage-collected while C++11 uses RAII as a programming idiom where holding a resource is tied to an object lifetime and the language itself guarantees that an object is freed once control flow leaves the scope.
We finally include a plot in Fig. 10 to further compare the different implementations. Tests are performed using 20 random arenas generated through PGSolver, setting the number of nodes as N = i × 1000 with 1 ≤ i ≤ 60, number of priorities p ∈ 2, √ n, n , and minimum and maximum number of edges (min, max) ∈ (1, n) , n 2 , n . Note that Our benchmarks empirically show that C++ is the best performing one, in every test we have run, reducing the Scala running time by a factor of 4 and Go by a factor of˜1.5. Scala and Java tend to have a very similar behavior over 40000 nodes, in fact in some plots the results are missing because the JVM would go over 32GB of available memory, while Go was always capable of completing the solving process even if taking more time. A further investigation on why performance was degrading so quickly for Go after the 40000 nodes threshold led to the garbage collector implementation, Go's collector is a mark-andsweep with periodic pauses when it runs; on the other hand Java provides few different implementation for its collector allowing multiple performance optimizations.

Benchmarks on practical model checking problems
The experiments on random games have showed an overview on the performance of our implementations, but to better understand their behavior in the practice applications of parity games we need to continue our investigation. Therefore, we evaluate the performance of our implementation of IRE in OCaml (IRO), C++ (IRC), Go (IRG), Java (IRJ), and Scala (IRS) on some practical model checking problems as in [33]. Specifically, we use models coming from: the Sliding Window Protocol (SWP) with window size (WS) of 2 and 4 (WS represents the boundary of the total number of packets to be acknowledged by the receiver) and the Onebit Protocol (OP). The properties we check on these models concerns: absence of deadlock (ND), a message of a certain type (d1) is received infinitely often (IORD1), and if there are infinitely many read steps then there are infinitely many write steps (IORW).
Note that, in all benchmarks, data size (DS) denotes the number of messages, and every game has 3 priorities.
As we can see, by comparing Tables 6 and 7, the experiments indicate a more nuanced relationship between the different implementations of IRE. Indeed, even though the experiments follow the trend showed previously for the random case, that is, C++ is the best performing one, they also show a different behavior for the other implementations. Overall, we can observe that IRJ gains one order of magnitude over IRS in all protocols and properties. Thus, the gap between C++ and Scala reaches up to two orders of magnitude, differently from what we have seen for the random case. While, IRC and IRG reduce the IRJ running time of one order of magnitude. Here, another interesting aspect is highlighted by the comparison between the IRC and IRG running time where a closer distance is showed, though IRC outperforms IRG in all cases. Finally, the experiments clearly show the significant gap that IRO gets against the other implementations.

Conclusions
PGSolver is a well-known framework that collects multiple algorithms to decide parity games. For several years this platform has been the only one available to solve and benchmark in practice. However, given PGSolver's limitations addressing huge graphs, several attempts of improvement have been carried out. Some of them have been implemented as preprocessing steps in the tool itself, such as priority compression or SCC decomposition and the like [25], while others chose parallelism techniques applied to the algorithms, as done in [4,27].
In this work we start from scratch by revisiting the implementation of RE in PGSolver. We first provide an improved version by using new data structures and more efficient coding. The improved version guarantees that the original game remains immutable tracking the removed nodes in every subsequent call and checking, in constant time, whether a node needs to be excluded or not. Our preliminary results show that our implementation allows to gain up to one order of magnitude over the implementation in PGSolver.
Then, we exploit its implementation along different programming languages such as Scala, Java, C++, and Go and compare among them. The experimental results give a clear and perfect idea of which implementation is outperforming the solving process. C++ is the best performing one, in every run, and it is capable of gaining up to three orders of magnitude in running time over its classical implementation in PGSolver. Specifically, on random games, C++ reduces Scala running time by a factor of 4 and Go by a factor of ∼1.5. Go's performance behavior tends to degrade after 40000 nodes, outperformed by Java in some cases. Instead, the benchmarks executed over practical model checking problems show that Go's performance, besides being of the same order, tends to be closer of the C++ one. This highlights an interesting aspect about the use of Go. Indeed, although Go is a young programming language, it provides a high performance, super efficient concurrency handling like Java and fun to code like Python/Perl. Then, even though C++ has the better performance, a deep investigation in the benefits of using Go might be lead to appealing results.
The importance of this work, that is an extension of the results in [19], relies on the fact that has been the first to exploit the use of efficient coding and modern programming languages to improve the performance of RE, as it is used for comparisons in the literature. Indeed, our results have already inspired to development new efficient tools for solving parity games. Among others, Oink, developed by Tom van Dijk [48], makes use of our improvement like the immutability of the game in implementing RE, it is written in C++, and provides multi-core implementation of RE and SPM as done in [4]. Finally, this work points out interesting questions that we take into account for future works such as continue the evaluation of Go's performance in real scenarios, as well as the implementation of a multi-core version of the algorithms in Go and Java.
Funding Open access funding provided by Università degli Studi di Roma La Sapienza within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommonshorg/licenses/by/4.0/.