Distributed Coalgebraic Partition Refinement

Partition refinement is a method for minimizing automata and transition systems of various types. Recently, a new partition refinement algorithm and associated tool CoPaR were developed that are generic in the transition type of the input system and match the theoretical run time of the best known algorithms for many concrete system types. Genericity is achieved by modelling transition types as functors on sets and systems as coalgebras. Experimentation has shown that memory consumption is a bottleneck for handling systems with a large state space, while running times are fast. We have therefore extended an algorithm due to Blom and Orzan, which is suitable for a distributed implementation to the coalgebraic level of genericity, and implemented it in CoPaR. Experiments show that this allows to handle much larger state spaces. Running times are low in most experiments, but there is a significant penalty for some.


Introduction
Minimization is an important and basic algorithmic task on state-based systems, concerned with reducing the state space as much as possible while retaining the system's behaviour. It is used for equivalence checking of systems and as a subtask in model checking tools in order to handle larger state spaces and thus mitigate the state-explosion problem. We focus on the task of identifying behaviourally equivalent states modulo bisimilarity. For classic labelled transitions systems this notion obeys the principle 'states s and t are bisimilar if for every transition s a − − → s , there exists a transition t a − − → t with s and t bisimilar', and symmetrically for transitions from t. Bisimilarity is a rather fine-grained branching-time notion of equivalence (cf. [ ]); it is widely used and preserves all properties expressible as µ-calculus formulas. Moreover, it has been generalized to yield equivalence notions for many other types of state-based systems and automata.
Due to the above principle, bisimilarity is defined by a fixed point, to be understood as a greatest fixed point and is hence approximable from above. This is used by partition refinement algorithms: The initial partition considers all states tentatively equivalent is then iteratively refined using observations In previous work [ , ], an efficient partition refinement algorithm was provided which is generic in the system type, captures all the above system types, and matches or, in some cases even improves on the run time complexity of the respective specialized algorithms. Subsequently, we have shown how to extend the generic complexity analysis to weighted tree automata and implemented the algorithm in the tool CoPaR [ , ], again matching the previous best run time complexity and improving it in the case of weighted tree automata with weights from a non-cancellative monoid. The algorithm is based on ideas of Paige and Tarjan, which leads to its efficiency. Genericity is achieved by modelling state based systems as coalgebras, following the paradigm of universal coalgebra [ ], in which the transitions structure of systems is encapsulated by a set functor. The algorithm and tool are modular in the sense that functors can be built from a preimplemented set of basic functors by standard set constructions such as cartesian product, disjoint union and functor composition. The tool then automatically derives a parser for input coalgebras of the composed type and provides a corresponding partition refinement implementation off the shelf. In addition, new basic functors F may easily be added to the set of basic functors by implementing a simple refinement interface for them plus a parser for encoded Fcoalgebras. Our experiments with the tool have shown that run time scales well with the size of systems. However, memory usage becomes a bottleneck with growing system size, a problem that has previously also been observed by Valmari [ ] for partition refinement. One strategy to address this is to distribute the algorithm across multiple computers, which store and process only a part of the state space and communicate via message passing. For ordinary labelled transition systems and Markov systems this has been investigated in a series of papers by Blom and Orzan [ -] who were also motivated to mitigate the memory bottleneck of sequential partition refinement algorithms.
Our contribution in this paper is an extension of CoPaR by an efficient distributed partition algorithm in coalgebraic generality. Like in Blom and Orzan's work, our algorithm is a distributed version of a simple but effective algorithm called "the naive method" [ ], or "the final chain algorithm" in coalgebraic generality [ , ]. We first generalize signature refinement introduced by Blom and Orzan to the level of coalgebras. We also combine generalized signatures (Section ) with the previous encodings of set functors and their coalgebras [ , ] via the new notion of a signature interface (Definition . ). This is a key idea to make coalgebraic signature refinement and the final chain algorithm implementable in a tool like CoPaR. In addition, we demonstrate how signature interfaces of functors can be combined (Construction . and Proposition . ) along standard functor constructions. This yields a similar modularity principle than for the previous sequential algorithm. However, this is a new feature for signature refinement and also, to our knowledge, for the final chain algorithm. Consequently, our distributed, modular and generic implementation of the final chain algorithm is new (already as sequential algorithm).
We also provide experiments demonstrating its scalability and show that much larger state spaces can indeed be handled. Our benchmarks include weighted tree automata for non-cancellative monoids, a type of system for which our previous sequential implementation is heavily limited by its memory requirements. For those systems the running times of the distributed algorithm are even faster then those of the sequential algorithm. In a second set of benchmarks stemming from the PRISM benchmark suite [ ] we again show that larger systems can now be handled; however, for some of these there is a penalty in run time.

Preliminaries
Our algorithmic framework and the tool CoPaR [ , ] are based on modelling state-based systems abstractly as coalgebras for a (set) functor that encapsulates the transition type, following the paradigm of universal coalgebra [ ]. We now recall some standard notations for sets and maps and basic notions and examples in coalgebra. We fix a singleton set 1 = { * }; for every set X we have a unique map ! : X → 1 and the identity map id X : X → X. We denote composition of maps by (−) · (−), in applicative order. Given maps f : ). The type of transitions of states in a system is modelled by a set functor F . Informally, F assigns to every set X a set F X of structured collections of elements of X, and an F -coalgebra is a map c : S → F S which assigns to every state s ∈ S in a system a structured collection c(s) ∈ F S of successor states of s. The functor F also determines a canonical notion of behavioural equivalence of states of a coalgebra; this arises by stipulating that morphisms of coalgebras are behaviour preserving maps.
Definition . . A functor F : Set → Set assigns to each set X a set F X and to each map f : Example . . We mention several types of systems which are instances of the general notion of coalgebra and the ensuing notion of behavioural equivalence. All these are possible input systems for our tool CoPaR. ( ) Transition systems. The finite powerset functor P ω maps a set X to the set P ω X of all finite subsets of X, and a map f : X → Y to the map P ω f = f [−] : P ω X → P ω Y taking direct images. Coalgebras for P ω are finitely branching (unlabelled) transition systems. Two states are behaviourally equivalent iff they are (strongly) bisimilar in the sense of Milner [ , ] and Park [ ]. Similarly, finitely branching labelled transition systems with label alphabet A are coalgebras for the functor F X = P ω (A × X). ( ) Deterministic automata. For an input alphabet A, the functor given by F X = 2 × X A , where 2 = {0, 1}, sends a set X to the set of pairs of boolean values and functions A → X. An F -coalgebra (S, c) is a deterministic automaton (without an initial state). For each state s ∈ S, the first component of c(s) determines whether s is a final state, and the second component is the successor function A → S mapping each input letter a ∈ A to the successor state of s under input letter a. States s, t ∈ S are behaviourally equivalent iff they accept the same language in the usual sense. ( ) Weighted tree automata simultaneously generalize tree automata and weighted (word) automata. Inputs of such automata stem from a finite signature Σ, i.e. a finite set of input symbols, each with a prescribed natural number, its arity. Weights are taken from a commutative monoid (M, +, 0). A (bottom-up) weighted tree automaton (WTA) (over M with inputs from Σ) consists of a finite set S of states, an output map f : S → M , and for each k ≥ 0, a transition map µ k : Σ k → M S k ×S , where Σ k denotes the set of k-ary input symbols in Σ; the maximum arity of symbols in Σ is called the rank.
Every signature Σ gives rise to its associated polynomial functor, also denoted Σ, which assigns to a set X the set n∈N Σ n ×X n , where denotes disjoint union (coproduct). Further, for a given monoid (M, +, 0) the monoid-valued functor M (−) sends a set X to the set of maps f : X → M that are finitely supported, i.e. f (x) = 0 for almost all x ∈ X. Given a map f : , corresponding to the standard image measure construction.
Weighted tree automata are coalgebras for the composite functor F X = M × M (ΣX) ; indeed, given a coalgebra c = c 1 , c 2 : S → M × M (ΣS) , its first component c 1 is the output map, and the second component c 2 is equivalent to the family of transitions maps µ k described above.
As proven by Wißmann et al. [ , Prop. . ], the coalgebraic behavioural equivalence is precisely backward bisimulation of weighted tree automata as introduced by Högberg et al. [ , Def. ].
( ) The bag functor B : Set → Set sends a set X to the set of all finite multisets (or bags) over X. This is the special case of the monoid-valued functor for the monoid (N, +, 0). Accordingly, B-coalgebras are weighted transition systems with positive integers as weights, or they may be regarded as finitely branching transition systems where multiple transitions between a pair of states are allowed. Behavioural equivalence coincides with weighted (or strong) bisimilarity. ( ) Markov chains. The finite distribution functor D ω is a subfunctor of the monoid-valued functor R (−) for the usual monoid of addition on the real numbers. It maps a set X to the set of all finite probability distributions on X. That means that D ω X is the set of all finitely supported maps d : X → [0, 1] such that x∈X d(x) = 1. The action of D ω on maps is the same as that of R (−) . As shown by Rutten and de Vink [ ], coalgebras c : S → (D ω S + 1) A are precisely Larsen and Skou's probabilistic transition systems [ ] (aka. labelled Markov chains [ ]) with the label alphabet A. In fact, for each state s ∈ S and action label a ∈ A, that state either cannot perform an a-action (when c(s)(a) ∈ 1) or the distribution c(s)(a) determines for every state t ∈ C the probability with which s transitions to t with an a-action.
Coalgebraic behavioural equivalence is precisely probabilistic bisimilarity in the sense of Larsen and Skou, see Rutten and de Vink [ , Cor. . ]. ( ) Markov decision processes are systems which feature both non-deterministic and probabilistic branching. They are coalgebras for composite functors such as P ω (A × D ω (−)) or P ω (D ω (A × (−)) (simple/general Segala systems); Bartels et al. [ ] list further functors for various species of probabilistic systems.

Encodings.
To supply coalgebras as inputs to CoPaR and in order to speak about the size of a coalgebra in terms of states and transitions, we need Definition . [ , Def. . ]. An encoding of a set functor F consists of a set A of labels and a family of maps X : F X → B(A × X), one for every set X, such that the map F !, X : The number of states and edges of a given encoded input coalgebra are n = |S| and m = s∈S | S (c(s))|, respectively, where |b| = x∈X b(x) for a bag b : X → N.
An encoding of a set functor F specifies how F -coalgebras are represented as directed graphs, and the required injectivity ensures that different coalgebras have different encodings.
Example . . We recall a few key examples of encodings used by CoPaR [ ]; for the required injectivity, see [ , Prop. . ]. ( ) For the finite powerset functor P ω one takes a singleton label set A = 1 and ( ) For the monoid-valued functor M (−) we take labels A = M , and the map ( ) As a special case, the bag functor B has labels A = N, and the map Remark . . ( ) Readers familiar with category theory may wonder about the naturality of encodings X . It turns out [ ] that in almost all instances, our encodings are not natural transformations, except for polynomial functors. As shown in op. cit., all our encodings satisfy a property called uniformity, which implies that they are subnatural transformations [ , Prop. . ]. ( ) Having an encoding of a set functor F does not imply a reduction of the problem of minimizing F -coalgebras to that of coalgebras for B(A × −). In fact, the behavioural equivalence of F -coalgebras and coalgebras for B(A × −) may be very different unless X is natural, which is not the case for most encodings.
Functors in CoPaR can be combined by product, coproduct or composition, leading to modularity. But in order to automatically handle combined functors, our tool crucially depends on the ability to form products and coproducts of encodings [ , ]. We refrain from going into technical details, but note for further use that given a pair of functors F 1 , F 2 with encodings A i , X,i one obtains encodings for the functors F 1 × F 2 (cartesian product) and F 1 + F 2 (disjoint union) with the label set A = A 1 + A 2 .
Input syntax and processing. We briefly recall the input format of CoPaR and how inputs are processed; for more details see [ , Sec. . ]. CoPaR accepts input files representing a finite F -coalgebra. The first line of an input file specifies the functor F which is written as a term according to the following grammar: where n ∈ N denotes the set {0, . . . , n − 1}, the s k are strings subject to the usual conventions for variable names (a letter or an underscore character followed by alphanumeric characters or underscore), exponents F A are written F^A, and M is one of the monoids (Z, +, 0), (R, +, 0), (C, +, 0), (P ω (64), ∪, ∅) (the monoid of 64-bit words with bitwise or), and (N, max, 0) (the additive monoid of the tropical semiring). Note that C effectively ranges over at most countable sets, and A over finite sets. A term T determines a functor F : Set → Set in the evident way, with X interpreted as the argument.
The remaining lines of an input file specify a finite coalgebra c : S → F S. Each line has the form s:␣t for a state s ∈ S, and t represents the element c(s) ∈ F S. The syntax for t depends on the specified functor F and follows the structure of After reading the functor term T , CoPaR builds a parser for the functorspecific input format and then parses the input coalgebra given in that format into an intermediate format which internally represents the encoding of the input coalgebra (Definition . ). For composite functors the parsed coalgebra then undergoes a substantial amount of preprocessing, which also affects how transitions are counted; see [ , Sec. . ] for more details.

Coalgebraic Partition Refinement
As mentioned in the introduction, the sequential partition refinement algorithm previously implemented in CoPaR is based on ideas used in the Paige-Tarjan algorithm [ ] for transition systems. However, as has been mentioned by Blom and Orzan [ ], the Paige-Tarjan algorithm carefully selects the block of states to split in each iteration, and the data structures used for this selection take a lot of memory and require modification to allow a distributed implementation. Hence, Blom and Orzan have built their distributed algorithm from a rather simple sequential partition refinement algorithm based on what Kanellakis and Smolka refer to as the naive method [ ]. We now recall this algorithm and subsequently show how it can be adapted to the coalgebraic level of generality.
Signature Refinement. Given a finite labelled transition system with the state set S, a partition on S may be presented by a function π : S → N, i.e. two states s, t ∈ S lie in the same block of the partition iff π(s) = π(t). The signature of a state s ∈ S is the set of outgoing transitions to blocks of π: ( ) A signature refinement step then refines π by putting s, t ∈ S into different blocks iff sig π (s) = sig π (t). Concretely, we put π new (s) = hash(sig π (s)) using a perfect, deterministic hash function hash. The signature refinement algorithm ( Fig. ) starts with a trivial initial partition on S and repeats the refinement step until the partition stabilizes, i.e. until two subsequent partitions have the same size.
Coalgebraic Signature Refinement. Regarding a labelled transition system as a coalgebra c : S → P ω (A × S) (Example . ( )), signatures are obtained by postcomposing the transition structure with the partition under the functor: Variables : old and new partitions represented by π, πnew : S → N with sizes l, lnew, resp.; set H for counting block numbers; The generalisation to coalgebras for arbitrary F is immediate: the signature of a state of an F -coalgebra c : S → F S w.r.t. a partition π is given by the function sig π = F π · c. In the refinement step of the above algorithm two states are identified by the next partition if they have the same signatures currently: Hence, the algorithm in fact simply applies F (−) · c to the initial partition corresponding to the trivial quotient ! : S → 1 until stability is reached. Note that this is precisely the Final Chain Algorithm by König and Küpper [ , Alg. . ] computing behavioural equivalence of a given F -coalgebra. Its correctness thus proves correctness of the coalgebraic signature refinement which is the algorithm in Fig. with sig π = F π · c. Since we represent functors and their coalgebras by encodings we use an interface to F to compute signatures based on encodings.
Definition . . Given a functor F with encoding A, X , a signature interface consists of a function sig : F 1 × B(A × N) → F N such that for every finite set S and every partition π : S → N we have Given a coalgebra c : S → F S, a state s ∈ S and a partition π : S → N, the two arguments of sig should be understood as follows. The first argument is the value F !(c(s)) ∈ F 1, which intuitively provides an observable output of the state s. The second argument is the bag B(A × π)( S (c(s)) formed by those pairs (a, n) of labels a and numbers n of blocks of the partition π to which s has an edge; that is, that bag contains one pair (a, n) for each edge s a − − → s where π(s ) = n. Thus, when supplied with these inputs, sig correctly computes the signature of s; indeed, to see this, precompose equation ( ) with the coalgebra structure c.
( ) The powerset functor P ω has the label set A = 1, and we define the function sig : ( ) The monoid-valued functor R (−) has the label set A = R, and we define the function sig : Next we show how signature interfaces can be combined by products (×) and coproducts (+). This is the key to the modularity of the implementation (be it distributed or sequential) of the coalgebraic signature refinement in CoPaR.
Here, pr i : F 1 → F i 1 is the projection map and filter i : Proposition . . The functions sig defined in Construction . yield signature interfaces for the functors F 1 × F 2 and F 1 + F 2 , respectively.
As a consequence of this result, it suffices to implement signature interfaces only for basic functors according to the grammar in ( ), i.e. the trivial identity and constant functors as well as the functors P ω , B, D ω and the supported monoid-valued functors M (−) . Signature interfaces of products, coproducts and exponents, being a special form of product, are derived using Construction . . Functor composition can be reduced to these constructions by a technique called desorting [ , Sec. . ], which transforms a coalgebra of a composite functor into a coalgebra for a coproduct of basic functors whose signature interfaces can then be combined by + (see also [ , Sec. . ]). As for the previous Paige-Tarjan style algorithm, this leads to the modularity in the functor of the coalgebraic signature refinement algorithm: signature interfaces for composed functors are automatically derived in CoPaR. Moreover, a new basic functor F may be added by implementing a signature interface for F , effectively extending the grammar of supported functors in ( ) by a clause F T .

The Distributed Algorithm
Our distributed algorithm for coalgebraic signature refinement is a generalization of Blom and Orzan's original algorithm [ ] to coalgebras. We highlight differences to op. cit. at the end of this section.
We assume a distributed high-bandwidth cluster of W workers w 1 , . . . , w W that is failure-free, i.e. nodes do not crash, messages do not get lost and between two nodes the order of messages is preserved. The communication is based on non-blocking send operations and blocking receive operations. Messages are triples of the form (from, to, data), where the data field may be structured and will often contain a tag to simplify interpretation.
Description. The distributed algorithm is based on the sequential algorithm presented in Fig. , using a distributed hashtable to keep track of the partition. As for the sequential algorithm, the input consists of an F -coalgebra (S, c) with |S| = n states. We split the state space evenly among the workers as a preprocessing step. We write S i with |S i | = n/W for the set of states of worker w i . The input for worker w i is the encoding of that part of the transition structure of the input coalgebra which is needed to compute the signatures of the states in S i . This information is presented to w i as the list of all outgoing edges of states of S i in the encoding of the coalgebra (S, c), i.e. the list of all s a − − → t with s ∈ S i (cf. Definition . ). We refer to the block number π(s) of a state s ∈ S as its ID.
After processing the input, the algorithm runs in two phases. In the Initialization Phase ( Fig. ) the workers exchange update demands about the IDs stored in the distributed hashtable. If w i has an edge s a − − → s into some state s of w j , then during refinement w i needs to be kept up to date about the ID of s and thus instructs w j to do so. Worker w j remembers this information by storing w i in the set In s = {w i | ∃s ∈ S i , a ∈ A. s a − − → s } of incoming edges of s (lines -). Hence, for each edge s a − − → s with s ∈ S i and s ∈ S j , worker w i sends a message to w j , informing w j to add w i to In s (lines -).
Variables : Set V of visited states; process count d; for each s ∈ Si a list Ins of workers with an edge into s  The main phase is the Refinement Phase (Fig. ), mimicking the refinement loop of the undistributed algorithm. In each iteration all workers compute their part of the new partition, i.e. the IDs h s = hash(sig π (s)) for each of their states s ∈ S i (line ). In addition, every worker w i is responsible for sending the computed ID of s ∈ S i to workers in In s that need it for computation of their own signatures in the next iteration (lines -). The IDs are also sent to a designated worker counterOf(h s ) (lines -). This ensures that IDs are counted precisely once at the end of the round when the partition size is computed after all messages have been received (lines -). The actual counting (line ) is a   Fig. ]. Finally, the workers synchronize before starting the next iteration (line ). The refinement phase stops if two consecutive partitions have the same size (line ). (Fig. ) terminates since every worker reaches line , sends DONE to all workers and thus also receives it (lines -) a total of W times, allowing it to progress past line . An analogous argument proves termination of every iteration of the Refinement Phase ( Fig. ). The sequential algorithm is correct, hence we know the loop of the refinement phase terminates when all IDs are computed and counted correctly, since then the distributed and the sequential algorithm compute precisely the same partitions.

Correctness. The Initialization Phase
To show that the signatures are computed correctly, we note that if all DONE messages have been received in a round, then, by order-preservation of messages, all messages sent previously in this round have also been received. This ensures that no workers are missing from the lists In s computed in the Initialization Phase and that during the Refinement Phase new IDs are sent to all concerned workers (Fig. , lines -). This establishes correctness of the signature computation, and the signatures coincide on all workers since we assume that the hash function is deterministic. Finally, the use of the counterOf function (line ) ensures that each ID is included in the counting set of exactly one worker. Thus, the distributed sum of the sizes of all counting sets is equal to the size of the partition.
Complexity. Let us assume that not only states, but also outgoing transitions are distributed evenly among the workers, i.e. every worker has about m/W outgoing transitions. In the Initialization Phase, the loop sending messages runs in O( m W ) and receiving takes O(W · n W ) = O(n), since for worker w i every other worker w j might have an edge into every state in S i . Both are executed in parallel so in total the phase runs in O(max( m W , n)) = O( m W + n). In the Refinement Phase, we assume the run time of computing signatures and their hashes is linear in the number of edges. Then the loop for computing and hashing (O( m W )) and counting (O( n W )) signatures runs in total in O( m+n W ), since it is performed by all workers independently. Each worker receives at most m/W ID-updates each round and the partition size is computable in O(W ) giving the complexity of one refinement step in O( m+n W ). As many as n iterations might be needed for a total Remark . . The above analysis assumes that signature interfaces are implemented with a linear run time in their input bag. This could in fact be theoretically realized for all basic functors (whence also for their combinations) currently implemented in CoPaR, which would involve using bucket sort for the grouping of bag elements by the target block (second component), e.g. for monoid-valued functors. However, since the table used in bucket sort would be very large (the size of the last partition) and memory conscience is our main motivation, we opted for an implementation using a standard n log n sorting algorithm instead.

Implementation details.
CoPaR is implemented in Haskell. We were able to reuse, with only minor adjustments, major parts of the code base of CoPaR dedicated to the representation and processing of coalgebras. This includes the implemented functors and their encodings together with the corresponding parser and preprocessing algorithms (see Section ). As explained in Section the sequential Paige-Tarjan-style algorithm of CoPaR was not used; we implemented an additional "algorithmic frontend" to our "coalgebraic backend". To compute signatures during the Refinement Phase, each functor implements the signature interface (Definition . ), which is written in Haskell as follows: class Hashable (Signature f) => SignatureInterface f where type Signature f :: Type sig : We require in the second line a type Signature f, that serves as an implementation-specific datatype representation of F N. In the type of sig, the types f, Label f and F1 f correspond to the name of F , its label type and the set F 1, respectively.
Example . . The Haskell-implementation of the signature interface for the finite power set functor P ω from Example . ( ) is as follows: data P x = P x −− already defined in CoPaR type instance Label P = () −− also already defined instance SignatureInterface P where type Signature P = Set Int sig :: F f -> [((), Int )] -> Set Int sig _ = setFromList . map snd Signature interfaces for the other basic functors according to the grammar in ( ) are implemented similarly. For combined functors CoPaR automatically derives their signature interface based on Construction . . In the algorithm itself, each worker runs three threads in parallel: The first thread is for computing, the second one is for sending and the third one is for receiving signatures. This allows us to keep calls to the MPI interface separated from (pure) signature computation, simplifying logic and allowing the workers to scatter the ID of one state while simultaneously computing the signature of the next one to ensure that neither signature computation nor network traffic become bottlenecks. For inter-thread communication and synchronization we rely on Haskell's software transactional memory [ ] to ease concurrent programming, e.g. to avoid race conditions.

Comparison to Blom and Orzan's algorithm. We now discuss a few differences of our algorithm to Blom and Orzan's original one [ ].
In Blom and Orzan's algorithm for LTSs the sets In s of s ∈ S i are in fact lists and contain worker w k a total of r times if there exist r edges from states in S k to s. This induces a redundancy in messages of ID updates, since w i sends r (instead of one) messages with the ID of s to w k . If the LTS has an average fanout of f then each worker has t = n/W · f outgoing transitions; this is the number of ID updates received every round. Since there are only n states, at most n/t = W/f of those messages are necessary. In our scenario, we have W f for large coalgebras, hence the overhead becomes massive; e.g. for W = 10, f = 100 already 90% of all ID messages are redundant. We use sets instead of lists for In s to avoid this redundancy.
Signature Another difference of our implementation is that we decided to hash the signatures directly on the workers of the respective states while Blom and Orzan decided to first send the signatures to some dedicated hashing worker who is then (uniquely) responsible for hashing, i.e. computing a new ID. This method allows to compute new IDs in constant time. However, for more complex functors supported by CoPaR, sending signatures could result in very large messages, so we opted for minimizing network traffic at the cost of slower signature computation.

Evaluation
To illustrate the practical utility and scalability of the algorithm and its implementation in CoPaR, we report on a number of benchmarks performed on a selection of randomly generated and real world data. In previous evaluations of sequential CoPaR [ ], we were limited by the GB RAM of a standard workstation. Here we demonstrate that our distributed implementation fulfills its main objective of handling larger systems without lifting the memory restriction per process. All benchmarks were run on a high performance computing cluster consisting of nodes with two Xeon v "Ivy Bridge" chips ( cores per chip + SMT) with . GHz clock rate and GB RAM. The nodes are connected by a fat-tree InfiniBand interconnect fabric with GBit/s bandwidth. Unless stated otherwise, execution runs were performed using workers on nodes, resulting in worker processes per node. No process used more than GB RAM. Execution times of the sequential algorithm were taken using one node of the cluster. No times are given for executions that ran out of GB memory previously [ ]; those were not run on the cluster.
Weighted Tree Automata. In previous work [ ], we have determined the size of the largest weighted tree automata for different parameters that the sequential version of CoPaR could handle in GB of RAM. Here, we demonstrate that the distributed version can indeed overcome these memory constraints and process much larger inputs.
Recall from Example . that weighted tree automata are coalgebras for the functor F X = M ×M (ΣX) . For these benchmarks, we use ΣX = 4×X r with rank r ∈ {1, . . . , 5} and the monoids (2, ∨, 0) (available as the finite powerset functor in CoPaR), (N, max, 0) and (P ω (64), ∪, ∅). To generate a random automaton with n states, we uniformly chose k = 50 · n transitions from the set of all possible transitions (using an efficient sampling algorithm by Vitter [ ]) resulting in a coalgebra encoding with n = 51 · n states and m = (r + 1) · k edges. We took care to restrict the state and transition weights to at most different monoid elements in each example, to avoid the situation where all states are already distinguished in the first iteration of the algorithm.  Table lists results for both the sequential and distributed implementation when run on the same input. These are the largest WTAs for their respective rank and monoid that sequential CoPaR could handle using at most GB of RAM [ ]. In contrast, the distributed implementation uses less than GB per worker for those examples and is thus able to handle much larger inputs. Incidentally, the distributed implementation is also faster despite the overhead incurred by network communication. This can partly be attributed to the input-parsing stage, which does not need inter-worker synchronization and is thus perfectly parallelizable.
To test the scaling properties of the distributed algorithm, we ran CoPaR with the same input WTA but a varying number of worker processes. For this we chose the WTA for the monoid (2, ∨, 0) with ΣX = 4 × X 5 having states with transitions and file size MB. The figure on the right above depicts the maximum memory usage per worker and the overall running time. The results show that both data points scale nicely with up to workers, but while the running time even increases when using up to workers, the memory usage per worker (the main motivation for this work) continues to decrease significantly.  Table : Maximally manageable WTAs for sequential CoPaR; "Mem." and "Time" are the memory and time required for the distributed algorithm and are the maximum over all workers. "Seq. Time" is the time needed by sequential CoPaR.
PRISM Models. Finally, we show how our distributed partition refinement implementation performs on models from the benchmark suite [ ] of the PRISM model checker [ ]. These model (aspects of) real-world protocols and are thus a good fit to evaluate how CoPaR performs on inputs that arise in practice. Specifically, we use the fms and wlan_time_bounded families of systems. These are continuous time Markov chains, regarded as coalgebras for F X = R (X) , and Markov decision processes regarded as coalgebras for F X = N × P ω (N × (D ω X)), respectively. Again, our translation to coalgebras took care to force a coarse initial partition in the algorithm. The results in Table show that the distributed implementation is again able to handle larger systems than sequential CoPaR in GB of RAM per process. For the fms benchmarks, the distributed implementation is again faster than the sequential one. However, this is not the case for the wlan examples. The larger run times might be explained by the much higher number of iterations of the refinement phase (i-column of the table). This means that only few states are distinguished in each phase, and thus signatures are re-computed more often and more network traffic is incurred.  Table : Benchmarks on PRISM models: n and m are the numbers of states and edges of the input coalgebra; i is the number of refinement steps (iterations). The other columns are analogous to Table .

Conclusions and Future Work
We have presented a new and simple partition refinement algorithm in coalgebraic genericity which easily lends itself to a distributed implementation. Our algorithm is based on König and Küpper's final chain algorithm [ ] and Blom and Orzan's signature refinement algorithm for labelled transition systems [ ]. We have provided a distributed implementation in the tool CoPaR. Like the previous sequential Paige-Tarjan style partition refinement algorithm, our new algorithm is modular in the system type. This is made possible by combining signature interfaces by product and coproduct, which is used by CoPaR for handling combined type functors. Experimentation has shown that with the distributed algorithm CoPaR can handle larger state spaces in general. Run times stay low for weighted tree automata, whereas we observed severe penalties on some models from the PRISM benchmark suite. An additional optimization of the coalgebraic signature refinement algorithm should be possible using Blom and Orzan's idea [ ] to mark in each iteration those states whose signatures can change in the next iteration and only recompute signatures for those states in the next round. This might mitigate the run time penalties we have seen in some of the PRISM benchmarks.
Further work on CoPaR concerns symbolic techniques: we have a prototype sequential implementation of the coalgebraic signature refinement algorithm where state spaces are represented using BDDs. In a subsequent step it could be investigated whether this can be distributed. In another direction the distributed algorithm might be extended to compute distinguishing formulas, as recently achieved for the sequential algorithm [ ], for which there is also an implemented prototype. Finally, there is still work required to integrate all these new features, i.e. distribution, distinguishing formulas, reachability and computation of minimized systems, into one version of CoPaR.

Data Availability Statement
The software CoPaR and the input files that were used to produce the results in this paper are available for download [ ]. The latest version of CoPaR can be obtained at https://git .cs.fau.de/software/copar.
Math. Softw. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Omitted Details
First, let us recall from previous work how encodings can be combined by product and coproduct: Proposition A. [ , Prop. . ]. For a pair of functors F 1 , F 2 with encodings A i , X,i , i = 1, 2 we have the following encodings with label set A = A 1 + A 2 : ( ) for the product functor F = F 1 × F 2 we take where in i : A i → A 1 + A 2 and pr i : F 1 X × F 2 X → F i X, i = 1, 2, denote the canonical coproduct injections and product projections, respectively. ( ) for the coproduct functor F = F 1 + F 2 we take In the following proof we work with finite products and coproducts in lieu of binary ones. Given a finite index set I and a family X i , i ∈ I, of sets we denote their product and the canonical projection maps by j∈I X j For every family of maps f i : X i → Y i , i ∈ I, we have the product map The coproduct (disjoint union) of the X i and the canonical injection maps are denoted by Remark A. . ( ) Note that for every family of sets X i , i ∈ I, we clearly have a canonical isomorphism B( i X i ) ∼ = i BX i mapping a bag b : i X i → N to the family of bags b · in i : X i → N, i ∈ I. Observe that the ith component of this isomorphism is a filtering map (cf. Construction . ): for the product and coproduct functors. First recall from Proposition A. the encodings of these functors. Let I be some finite index set and let F i , i ∈ I, be a family of functors with encodings A i , X,i , and put A = i∈I A i .
( ) For the product functor F = i∈I F i note first that the encoding can be rewritten elementfree as follows: where the isomorphism arises from the canonical one in Remark A. ( ). We now obtain the desired equation in Definition . by a simple diagram chase: Note that the horizontal isomorphisms labelled ∼ = reorder factors of the product using the canonical isomorphisms B(A × X) ∼ = i B(A i × X) for X = S and X = N, respectively, making the right-hand square commute due to the naturality of the isomorphisms involved. The upper part commutes using ( ). Similarly, the lower part is the definition of sig in elementfree form. The left-hand square commutes by the assumption on the sig i . Thus, the outside commutes, which yields the desired equation. ( ) For the coproduct functor F = i F i we proceed by case distinction. More precisely, we verify that the desired equation holds when precomposed by every injection map in i : F i S → i F i S = F S. Note first that the ith coproduct component of the encoding X is Again, we conclude by a simple diagram chase: The desired equation is the commutativity of the rectangle in the middle. The upper part commutes by considering the product components separately and using ( ) for the right-hand one. The left-and right-hand parts clearly commutes. The lower part commutes due to the definition of sig, and the lower right-hand triangle commutes by ( ). Finally, the outside commutes by the assumption on the sig i . It follows that the desired inner rectangle commutes when precomposed by in i , which completes the proof.