From Generic Partition Refinement to Weighted Tree Automata Minimization

Partition refinement is a method for minimizing automata and transition systems of various types. Recently, we have developed a partition refinement algorithm that is generic in the transition type of the given system and matches the run time of the best known algorithms for many concrete types of systems, e.g. deterministic automata as well as ordinary, weighted, and probabilistic (labelled) transition systems. Genericity is achieved by modelling transition types as functors on sets, and systems as coalgebras. In the present work, we refine the run time analysis of our algorithm to cover additional instances, notably weighted automata and, more generally, weighted tree automata. For weights in a cancellative monoid we match, and for non-cancellative monoids such as (the additive monoid of) the tropical semiring even substantially improve, the asymptotic run time of the best known algorithms. We have implemented our algorithm in a generic tool that is easily instantiated to concrete system types by implementing a simple refinement interface. Moreover, the algorithm and the tool are modular, and partition refiners for new types of systems are obtained easily by composing pre-implemented basic functors. Experiments show that even for complex system types, the tool is able to handle systems with millions of transitions.


Introduction
Minimization is a basic verification task on state-based systems, concerned with reducing the number of system states as far as possible while preserving the system behaviour. This can be done by identifying states that exhibit the same behaviour. Hence, it can be used for equivalence checking of systems, and constitutes a preprocessing step in further system analysis tasks, such as model checking.
Notions of equivalent behaviour typically vary quite widely even on fixed system types [vG01]. We work with various notions of bisimilarity, i.e. with branching-time equivalences. Classically, bisimilarity for labelled transition systems obeys the principle "states x and y are bisimilar if for every transition x → x , there exists a transition y → y with x and y bisimilar". It is thus given via a fixpoint definition, to be understood as a greatest fixpoint, and can therefore be iteratively approximated from above. This is the principle behind partition refinement algorithms: Initially all states are tentatively considered equivalent, and then this initial partition is iteratively refined according to observations made on the states until a fixpoint is reached. Unsurprisingly, such procedures run in polynomial time. Its comparative tractability (in contrast, e.g. trace equivalence and language equivalence of non-deterministic systems are PSPACE-complete [KS90]) makes minimization under bisimilarity interesting even in cases where the main equivalence of interest is linear-time, such as word automata. Kanellakis and Smolka [KS90] in fact provide a minimization algorithm with run time O(m·n) for ordinary transition systems with n states and m transitions. However, even faster partition refinement algorithms running in O((m + n) · log n) have been developed for various types of systems over the past 50 years. For example, Hopcroft's algorithm minimizes deterministic automata for a fixed input alphabet A in O(n · log n) [Hop71]; it was later generalized to variable input alphabets, with run time O(n · |A| · log n) [Gri73,Knu01]. The Paige-Tarjan algorithm minimizes transition systems in time O((m+n)·log n) [PT87], and generalizations to labelled transition systems have the same time complexity [HT92,DHS03,Val09]. Minimization of weighted systems is typically called lumping in the literature; Valmari and Franchescinis [VF10] exhibit a simple O((m + n) · log n) lumping algorithm for systems with rational weights.
In earlier work [DMSW17,WDMS20] we have developed an efficient generic partition refinement algorithm that can be easily instantiated to a wide range of system types, most of the time either matching or improving the previous best run time. The genericity of the algorithm is based on modelling state-based systems as coalgebras following the paradigm of universal coalgebra [Rut00], in which the branching structure of systems is encapsulated in the choice of a functor, the type functor. This allows us to cover not only classical relational systems and various forms of weighted systems, but also to combine existing system types in various ways, e.g. nondeterministic and probabilistic branching. Our algorithm uses a functor-specific refinement interface that supports a graph-based representation of coalgebras. It allows for a generic complexity analysis, and indeed the generic algorithm has the same asymptotic complexity as the above-mentioned specific algorithms. For Segala systems [Seg95] (systems that combine probabilistic and non-deterministic branching, also known as Markov decision processes), it matches the run time of a recent algorithm [GVdV18] discovered independently and almost at the same time as ours, and improves on the run time of the previously best algorithm [BEM00].
The new contributions of the present paper are twofold. On the theoretical side, we show how to instantiate our generic algorithm to weighted systems with weights in a monoid (generalizing the group-weighted case considered previously [DMSW17,WDMS20]). We then refine the complexity analysis of the algorithm, making the complexity of the implementation of the type functor a parameter p(c), where c is the input coalgebra. In the new setup, the previous analysis becomes the special case where p(c) = 1. Under the same structural assumptions on the type functor and the refinement interface as previously, our algorithm runs in time O(m · log n · p(c)) for an input coalgebra c with n states and m transitions. Instantiated to the case of weighted systems over non-cancellative monoids (with p(c) = log(m) where m is the number of transitions in c), such as the additive monoid (N, max, 0) of the tropical semiring, the run time of the generic algorithm is O(m·log 2 m), thus markedly improving the run time O(m · n) of previous algorithms for weighted automata [Buc08] and, more generally, (bottom-up) weighted tree automata [HMM07] for this case. In addition, for cancellative monoids, we again essentially match the complexity of the previous algorithms [Buc08,HMM07].
Our second main contribution is a generic and modular implementation of our algorithm, the Coalgebraic Partition Refiner (CoPaR). Instantiating CoPaR to coalgebras for a given functor requires only to implement the refinement interface. We provide such implementations for a number of basic type functors, e.g. for non-deterministic, weighted, or probabilistic branching, as well as (ranked) input and output alphabets or output weights. In addition, CoPaR is modular : For any type functor obtained by composing basic type functors for which a refinement interface is available, CoPaR automatically derives an implementation of the refinement interface. We explain in detail how this modularity is realized in our implementation and, extending Valmari and Franchescinis's ideas [VF10], we explain how the necessary data structures need to be implemented so as to realize the low theoretical complexity. We thus provide a working efficient partition refiner for all the above mentioned system types. In particular, our tool is, to the best of our knowledge, the only available implementation of partition refinement for many composite system types, notably for weighted (tree) automata over non-cancellative monoids. The tool including source code and evaluation data is available at https://git8.cs.fau.de/software/copar. The present paper is an extended and completely reworked version of a previous conference paper [DMSW19]. It includes full proofs, additional benchmarks, and more extensive examples and explanations. Moreover, we formally show how refinement interfaces can be combined along products of functors (Proposition 3.14 and Section 3.5). We have optimized the memory consumption of our implementation which has led to better performance in the benchmarks on weighted tree automata (Table 2).
Organization. The material is structured as follows. In Section 2 we recall the necessary technical background and the modelling of state based systems as coalgebras. In Section 3, we describe the tool and the underlying algorithm, discussing in particular tool usage and implementation, the generic interface, and the modularity principles that we employ. Some concrete instantiations are exhibited in Section 4. We then go on to elaborate the case of weighted systems in more detail, giving a refinement interface for the basic underlying functor of such systems in Section 5, and showing in Section 6 how to cover weighted tree automata -which arise by combination of weighted systems and ranked alphabets -by means of our modularity principles. Benchmarks are presented in Section 7.

Preliminaries: Universal Coalgebra
Our algorithmic framework [WDMS20] is based on modelling state-based systems abstractly as coalgebras for a (set) functor that encapsulates the transition type, following the paradigm of universal coalgebra [Rut00]. We proceed to recall standard notation for sets and maps, as well as basic notions and examples in coalgebra. Occasional comments assume familiarity with basic notions of category theory (e.g. [Awo10]) but the few concepts needed for the main development are explained in full. We fix a singleton set 1 = { * }; for every set X we have a unique map ! : X → 1. We denote composition of maps by (−) · (−), in applicative order. We denote the disjoint union -in categorical terms, the coproduct -of sets A, B by A + B where we write inl : A → A + B and inr : B → A + B for the canonical injections; the disjoint union, or coproduct, of a family (X j ) j∈J of sets is denoted by j∈J X j . Similarly, we write j∈J X j for the (cartesian) product of a family of sets. Injection maps of disjoint unions and projection maps of products, respectively, are denoted by Given two maps f : A → X and g : A → Y we write f, g : A → X × Y for the map a → (f (a), g(a)).
Similarly, for a family of maps (f i : A → X i ) i∈I , we write f i i∈I : A → i∈I X i for the map a → (f i (a)) i∈I . We model the transition type of state based systems using functors. Informally, a functor F assigns to a set X a set F X, whose elements are thought of as structured collections over X, and an F -coalgebra is a map c assigning to each state x in a system a structured collection c(x) ∈ F X of successors. The most basic example is that of transition systems, where F is powerset, so a coalgebra assigns to each state a set of successors. Formal definitions are as follows.
Definition 2.1. (1) A functor F : Set → Set assigns to each set X a set F X, and to each map f : (2) An F -coalgebra (C, c) consists of a set C of states and a transition structure c : (4) Two states x, y ∈ C of a coalgebra c : C → F C are behaviourally equivalent (notation: x ∼ y) if there exists a coalgebra morphism h such that h(x) = h(y).
As above, we usually use the letters X and Y for sets (without structure) and C or D for state sets of coalgebras.
Example 2.2. (1) The finite powerset functor P ω maps a set X to the set P ω X of all finite subsets of X, and a map f : X → Y to the map P ω f = f [−] : P ω X → P ω Y taking direct images. P ω -coalgebras are finitely branching (unlabelled) transition systems. and two states are behaviourally equivalent iff they are bisimilar in the sense of Milner [Mil80] and Park [Par81].
(2) A signature Σ is a set Σ of operation symbols together with a map ar : Σ → N, which assigns to each operation symbol σ ∈ Σ its arity ar(σ). We write σ/ n ∈ Σ for σ ∈ Σ with ar(σ) = n. Every signature Σ canonically defines a polynomial functor We slightly abuse notation by denoting for each σ/ n ∈ Σ the corresponding injection into the coproduct by Moreover, we simply write Σ in lieu of F Σ , so we have This polynomial functor acts component-wise on maps f : X → Y : Every state in a Σ-coalgebra represents a (possibly infinite) Σ-tree, i.e. a rooted ordered tree where every node is labelled with some operation symbol σ ∈ Σ and has precisely ar(σ)-many children. In particular, a node is a leaf iff it is labelled with a 0-ary operation symbol. For example, for the signature Σ = { * /2, /0} with a binary operation symbol and a constant, we have the following example of a Σ-tree: * * * . . .
Given a state x in a coalgebra c : C → ΣC, we obtain a Σ-tree t x by unravelling the coalgebra structure at x. More precisely, t x is uniquely defined by (this equation constituting a coinductive definition [Rut00]). For example, the above Σ-tree is obtained by unravelling the coalgebra structure at the state x of the Σ-coalgebra c : {x, y} → Σ{x, y} with c(x) = * (y, x) and c(y) = .
Two states in a Σ-coalgebra are behaviourally equivalent iff they represent the same possibly infinite tree: (3) For a fixed finite set A, the functor given by F X = 2 × X A , where 2 = {0, 1}, sends a set X to the set of pairs of boolean values and functions A → X. An F -coalgebra (C, c) is a deterministic automaton (without initial state). For each state x ∈ C, the first component of c(x) determines whether x is a final state, and the second component is the successor function A → X mapping each input letter a ∈ A to the successor state of x under input letter a. States x, y ∈ C are behaviourally equivalent iff they accept the same language in the usual sense. This functor is (naturally isomorphic to) the polynomial functor for the signature Σ consisting of two operation symbols of arity |A|: 2 × X A ∼ = X |A| + X |A| .
(4) For a commutative monoid (M, +, 0), the monoid-valued functor M (−) sends each set X to the set of finitely supported maps f : X → M , i.e. f (x) = 0 for all but finitely many x ∈ X. In case M is even an abelian group, we also refer to M (−) as a group-valued functor.
An F -coalgebra c : C → M (C) is, equivalently, a finitely branching M -weighted transition system: For a state x ∈ C, c(x) maps each state y ∈ C to the weight c(x)(y) of the transition from x to y. For a map f : corresponding to the standard image measure construction. As the notion of behavioural equivalence of states in M (−) -coalgebras, we obtain weighted bisimilarity (cf. [Buc08,KS13]), given coinductively by postulating that states x, y ∈ C are behaviourally equivalent (x ∼ y) iff For the Boolean monoid (2 = {0, 1}, ∨, 0), the monoid-valued functor 2 (−) is (naturally isomorphic to) the finite powerset functor P ω . For the monoid of real numbers (R, +, 0), the monoid-valued functor R (−) has R-weighted systems as coalgebras, e.g. Markov chains. In fact, finite Markov chains are precisely finite coalgebras of the finite distribution functor, i.e. the subfunctor D ω of R (−) ≥0 (and hence of R (−) ) given by . For the monoid (N, +, 0) of natural numbers, the monoid-valued functor is the bag functor B ω , which maps a set X to the set of finite multisets over X.
Notation 2.3. Note that for every commutative monoid (M, +, 0), we have the canonical summation map It sums up all elements of a bag f of monoid elements, where a single element of the monoid can occur multiple times.
Remark 2.4. For categorically-minded readers, we note that B ω is a monad on the category of sets. Moreover, commutative monoids are precisely the Eilenberg-Moore algebras (e.g. [Awo10]) for B ω . In fact, for every commutative monoid (M, +, 0), the map Σ is the structure of its associated Eilenberg-Moore algebra.

Generic Partition Refinement
We recall some key aspects of our generic partition refinement algorithm [WDMS20], which minimizes a given coalgebra, i.e. computes its quotient modulo behavioural equivalence; we centre the presentation around the implementation and use of our tool.
The algorithm [WDMS20, Algorithm 4.5] is parametrized over a type functor F , represented by implementing a fixed refinement interface, which in particular allows for a representation of F -coalgebras in terms of nodes and edges (by no means implying a restriction to relational systems!). Our previous analysis has established that the algorithm minimizes an F -coalgebra c : C → F C with n nodes and m edges in time O(m · log n), assuming m ≥ n and that the operations of the refinement interface run in linear time. In the present paper, we generalize the analysis, establishing a run time in O(m · log n · p(c)), where p(c) is a factor in the time complexity of the operations implementing the refinement interface which depends on the input coalgebra c : C → F C. For many functors, p(c) = 1, reproducing the previous analysis. In some cases, p(c) is not constant, and our new analysis still applies in these cases, either matching or improving the best known run time in most instances, most notably weighted systems over non-cancellative monoids.
We proceed to discuss the design of the implementation, including input formats of our tool CoPaR for composite functors built from pre-implemented basic blocks and for systems to be minimized (Section 3.1). We then discuss the internal representation of coalgebras in the tool (Section 3.2). Subsequently, we recall refinement interfaces, describe their implementation (Section 3.4), and discuss how to combine them (Section 3.5). Finally, we note implementation details of our tool and, in particular, argue that it realizes the theoretical time complexity (Section 3.6).

Generic System Specification
CoPaR accepts as input a file that represents a finite F -coalgebra c : C → F C, and consists of two parts. The first part is a single line specifying the functor F . Each of the remaining lines describes one state x ∈ C and its one-step behaviour c(x). Examples of input files are shown in Figure 1.

. Functor Specification
Functors are specified as composites of basic building blocks; that is, the functor given in the first line of an input file is an expression determined by the grammar where the character X is a terminal symbol and F is a set of predefined symbols called basic functors, representing a number of pre-implemented functors of type F : Set k → Set. Only for basic functors, a refinement interface needs to be implemented (Section 3.4); for composite functors, the tool derives an appropriate refinement interface automatically (Section 3.5). Basic functors currently implemented include the (finite) powerset functor P ω , the bag functor B ω , monoid-valued functors M (−) , and polynomial functors for finite many-sorted signatures Σ, based on the description of the respective refinement interfaces given in our previous work [WDMS20] and, in the case of M (−) for unrestricted commutative monoids M (rather than only abelian groups), the newly developed interface described in Section 5.2. Since behavioural equivalence is preserved and reflected under converting G-coalgebras into F -coalgebras for a subfunctor G of F [WDMS20, Proposition 2.13], we also cover subfunctors, such as the finite distribution functor D ω as a subfunctor of R (−) . With the polynomial constructs + and × written in infix notation as usual, the currently supported grammar is effectively Note that C effectively ranges over at most countable sets, and A over finite sets. A term T determines a functor F : Set → Set in the evident way, with X interpreted as the argument, i.e. F (X) = T . It should be noted that the implementation treats composites of polynomial (sub-)terms as a single functor in order to minimize overhead incurred by excessive decomposition, e.g. X → {a, b} + P ω (R (X) ) + X × X is composed from the basic functors P ω , R (−) and the 3-sorted polynomial functor Σ(X, Y, Z) = {a, b} + X + Y × Z.

Coalgebra Specification
The remaining lines of an input file define a finite F -coalgebra c : C → F C. Each line of the form x:␣t defines a state x ∈ C, where x is a variable name, and t represents the element t = c(x) ∈ F C. The syntax for t depends on the specified functor F , and follows the structure of the term T defining F ; we write t ∈ T for a term t describing an element of F C: • t ∈ X is given by one of the named states specified in the file.
• t ∈ M (T ) is given by t ::= {t 1 :␣m 1 , . . . , t n :␣m n } with m 1 , . . . , m n ∈ M and t 1 , . . . , t n ∈ T , denoting µ ∈ M (T C) with µ(t i ) = m i for i = 1, . . . n, and µ(t) = 0 for t / ∈ {t 1 , . . . , t n }. defines an F -coalgebra for the functor F X = P ω ({a, b} × R (X) ), with a single state x, having two a-successors and one b-successor, where successors are elements of R (X) . One a-successor is constantly zero, and the other assigns weight 2.4 to x; the b-successor assigns weight −8 to x. Two more examples are shown in Fig. 1.

Generic Input File Processing
After reading the functor term T , the tool builds a parser for the functor-specific input format and parses an input coalgebra specified in the above syntax into an intermediate format described in the next section. In the case of a composite functor, the parsed coalgebra then undergoes a substantial amount of preprocessing that also affects how transitions are counted; we defer the discussion of this point to Section 3.5, and assume for the time being that F : Set → Set is a basic functor with only one argument.

Internal Representation of Coalgebras
New functors are added to the framework by implementing a refinement interface (Definition 3.5). The interface relates to an abstract encoding of the functor and its coalgebras in terms of nodes and edges: Definition 3.1 [WDMS20]. An encoding of a functor F consists of a set A of labels and a family of maps one for every set X. The encoding of an F -coalgebra c : C → F C is given by the map and we say that the coalgebra has n = |C| states and m = x∈C | (c(x))| edges.
An encoding does by no means imply a reduction from F -coalgebras to B ω (A × (−))-coalgebras, i.e. the notions of behavioural equivalence for B ω (A × (−)) and F , respectively, can be radically different. The encoding just fixes a representation format.
Remark 3.2. Categorically-minded readers will notice that is not assumed to be a natural transformation. In fact, fails to be natural in all encodings we have implemented except the one for polynomial functors.
Encodings typically match how one intuitively draws coalgebras of various types as certain labelled graphs. We briefly recall three examples below; see [WDMS20] for more. We note that so far, we see no general method for deriving an encoding of a functor, which therefore requires invention.
(1) We have mentioned in Example 2.2(1) that finitely branching transition systems are the coalgebras for F = P ω . For the encoding we choose the singleton set A = 1 of labels, and : (2) For a monoid-valued functor F = M (−) (see Example 2.2(4)) we take A = M =0 , the non-zero elements of M , and define : interpreted as a bag. Special cases are the group-valued functors G (−) for an abelian group G, in particular R (−) and its subfunctor D ω , whose coalgebras are Markov chains (cf. Fig. 1). In the last case, we formally inherit A = R =0 from the encoding of R (−) but can actually restrict to A = (0, 1]. (3) For a polynomial functor F = Σ, the set of labels is A = N, and the map : ΣX → B ω (N × X) is given by The implementation of a basic functor then consists of two ingredients: (1) a parser that transforms the syntactic specification of an input coalgebra (Section 3.1) into the encoded coalgebra in the above sense, and (2) an implementation of the refinement interface, which is motivated next.
Splitting of a block B into smaller blocks results in further refinement of the block {x, y, z}

Splitting Blocks by F 3
In order to understand the requirements on an interface encapsulating the functor specific parts of partition refinement, let us look at one step of the algorithm which is crucial for the overall run time complexity. As indicated in the introduction, partition refinement algorithms in general maintain a partition of the state space, i.e. a disjoint decomposition of the state space into sets called blocks, adhering to the invariant that states in different blocks are behaviourally inequivalent, and ensuring upon termination that states in the same block are behaviourally equivalent. Initially, the algorithm tentatively identifies all states of a coalgebra c : C → F C in a partition consisting of only one block, C. Then, the algorithm splits this block into smaller blocks whenever states of the coalgebra turn out to be behaviourally inequivalent and successively applies this procedure to the new blocks until no further splitting is necessary. In the first iteration, the algorithm separates states , so this first step separates final from non-final states. In the classical Paige-Tarjan algorithm [PT87], i.e. for F X = P ω X, deadlock states and states with at least one outgoing transition are separated from each other. In the subsequent steps, the representation of the coalgebra as labelled edges (i.e. · c : C → B ω (A × C)) is used to refine the partition further. Information about inequivalence of states is propagated from successor states to predecessor states; this is iterated until a (greatest) fixed point is reached, i.e. until no new behavioural inequivalences are discovered.
In this process of propagating inequivalences, suppose that the partition refinement has already computed a block of states B ⊆ C in its partition and that states in S ⊆ B have different behaviour from those in B \ S (as illustrated in Figure 2). From this information, the algorithm infers whether states x, y ∈ C that are in the same block and have successors in B exhibit different behaviour and thus have to be separated. Let us explain on two concrete instances how this inference is achieved.
Example 3.4. (1) We mentioned in Example 2.2(1) that for F = P ω , a coalgebra is a finitely branching transition system. A partition on the state space represents a bisimulation if it has the following property: For every pair of states x, y in the same block • x has a successor in B iff y has one, and • x has a successor in C \ B iff y has one.
This means that when we split the block B into the two blocks S and B \ S, then x and y can stay in the same block provided that (a) x has a successor in S iff y has one and (b) x has a successor in B \ S iff y has one. Equivalently, we can express these conditions by the equality where χ S , χ B\S : C → 2 are the usual characteristic functions of the subsets S, B \ S ⊆ C, respectively. Indeed, the function maps a state x ∈ C to the set encoding whether x has successors in S resp. in B \ S. In fact, we see that x has a successor in S iff (1, 0) ∈ P and x has a successor in B \ S iff (0, 1) ∈ P . Moreover, x has a successor in C \ B iff (0, 0) ∈ P . Since S and B \ S are disjoint, we have (1, 1) ∈ P . Similar observations apply to y. Thus, in order to maintain the desired property, we need to separate x and y iff (3.3) holds. 5 (2) In the example of Markov chains, i.e. F = D ω , we can make a similar observation. Here, x, y ∈ C can stay in the same block if the weights of all transitions from x to states in S sum up to the same value as the weights of all transitions from y to states in S and similarly for B \ S; that is if Analogously as in item (1), we can equivalently express this by stating that x and y can stay in the same block iff the following equation holds The map Two states x, y that are in the same block before splitting B into S and B \ S have the same accumulated weight of transitions to B and also to C \ B, so x and y can stay in the same block iff (3.4) holds.
It is now immediate how (3.3) and (3.4) are generalized to an arbitrary functor F : States x and y stay in the same block in a refinement step iff F χ S , χ B\S (c(x)) = F χ S , χ B\S (c(y)). (3.5) As we have seen in Example 3.4, (1, 1) is never in the image of χ S , χ B\S : C → 2 × 2, because S and B \ S are disjoint. Hence we can restrict its codomain to 3 = {0, 1, 2} by defining the map (3.6) Hence, the criterion in (3.5) can be simplified: the states x and y stay in the same block in the refinement step iff We conclude that the generic partition refinement algorithm needs to compute the value F χ B S (c(x)) ∈ F 3 for every state x. Whenever states x and y are sent to different values by F χ B S · c, we know that they are behaviourally inequivalent and need to be moved to separate blocks.

Refinement Interfaces
Computing the values F χ B S (c(x)) for states x of interest is the task of the refinement interface. We start with its formal definition and then provide an informal explanation of its ingredients.
Definition 3.5 [WDMS20]. Given an encoding (A, ) of the set functor F , a refinement interface for F consists of a set W of weights and functions Fig. 3. Computation of the value of type F 3 for x satisfying the coherence condition that there exists a family of weight maps w : PX → (F X → W ) (not themselves part of the interface to be implemented!), one for each set X, such that for t ∈ F X and S ⊆ B ⊆ X. Here, the notation {[ a | · · · ] } in the arguments of init and update indicates multiset comprehension, i.e. multiple occurrences of a label a ∈ A in (t) result in multiple occurrences of a in the bag The intuition behind the refinement interface can be understood best if we consider an F -coalgebra c : C → F C, put X := C, fix a state x ∈ C, and instantiate t := c(x). As one can see from the types of init and update, the refinement interface is designed in such a way that it computes values of the functor specific type W of weights that the calling algorithm saves for subsequent calls to update. For every block B ⊆ C, the value w(B)(c(x)) ∈ W is the accumulated weight of edges from x to (states in) B in the coalgebra (C, c). In principle, values in W can contain whatever information about the set of edges from x to B helps the implementation of update to compute the result value of type F 3, which is what the caller is actually interested in. However, while more information contained in the second argument of update helps this function to achieve this task more efficiently, both init and update also have to compute values of W , which may require more effort if these values carry too much information. This trade-off is guided by the two equational axioms for init and update, which represent a contract that their implementation for a particular functor has to fulfil. The first axiom assumes that init receives in its first argument the output behaviour of x -e.g. whether x is final or non-final (in the case of automata), or for F = P ω whether x has any successors or is a deadlock state -and in its second argument the bag of labels of all outgoing edges of x in the graph representation of (C, c). The axiom then requires init to return the accumulated weight w(C)(c(x)) ∈ W of all edges from x to the the whole state set C, which is the only block in the initial partition of C. This corresponds to the use of init in the actual algorithm, namely to initialize the weight value in W that is later passed to update.
The operation update is called whenever the algorithm derives that a block B in the partition of C contains behaviourally inequivalent states, i.e. when the block B has been split into smaller blocks, including, say, S ⊆ B, like in Figure 3. This means that every state x ∈ S is behaviourally inequivalent to every state x ∈ B \ S. The first parameter of update is then the bag of labels of all edges from x to S, and the second parameter is the weight w(B)(c(x)) ∈ W of all edges from x to B, which the caller has saved from return values of previous calls to init and update, respectively. From only this information (in particular, update does not know x, S or B explicitly), update computes the triple consisting of the weight w(S)(c(x)) of edges from x to S, the result of F χ B S · c(x), and the weight w(B \ S)(c(x)) of edges from x to B \ S. The two weights are stored by the caller in order to supply them to update in the next refinement step, and F χ B S (c(x)) is used to split the block containing x according to (3.7).
For a given functor F , it is usually easy to derive the operations init and update once appropriate choices of the set W of weights and the weight maps w are made. We now recall refinement interfaces for some functors of interest; see [WDMS20] for the verification of the axioms.
(1) For F = P ω , we put W = 2 × N. For further use in the definition of the weight maps and the refinement interface routines, we define an auxiliary function The weight maps are defined by This records whether there is an edge to X \ B and counts the numbers of edges to states in the block B. This number is crucial to be able to implement the update routine which needs to return P ω χ B S (c(x)) ∈ P ω 3 for a coalgebra c : C → F C and a state x ∈ C. Hence, update needs to determine whether x has an edge to B \ S -i.e. whether 1 ∈ P ω χ B S (c(x)) -given only the number k of edges from x to S and the weight w(B)(c(x)) (cf. Figure 3). This task can only be accomplished if w(B)(c(x)) holds the number n of edges from x to B: with this information, there are edges to B \ S iff n − k > 0. Recall from Example 3.3(1) that the set of labels is A = 1. Hence, every bag of labels is just a natural number because B ω A = B ω 1 ∼ = N. Consequently, the interface routines are implemented as follows: where n C\S := max(n C − n S , 0), ∨ : 2 × 2 → 2 is disjunction, and the middle return value in P ω 3 is written as a bit vector of length three. The axioms in Definition 3.5 ensure that n S , n C , n C\S can be understood as the numbers of edges to S, C, and C \ S, respectively. The technique of remembering the number of edges from every state to every block is already crucial in the classical algorithm by Paige and Tarjan [PT87]. In Section 5.2, we will generalize this trick from P ω to arbitrary monoid-valued functors.
(2) For the group-valued functor G (−) for the abelian group G, we put W = G (2) ∼ = G × G, and the weight map is defined by for every subset B ⊆ X.
The refinement interface routines are implemented as follows: where Σ: B ω G → G is the summation map of Notation 2.3. The terms under the braces only serve as the intuition when considering a coalgebra c : C → F C and a subblock S ⊆ B of a block B ⊆ C. The function init is called with the bag of labels of outgoing transitions of some element x ∈ C and the sum g of all these labels. Since w(C)(c(x)) = (0, G (!) (c(x))) for the whole set C, the init function simply returns (0, g).
In the above update routine we have used that G has inverses. In Section 5, we will define a refinement interface for the monoid-valued functor M (−) for monoids M that are not groups, using additional data structures to make up for the lack of inverses in M .
(3) As special instances of the previous item we obtain refinement interfaces for the functors R (−) and Z (−) .
(4) For a polynomial functor F = Σ, we put W = Σ2 and for every subset B ⊆ X.
For a coalgebra c : C → ΣC and B ⊆ C, this means that w(B)(c(x)) consists of an operation symbol σ ∈ Σ and a bit vector of length ar(σ). The bit vector specifies which successor states of x are in the set B. Recall from Example 3.3(3) that the encoding of Σ uses as labels A = N. With 1 = { * }, the init routine is given by For update, we first define the map that computes the middle component of the result: For subsets S ⊆ B ⊆ X and a state y ∈ X, this sum computes χ B S (y) from χ S (y) (given by i ∈ I) and χ B (y) (given by the bit b i ). From the value in Σ3 thus computed, we can derive the other components of the result of update. For k ∈ 3, let (k =) : 3 → 2 be the map that compares its parameter with k, i.e.
The update routine now calls update and derives the values of type Σ2: In order to ensure that iteratively splitting blocks using F χ B S in each iteration correctly computes the minimization of the given coalgebra, we need another property of the functor F : All functors mentioned in Example 2.2 are zippable. Moreover, zippable functors are closed under products, coproducts (both formed point-wise), and subfunctors [WDMS20, Lemma 5.4]. However, they are not closed under functor composition: for example, P ω P ω fails to be zippable [WDMS20, Example 5.9]. We deal with this problem by a reduction discussed in Section 3.5 below.
For zippable set functors F with a refinement interface, we have presented a partition refinement algorithm [WDMS20, Algorithm 7.7]. The main correctness result states that for a zippable functor equipped with a refinement interface, our algorithm correctly minimizes the given coalgebra. The low time complexity of our algorithm hinges on the time complexity of the implementations of init and update. We have shown previously [WDMS20,Theorem 7.16] that if both init and update run in linear time in its input of type B ω A alone (i.e. independently of the input coalgebra size), then our generic partition refinement algorithm runs in time O((m + n) · log n) on coalgebras with n states and m edges (cf. Definition 3.1). In order to cover instances where the run time of init and update depends also on the input coalgebra, we make this dependence formally explicit: Definition 3.8. The refinement interface for a functor F has run time factor p(c) if for every map c : X → F Y (in particular for every coalgebra c : C → F C), (1) the following calls to init and update run in time O(| | · p(c)) for x ∈ X, t = c(x), and S ⊆ B ⊆ X: (2) equality of values in {F χ B S (c(q)) | q ∈ C, S ⊆ B ⊆ X} ⊆ F 3 can be checked in time O(p(c)). If p(c) only depends on the number of states n and number of transitions m in c, then we write p(n, m) in lieu of p(c). Note that the above calls to init and update are precisely those from the axioms of the refinement interface in Definition 3.5.
Example 3.9. The powerset functor P ω and the group-valued functors G (−) have run time factor p(c) = 1 [WDMS20, Examples 6.11]. For a signature Σ where the arity of operation symbols is bounded by b ∈ N, the refinement interface has run time factor p(c) = 1. Otherwise we define the rank of a finite Σ-coalgebra c : C → ΣC to be the maximal arity that appears in c: rank(c) = max{ar(σ) | σ(x 1 , . . . , x n ) = c(x) for some x 1 , . . . , x n and x in C}.
It is easy to see that the refinement interface for Σ (see [WDMS20, Examples 6.11.3]) has run time factor p(c) = rank(c).
Since we can now describe the run time of the refinement-interface in a more fine-grained way, we can lift this to the run time analysis of the overall partition refinement algorithm.
Theorem 3.10. Let F be a zippable functor equipped with a refinement interface with run time factor p(c). Then the algorithm computes the behavioural equivalence relation on an input F -coalgebra c : C → F C with n states and m transitions in O((m + n) · log n · p(c)) steps.
Proof. The case where p(c) = 1 is proved in [WDMS20, Theorem 6.22]. We reduce the general case to this one as follows. Observe that the previous complexity analysis counts the number of basic operation performed by the algorithm (e.g. comparisons of values of type F 3) including those performed by init and update. In that analysis init and update were assumed to have run time O(| |), and the total number of basic operations of the algorithm is then O((m + n) · log n).
For the reduction, we consider 'macro' operations that run in O(p(c)) time. In particular, every constanttime operation that is performed in the algorithm can be viewed as a macro performing precisely this single operation. Then we can view the generalized run time assumptions on the refinement-interface as follows: (1) all calls to init and update on ∈ B ω A perform O(| |) macro operations (each of which takes O(p(c)) time).
(2) all values of type F 3 that arise during the execution of the algorithm are in the set in Definition 3.8(2). Hence, every comparison of such values is done in one macro operation, which takes O(p(c)) steps. By the previous complexity analysis for p(c) = 1, the partition refinement for a coalgebra with n states and m edges performs O((m + n) · log n) macro calls. Thus, the overall run time lies in O((m + n) · log n · p(c)) as desired.
Obviously, for p(c) ∈ O(1), we obtain the previous complexity.
Example 3.11. For a (possibly infinite) signature Σ, the coalgebraic partition refinement runs, by Theorem 3.10 and Example 3.9, in time O(r ·(m+n)·log n) for an input coalgebra c : C → ΣC with rank r, n states, and m edges. Note that every state x ∈ C has at most r many outgoing edges in the graph representation. Hence, we have m ≤ r · n so that the time complexity may be simplified to O(r 2 · n · log n).
In Section 5 we will discuss how Theorem 3.10 instantiates to the example that mainly motivated the above generalization of the complexity analysis: weighted systems with weights from an unrestricted commutative monoid.

Combining Refinement Interfaces
In addition to supporting genericity via direct implementation of the refinement interface for basic functors, our tool is modular in the sense that it automatically derives a refinement interface for functors built from the basic ones according to the grammar (3.1). In other words, for such a combined functor the user does not need to write a single line of new code. Moreover, when the user implements a refinement interface for a new basic functor, this automatically extends the effective grammar. For example, our tool can minimize systems of type However, while all basic functors from which F is formed are zippable (see Definition 3.7), there is no guarantee that F is so because zippable functors are not closed under functor composition in general. In order to circumvent this problem, a given F -coalgebra is transformed into one for the functor This functor is obtained as the sum of all basic functors involved in F , i.e. of all the nodes in the visualization of the functor term F (Figure 4). Then the components of the refinement interfaces of the four functors involved, viz. D ω , Σ, P ω , and B ω , are combined by disjoint union +. The transformation of a finite coalgebra c : C → F C into a finite F -coalgebra introduces a set of intermediate states for each edge in the visualization of the term F ; we have labelled the edges in Figure 4 by these sets. The construction starts with X := C and constructs a finite F -coalgebra on the set C := X + Y + Z 1 + Z 2 as follows. The set Y contains an intermediate state for every D ω -edge out of a state x ∈ X, i.e.
This also yields a map c X : and by a similar definition as for Y , we obtain finite sets Finally, intermediate states in Z 1 and Z 2 have successors in P ω X and B ω X, respectively, which yields (inclusion) maps c Z1 : Z 1 → P ω X and c Z2 : Z 2 → B ω X. Putting these maps together we obtain a finite F -coalgebra where can is the canonical inclusion map. The minimization of this F -coalgebra yields the minimization of the given F -coalgebra (C, c). The details of the construction in full generality and its correctness are established in [WDMS20, Section 8].
Remark 3.12. In op. cit. we show that we can derive a refinement interface for F from the refinement interfaces of the basic functors used. It is easy to see that the run time factor of the refinement interface of the above F is where the summands on the right-hand side are the run time factors of the respective refinement interfaces of the building blocks. Note that here we use the full generality of Definition 3.8, i.e. that c X , c Y , c Z1 and c Z2 are not required to be coalgebras but only maps of the shape X → HY for the relevant functor H.

Combination by product
CoPaR moreover implements a further optimization of this procedure that leads to fewer intermediate states in the case of polynomial functors Σ: Instead of putting the refinement interface of Σ side by side with those of its arguments, CoPaR includes a systematic procedure to combine the refinement interfaces of the arguments of Σ into a single refinement interface. For instance, starting from F X = D ω (N × P ω X × B ω X) as above, a given F -coalgebra is transformed into a coalgebra for the functor F X = D ω X + N × P ω X × B ω X, effectively inducing intermediate states in Y as above but avoiding Z 1 and Z 2 . In order to run the generic partition refinement algorithm for F -coalgebras, we need a refinement interface for F . CoPaR derives a refinement interface for F by first combining the refinement interfaces of P ω , B ω , and that of the constant functor X → N, yielding a refinement interface for X → N × P ω X × B ω X. Then, this refinement interface is combined with that of D ω , finally yielding one for F . The combination of refinement interfaces along coproducts of functors is already described in [WDMS20, Sec. 8.3]; in the following, we describe how refinement interfaces are combined along cartesian product ×.
Construction 3.13. Suppose we are given a finite family of functors such that each F i has the encoding i : F i X → B ω (A i × X) with label set A i and is equipped with the refinement interface We construct an encoding and a refinement interface for F X = i∈I F i X as follows. The encoding of F is given by taking as the disjoint union of the label sets A i and the obvious component-wise definition of : The set W of weights consists of tuples of weights in W i , and the weight function simply applies the map The refinement interface routines of F now have the following types: For their definition, we introduce the following auxiliary function π i , which restricts bags of labels to only those labels that come from A i : for every i ∈ I.
Then we define init by init((t j ) j∈I , ) i = init i (t i , π i ( )) for every i ∈ I, and we define update as the composite where φ is the obvious bijection reordering tuples in the evident way.
Proposition 3.14. Let F i : Set → Set, i ∈ I, be equipped with refinement interfaces with run time factors p i (c). Then Construction 3.13 defines a refinement interface for F = i∈I F i with run time factor In particular, if the refinement interface of every F i has run time factor p(c) = 1, then so does the refinement interface for F = i∈I F i .
Proof. To simplify notation in the composition of maps, we define the following filter map for every S ⊆ X and i ∈ I: Using the filter maps f S , we can rephrase the axioms of the refinement interface in Definition 3.5 as the commutativity of the following diagrams In the proof that these diagram commute, we will use the equalities for i ∈ I and S ⊆ X. (3.10) This diagram commutes because for t ∈ j∈I F j X and a ∈ A i , we have Moreover, we will use that for every family (Y j ) j∈I of sets, every map b : B → B , and every i ∈ I, the diagram commutes, as verified by straightforward evaluation of the maps involved. The categorically-minded reader will notice that commutation of the right-hand inner quadrangle is simply naturality of the projection maps pr i , as indicated in the diagram; we will use this property of pr i again later, referring to it as "naturality of pr i " (without requiring further understanding of the concept of natural transformation). Furthermore, we clearly have commutative squares pr i for every i ∈ I. (3.12) We are ready to verify the axioms of the refinement interface. To this end, we use that the product projections pr i , i ∈ I, form a jointly injective family. This means that for every pair of maps f, g : Z → j∈I Y j we have that pr i · f = pr i · g for all i ∈ I implies f = g. (3.13) Axiom for init. For every i ∈ I, the outside of the following diagram commutes because all its inner parts commute, for the respective indicated reasons: By (3.13), we obtain commutation of the left-hand triangle in (3.9) as desired.
Axiom for update. For all S ⊆ B ⊆ X, the outside of the diagram below commutes because all its inner parts commute for the respective indicated reasons: , which is the desired right-hand triangle in (3.9).
Run time factor. Both init and update preprocess their parameters in linear time (via π i and by accessing elements of tuples), before calling the init i and update i routines of all F i . Since |I| is constant, this results in a run time of O(| | · max i∈I p i (c)) for ∈ B ω ( A i ).
We order F 3 lexicographically by assuming a total order on the index set I: x < y in i∈I F i 3 iff there is i ∈ I with pr i (x) < pr i (y) and pr j (x) = pr j (y) for all j < i.
This comparison takes time O(max i∈I p i (c)), again because |I| is constant.
Remark 3.15. In summary we obtain that the run time factor p(c) for a composite functor F is the maximum of the respective run time factors of the refinement interfaces of the basic functors from which F is built. Specifically, suppose that F is built from basic functors G 1 , . . . , G n using composition, product, and coproduct, and that G 1 , . . . , G n have refinement interfaces with respective run time factors p 1 (c), . . . , p n (c). Then the modularity mechanism decomposes an F -coalgebra c : C → F C into maps f i : X i → G i Y i , for 1 ≤ i ≤ n (i.e. one map per block in the illustration in Figure 4). The run time factor for the refinement interface arising by modular construction is given by (3.14) This is because the refinement interface for a particular functor F i only sees the labels and weights for the map f i and never those from other sorts.

Implementation Details
Our implementation is geared towards realizing both the level of genericity and the efficiency afforded by the abstract algorithm. Regarding genericity, each basic functor is defined (in its own source file) as a single Haskell data type that implements two type classes: (1) the class RefinementInterface with functions init and update, which directly corresponds to the mathematical notion (Definition 3.5), and (2) the class ParseMorphism, which provides a parser that defines the coalgebra syntax for the functor. This means that new basic functors can be implemented without modifying any of the existing code, except for registering the new type in a list of functors (the existing functor implementations are in src/Copar/Functors). The type class modelling refinement interfaces is defined as follows in CoPaR: Here, the type f serves as the name of the functor F of interest, and Label f is the type representing the label set A from the encoding of the functor. Similarly, the type Weight f represents W and the types F1 f and F3 f represents the sets F 1 and F 3. For example, if we want to implement the refinement interface for F X = R (X) explicitly we can write the following: data R x = R x type instance Label R = Double type instance Weight R = (Double,Double) type instance F1 R = Double type instance F3 R = (Double,Double,Double) instance RefinementInterface R where init g e = (0, g) update l (r,b) = ((r + b -sum l, sum l), (r, b -sum l, sum l), (r + sum l, b -sum l)) The first line defines a parametrized type R (with one constructor of the same name) representing the functor (R (−) in this case), with the parameter x representing the functor argument. The next lines define the types representing the sets that appear in the refinement interface, and the instance of RefinementInterface for R implements the init and update routines for R (−) as we have defined them before (Example 3.6.(2)). Concerning efficiency, CoPaR faithfully implements our imperative algorithm [WDMS20] in the functional language Haskell. We have made sure that this implementation actually realizes the good theoretical complexity of the algorithm. This is achieved by ample use of the ST monad [LP94] and by disabling lazy evaluation for the core parts of the algorithm using GHC's Strict pragma. The ST monad also enables the use of efficient data structures like mutable vectors where possible.
One such data structure, which is central to the efficient implementation of the generic algorithm, is a refinable partition, which stores the blocks of the current partition of the state set C of the input coalgebra during the execution of the algorithm. This data structure has to provide constant time operations for finding the size of a block, marking a state and counting the marked states in a block. Splitting a block in marked and unmarked states must only take linear time in the number of marked states of this block. Valmari and Franceschinis [VF10] have described a data structure (for use in Markov chain lumping) fulfilling all these requirements, and this is what we use in CoPaR.
Our abstract algorithm maintains two partitions P, Q of C, where P is one transition step finer than Q; i.e. P is the partition of C induced by the map F q · c : C F Q, where q : C Q is the canonical quotient map assigning to every state the block which contains it. The key to the low time complexity is to choose in each iteration a subblock S in P whose surrounding compound block B in Q (with S ⊆ B) satisfies 2 · |S| ≤ |B|, and then refine Q (and P ) as explained in Section 3.4 (see Figure 3). This idea goes back to Hopcroft [Hop71], and is also used in all other partition refinement algorithms mentioned in the introduction. Our implementation maintains a queue of subblocks S satisfying the above property, and the termination condition P = Q of the main loop then translates to this queue being empty.
One optimization that is new in CoPaR in relation to previous work [VF10,WDMS20] is that weights for blocks of exactly one state are not computed, because such blocks cannot be split any further. This has drastic performance benefits for inputs where the algorithm produces many single-element blocks early on, e.g. for nearly minimal systems or fine grained initial partitions, see [Dei19] for details and measurements.

Instances
Many systems are coalgebras for functors formed according to the grammar (3.2). In Table 1, we list various system types that can be handled by our algorithm, taken from [WDMS20] except for Markov chains with weights in a monoid and weighted tree automata, which are new in the present paper. In all cases, m is the number of edges and n is the number of states of the input coalgebra, and we compare the run time of our generic algorithm with that of specifically designed algorithms from the literature. In most instances, we match the complexity of the best known algorithm. In the one case where our generic algorithm is asymptotically slower (LTS with unbounded alphabet), this is due to assuming a potentially very large number of alphabet letters -as soon as the number of alphabet letters is assumed to be polynomially bounded in the number n of states, the number m of transitions is also polynomially bounded in n, so log m ∈ O(log n). This argument also explains why '<' and '=', respectively, hold in the last two rows of Table 1, as we assume Σ to be (fixed and) finite; the case where Σ is infinite and unranked is more complicated. Details on the instantiation to weighted tree automata are discussed in Section 6.
Our algorithm and tool can handle further system types that arise by combining functors in various ways. For instance, so-called simple Segala systems are coalgebras for the functor P ω (A × D ω (−)), and are minimized by our algorithm in time O((m + n) · log n), improving on the best previous algorithm [BEM00] and matching the complexity of the algorithm by Groote et al. [GVdV18]. Other type functors for various species of probabilistic systems are listed in [BSdV03], including the ones for general Segala systems, reactive systems, generative systems, stratified systems, alternating systems, bundle systems, and Pnueli-Zuck systems. Hence, CoPaR provides an off-the-shelf minimization tool for all these types of systems.
Remark 4.1 (Initial partitions). Note that in the classical Paige-Tarjan algorithm [PT87], the input includes an initial partition. Initial partitions as input parameters are covered via the genericity of our algorithm. In fact, initial partitions on F -coalgebras are accommodated by moving to the functor F X = N × F X, where the first component of a coalgebra assigns to each state the number of its block in the initial partition. Under the optimized treatment of the polynomial functor N × (−) (Section 3.5), this transformation does not enlarge the state space and also leaves the run time factor p(c) unchanged [WDMS20]; that is, the asymptotic run time of the algorithm remains unchanged under adding initial partitions.

Weighted Transition Systems
We have seen in Example 3.6(2) that weighted systems with weights in a group easily fit into our framework of generic partition refinement, since we can use inverses to implement the refinement interface efficiently. In Table 1. Asymptotic complexity of the generic algorithm (2017) compared to specific algorithms, for systems with n states and m transitions, respectively m Pω nondeterministic and m Dω probabilistic transitions for Segala systems. For simplicity, we assume that m ≥ n and that A and Σ are finite and fixed. M is a possibly infinite and possibly non-cancellative commutative monoid.

System
Functor Run Time Specific algorithm Year Reference Labelled Transition Systems Markov Chains Weighted Systems M (−) m · log n · log(min(m, |M |)) the following we generalize this by allowing the weighted systems to be weighted in an arbitrary commutative monoid M that does not necessarily have inverses. Systems with weights in a monoid are studied by Klin and Sassone [KS13], and they show that behavioural equivalence of M (−) -coalgebras is precisely weighted bisimilarity (cf. Example 2.2.(4)). Weighted transition systems with weights in a monoid also serve as a base for -and are in fact a special case of -weighted tree automata as studied by Högberg et al. [HMM09], which we will discuss in the next section.
In the following we distinguish between cancellative and non-cancellative monoids because the respective refinement interfaces for M (−) are implemented differently, with the interface for a cancellative monoid allowing for a lower time complexity.

Cancellative Monoids
Recall that a commutative monoid (M, +, 0) is cancellative if a + b = a + c implies b = c. Clearly, every submonoid of a group is cancellative, for example (N, +, 0) and (Z \ {0}, · , 1). It is well-known that every cancellative commutative monoid M embeds into an abelian group G via the standard Grothendieck construction. Explicitly, and the group structure is given by the usual component-wise addition on the product: The embedding of M into G is given by the monoid homomorphism Hence, we have in total: Corollary 5.1. A monoid is cancellative iff it is a submonoid of a group.
Informally speaking, the inverses of elements of a cancellative monoid M exist, albeit not within M itself. The embedding M → G extends to a component-wise injective natural transformation α : M (−) → G (−) , and therefore, computing behavioural equivalence for M (−) reduces to that of G (−) [WDMS20, Proposition 2.13]. Hence, we can convert every M (−) -coalgebra c : and use the refinement interface for G (−) from Example 3.6(2), obtaining: Corollary 5.2. Let M be a cancellative monoid. Then partition refinement on a weighted transition system c : C → M (C) with n states and m transitions runs in time O((m + n) · log n).
Indeed, this is immediate from Theorem 3.10 for p(c) = 1.

Non-cancellative Monoids
Assume given a non-cancellative commutative monoid (M, +, 0). Then M does not embed into a group, so we need a new refinement interface for the type functor M (−) of M -weighted transition systems in our algorithm, rather than being able to reuse the one for group-valued functors as in the case of cancellative commutative monoids (Section 5.1). The basic idea in the construction of a refinement interface for M (−) is to incorporate bags of monoid elements into the weights, and consider subtraction of bags. We implement this idea as follows. We use the same encoding of M (−) as for group-valued functors: The refinement interface for M (−) has weights W = M × B ω (M =0 ) and uses weight functions where for a, b ∈ B ω Y , the bag a − b is defined by (a − b)(y) = max(0, a(y) − b(y)).
As for groups, we denote elements of M (3) as triples of elements from M . Proof. We define the weight functions w : PX → (M (X) → M × B ω (M =0 )) as in (5.1), and show that init and update then satisfy the two axioms in Definition 3.5. Let t ∈ F X. For the first axiom we compute as follows: In the first of the above multiset comprehensions, different x ∈ X with t(x) = m, m = 0 lead to multiple occurrences of m in the multiset, and similarly in the second multiset comprehension. Let us now verify the second axiom concerning update. In order to simplify the notation, we define The accumulated weight of edges into B ⊆ X in t ∈ M (X) is then denoted by B t := Σ(t ↓ B) = x∈B t(x).
In this notation, we have (t ↓ B)).
In order to determine the run time factor of the above refinement interface, we need to describe how we handle elements of W = M × B ω (M =0 ) in the routines of the refinement interface. We implement a bag in B ω (M =0 ) as a balanced search tree with keys M =0 and values N. In addition to the standard structure of a balanced search tree, we store in every node the value Σ(b), where b is the bag encoded by the subtree rooted at that node. Hence, for every bag b ∈ B ω (M =0 ), the value Σ(b) is immediately available at the root node of the search tree for b. For CoPaR, we have implemented the basic operations on balanced search trees following Adams [Ada93]. For the complexity analysis, we prove that maintaining the values Σ(b) in the nodes only adds constant overhead to the operations on search trees, so that we obtain Proof. For a node x in a binary search tree encoding a bag in B ω (M =0 ) as described above, we write Σ(x) for the value in M stored at that node. Note that our search trees cannot have more nodes than the size |M | of their index set, and the number of nodes is also not greater than the number m of all edges. Hence, their size is bounded by min(|M |, m).
Recall (e.g. [CLR90,Section 14]) that the key operations insert, delete and search have logarithmic time complexity in the size of a given balanced binary search tree.
We need to argue that maintaining the values Σ(x) in the nodes does not increase this complexity. This is obvious for the search operation as it does not change its argument search tree. For insert and delete, recall from op. cit. that these operations essentially trace down one path starting at the root to a node (at worst, a leaf) of the given search tree. Additionally, we need to rebalance the search tree after inserting or deleting a node. This is done by tracing back the same path to the root and (possibly) performing rotations on the nodes occurring on that path. Rotations are local operations changing the structure of a search tree but preserving the inorder key ordering of subtrees (see Figure 5).
In addition, when inserting or deleting a node x we must recompute the Σ(y) of all nodes y along the path from the root to x when we trace that path back to the root. This can clearly be performed in constant time for each node y since Σ(y) is the sum (in M ) of Σ(y 1 ) and Σ(y 2 ), which are stored at the child nodes y 1 and y 2 of y, respectively. In summary, we see that maintaining the desired summation values only requires an additional constant overhead in the backtracing step. Consequently, the operations of our balanced binary search trees run in time O(log min(|M |, m)). Subtraction c − of bags c, performs | |-many calls to delete on c, and computing the sum Σ( ) takes time O(| |), since , the bag of labels passed to update, is not represented as a search tree.
Hence, we obtain the desired overall time complexity O(| | · log min(|M |, m)) of update.
Similarly, for init(f, ), we need | |-many calls to insert in order to initialize the search tree representing B ω (M =0 ).
Remark 5.5. It is no coincidence that we use B ω (M =0 ) in the refinement interface. In fact, for every set X, the set B ω X, with union of bags as addition, is the free commutative monoid on X. Moreover, B ω X is cancellative, so that we may use a form of subtraction on bags. Thus, we see that B ω (M =0 ) is a canonical cancellative monoid containing M =0 (via the identification of elements of M =0 with singleton bags). Moreover, the summation map is the canonical unique monoid homomorphism freely extending the identity map on M . Thus, this map allows us to go back from bags to monoid elements. This is essentially the point of Eilenberg-Moore algebras in general (cf. Remark 2.4).
Corollary 5.6. Let M be any commutative monoid. Then partition refinement on weighted transition systems c : C → M (C) with n states and m transitions runs in time O((m + n) · log n · log min(|M |, m)).
Indeed, this is immediate by Proposition 5.4 and Theorem 3.10.

Weighted Tree Automata
We proceed to take a closer look at weighted tree automata as a worked example. It is this example that mainly motivates the discussion of non-cancellative monoids in the last section, since in this case the generic algorithm improves on the run time of the best known specific algorithms in the literature.
Weighted tree automata simultaneously generalize tree automata and weighted (word) automata. A partition refinement construction for weighted automata (w.r.t. weighted bisimilarity) was first considered by Buchholz [Buc08, Theorem 3.7]. Högberg et al. first provided an efficient partition refinement algorithm for tree automata [HMM09], and subsequently for weighted tree automata [HMM07]. Generally, tree automata differ from word automata in replacing the input alphabet, which may be seen as sets of unary operations, with an algebraic signature Σ: Definition 6.1. Let (M, +, 0) be a commutative monoid. A (bottom-up) weighted tree automaton (WTA) (over M ) consists of a finite set X of states, a finite signature Σ, an output map f : X → M , and for each k ≥ 0, a transition map µ k : Σ k → M X k ×X , where Σ k denotes the set of k-ary input symbols in Σ; the maximum arity of symbols in Σ is called the rank.
Given a weighted tree automaton (X, f, (µ k ) k∈N ) as in Definition 6.1 we see that it is, equivalently, a finite coalgebra for the functor F X = M × M (ΣX) , where we identify the signature Σ with its corresponding polynomial functor ΣX = σ/ k∈Σ X k . Indeed (µ k ) k≥0 is equivalently expressed by a map For the minimization of weighted tree automata, two bisimulation notions are considered in the literature [Buc08,HMM07]: forward and backward bisimulation. Here, we treat backward bisimulation, as it corresponds to coalgebraic behavioural equivalence. A backward bisimulation on a weighted tree automaton (X, f, (µ k ) k∈N ) is an equivalence relation R ⊆ X × X such that for every (p, q) ∈ R, σ/k ∈ Σ, and every L ∈ {D 1 × · · · × D k | D 1 , . . . , D k ∈ X/R} the following equation holds: Remark 6.4. Note that L consists of the w ∈ X k such that e k (w) = L, where e k : X k (X/R) k is the k-fold power of the canonical quotient map e : X → X/R.
We can regard the output map f as a transition map for a constant symbol, so it suffices to consider the functor F X = M (ΣX) (and in fact the output map is ignored in the definition of backward bisimulation given above and in [HMM07]). Then, we obtain the following result: p(n, m) = 1 and p(n, m) = log min(|M |, m) respectively. In the following, we additionally distinguish between finite and infinite monoids.
Theorem 6.8. Let M be a commutative monoid. On weighted tree automata with n states, k transitions, and rank r, our algorithm runs in time (1) O((r 2 k + rn) · log(k + n)) if M is cancellative or finite, and (2) O (rk + n) · log(k + n) · (log k + r) otherwise.
Proof. The functor F X = M (ΣX) is first transformed into F X = M (X) + ΣX according to Section 3.5. Given a coalgebra c : C → M (ΣC) with n = |C| states, this transformation introduces a set K of intermediate states, one for every outgoing transition from every x ∈ C: hence |K| = k. The given coalgebra structure yields the two evident maps c 1 : C → M (K) and c 2 : K → ΣC given by It takes k edges to encode c 1 and at most k · r edges to encode c 2 . Partition refinement is now performed on the following F -coalgebra: where inl : C → C + K and inr : K → C + K are the canonical injections. This coalgebra on C + K has n := |C + K| = n + k states and at most m = (r + 1) · k edges. Since the refinement interface for F is a combination of those of M (−) and Σ, its run time factor is given by the maximum of the run time factors p M (c 1 ) and p Σ (c 2 ), respectively, of those two refinement interfaces (cf. Remark 3.15). We can further simplify this maximum to the asymptotically equivalent sum Since p Σ (c 2 ) = r and the number of edges in c 1 is bounded by k, we obtain, by Theorem 3.10, an overall time complexity of O((m + n ) · log n · (p M (c 1 ) + p Σ (c 2 ))) = O((r · k + n) · log(n + k) · (p M (c 1 ) + r)).
We proceed by distinguishing the following cases: (a) If M is cancellative, then we can use the refinement interface for groups (see Example 3.6(2)) with p M (c 1 ) = 1 as explained in Section 5.1. Thus, the overall time complexity simplifies to O((r · k + n) · log(n + k) · r) = O((r 2 · k + r · n) · log(n + k)).
(b) Otherwise, we have p M (c 1 ) = log min(|M |, k) by Proposition 5.4, since c 1 has at most k edges.
(b1) If M is finite, then we have p M (c 1 ) ≤ log |M | ∈ O(1), and thus obtain the same overall run time complexity as in the previous case. We have thus proved item (1) in the statement of the theorem. (b2) If M is infinite, then log min(|M |, k) = log k. Thus, the overall run time is in O((r · k + n) · log(n + k) · (log k + r)).
Note that the number m of edges of the input coalgebra satisfies m ≤ rk. Thus, for a fixed input signature Σ, we see that m and k are asymptotically equivalent. If we further assume that m ≥ n, which means that there are no isolated states, then we obtain the bound in Table 1: using in the first step that O(log(m) d ) O(m c ) for every d ≥ 1 and 0 < c < 1.
(2) For cancellative monoids, the time bound given in op. cit. is O(r 2 · k · log n) [HMM07, Theorem 29]. Assuming again that m ≥ n, and recalling that rk ≥ m, the complexity of our algorithm according to Theorem 6.8 is O(r 2 · k · log(k + n)), i.e. only slightly worse.
In addition to guaranteeing a good theoretical complexity, our tool immediately yields an efficient implementation. For the case of non-cancellative monoids, this is, to the best of our knowledge, the only available implementation of partition refinement for weighted tree automata.

Evaluation and Benchmarking
We report on a number of benchmarks 6 that illustrate the practical scalability of our tool CoPaR and hence our generic algorithm. These benchmarks cover a selection of different system types and include randomly generated inputs as well as real world examples. We also compare CoPaR with two other minimization tools, where applicable. Details and results of further benchmarks, in particular for the optimizations described in Section 3.5.1 and at the end of Section 3.6, are reported in [Dei19]. All benchmarks were run and measured on the same Intel R Core TM i5-6500 processor with 3.20GHz clock rate running a Linux system. We report the timing results of our tool CoPaR (compiled with GHC 8.4.4) separately for the three phases parsing, initialization and the actual refinement loop. Recall from Remark 4.1, that the input coalgebra implicitly defines an initial partition according to the output behaviour of states. We have taken care to ensure that in all the following benchmarks, this initial partition is still coarse, i.e. the algorithm has to perform some actual refinement steps after initialization.

Weighted Tree Automata
We first focus on the instantiation of our algorithm for weighted tree automata as described in Section 6. Previous studies on the practical performance of partition refinement on large labelled transition systems [Val10,Val09] show that memory rather than run time seems to be the limiting factor. Since labelled transition systems are a special case of weighted tree automata, we expect to see similar phenomena. Hence, we evaluate the maximal automata sizes that can be processed on a typical current computer setup: We randomly generate weighted tree automata for various signatures and monoids, looking for the maximal size of WTAs that can be handled with 16 GB of RAM, and measure the respective run times of our tool. To this end, we minimize randomly generated coalgebras for the functor with ΣX = 4 × X r for all combinations of rank r ∈ {1, . . . , 5} and weight monoid M ranging over • (2, ∨, 0) (functor available as powerset P(X) in CoPaR) • (N, max, 0) (syntactically: (Z,max)^(X)) • (2, ∨, 0) 64 ∼ = (P ω (64), ∪, ∅) (syntactically: (Word,or)^(X)) We write n for the number of states, k for the number of transitions, and m for the number of edges in the coalgebra encoding; in fact, we generate only transitions of the respective rank r.
When generating a coalgebra with n states, we randomly create 50 outgoing transitions per state, leading Table 2. Processing times for partition refinement on maximal weighted tree automata (i.e. coalgebras for M × M (Σ(−)) ) in 16 GB of memory with n states and 50 transitions per state, leading to n states and m edges in total. The column 'Init' provides the time needed to compute the initial partition P 1 , and 'Refine' the time to compute the final partition P f .
to k = 50 · n transitions in total. The transformation described in Section 3.5 additionally introduces one intermediate state per transition, leading to an actual number of states n = 51 · n. Every transition of rank r has one incoming edge and r outgoing edges, hence m = (r + 1) · k = 50 · (r + 1) · n. Table 2 lists the maximal sizes of weighted tree automata that CoPaR is able to process in the mentioned 16 GB of RAM, along with the associated run times. Since our implementation of the refinement interface for P ω ∼ = (2, ∨, 0) (−) is optimized for its specific functor, the tool needs less memory in this case, allowing for higher values of n, an effect that decreases with increasing rank r.
We restrict to generating at most 50 different elements of M in each automaton, to avoid situations where all states are immediately distinguished in the first refinement step, as promised in the introduction. In addition, the parameters are chosen so that with high likelihood, the final partition distinguishes all states, so the examples illustrate the worst case. The first refinement step produces in the order of |Σ| · min(50, |M |) r subblocks (cf. Section 3.6), implying earlier termination for high values of |M | and r and explaining the slightly longer run time for M = (2, ∨, 0) on small r. We note in summary that WTAs with well over 10 million edges are processed in less than five minutes, and in fact the run time of minimization is of the same order of magnitude as that of input parsing. Since the publication of the conference paper [DMSW19], we have optimized the memory consumption in CoPaR, especially in the refinement interface for the functor Σ. With the optimizations, CoPaR can now handle coalgebras with 20% more states within the same memory limit of 16GB of RAM for signatures with rank r = 1 and even 75% more states for signatures with rank r = 5.

Benchmarks for PRISM Models
In order to see how CoPaR performs on models that arise in practise, we have taken two kinds of models from the benchmark suite [KNP12] of the probabilistic model checker PRISM [KNP11]. We derived coalgebras for the functors • F X = R (X) from continuous time Markov chains (CTMC), and • F X = N × P(N × (D ω X)) from Markov decision processes (MDP). This translation deliberately ignores the variable valuations present in the original benchmark models to avoid situations where all states are already distinguished after the first refinement step. For MDPs, the translation instead generates a coarse initial partition for each model (the outer N × (−)). For the CTMCs considered, the functor R (−) is already sufficient since the initial partition distinguishes states by the accumulated weight of their outgoing transitions. Like in the case of WTAs, the functor for MDPs is a composite of several basic functors and thus requires use of the construction described in Section 3.5. Two of the benchmarks are shown in Table 3 with different parameters, resulting in three differently sized coalgebras each. The fms family of systems model a flexible manufacturing system [CT93] as CTMCs (without initial partition), and we minimize them under the usual weighted bisimilarity, i.e. as R (−) -coalgebras. The wlan benchmarks [KNS02] model various aspects of the IEEE 802.11 Wireless LAN protocol as MDPs. Table 3 also includes the total run time of two additional partition refinement tools: A C++ implementation 7 of the algorithm described by Valmari [VF10], which can minimize MDPs as well as CTMCs, and the tool ltspbisim from the mCRL2 toolset [BGK + 19] version 201808.0, which implements a recently discovered refinement algorithm for MDPs [GVdV18] (but does not support CTMCs directly, hence there is no data in the first three lines).
The results in Table 3 show that refinement for the fms benchmarks is faster than for the respective wlan ones, even though the first group has more edges. This is due to (a) the fact that the functor for MDPs is more complex and thus introduces more indirection into our algorithms, as explained in Section 3.5, and (b) that our optimization for one-element blocks fires much more often for fms.
It is also apparent that CoPaR is slower than both of the other tools in our comparison, by a factor of up to 15 for the presented examples. To some extent, this performance difference can be attributed to the fact that our implementation is written in Haskell and the other tools in C++. In addition, CoPaR incurs a certain amount of overhead for genericity and modularity.

Conclusion and Future Work
We have instantiated a generic and efficient partition refinement algorithm that we introduced in previous work [WDMS20] to weighted (tree) automata, and we have refined the generic complexity analysis of the algorithm to cover this case. Moreover, we have described an implementation of the generic algorithm in the form of the tool CoPaR, which supports the modular combination of basic system types without requiring any additional implementation effort, and allows for easy incorporation of new basic system types by implementing a generic refinement interface.
In future work, we will further broaden the range of system types that our algorithm and tool can accommodate, and provide support for base categories beyond sets, e.g. nominal sets, which underlie nominal automata [BKL14,SKMW17].
Concerning genericity, there is an orthogonal approach by Ranzato and Tapparo [RT08], which is generic over notions of process equivalence but fixes the system type to standard labelled transition systems; see also [GJKW17]. Similarly, Blom and Orzan [BO03,BO05] present signature refinement, which covers, e.g. strong and branching bisimulation as well as Markov chain lumping, but requires adapting the algorithm for each instance. These algorithms have also been improved using symbolic techniques (e.g. [vDvdP18]). Moreover, many of the mentioned approaches and others [BDJM05, BO03, BO05, GH02, vDvdP18] focus on parallelization. We will explore in future work whether symbolic and distributed methods can be lifted to coalgebraic generality. A further important aim is genericity also along the axis of process equivalences.