Seminator 2 Can Complement Generalized Büchi Automata via Improved Semi-determinization

We present the second generation of the tool Seminator that transforms transition-based generalized Büchi automata (TGBAs) into equivalent semi-deterministic automata. The tool has been extended with numerous optimizations and produces considerably smaller automata than its first version. In connection with the state-of-the-art LTL to TGBAs translator Spot, Seminator 2 produces smaller (on average) semi-deterministic automata than the direct LTL to semi-deterministic automata translator ltl2ldgba of the Owl library. Further, Seminator 2 has been extended with an improved NCSB complementation procedure for semi-deterministic automata, providing a new way to complement automata that is competitive with state-of-the-art complementation tools.


Introduction
Semi-deterministic [24] automata are automata where each accepting run makes only finitely many nondeterministic choices. The merit of this interstage between deterministic and nondeterministic automata comes from two facts known since the late 1980s. First, every nondeterministic Büchi automaton with n states can be transformed into an equivalent semi-deterministic Büchi automaton with at most 4 n states [7,24]. Note that asymptotically optimal determinization procedures transform nondeterministic Büchi automata to deterministic automata with 2 O(n log n) states [24] and with a more complex (typically Rabin) acceptance condition, as deterministic Büchi automata are strictly less expressive. Second, some algorithms cannot handle nondeterministic automata, but they can handle semi-deterministic ones; for example, algorithms for qualitative model checking of Markov decision processes (MDPs) [7,29].
For theoreticians, the difference between the complexity of determinization and semi-determinization is not dramatic-both constructions are exponential. However, the difference is important for authors and users of practical automatabased tools-automata size and the complexity of their acceptance condition often have a significant impact on tool performance. This latter perspective has recently initiated another wave of research on semi-deterministic automata. Since 2015, many new results have been published: several direct translations of LTL to semideterministic automata [11,15,16,26], specialized complementation constructions for semi-deterministic automata [4,6], algorithms for quantitative model checking of MDPs based on semi-deterministic automata [13,25], a transformation of semideterministic automata to deterministic parity automata [10], and reinforcement learning of control policy using semi-deterministic automata [21].
In 2017, we introduced Seminator 1.1 [5], a tool that transforms nondeterministic automata to semi-deterministic ones. The original semi-determinization procedure of Courcoubetis and Yannakakis [7] works with standard Büchi automata (BAs). Seminator 1.1 extends this construction to handle more compact automata, namely transition-based Büchi automata (TBAs) and transitionbased generalized Büchi automata (TGBAs). TBAs use accepting transitions instead of accepting states, and TGBAs have several sets of accepting transitions, each of these sets must be visited infinitely often by accepting runs. The main novelty of Seminator 1.1 was that it performed degeneralization and semi-determinization of a TGBA simultaneously. As a result, it could translate TGBAs to smaller semi-deterministic automata than (to our best knowledge) the only other tool for automata semi-determinization called nba2ldba [26]. This tool only accepts BAs as input, and thus TGBAs must be degeneralized before nba2ldba is called.
Moreover, in connection with the LTL to TGBAs translator ltl2tgba of Spot [8], Seminator 1.1 provided a translation of LTL to semi-deterministic automata that can compete with the direct LTL to semi-deterministic TGBAs translator ltl2ldba [26]. More precisely, our experiments [5] showed that the combination of ltl2tgba and Seminator 1.1 outperforms ltl2ldba on LTL formulas that ltl2tgba translates directly to deterministic or semi-deterministic TGBA (i.e., when Seminator has no work to do), while ltl2ldba produced (on average) smaller semi-deterministic TGBAs on the remaining LTL formulas (i.e., when the TGBA produced by ltl2tgba has to be semi-determinized by Seminator). This paper presents Seminator 2, which changes the situation. With many improvements in semi-determinization, the combination of ltl2tgba and Seminator 2 now translates LTL to smaller (on average) semi-deterministic TGBAs than ltl2ldba even for the cases when ltl2tgba produces a TGBA that is not semi-deterministic. Moreover, this holds even when we compare to ltl2ldgba, which is the current successor of ltl2ldba distributed with Owl [19].

Improvements in Semi-determinization
First of all, we recall the definition of semi-deterministic automata and principles of the semi-determinization procedure implemented in Seminator 1.1 [5]. Let A = (Q, Σ, δ, q 0 , {F 1 , . . . , F n }) be a TGBA over alphabet Σ, with a finite set of states Q, a transition relation δ ⊆ Q × Σ × Q, an initial state q 0 ∈ Q, and sets of accepting transitions F 1 , . . . , F n ⊆ δ. Then A is semi-deterministic if there exists a subset Q D ⊆ Q such that (i) each transition from Q D goes back to Q D (i.e., δ ∩ (Q D × Σ × (Q Q D )) = ∅), (ii) all states of Q D are deterministic (i.e., for each q ∈ Q D and a ∈ Σ there is at most one q such that (q, a, q ) ∈ δ), and (iii) each accepting transition starts in a state of Q D (i.e., The part of A delimited by states of Q D is called deterministic, while the part formed by the remaining states Q Q D is called nondeterministic, although it could contain deterministic states too. The transitions leading from the nondeterministic part to the deterministic one are called cut-transitions. The structure of a semi-deterministic automaton is depicted in Fig. 1. Intuitively, a TGBA A with a set of states Q and a single set of accepting transitions F can be transformed into a semi-deterministic TBA B as follows. First, we use a copy of A as the nondeterministic part of B. The deterministic part of B has states of the form (M, N ) such that Q ⊇ M ⊇ N and M = ∅. Every accepting transition (q, a, q ) ∈ F induces a cut-transition (q, a, ({q }, ∅)) of B. The deterministic part is then constructed to track all runs of A from each such state q using the powerset construction. More precisely, the first element of (M, N ) tracks all runs while the second element tracks only the runs that passed some accepting transition of F . Each transition of the deterministic part, that would reach a state where M = N (so-called breakpoint) is replaced with an accepting transition of B leading to state (M, N ), where N tracks only the runs of A passing an accepting transition of F in the current step. Seminator 1.1 extended this procedure to construct a semi-deterministic TBA even for a TGBA with multiple acceptance sets F 1 , . . . , F n . States of the deterministic part are now triples (M, N, i), where i ∈ {0, . . . , n − 1} is called level and it has a similar semantics as in degeneralization. Cut-transitions are induced by transitions of F n and they lead to states of the form ({q }, ∅, 0). The level i says that N tracks runs that passed a transition of F i+1 since the last level change. When the deterministic part reaches a state (M, N, i) with M = N , we change the level to i = (i + 1) mod n and modify N to track only runs passing F i +1 in the current step. Transitions changing the level are accepting.
A precise description of these semi-determinization procedures and proofs of their correctness can be found in Blahoudek's dissertation [3]. Now we briefly explain the most important optimizations added in Seminator 2 (we work on a journal paper with their formal description). Each optimization can be enabled/disabled by the corresponding option. All of them are enabled by default.
--scc-aware approach identifies, for each cut-transition, the strongly connected component ( Note that Seminator 1.1 can produce a semi-deterministic TGBA with multiple acceptance sets only when it gets a semi-deterministic TGBA as input. Seminator 2 produces such automata more often due to --reuse-deterministic.

Implementation and Usage
Seminator 2 is an almost complete rewrite of Seminator [5], and is still distributed under the GNU GPL 3.0 license. Its distribution tarball and source code history disable with --postprocess=0. Acceptance adjusted according to --tgba (default), --tba, or --ba.
--complement={best,spot,pldi} is supplied. In that case, the semi-det. automaton is built as a TBA. Both routes used unless --complement={spot,pldi}. are hosted on GitHub (https://github.com/mklokocka/seminator). The package contains sources of the tool with two user-interfaces (a command-line tool and Python bindings), a test-suite, and some documentation.

Complementation is activated when
Seminator is implemented in C++ on top of the data-structures provided by the Spot library [8], and reuses its input/output functions, simplification algorithms, and the NCSB complementation. The main implementation effort lies in the optimized semi-determinization and an alternative version of NCSB.
The first user interface is a command-line tool called seminator. Its highlevel workflow is pictured in Fig. 2. By default (top-part of Fig. 2) it takes a TGBA (or TBA or BA) on input and produces a semi-deterministic TGBA (or TBA or BA if requested). Figure 2 details various switches that control the optional simplifications and acceptance transformations that occur before the semi-determinization itself. The pre-and post-processing are provided by the Spot library. The semi-determinization algorithm can be adjusted by additional command-line options (not shown in Fig. 2) that enable or disable optimizations of Sect. 2. As Spot simplification routines are stronger on automata with simpler acceptance conditions, it sometimes pays off to convert the automaton to TBA or BA first. If the input is a TGBA, seminator attempts three semi-determinizations, one on the input TGBA, one on its TBA equivalent, and one on its BA equivalent; only the smallest result is retained. If the input is already a TBA (resp. a BA), only the last two (resp. one) routes are attempted.
The --complement option activates the bottom part of Fig. 2 with two variants of the NCSB complementation [4]: "spot" stands for a transition-based adaptation of the original algorithm (implemented in Spot); "pldi" refers to its modification based on the optimization by Chen et al. [6, Section 5.3] (implemented in Seminator 2). Both variants take a TBA as input and produce a TBA. The options --tba and --ba apply on the final complement automaton only.
The seminator tool can now process automata in batch, making it possible to build pipelines with other commands. For instance the pipeline ltl2tgba <input.ltl | seminator | autfilt --states=3.. >output.hoa uses Spot's ltl2tgba command to read a list of LTL formulas from input.ltl and transform it into a stream of TGBAs that is passed to seminator, which transforms them into semi-deterministic TGBAs, and finally Spot's autfilt saves into output.hoa the automata with 3 states or more.
Python bindings form the second user-interface and are installed by the Seminator package as an extension of Spot's own Python bindings. It offers several functions, all working with Spot's automata (twa graph objects): semi determinize() implements the semi-determinization procedure; complement semidet() implements the "pldi" variant of the NCSB complementation for semi-deterministic automata (the other variant is available under the same function name in the bindings of Spot); highlight components() and highlight cut() provide ways to highlight the nondeterministic and the deterministic parts of a semi-deterministic automaton, and its cut-transitions; seminator() provides an interface similar to the command-line seminator tool with options that selectively enable or disable optimizations or trigger complementation.
The Python bindings integrate well with the interactive notebooks of Jupyter [17]. Figure 3 shows an example of such a notebook, using the seminator() and highlight components() functions. Additional Jupyter notebooks, distributed with the tool, document the effect of the various optimization options. 1

Experimental Evaluation
We evaluate the performance of Seminator 2 for both semi-determinization and complementation of TGBAs. We compare our tool against several tools listed in Table 1. As ltl2ldgba needs LTL on input, we used the set of 221 LTL formulas already considered for benchmarking in the literature [9,12,14,22,27]. To provide TGBAs as input for Seminator 2, we use Spot's ltl2tgba to convert the LTL formulas. Based on the automata produced by ltl2tgba, we distinguish three categories of formulas: deterministic (152 formulas), semi-deterministic but not deterministic (49 formulas), and not semi-deterministic (20 formulas). This division is motivated by the fact that Seminator 2 applies its semi-determinization only on automata that are not semi-deterministic, and that some complementation tools use different approaches to deterministic automata. We have also generated 500 random LTL formulas of each category. The scripts and formulas used in those experiments can be found online, 2 as well as a Docker image with these scripts and all the tools installed. 3 All experiments were run inside the supplied Docker image on a laptop Dell XPS13 with Intel i7-1065G7, 16 GB RAM, and running Linux. Fribourg plugin for GOAL (na) [1,30] GOAL (gc) 20200506 [28] Owl (ltl2ldgba) 19.06.03 [11] ROLL (replaces Buechic) 1.0 [20] Seminator (seminator) 1 . 1 [ 5] Spot (autfilt, ltl2tgba) 2 . 9 [ 8] Table 2. Comparison of semi-determinization tools. A benchmark set marked with x + y consists of x formulas for which all tools produced some automaton, and y formulas leading to some timeouts. A cell of the form s (m) shows the cumulative number s of states of automata produced for the x formulas, and the number m of formulas for which the tool produced the smallest automaton out of the obtained automata. The best results in each column are highlighted.

Semi-determinization
We compare Seminator 2 to its older version 1.1 and to ltl2ldgba of Owl. We do not include Buchifier [16] as it is available only as a binary for Windows. Also, we did not include nba2ldba [26] due to the lack of space and the fact that even Seminator 1.1 performs significantly better than nba2ldba [5].
Recall that Seminator 2 calls Spot's automata simplification routines on constructed automata. To get a fair comparison, we apply these routines also to the results of other tools, indicated by +Spot in the results. Further, ltl2ldgba of Owl can operate in two modes: --symmetric and --asymmetric. For each formula, we run both settings and pick the better result, indicated by +best. Table 2 presents the cumulative results for each semi-determinization tool and each benchmark set (we actually merged deterministic and semi-deterministic benchmark sets). The timeout of 30 s was reached by Owl for one formula in Table 3. Comparison of tools complementing Büchi automata, using the same conventions as In the (semi-)deterministic category, the automaton produced by ltl2tgba and passed to both versions of Seminator is already semi-deterministic. Hence, both versions of Seminator have nothing to do. This category, in fact, compares ltl2tgba of Spot against ltl2ldgba of Owl. Figure 4 shows the distribution of differences between semi-deterministic automata produced by Owl+best+Spot and Seminator 2 for the not semideterministic random set. A dot at coordinates (x, y) represents a formula for which Owl and Seminator 2 produced automata with x and y states, respectively.
We can observe a huge improvement brought by Seminator 2 in not semideterministic benchmarks: while in 2017 Seminator 1.1 produced a smaller automaton than Owl in only few cases in this category [5], Seminator 2 is now more than competitive despite the fact that also Owl was improved over the time.

Complementation
We compare Seminator 2 with the complementation of ROLL based on automata learning (formerly presented as Buechic), the determinization-based algorithm [23] implemented in GOAL, the asymptotically optimal Fribourg complementation implemented as a plugin for GOAL, and with Spot (autfilt --complement). We apply the simplifications from Spot to all results and we use Spot's ltl2tgba to create the input Büchi automata for all tools, using transition-based generalized acceptance or state-based acceptance as appropriate (only Seminator 2 and Spot can complement transition-based generalized Büchi automata). The timeout of 120 s was reached once by both Seminator 2 and Spot, 6 times by Fribourg, and 13 times by GOAL and ROLL. Table 3 shows results for complementation in the same way as Table 2 does for semi-determinization. For the deterministic benchmark, we can see quite  similar results from all tools but ROLL. This is caused by the fact that complementation of deterministic automata is easy. Some tools (including Spot) even apply a dedicated complementation procedure. It comes at no surprise that the specialized algorithm of Seminator 2 performs better than most other complementations in the semi-deterministic category. Interestingly, this carries over to the not semi-deterministic category. The results demonstrate that the 2-step approach of Seminator 2 to complementation performs well in practice. Figure 5 offers more detailed insight into distribution of automata sizes created by Seminator 2, Spot, and Fribourg+Spot for random benchmarks in this category.
Finally, Fig. 6 compares the running times of these tools over the 83 hard cases of not semi-deterministic random benchmark (a case is hard if at least one tool did not finish in 10 s). We can see that Seminator 2 and Spot run significantly faster than the other tools.

Conclusion
We have presented Seminator 2, which is a substantially improved version of Seminator 1.1. The tool now offers a competitive complementation of TGBA. Furthermore, the semi-determinization code was rewritten and offers new optimizations that significantly reduce the size of produced automata. Finally, new user-interfaces enable convenient processing of large automata sets thanks to the support of pipelines and batch processing, and versatile applicability in education and research thanks to the integration with Spot's Python bindings.