Reasoning About Regular Properties: A Comparative Study

Fiedor, Tomáš; Holík, Lukáš; Hruška, Martin; Rogalewicz, Adam; Síč, Juraj; Vargovčík, Pavol

doi:10.1007/978-3-031-38499-8_17

Tomáš Fiedor⁹,
Lukáš Holík⁹,
Martin Hruška⁹,
Adam Rogalewicz⁹,
Juraj Síč⁹ &
…
Pavol Vargovčík⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14132))

Included in the following conference series:

International Conference on Automated Deduction

1178 Accesses
1 Citations

Abstract

Several new algorithms for deciding emptiness of Boolean combinations of regular languages and of languages of alternating automata have been proposed recently, especially in the context of analysing regular expressions and in string constraint solving. The new algorithms demonstrated a significant potential, but they have never been systematically compared, neither among each other nor with the state-of-the art implementations of existing (non)deterministic automata-based methods. In this paper, we provide such comparison as well as an overview of the existing algorithms and their implementations. We collect a diverse benchmark mostly originating in or related to practical problems from string constraint solving, analysing LTL properties, and regular model checking, and evaluate collected implementations on it. The results reveal the best tools and hint on what the best algorithms and implementation techniques are. Roughly, although some advanced algorithms are fast, such as antichain algorithms and reductions to IC3/PDR, they are not as overwhelmingly dominant as sometimes presented and there is no clear winner. The simplest NFA-based technology may sometimes be a better choice, depending on the problem source and the implementation style. We believe that our findings are relevant for development of automata techniques as well as for related fields such as string constraint solving.

You have full access to this open access chapter, Download conference paper PDF

On the Semantics of Atomic Subgroups in Practical Regular Expressions

String Theories Involving Regular Membership Predicates: From Practice to Theory and Back

Solving String Theories Involving Regular Membership Predicates Using SAT

1 Introduction

Efficient representation of regular properties of finite words has been the subject of research for a long time, with applications and results spanning much of the field of formal reasoning, including regular expression matching, verification, testing, modelling, or general decision procedures of logics. When regular properties are combined using Boolean and similar operations, interesting decision problems are PSPACE-complete. This includes the most essential problem of language emptiness (further just emptiness). The textbook approaches that use deterministic automata are plagued by state space explosion. Determinization and complementation is done by exponential subset construction and conjunction is quadratic. This motivated the research on efficient algorithms for non-deterministic and alternating finite automata (NFA and AFA, respectively).

Using nondeterminism and alternation, one can gain one or two levels of exponential savings in the size of automata, respectively. Alternation in context of automata was first studied in [24] and [18, 38, 53], and extensively in the context of automata over infinite words and temporal logics (e.g., [57, 58, 66, 76]). It adds conjunctive branching to the disjunctive non-deterministic branching and allows to avoid the blow-up in the automata size completely. However, from the perspective of the worst case complexity, the gained succinctness is payed back by the PSPACE-completeness of language emptiness. Still, the more succinct the representation gives more opportunities for clever heuristics that combat the worst case complexity and work in practical cases, essentially by avoiding re-creation of the entire (non)deterministic representation.

Several very promising techniques and their implementations were proposed during the recent years. The latest advances in testing AFA emptiness appeared in the context of analysing combinations of regular expressions and in string solving. A group of these techniques is based on reducing AFA emptiness to a reachability in a Boolean transition systems and using existing implementations of model-checking algorithms, most notably of IC3/PDR [15, 46], such as ABC [17], nuXmv [22], or IC3Ref [16], to solve it [27, 28, 47, 80]. The most recent contribution from [73] extends the SMT-solver Z3 with symbolic derivatives, a generalisation of Antimirov derivatives of regular expressions. Z3 uses them to convert a combination of regular expressions into an alternating/Boolean automaton and on the fly tests its language emptiness through the classical de-alternation and a search for an accepting configuration.

Slightly older algorithm for testing equivalence of AFA (convertible to an emptiness test) is based on computing bisimulation up-to congruence [30]. It generalizes the original NFA-equivalence test of [11]. The congruence closure algorithms were preceded by the antichain algorithms that optimize the subset construction by the subsumption pruning [41, 82], and by the first attempt to use the model checking algorithms, namely the algorithm Impact of [63], to emptiness of combinations of regular properties [40]. Lastly, the area of string constraint solving gave rise to a large variety of string constraint solvers. They approach combinations of regular properties through a spectrum of clever techniques based e.g. on automata, transformations to other types of constraints, reasoning on lengths of strings, Parikh images, etc. (e.g. Z3 [65, 73], CVC4/5 [7, 68], Z3Str4 [9], OSTRICH [25, 26], Trau [4, 5] to name a few).

These works demonstrate a significant promise, but they are presented in specific, often narrow contexts and under varying views on state of the art. Consequently, they have never been sufficiently compared against each other. Even comparisons against the most efficient implementations of the more standard techniques based on (non)deterministic automata is rare. String solvers were compared only against string solvers, advanced AFA-emptiness tests were compared only against the basic de-alternation. A somewhat interesting comparison was done only between NFA-antichain and up-to congruence-based language inclusion and equivalence test in [11] and in [39], and between the basic antichain based AFA emptiness and a version that uses abstract interpretation [41]. A number of works also take as their baseline implementations of automata or string solvers which, even though being respectable tools in their own right, are currently not the fastest solvers of combinations of regular properties in either category. On top of that, all the mentioned works on solving combinations of regular properties use only narrow benchmarks, often mutually exclusive.

Systematic comparisons of tools and algorithms on meaningful benchmarks is obviously needed to answer the questions ‘What to use?’ and ‘What to compare with?’, and generally for the field of reasoning about regular properties and automata to progress. We thus present a comparison of implementations of major algorithms. We compare the tools on a large benchmark of problems that we have collected from other works, from string constraint solving problems, analysis of regular expressions, regular model checking, and analysing LTL properties of systems. We believe that it is currently the most comprehensive benchmark in existence. Our main focus is on examples around string solving and analysis of regular expressions, which is also where the most of the recent developments has happened. These benchmarks mostly allow for a relatively simple representations of automata transition functions. Even though the alphabets in examples coming form this are large (e.g. UNICODE with up to \(2^{32}\) symbols), the alphabet size can, in most cases, be reduced to few symbols by working with alphabet minterms (classes of indistinguishable symbols) instead of individual symbols. The issue of effective symbolic representation of transition relations with large alphabets then does not dominate the evaluation, although it would be critical in other application areas, such as deciding WS1S (monadic second-order logic of one successor) or linear integer arithmetic [20, 44, 81].

We have obtained results that paint the basic landscape of the available techniques and tools. They identify tools and approaches which are likely to work well and should be used as the baseline in comparisons. We also provide a relatively diverse and large benchmark to be used in comparisons. The results broadly confirm that the new algorithms represent a leap in efficiency compared to the technology of DFA and also make a reduction of a problem to language emptiness of alternating automaton an attractive option. On the other hand, they challenge some folklore knowledge and conclusions implied elsewhere. For instance, reductions to IC3/PDR, although yielding one of the fastest algorithm, are not as vastly superior as sometimes presented. Some practically relevant benchmark categories are best solved by a combination of an antichain algorithm with a SAT solver. Others, surprisingly many in fact, by a simple efficiency oriented implementation of basic algorithms for nondeterministic automata. Our results also underscore that there is no universal silver bullet. The particular kind of the problem, determined to a large degree by its source, is a decisive factor that should be taken into account when choosing and tuning a solver.

We will maintain and further grow the benchmark set, at GitHub [1], as well as the framework for the entire comparison, at [2], in order for it to be easily usable and extensible by others.

2 Preliminaries

A (nondeterministic) finite automaton (NFA) over \(\Sigma \) is a tuple \(\mathcal {A}= (Q,\Delta ,I,F)\) where Q is a finite set of states, \(\Delta \) is a set of transitions of the form with \(q,r\in Q\) and \( a \in \Sigma \), \(I\subseteq Q\) is the set of initial states, and \(F\subseteq Q\) is the set of final states. A run of \(\mathcal {A}\) over a word \(w \in \Sigma ^*\) is a sequence where for all \(1\le i \le n\), it holds that \( a _i \in \Sigma \cup \{\epsilon \}\), \(w = a _1 \cdot a _2 \cdots a _n\), and either or \(p_{i-1} = p_i\), \(a_i=\epsilon \). The run is accepting if \(p_0 \in I\) and \(p_n\in F\), and the language \(L(\mathcal {A})\) of \(\mathcal {A}\) is the set of all words for which \(\mathcal {A}\) has an accepting run.

The automaton is deterministic (DFA) if for every state q and symbol a, \(\Delta \) has at most one transition . Any NFA can be determinized by the subset construction, which creates the DFA \(A' = (2^Q,\Delta ',\{I\},\{S\mid S\cap F\ne \emptyset \})\) where iff . The basic automata constructions implementing Boolean operations with languages are intersection, \(\mathcal {A}\cap \mathcal {A}' = (Q\times Q',\Delta ^\times ,I\times I',F\times F')\) where iff and , non-deterministic union \(\mathcal {A}\cup \mathcal {A}' = (Q\cup Q',\Delta \cup \Delta ',I\cup I',F\cup F')\), deterministic union by product which is the same as \(\cap \) up to that the final states are \(F\times Q \cup Q\times F\), and complementation which consists of determinization and complementing the final states.

Alternating Automata. An alternating finite automaton (AFA) in the most general form would be a tuple \(\mathcal {M}= (\Sigma ,\mathbb {P}, Q,\delta ,I,F)\) where, when denoting \(\mathbb B(X)\) the Boolean predicate formulae over variables X: 1) \(\Sigma \) is a finite alphabet; 2) \(\mathbb {P}\) is a set of unary symbol predicates with a free variable \(\alpha \); 3) Q is a finite set of states; 4) \(\delta : Q \rightarrow \mathbb B(Q \cup \mathbb {P})\) is a transition function where states of Q have only positive occurrences 5) \(I \in \mathbb B(Q)\) is a positive initial condition; and 6) \(F\in \mathbb B(Q)\) is a negative final/accepting condition.^{Footnote 1}

It can be interpreted as the forward NFA \(A^{\textsf{f}} = (\Sigma , \mathcal {P}(Q),\Delta ^{\textsf{f}}, I', F')\) with states \(c\subseteq Q\) called configurations of A. Assume many sorted interpretation of formulae over variables Q of the type Boolean (values 0 and 1) and the variable \(\alpha \) of the type \(\Sigma \). A set of states \(c\subseteq Q\) is understood as an assignment \(Q\rightarrow \{0,1\}\) in which \(c(q)=1\) corresponds to \(q\in c\). A pair \((c,a)\), \(a\in \Sigma \) is understood as the same assignment extended with \(\alpha \mapsto a\). The satisfaction relation \(\models \) between a formula and a configuration \(c\) or a pair \((c,a)\) is defined as usual. The transition relation \(\Delta ^{\textsf{f}}\) then contains a transition iff \((c',a) \models \bigwedge _{q\in c} \Delta (q)\), and \(I'\) and \(F'\) are the sets of configurations that satisfy I and F, respectively. It is common to define \(\Delta ^{\textsf{f}}\) to contain only the smallest transitions, that is, for a given \(c\) and a, only the transitions with the \(\subseteq \)-minimal target \(c'\) are in \(\Delta \).^{Footnote 2} The language of A, L(A), is the language of \(A^{\textsf{f}}\).

The AFA can equivalently be interpreted as the backward NFA, the automaton \(A^{\textsf{b}} = (\Sigma , \mathcal {P}(Q),\Delta ^{\textsf{b}}, I', F')\) where if \((c,a)\models \Delta (q)\) for each \(q\in c\). Here it is enough to take, for a given \(c'\) and a, only the transition with the \(\subseteq \)-largest source \(c\)^{Footnote 3} (this makes the transition relation backward deterministic).

Boolean Automata. Alternating automata may be extended to Boolean finite automata (BFA) by allowing any Boolean combination in the initial, final, and transition formulae (states in the initial and transition formulae may occur negatively, states in the final formula may occur positively). Note that the extension of AFA to BFA is not dramatic, as a BFA is easily encoded as an AFA with only double the size, by the following steps: 1) for each \(q\in Q\), add state \(\bar{q}\) with \(\Delta (\bar{q}) = \lnot \Delta (q)\), 2) transform all formulas in \(I,F,\Delta \) to DNF, 3) replace all literals \(\lnot q\) by \(\bar{q}\) in \(\Delta \) and I and replace literals q by \(\lnot \bar{q}\) in F.

Restricted Forms of AFA Transition Relation. The general form of AFA, as defined above, is the most succinct. It provides space for most optimizations, such as in [77]. Automata in this form are generated from LTL conversions of [34] used in [30, 77]. On the other hand, only a small subset of algorithms and tools support AFA in this most liberal form. A common restriction (used e.g. in [30]) is to separate symbols from states in the transition formulae, that is, having \(\Delta (q)\) in the form \(\varphi \wedge \psi \) with \(\varphi \in \mathbb B(\mathbb {P}),\psi \in \mathbb B(Q)\). We call such AFA separated. The transition relation can then be seen as a function \(Q\rightarrow \mathbb B(\mathbb {P})\times \mathbb B(Q)\). Separated AFA are often considered with the state formula \(\psi \) in the disjunctive normal form (e.g. in [36, 41]), which we call the DNF form, and \(\Delta \) then may be seen as a set of transitions of the form where \(\bigwedge c\) is a (positive) clause of \(\psi \).

The Decision Problems. We will concentrate on two decision problems:

(1)
AFA emptiness asks whether the language of the given AFA is empty.
(2)
Emptiness of Boolean combinations of regular properties (BRE), asks whether a Boolean combination of regular languages, given as automata or regular expressions, is empty (languages can be combined with \(\cap \), \(\cup \), and complement wrt. \(\Sigma ^*\), which also covers testing inclusion and equivalence^{Footnote 4}).

3 Existing Algorithms and Tools

In this section, we will overview the existing approaches and tools implementing AFA and BRE emptiness.

3.1 Representation of Automata Transition Relations

In the simplest form, a predicate on a automata transition represents a single letter from the alphabet. This is called an explicit transition. Explicit automata are simple, allow for low level optimizations, and implementation of complex algorithms for them is manageable (such as advanced algorithms for computing simulations [23, 50, 70]). The technique of a-priori mintermization, that replaces the alphabet by the alphabet of minterms, classes of indistinguishable symbols, makes explicit automata usable also when alphabets are large. However, when the number of minterms tends to explode, explicit automata do not scale.

Various implementations of automata have been using transition predicates implemented as BDDs, Boolean formulae, formulae over SMT-theory of bit-vectors, intervals of numbers, etc. This has been systematized in the works on symbolic automata [31, 33, 79], where the symbol predicates may be taken from any effective Boolean algebra (and the automata are in the separated form). Even more compact than symbolic automata are representations of the transition relation used in the WS1S solver Mona or in some of the implementations of AFA, which in a way drop the restriction to the separated form. We will discuss the concrete implementations below.

3.2 (Non)deterministic Finite Automata

The baseline approach to solve BRE is to use DFA or NFA. Boolean operations are implemented as the classical construction listed in Sect. 2. Automata may be kept deterministic, or they are kept non-deterministic whenever possible and determinized only before complementing. An important ingredient of achieving efficiency is usually to minimize automata at least once every few operations (important e.g. in applications such as regular model checking [12] or some approaches to string solving [4, 10, 25]). The deterministic approaches construct the minimal DFA by the Hopcroft, Moore, Brzozowski, or the Huffman algorithm [19, 52, 54, 64], the non-deterministic approach may use simulation [23, 45, 50, 55, 70] or bisimulation [48, 69, 75] based reduction methods. Simulation reduces significantly more but is much costlier. DFA/NFA are implemented in many libraries. Here we select a representative sample.

First, eNfa is the simplest tool, our own implementation of NFA, which was originally meant to play the role of a baseline. It uses explicit automata with mintermization. It is implemented in C++, with efficiency in mind, but with no extensive optimizations (roughly, transitions from a state stored in a two layered data structure, the first layer divided and ordered by symbols, and the second layer ordered by the target state). It uses an off the shelf implementation of one of the newest generation algorithms for computing simulation [23, 50, 70] (that achieve good efficiency through a usage of the partition-relation data structure) taken from VATA tree automata library [59] (implementing namely [50]).^{Footnote 5}

The Brics automata library [67] is often considered a baseline in comparisons [67]. It uses primarily deterministic automata and transition relation represented symbolically using character ranges. It is written in Java and relatively optimized.

The Automata library [78], made in C#, implements symbolic NFA/DFA parametrized by an effective Boolean algebra. We use it with the default algebra of BDDs. Automata has been long developed and has accumulated many optimizations and novel techniques for handling symbolic automata (e.g., optimized minimization [32]).

Mona [44], written in C, is the most influential and optimized implementation of deterministic automata. It specialises in deciding WS1S formulae, which besides Boolean combinations includes also quantification. The decision procedure generates DFA with complex transition relations over large alphabets of bit-vectors. For this purpose, Mona uses a compact representation of the transition relation: a single MTBDD for all transitions originating in a state, with the target states in its leaves. Mona can represent only a DFA, hence it always implicitly determinizes.

VATA [59], written in C++, is a library implementing non-deterministic tree automata. As NFA are a special case of tree automata, we can use it as an implementation of the basic constructions for explicit NFA. It is relatively optimized. We include it into the comparison for its fast implementation of the antichain inclusion checking [12, 49], which for NFA boils down to the inclusion check of [36].

3.3 Alternating Automata

De-alternation. The basic approach to AFA emptiness is de-alternation, transformation to an NFA, either the forward \(A^{\textsf{f}}\) or the backward \(A^{\textsf{b}}\), followed by testing the emptiness of the resulting NFA. Both NFAs are constructed by a variation on the NFA subset construction. We are not aware of any tool using pure de-alternation, and we believe that it would not be competitive. The forward algorithm is however the basis of [73] used in Z3 where it is run on the fly with a novel symbolic derivative construction (discussed also in the paragraph on string constraint solvers).

Interpolation Based Abstraction Refinement. Attempts to harness model checking algorithms to AFA emptiness appeared in the context of string solving and processing of regular expressions. To our best knowledge, the earliest attempt was [40], where conjunctions of regular constraints were solved using the interpolation-based algorithm of [62]. The interpolation-based abstraction refinement, namely the algorithm Impact of [63], was also used in [56]. This work concentrated on more general problem, solving emptiness of AFA over data words with an infinite data domain (that can relate past and current values of data variables). Their tool JaltImpact [3] (in Java), that we include into our comparison, can be run on our benchmark too.

Reduction to Reachability and IC3/PDR. The work of [80] presented the first translation of string constraints (mostly BRE) into reachability in a Boolean transition system (circuit) that was then solved by the model checker nuXmv [22]. This was de facto the first reduction of AFA emptiness to reachability in a Boolean transition system (BTS).

Let us briefly overview the basic principle of the reduction. The forward BTS for an AFA A has configurations that are Boolean assignments to Q, initial and final configurations satisfy I and F, respectively, and transitions are given by the formula \(\Phi ^{\textsf{f}}_\Delta : \bigwedge _{q\in Q}q\rightarrow [\Delta (q)]'\). Here we use \([\varphi ]'\) to denote the formula obtained from \(\varphi \) by substituting every state q by its primed version \(q'\), and we will also denote by \([c]'\) the primed version \(\{q' \mid q \in c\}\) of a configuration \(c\). A successor of a configuration \(c\) is any configuration \(\bar{c}\) such that \([\bar{c}]'\) satisfies \(\exists Q\exists \alpha \, \Phi ^{\textsf{f}}_\Delta \wedge \bigwedge _{q\in C} q\) (the symbol variable alpha is of the bit-vector sort). Reachability is then the transitive and reflexive closure of the successor relation and the reachability problem asks whether a final configuration is reachable from an initial one. It is the case if and only if A is not empty. The forward reduction has been used in [80]. Alternatively, the backward BTS for A has the initial configurations satisfying F, final configurations satisfying I, and the successor relation given by the formula \(\Phi ^{\textsf{b}}_\Delta : \bigwedge _{q\in Q}q'\rightarrow \Delta (q)\).

The work [28] applied IC3/PDR [15, 46], implemented in IC3Ref [16], together with the backward BTS reduction to solve emptiness of BRE and obtained very encouraging results. The implementation used in [28], called Qzy, is, however, proprietary and not publicly available. Similar approach was taken by [47], where a string constraint was translated to a multi-tape AFA and then to a BTS by the forward translation, and given to IC3/PDR to solve through tools nuXmv [22] or ABC [17]. Results of [77] seem to indicate that the backward translation is better and the same is suggested by the comparison in [27, 28] in which the string solver Sloth [47], based on the forward reduction, was much slower than Qzy, based on the backward reduction. In this comparison, we include our own C++ implementation bwIC3 of the backward reduction based on the model checker ABC.

Antichains. Antichain algorithms presented in [82] were the first breakthrough in solving BRE. They use subsumption relations between the states of the automata constructed by variations of the subset construction to prune the constructions. They were used to test language universality and inclusion of NFAs and AFA emptiness. The AFA emptiness namely is based on an on-the-fly search for an accepting state of the \(A^{\textsf{f}}\) or for an initial state of the \(A^{\textsf{b}}\). Subsumption prunes discovered states that are larger (smaller for the backward algorithm) than others.

The antichain algorithms were enhanced and generalized in a number of works, e.g. with a more aggressive pruning by the simulation-based subsumption [6, 36], or by counterexamples guided abstraction refinement in [41]. In this comparison, we include the NFA inclusion check implemented in the VATA tree automata library [59]. We also experimented with a student-made implementation of the antichain AFA emptiness check of [41] that uses abstraction refinement (the original implementation is no longer maintained and we were not able to run it). However, not being able to achieve a competitive performance, we excluded it from the comparison. One reason of the poor performance may be that simplest form of AFA, explicit DNF form (used in the original version [41]), might be too inefficient and costly to construct in our examples, partly due to a large number of minterms induced by the AFA emptiness benchmark.

We implemented (in C++) the antichain AFA emptiness test of [36] that integrates tightly with a SAT solver to handle the general form of AFA with large alphabets. We will refer to it as Antisat. We will briefly explain its principle. It essentially implements the reachability test for the backward BTS discussed in the previous paragraph. A configuration \(c\) is represented by the conjunction \(\phi _c= \bigwedge _{q \in Q \setminus c} \lnot q\). Note that \(\phi _c\) is satisfied by the downward closure of \(c\), which are all configurations included in (subsumed by) \(c\). To compute predecessors of configurations represented by \(\phi _c\), the SAT solver (namely MiniSAT [37]) is called on the formula \(\Phi : \Psi ^{\textsf{b}}_\Delta \wedge \phi _c\wedge \psi _{\textsf{Ach}}\). Here, \(\psi _{\textsf{Ach}}\) excludes all already discovered configurations from the solution. It is a conjunction of clauses \(\overline{\phi _c} : \bigvee _{q\in Q\setminus c} q\) for every previously discovered configuration \(c\). The SAT solver discovers a satisfying assignment e, which is turned into a new configuration \(c' = Q \cap e\) (that is, the values of the symbol bits constituting the bit-vector \(\alpha \) are omitted from e). Unless \(c'\) is initial, it is queued for further predecessor computation and is immediately added to \(\phi _\textsf{Ach}\) through the interface of incremental SAT solving as the clause \(\overline{\phi _{c'}}\). Finally, only maximal predecessors of c are of interest, as the non-maximal ones are subsumed by them. We enforce the maximality of c through working directly with the internal SAT solver structures: at decision points, the SAT solver is forced to give priority to decisions that assign 1 to state variables.

Bisimulation up-to Congruence. A later class of algorithms, here refered to as up-to algorithms, checks equivalence as a bisimulation between configurations of AFA, and utilises the up-to congruence technique to prune the search space. The first algorithm on NFA equivalence [11] was extended to alternating automata emptiness check in [30]. These algorithms are close to antichains. As shown in [11], the pruning potential of the up-to techniques is in theory the same or larger than that of antichain. A disadvantage of the up-to congruence technique is the need for expensive evaluation of congruence closures. The more extensive experiments of [39] shows antichain algorithms as faster, with an exception of randomly generated automata with small alphabets and very dense transition relations. We include into the comparison the Java implementation of the AFA-emptiness of [30] (emptiness reduces to equivalence with a trivial empty AFA), that we refer to as Bisim. The other implementations of up-to algorithms we are aware of, from [39] and [11], are single-purpose programs that decide equivalence of two NFAs, hence we would be able to run them on a very small fraction of our benchmark only.

3.4 String Constraints Solvers

There are dozens of string constraint solvers that implement, to a various degree, a support for deciding combinations of regular properties. String languages are rich and BRE are not the absolute priority of the solvers, hence they perform on them generally worse than specialised tools. However, string solvers implement a wide scale of unique techniques and pragmatic heuristics that may work in specific instances. Representatives of the solvers with the most mature implementations (also used in most comparisons in the literature) are Z3 [65, 73] and CVC5 [7, 68]. CVC5 solves BRE mostly through rewriting rules. Recently [73] extended Z3 with an approach based on the Antimirov derivative automata construction generalised to symbolic automata and extended regular expressions. Essentially, the construction produces a symbolic AFA/BFA and checks its emptiness on the fly while running the forward de-alternation. As shown in [73], it is significantly more efficient in solving BRE than other SMT solvers (including CVC5).

3.5 Other Approaches and Tools

Although we believe that we have collected a representative subset of existing algorithms and tools, we have not collected all interesting specimens. Some were not available, some were difficult to run or prepare the inputs for, some seemed covered by experimentation in other works. Including these tools and algorithms into the comparison could still be interesting and we leave it for the future work (we plan to keep extending the tool base as well as the benchmark set). Namely, the tool DPRLE [51], used in the comparison in [28], seemed to be mostly outperformed by the IC3/PDR approach implemented in Qzy, however, not absolutely consistently. The implementation of NFA antichain and up-to congruence techniques used in [39] seems efficient, with its NFA antichain inclusion twice as fast as that of VATA. The up-to congruence NFA equivalence checking of [11] could be fast too ([11] and [39] report somewhat conflicting results). There are numerous NFA/DFA libraries, e.g. the C alternative of Brics [61] or the Java implementation of symbolic NFA of [29]. ALASKA [35] might contain interesting implementations of antichain algorithms but is no longer maintained and available. Our comparison is missing a basic implementation of antichain-powered de-alternation for explicit AFA in the DNF form, which, if not overwhelmed by a large number of minterms, could reach a good performance through simple fast data structures, similarly to our eNfa.

4 Benchmarks

We collected as comprehensive benchmark as possible, harvesting examples used in previous works as well as generating some of our own. It is available together with the whole experiment from [2] and at GitHub [1] (we plan to maintain and grow the benchmark and welcome contributors).

Our main focus of the current benchmark set is the areas where the most of the development in solving AFA and BRE emptiness happened recently, which is string constraint solving and analysis of regular expressions used in analysing and filtering texts. Atomic regular properties are here mostly given in the form of regular expressions over UNICODE character classes. The alphabet is large but the number of minterms is mostly small or moderate. This is true also for our examples from regular model checking. Symbolic handling of complex transition relations over large alphabets is thus not absolutely crucial and the experiment can stay focused on the main algorithms for emptiness check. For that reason, we do not include benchmarks from solving WS1S [21], the primary target of Mona, or Presburger arithmetic with automata [13, 81], where the techniques of handling symbolic alphabet are indispensable. Techniques specialising at this kind of problems would deserve their own study. Our benchmarks where the symbolic alphabet representation is still rather important are AFA coming from (combinations of) LTL properties, with alphabets of sets of atomic propositions, and from translations of string constraint problems to AFA with complex multi-track alphabets.^{Footnote 6}

Boolean Combinations of Regular Expressions. This group of BRE contains benchmarks on which we can run all tools, including those based on NFA and DFA. They have small to moderate numbers of minterms (about 30 in average, at most over a hundred).

b-smt contains 330 string constraints from the Norn and SyGuS-qgen, collected in SMT-LIB benchmark [8], that fall in BRE. These were also used to compare SMT-solvers in [73].
b-hand-made has 56 difficult handwritten problems from [73] containing membership in regular expressions extended with intersection and complement. They encode (1) date and password problems, (2) problems where Boolean operations interact with concatenation and iteration, and (3) problems with exponential determinization.
b-armc-incl contains 171 language inclusion problems from runs of abstract regular model checking tools (verification of the bakery algorithm, bubble sort, and a producer-consumer system) of [12]. These examples were used also in [11, 39].
b-regex contains 500 problems, obtained analogously as in [30, 77], of the form \(r_1 \wedge r_2 \wedge r_3 \wedge r_4 = r_1 \wedge r_2 \wedge r_3 \wedge r_4 \wedge r_5\), where each \(r_i\) is one of the 75 regexes^{Footnote 7} from RegExLib [71] selected so that \(r_1 \wedge r_2 \wedge r_3 \wedge r_4 \wedge r_5\) is not empty. This benchmark is inspired by spam filtering, where we want to test whether a new filter \(r_5\) adds anything to existing filters. We transformed this problem into the inclusion \(r_5 \subseteq r_1 \wedge r_2 \wedge r_3 \wedge r_4\), and kept the original form for Bisim which expects an equivalence.
b-param has 8 parametric problems. Four are from [40]:
1. (1)
  \(\texttt {[a-c]a[a-c]}\{n+1\} \cap \texttt {[a-c]a[a-c]}\{n\}\) (long strings),
2. (2)
  \(\bigcap _{i=1}^n\texttt {([0-1]}\{i-1\}\texttt {0[0-1]}\{n-1\}\texttt {0[0-1]}\{n-i\}\alpha _i\texttt {)|([0-1]}\{i-1\}\texttt {1[0-1]}\{n-1\}\texttt {1[0-1]}\{n-i\}\alpha _i\texttt {)}\) (exponential branching),
3. (3)
  \(\bigcap _{i=1}^n\texttt {.*(.}\{p_{10+i}\}\texttt {)+}\alpha _i\) (exponential paths 1), and
4. (4)
  \(\bigcap _{i=1}^n\texttt {.+}\alpha _i\texttt {0(.}\{p_{10+i}\}\texttt {)+}\) (exponential paths 2), where \(\alpha _1,\ldots ,\alpha _n\) are disjoint character classes and \(p_j\) is the j-th prime number. Another four are from [28]:
5. (5)
  (sat. difference),
6. (6)
  (unsat. difference),
7. (7)
  (sat. intersection) and
8. (8)
  (unsat. intersection). For (1) we chose \(n \in \{50,100,\dots ,500\}\), for (2)–(4) we chose \(n \in \{2,3,\dots ,60\}\) and for (5)–(8) we chose \(n \in \{50,100,\dots ,1000\}\).

AFA Benchmark. The second group of examples contains AFA not easily convertible to BRE. Here we can run only tools that handle general AFA emptiness. Some of these benchmarks also have large sets of minterms (easily reaching to thousands) and complex formulae in the AFA transition function, hence converting them to restricted forms such such as separated DNF or explicit may be very costly. This also seems to be the main reason for which our implementation of [41] could not compete.

a-ltlf-patterns comes from transformation of linear temporal logic formulae over finite traces (\(\text {LTL}_f\)) to AFA [34]. The 1699 formulae are from [60]^{Footnote 8} and they represent common \(\text {LTL}_f\) patterns which can be divided into two groups: (1) 7 parametric patterns (100 each) and (2) randomly generated conjunctions of simpler \(\text {LTL}_f\) patterns (999 formulae).
a-ltl-rand contains 300 \(\text {LTL}_f\) formulae obtained with the random generator of [77]. The generator traverses the syntactic tree of the LTL grammar, and is controlled by the number of variables, probabilities of connectives, maximum depth, and average depth. We have set the parameters empirically in a way likely to generate examples difficult for the compared solvers (the formulae have 6 atomic propositions and maximum depth 16).
a-ltl-param has a pair of hand-made parametric \(\text {LTL}_f\) formulae (160 formulae each) used in [30, 77]: Lift [43] describes a simple lift operating on a parametric number of floors and Counter [72] describes a counter incremented modulo the parameter.
a-ltlf-spec [60] contains 62 \(\text {LTL}_f\) formulae that specify realistic systems, used by Boeing [14] and NASA [42]. The formulae represent specifications used for designing Boeing AIR 6110 wheel-braking system and for designing NASA NextGen air traffic control (ATC) system.
a-sloth 4062 AFA emptiness problems to which the string solver Sloth reduced string constraints [47]. The AFA have complex multi-track transitions encoding Boolean operations and transductions, and a special kind of synchronization of traces requiring complex initial and final conditions.
a-noodler 13840 AFA emptiness problems that correspond to certain sub-problems solved within the string solver Noodler in [10]. The AFA were created similarly as those of a-sloth, but encode a different particular set of operations over different input automata.

5 The Comparison

We ran our experiments on Debian GNU/Linux 11, with Intel Core 3.4 GHz processor, 8 CPU cores, and 20 GB RAM. All experiments were run with the timeout of 60 s (increasing the timeout did not have a significant impact). Additional details as well as the virtual machine with the entire benchmark are available at [2].

Benchmarking Infrastructure. The initial difficulty is that the tools expect different input formats and forms of automata and the benchmarks come in different formats as well. We converted all benchmarks to our internal AFA format, from which we generated formats supported by the AFA handling tools JaltImpact, bwIC3, Antisat, and Bisim, or we extend the tools with a parser. The BRE benchmarks come from various sources. We first convert them into a master file which specifies the Boolean combination of atomic NFA, each atomic NFA stored in a separate file. The SMT-lib format is generated for \(\textsc {Z3} \) and \(\textsc {CVC5} \). In the case of b-hand-made, b-param, and b-smt, the atomic automata are translated from regular expressions using the parser of Brics, while in the case of b-regex, where the regexes contain features not supported by Brics, we use the parser from Bisim. b-smt and b-hand-made requires first translating from SMT-lib to a regular expression. In the case of b-armc-incl, the atomic automata come directly as NFAs, and are converted into formats of the individual BRE solvers (we again wrote parsers for some of the solvers), and to our AFA format for the AFA solvers. Every BRE solver was extended by an interpreter of the master file that reads the NFA/DFA from the generated solver-specific files (except the SMT solvers, which read SMT-lib). We note that due to some difficulties with internal structures, we currently cannot run Brics on b-armc-incl, and due to the lack of a converter from complex regular expressions and from pure NFA to the SMT format, we do not run Z3 and CVC5 on b-regex and on b-armc-incl.

Measured Data. We will present the results obtained with BRE (where we run all the tools) and with AFA emptiness (where we run bwIC3, Antisat, Bisim, and JaltImpact) separately. We also separate the results on examples from applications from results on parametric hand-made examples.

Table 1 summarizes the statistics from evaluating the benchmarks. The table lists: (i) the average time, (ii) the median time, and (iii) the number of timeouts and number of errors (mostly, a tool ran out of the memory, made a bad alloc or ran into a segmentation fault). A few errors, e.g. in CVC5 or Bisim, were due to the unsupported features in the inputs. The tools’ performance is then visualised on cactus plots in Fig. 1. For each tool, the plot shows the progress of the tool on each benchmark: the y axis is the cumulative time taken on the benchmark, with the individual examples on the x axis ordered by the runtime taken by the tool. Timeouts are omitted. In the appendix, we also show a set of scatter-plots that compare for every benchmark the three best performing tools.

Table 1. Summary of AFA and BRE benchmarks. Table lists (i) the average, (ii) the median, and (iii) the number of timeouts and errors (in brackets). Winners are highlighted in bold.

Full size table

Finally, we compared the tools on the parametric benchmarks a-ltl-param and b-param. We illustrate the results in Fig. 2. Each graph shows the times for the increasing value of the specific parameter on the x axis.

5.1 Discussion

Based on the measurements, we make several observations.

Firstly, the tool which combines universality (it can be run on AFA as well as on BRE emptiness) with the most consistent good performance is bwIC3. It dominates most of the AFA emptiness benchmark, shows great or a very good performance on the BRE benchmark, and often stands out on the parametric examples. Moreover, the measurements reported in [28] suggest that the backward BTS reduction has even more potential. This is visible namely from the comparison of our results on the parametric benchmarks diff-sat, diff-unsat, inter-sat, and inter-unsat. Our implementation matched the result of [28] on diff-sat and partially on inter-sat, saw a worse trend on diff-unsat and much worse trend on inter-unsat. A likely culprit is a different underlying model-checker, ABC [17] in our implementation versus IC3Ref [16] in [28]. However, IC3Ref was not used out of the box in [28], harnessing it efficiently for problems of our king is not entirely trivial.

Secondly, the results on application related BRE (all BRE except the parametric examples in b-param) quite surprisingly favour the tools based mostly on relatively basic NFA algorithms. The overall best is the simplest tool of all, our implementation eNfa of basic NFA constructions. Close to the performance of eNfa is VATA, which uses the antichain inclusion checking on b-armc-incl and b-regex (the fact that explicit complementation of eNfa is faster than the antichain of VATA suggests that the inclusion benchmarks are not particularly hard). VATA specialises to the more general tree automata, which probably causes unnecessary overhead. Automata also performs well. It uses slightly more advanced algorithms than eNfa (such as lazy evaluation of difference, though, without antichain pruning). Its symbolic representation of transition functions with BDDs probably does not provide much advantage here. This result challenges the view that translating complex problems, arising for instance in string constraint solving, into AFA in order to use the sophisticated machinery of AFA solvers is an obvious silver bullet. Organizing the computation into smaller NFA operations, where, moreover, partial results can be minimized and re-used, and a simpler and hence more flexible NFA technology is used, might be a better strategy (this seems to work very well for instance in our recent prototype string constraint solver [10]).

Our AFA emptiness test Antisat based on the antichain algorithm and a SAT solver has an interesting performance. As can be seen on the cactus plots, besides its absolute domination on a-ltlf-spec, it is significantly faster than other tools on a large portion of the other AFA emptiness benchmark, but struggles on the rest. The examples where it dominates are often automata with the structure resembling a lasso (or several lassos) with a long handle. The other implementation of an antichain algorithm, NFA/NTA inclusion in VATA, also shows a good performance. This together points on the overall strength of antichain algorithms.

The SMT string constraint solvers are not among the best in the benchmark related to practical applications, but are competitive (especially Z3), and win on some parametric cases. This may be due to that various heuristics unique to SMT solvers, especially rewriting that reduces one type of a constraint to another, kicks in. For instance, Z3 seems to solve exppaths1 with a help of rewriting to the sub-string constraint in the theory of sequences. In general, the measurements on parametric examples underscore the fact that no algorithm is universally the best and their relative performance may vary drastically depending on the kind of an input.

Although the mediocre performance of the other tools can be partially explained by their focus on a different kind of a problem or a dated underlying technology, and each of them is respectable in its own right, a point can be made against relying on them as a baseline in comparisons of tools for solving our kind of problem. Mona, optimized for a different settings (complex alphabets of bit-vectors with many minterms), is held back by the implicit determinization, and, in our case, probably by the overhead of the symbolic representation. It also frequently runs out of the 32-bit address space for BDD nodes. Similarly for Brics, which also always determinizes. The low performance of Bisim is surprising relative to the good results of the up-to algorithms reported in [11, 30]. It is more consistent with [39] where up-to algorithms were not wining against antichains on the more practical examples. Our results however do not directly contradict the results of [30] itself, since it does not compare with the fast tools identified here and stands to a large degree on parametric and random benchmarks. There is also always the possibility that we have prepared the input in a way not ideal for the tool. For instance, transformation to the separated AFA, required by Bisim, is not entirely trivial. Further investigation of this and a comparison with some other implementation of the up-to techniques seems to be needed. The lack of a raw speed of JaltImpact on BRE and AFA emptiness is expectable considering that it is meant for a different kind of systems, AFA over data words. The stable trends shown in the graphs suggest that an implementation of an interpolation-based abstraction refinement optimized for BRE and AFA emptiness might have a potential.

Main Takeaways. The backward reduction of AFA emptiness to BTS reachability in a combination with IC3 is very fast and extremely versatile, showing very good performance on almost all benchmarks. However, on BRE with a relation to a real world application, simple NFA algorithms actually tend to have the best raw performance, with the simplest implementation of NFA being the best. Antichain algorithms work also well, even significantly better than other algorithms on specific kinds of AFA. These seem to be the tools to use. Reasonable implementations of the backward BTS reduction with IC3, of antichain, and of basic NFA should also be the baseline of comparisons.

Mona and Brics, based on DFA, as well as JaltImpact focused on data words rather then on pure regular properties, do no reach the performance of the best tools. Also Bisim did not confirm the power of up-to algorithms. SMT-solvers, Z3 especially, are competitive, but cannot be considered the top of state of the art.

Generally, the particular kind and source of benchmark is a decisive factor influencing the performance of tools, as especially visible on the parametric benchmark.

Threads to Validity. Our results must be taken with a grain of salt as the experiment contains an inherent room for error. Although we tried to be as fair as possible, not knowing every tool intimately, the conversions between formats and kinds of automata, discussed at the start of Sect. 5, might have introduced biases into the experiment. Tools are written in different languages and some have parameters which we might have used in sub-optimal way (we use the tools in their default settings), or, in the case of libraries, we could have used a sub-optimal combination of functions. We also did not measure memory peaks, which could be especially interesting e.g. in when the tools are deployed on a cloud. We are, however, confident that our main conclusions are well justified and the experiment gives a good overall picture. The entire experiment is available for anyone to challenge or improve upon [2].

Notes

1.
This is not a most standard definition of AFA but it allows us to later cover and categorize their common syntactic variants. See e.g. [18, 41, 57] for more standard definitions.
2.
A state in a configuration is understood as a constraint. The less constraints, the more can be accepted from the configuration. Transitions to more constrained configurations are useless.
3.
Going backward, larger configurations are more permissive. Transitions from the same target with smaller configurations are useless.
4.
\(L'\subseteq L\) is emptiness of \(L'\cap \overline{L}\) and equivalence is emptiness of \((L'\cap \overline{L})\cup (\overline{L'}\cap L)\).
5.
In our experiment, simulation is only used after parsing and has minimal overall impact.
6.
We did not attempt to generate purely random problems. First, purely random automata generated e.g. by [74] seem to have different characteristics than automata coming from practical problems (e.g. in [12, 39]). Second, although generating random NFA is possible with a generator controlled by three simple parameters which give a manageable parameter-value space covering all NFA, it is not clear how to similarly generate random AFA or BRE. On the other hand, we do include a benchmark based on randomly generated LTL formulae, which we consider relatively close to realistic LTL specifications.
7.
https://github.com/lorisdanto/symbolicautomata/blob/master/benchmarks/src/main/java/regexconverter/pattern%4075.txt.
8.
https://drive.google.com/file/d/1eOYGvm3C8sQ-9iyfZ8qx42K54hgrFNTC.

References

The benchmark used in the paper. https://github.com/VeriFIT/automata-bench
Experiment replication package and additional material. https://www.fit.vutbr.cz/research/groups/verifit/tools/afa-comparison/
Jaltimpact. https://github.com/cathiec/JAltImpact
Abdulla, P.A., et al.: TRAU: SMT solver for string constraints. In: Proceedings of the FMCAD’18. IEEE (2018)
Google Scholar
Abdulla, P.A., Atig, M.F., Diep, B.P., Holík, L., Janků, P.: Chain-free string constraints. In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 277–293. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_16
Chapter Google Scholar
Abdulla, P.A., Chen, Y.-F., Holík, L., Mayr, R., Vojnar, T.: When simulation meets antichains. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 158–174. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12002-2_14
Chapter MATH Google Scholar
Barbosa, H., et al.: cvc5: a versatile and industrial-strength SMT solver. In: TACAS 2022. LNCS, vol. 13243, pp. 415–442. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99524-9_24
Chapter Google Scholar
Barrett, C., Fontaine, P., Tinelli, C.: The Satisfiability Modulo Theories Library (SMT-LIB) (2016). https://www.smt-lib.org/
Berzish, M.: Z3str4: a solver for theories over strings. Ph.D. thesis (2021). http://hdl.handle.net/10012/17102
Blahoudek, F., et al.: Word equations in synergy with regular constraints. In: Chechik, M., Katoen, J.P., Leucker, M. (eds.) Formal Methods. FM 2023. LNCS, vol. 14000, pp. 403–423. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27481-7_23
Bonchi, F., Pous, D.: Checking NFA equivalence with bisimulations up to congruence. In: Proceedings of the POPL’13. ACM (2013)
Google Scholar
Bouajjani, A., Habermehl, P., Holík, L., Touili, T., Vojnar, T.: Antichain-based universality and inclusion testing over nondeterministic finite tree automata. In: Ibarra, O.H., Ravikumar, B. (eds.) CIAA 2008. LNCS, vol. 5148, pp. 57–67. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70844-5_7
Chapter MATH Google Scholar
Boudet, A., Comon, H.: Diophantine equations, presburger arithmetic and finite automata. In: Kirchner, H. (ed.) CAAP 1996. LNCS, vol. 1059, pp. 30–43. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61064-2_27
Chapter MATH Google Scholar
Bozzano, M., et al.: Formal design and safety analysis of AIR6110 wheel brake system. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 518–535. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_36
Chapter Google Scholar
Bradley, A.R., Manna, Z.: Checking safety by inductive generalization of counterexamples to induction. In: Proceedings of the FMCAD’07. IEEE (2007)
Google Scholar
Bradley, A.: IC3 reference implementation: a short, simple, fairly competitive implementation of IC3 (2015). https://github.com/arbrad/IC3ref
Brayton, R., Mishchenko, A.: ABC: an academic industrial-strength verification tool. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 24–40. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_5
Chapter Google Scholar
Brzozowski, J., Leiss, E.: On equations for regular languages, finite automata, and sequential networks. Theor. Comput. Sci. 10(1) (1980)
Google Scholar
Brzozowski, J.A.: Canonical regular expressions and minimal state graphs for definite events. In: Proceedings of the Symposium on Mathematical Theory of Automata (1962)
Google Scholar
Büchi, J.R.: Weak Second-Order Arithmetic and Finite Automata, pp. 398–424. Springer, New York, NY (1990). https://doi.org/10.1007/978-1-4613-8928-6_22
Büchi, J.R.: On a decision method in restricted second order arithmetic. In: Proceedings of the International Congress on Logic, Method, and Philosophy of Science. SUP (1962)
Google Scholar
Cavada, R., et al.: The nuXmv symbolic model checker. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 334–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_22
Chapter Google Scholar
Cécé, G.: Foundation for a series of efficient simulation algorithms. In: Proceedings of the LICS’17. IEEE (2017)
Google Scholar
Chandra, A.K., Kozen, D.C., Stockmeyer, L.J.: Alternation. J. ACM 28(1) (1981)
Google Scholar
Chen, T., Chen, Y., Hague, M., Lin, A.W., Wu, Z.: What is decidable about string constraints with the replaceall function. In: Proceedings of the POPL’18 (2018)
Google Scholar
Chen, T., Hague, M., Lin, A.W., Rümmer, P., Wu, Z.: Decision procedures for path feasibility of string-manipulating programs with complex operations. In: Proceedings of the POPL’19 (2019)
Google Scholar
Cox, A.: Model Checking Regular Expressions (2019). presented at MOSCA’19. https://mosca19.github.io/slides/cox.pdf
Cox, A., Leasure, J.: Model checking regular language constraints. CoRR abs/1708.09073 (2017)
Google Scholar
D’Anthoni, L.: A symbolic automata library. https://github.com/lorisdanto/symbolicautomata
D’Antoni, L., Kincaid, Z., Wang, F.: A symbolic decision procedure for symbolic alternating finite automata. Electron. Notes Theor. Comput. Sci. 336 (2018)
Google Scholar
D’Antoni, L., Veanes, M.: The power of symbolic automata and transducers. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 47–67. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_3
Chapter Google Scholar
D’Antoni, L., Veanes, M.: Minimization of symbolic automata. In: Proceedings of the POPL’14. ACM (2014)
Google Scholar
D’Antoni, L., Veanes, M.: Minimization of symbolic tree automata. In: Proceedings of the LICS’16. ACM (2016)
Google Scholar
De Giacomo, G., Vardi, M.Y.: Linear temporal logic and linear dynamic logic on finite traces. In: Proceedings of the IJCAI’13. ACM (2013)
Google Scholar
De Wulf, M., Doyen, L., Maquet, N., Raskin, J.F.: Alaska. In: Cha, S., Choi, J.Y., Kim, M., Lee, I., Viswanathan, M. (eds.) Automated Technology for Verification and Analysis. ATVA 2008. LNCS, vol. 5311, pp. 240–245. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88387-6_21
Doyen, L., Raskin, J.-F.: Antichain algorithms for finite automata. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 2–22. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12002-2_2
Chapter MATH Google Scholar
Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24605-3_37
Chapter Google Scholar
Fellah, A., Jürgensen, H., Yu, S.: Constructions for alternating finite automata. Int. J. Comput. Math. 35 (1990)
Google Scholar
Fu, C., Deng, Y., Jansen, D.N., Zhang, L.: On equivalence checking of nondeterministic finite automata. In: Larsen, K.G., Sokolsky, O., Wang, J. (eds.) SETTA 2017. LNCS, vol. 10606, pp. 216–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69483-2_13
Chapter Google Scholar
Gange, G., Navas, J.A., Stuckey, P.J., Søndergaard, H., Schachte, P.: Unbounded model-checking with interpolation for regular language constraints. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 277–291. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36742-7_20
Chapter MATH Google Scholar
Ganty, P., Maquet, N., Raskin, J.: Fixed point guided abstraction refinement for alternating automata. Theor. Comput. Sci. 411(38–39) (2010)
Google Scholar
Gario, M., Cimatti, A., Mattarei, C., Tonetta, S., Rozier, K.Y.: Model checking at scale: automated air traffic control design space exploration. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 3–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41540-6_1
Chapter Google Scholar
Harding, A.: Symbolic strategy synthesis for games with LTL winning conditions. Ph.D. thesis, University of Birmingham (2005)
Google Scholar
Henriksen, J.G., et al.: Mona: monadic second-order logic in practice. In: Brinksma, E., Cleaveland, W.R., Larsen, K.G., Margaria, T., Steffen, B. (eds.) TACAS 1995. LNCS, vol. 1019, pp. 89–110. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60630-0_5
Chapter Google Scholar
Henzinger, M.R., Henzinger, T.A., Kopke, P.W.: Computing simulations on finite and infinite graphs. In: Proceedings of the FOCS. IEEE (1995)
Google Scholar
Hoder, K., Bjørner, N.: Generalized property directed reachability. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012. LNCS, vol. 7317, pp. 157–171. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31612-8_13
Chapter Google Scholar
Holík, L., Janků, P., Lin, A.W., Rümmer, P., Vojnar, T.: String constraints with concatenation and transducers solved efficiently. In: Proceedings of the POPL’18, vol. 2 (2018)
Google Scholar
Holík, L., Lengál, O., Síč, J., Veanes, M., Vojnar, T.: Simulation algorithms for symbolic automata. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 109–125. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_7
Chapter MATH Google Scholar
Holík, L., Lengál, O., Šimáček, J., Vojnar, T.: Efficient inclusion checking on explicit and semi-symbolic tree automata. In: Bultan, T., Hsiung, P.-A. (eds.) ATVA 2011. LNCS, vol. 6996, pp. 243–258. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24372-1_18
Chapter MATH Google Scholar
Holík, L., Šimáček, J.: Optimizing an LTS-simulation algorithm. Comput. Inform. 7, 1337–1348 (2010)
MathSciNet MATH Google Scholar
Hooimeijer, P., Weimer, W.: A decision procedure for subset constraints over regular languages. In: PLDI’09. ACM (2009)
Google Scholar
Hopcroft, J.E.: An n log n algorithm for minimizing states in a finite automaton. Technical report, Stanford, CA, USA (1971)
Google Scholar
Hromkovič, J.: On the power of alternation in automata theory. J. Comput. Syst. Sci. 31(1) (1985)
Google Scholar
Huffman, D.: The synthesis of sequential switching circuits. J. Franklin Inst. 257(3) (1954)
Google Scholar
Ilie, L., Navarro, G., Yu, S.: On NFA reductions. In: Karhumäki, J., Maurer, H., Păun, G., Rozenberg, G. (eds.) Theory Is Forever. LNCS, vol. 3113, pp. 112–124. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27812-2_11
Chapter Google Scholar
Iosif, R., Xu, X.: Abstraction refinement for emptiness checking of alternating data automata. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 93–111. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89963-3_6
Chapter Google Scholar
Kupferman, O., Vardi, M.Y.: Weak alternating automata are not that weak. ACM Trans. Comput. Logic 2(3) (2001)
Google Scholar
Kupferman, O., Vardi, M.Y., Wolper, P.: An automata-theoretic approach to branching-time model checking. J. ACM 47(2) (2000)
Google Scholar
Lengál, O., Šimáček, J., Vojnar, T.: VATA: a library for efficient manipulation of non-deterministic tree automata. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 79–94. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5_7
Chapter MATH Google Scholar
Li, J., Pu, G., Zhang, Y., Vardi, M.Y., Rozier, K.Y.: SAT-based explicit LTLf satisfiability checking. Artif. Intell. 289 (2020)
Google Scholar
Lutterkort, D.: libfa. https://augeas.net/libfa/
McMillan, K.L.: Interpolation and SAT-based model checking. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45069-6_1
Chapter Google Scholar
McMillan, K.L.: Lazy abstraction with interpolants. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 123–136. Springer, Heidelberg (2006). https://doi.org/10.1007/11817963_14
Chapter Google Scholar
Moore, E.F.: Gedanken-experiments on sequential machines. In: Automata Studies, vol. 34. Princeton University Press, Princeton (1956)
Google Scholar
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Chapter Google Scholar
Muller, D., Saoudi, A., Schupp, P.: Weak alternating automata give a simple explanation of why most temporal and dynamic logics are decidable in exponential time. In: Proceedings of the LICS. IEEE (1988)
Google Scholar
Møller, A., et al.: Brics automata library. https://www.brics.dk/automaton/
Nötzli, A., Reynolds, A., Barbosa, H., Barrett, C., Tinelli, C.: Even faster conflicts and lazier reductions for string solvers. In: Shoham, S., Vizel, Y. (eds.) Computer Aided Verification. CAV 2022. LNCS, vol. 13372, pp. 205–226. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13188-2_11
Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6) (1987)
Google Scholar
Ranzato, F., Tapparo, F.: An efficient simulation algorithm based on abstract interpretation. Inf. Comput. 208, 1–22 (2010)
Article MathSciNet MATH Google Scholar
RegExLib.com: The Internet’s first Regular Expression Library. http://regexlib.com/
Rozier, K.Y., Vardi, M.Y.: LTL satisfiability checking. In: Bošnački, D., Edelkamp, S. (eds.) SPIN 2007. LNCS, vol. 4595, pp. 149–167. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73370-6_11
Chapter Google Scholar
Stanford, C., Veanes, M., Bjørner, N.S.: Symbolic boolean derivatives for efficiently solving extended regular expression constraints. In: Proceedings of the PLDI’21. ACM (2021)
Google Scholar
Tabakov, D., Vardi, M.Y.: Experimental evaluation of classical automata constructions. In: Sutcliffe, G., Voronkov, A. (eds.) LPAR 2005. LNCS (LNAI), vol. 3835, pp. 396–411. Springer, Heidelberg (2005). https://doi.org/10.1007/11591191_28
Chapter Google Scholar
Valmari, A.: Simple bisimilarity minimization in O(m log n) time. Fundam. Inform. 105(3) (2010)
Google Scholar
Vardi, M.Y.: Nontraditional applications of automata theory. In: Hagiya, M., Mitchell, J.C. (eds.) TACS 1994. LNCS, vol. 789, pp. 575–597. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57887-0_116
Chapter Google Scholar
Vargovčík, P., Holík, L.: Simplifying alternating automata for emptiness testing. In: Oh, H. (ed.) APLAS 2021. LNCS, vol. 13008, pp. 243–264. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89051-3_14
Chapter Google Scholar
Veanes, M.: A.NET automata library. https://github.com/AutomataDotNet/Automata
Veanes, M., de Halleux, P., Tillmann, N.: Rex: symbolic regular expression explorer. In: Proceedings of the ICST’10. IEEE (2010)
Google Scholar
Wang, H.-E., Tsai, T.-L., Lin, C.-H., Yu, F., Jiang, J.-H.R.: String analysis via automata manipulation with logic circuit representation. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp. 241–260. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41528-4_13
Chapter Google Scholar
Wolper, P., Boigelot, B.: An automata-theoretic approach to Presburger arithmetic constraints. In: Mycroft, A. (ed.) SAS 1995. LNCS, vol. 983, pp. 21–32. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60360-3_30
Chapter Google Scholar
De Wulf, M., Doyen, L., Henzinger, T.A., Raskin, J.-F.: Antichains: a new algorithm for checking universality of finite automata. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 17–30. Springer, Heidelberg (2006). https://doi.org/10.1007/11817963_5
Chapter Google Scholar

Download references

Acknowledgments

This work has been supported by the Czech Ministry of Education, Youth and Sports ERC.CZ project LL1908, and the FIT BUT internal project FIT-S-20-6427.

Author information

Authors and Affiliations

Brno University of Technology, Brno, Czech Republic
Tomáš Fiedor, Lukáš Holík, Martin Hruška, Adam Rogalewicz, Juraj Síč & Pavol Vargovčík

Authors

Tomáš Fiedor
View author publications
You can also search for this author in PubMed Google Scholar
Lukáš Holík
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hruška
View author publications
You can also search for this author in PubMed Google Scholar
Adam Rogalewicz
View author publications
You can also search for this author in PubMed Google Scholar
Juraj Síč
View author publications
You can also search for this author in PubMed Google Scholar
Pavol Vargovčík
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lukáš Holík .

Editor information

Editors and Affiliations

McGill University, Montreal, QC, Canada
Brigitte Pientka
The University of Iowa, Iowa City, IA, USA
Cesare Tinelli

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fiedor, T., Holík, L., Hruška, M., Rogalewicz, A., Síč, J., Vargovčík, P. (2023). Reasoning About Regular Properties: A Comparative Study. In: Pientka, B., Tinelli, C. (eds) Automated Deduction – CADE 29. CADE 2023. Lecture Notes in Computer Science(), vol 14132. Springer, Cham. https://doi.org/10.1007/978-3-031-38499-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-38499-8_17
Published: 02 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38498-1
Online ISBN: 978-3-031-38499-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics