figure a

1 Introduction

In her seminal work [3], Angluin introduced the well-known algorithm \(\textsf {L}^*\) to learn regular languages by means of deterministic finite automata (DFAs). In the learning setting presented in [3], there is a teacher, who knows the target language L, and a learner, whose task is to learn the target language, represented by an automaton. The learner interacts with the teacher by means of two kinds of queries: membership queries and equivalence queries. A membership query \(\mathrm {MQ}(w)\) asks whether a string w belongs to L while an equivalence query \(\mathrm {EQ}(A)\) asks whether the conjectured DFA \(A\) recognizes L. The teacher replies with a witness if the conjecture is incorrect otherwise the learner completes its job. This learning setting now is widely known as active automata learning. In recent years, active automata learning algorithms have attracted increasing attention in the computer aided verification community: it has been applied in black-box model checking [24], compositional verification [12], program verification [10], error localization [8], and model learning [26].

Due to the increasing importance of automata learning algorithms, many efforts have been put into the development of automata learning libraries such as libalf  [6] and LearnLib  [18]. However, their focus is only on automata accepting finite words, which correspond to safety properties. The \(\omega \)-regular languages are the standard formalism to describe liveness properties. The problem of learning the complete class of \(\omega \)-regular languages was considered open until recently, when it has been solved by Farzan et al. [15] and improved by Angluin et al. [4].

However, the research on applying \(\omega \)-regular language learning algorithms for verification problems is still in its infancy. Learning algorithms for \(\omega \)-regular languages are admittedly much more complicated than their finite regular language counterparts. This becomes a barrier for the researchers doing further investigations and experiments on such topics. We present ROLL 1.0, an open-source library implementing all existing learning algorithms for the complete class of \(\omega \)-regular languages known in literature, which we believe can be an enabling tool for this direction of research. To the best of our knowledge, ROLL 1.0 is the only publicly available tool focusing on \(\omega \)-regular language learning.

ROLL, a preliminary version of ROLL 1.0, was developed in [22] to compare the performance of different learning algorithms for Büchi automata (BAs). The main improvements made in ROLL 1.0 compared to its previous version are as follows. ROLL 1.0 rewrites the algorithms in the core part of ROLL and obtains high modularity to allow for supporting the learning algorithms for more types of \(\omega \)-automata than just BAs, algorithms to be developed in future. In addition to the BA format [1, 2, 11], ROLL 1.0 now also supports the Hanoi Omega Automata (HOA) format [5]. Besides the learning algorithms, ROLL 1.0 also contains complementation [23] and a new language inclusion algorithm. Both of them are built on top of the BAs learning algorithms. Experiments [23] have shown that the resulting automata produced by the learning-based complementation can be much smaller than those built by structure-based algorithms [7, 9, 19, 21, 25]. Therefore, the learning-based complementation is suitable to serve as a baseline for Büchi automata complementation researches. The language inclusion checking algorithm implemented in ROLL 1.0 is based on learning and a Monte Carlo word sampling algorithm [17]. ROLL 1.0 features an interactive mode which is used in the ROLL Jupyter notebook environment. This is particularly helpful for teaching and learning how \(\omega \)-regular language learning algorithms work.

2 ROLL 1.0 Architecture and Usage

ROLL 1.0 is written entirely in Java and its architecture, shown in Fig. 1, comprises two main components: the Learning Library, which provides all known existing learning algorithms for Büchi automata, and the Control Center, which uses the learning library to complete the input tasks required by the user.

Fig. 1.
figure 1

Architecture of ROLL 1.0

Learning Library. The learning library implements all known BA learning algorithms for the full class of \(\omega \)-regular languages: the \(L^{\$}\) learner [15], based on DFA learning [3], and the \(L^{\omega }\) learner [22], based on three canonical family of DFAs (FDFAs) learning algorithms [4, 22]. ROLL 1.0 supports both observation tables [3] and classification trees [20] to store membership query answers. All learning algorithms provided in ROLL 1.0 implement the Learner interface; their corresponding teachers implement the Teacher interface. Any Java object that implements Teacher and can decide the equivalence of two Büchi automata is a valid teacher for the BA learning algorithms. Similarly, any Java object implementing Learner can be used as a learner, making ROLL 1.0 easy to extend with new learning algorithms and functionalities. The BA teacher implemented in ROLL 1.0 uses RABIT  [1, 2, 11] to answer the equivalence queries posed by the learners since the counterexamples RABIT provides tend to be short and hence are easier to analyze; membership queries are instead answered by implementing the ASCC algorithm from [16].

Control Center. The control center is responsible for calling the appropriate learning algorithm according to the user’s command and options given at command line, which is used to set the Options. The file formats supported by ROLL 1.0 for the input automata are the RABIT BA format [1, 2, 11] and the standard Hanoi Omega Automata (HOA) format [5], identified by the file extensions .ba and .hoa, respectively. Besides managing the different execution modes, which are presented below, the control center allows for saving the learned automaton into a given file (option -out), for further processing, and to save execution details in a log file (option -log). The output automaton is generated in the same format of the input. The standard way to call ROLL 1.0 from command line is

  • Learning mode (command learn) makes ROLL 1.0 learn a Büchi automaton equivalent to the given Büchi automaton; this can be used, for instance, to get a possibly smaller BA. The default option for storing answers to membership queries is -table, which selects the observation tables; classification trees can be chosen instead by means of the -tree option.

    for instance runs ROLL 1.0 in learning mode against the input BA aut.hoa; it learns aut.hoa by means of the \(L^{\omega }\) learner using observation tables. The three canonical FDFA learning algorithms given in [4] can be chosen by means of the options -syntactic (default), -recurrent, and -periodic. Options -under (default) and -over control which approximation is used in the \(L^{\omega }\) learner [22] to transform an FDFA to a BA. By giving the option -ldollar, ROLL 1.0 switches to use the \(L^{\$}\) learner instead of the default \(L^{\omega }\) learner.

  • Interactive mode (command play) allows users to play as the teacher guiding ROLL 1.0 in learning the language they have in mind. To show how the learning procedure works, ROLL 1.0 outputs each intermediate result in the Graphviz dot layout formatFootnote 1; users can use Graphviz’s tools to get a graphical view of the output BA so to decide whether it is the right conjecture.

  • Complementation (command complement) of the BA \(\mathcal {B}\) in ROLL 1.0 is based on the algorithm from [23] which learns the complement automaton \(\mathcal {B}^{\mathsf {c}}\) from a teacher who knows the language \(\varSigma ^{\omega }\setminus \mathcal {L}(\mathcal {B})\). This allows ROLL 1.0 to disentangle \(\mathcal {B}^{\mathsf {c}}\) from the structure of \(\mathcal {B}\), avoiding the \(\varOmega ((0.76n)^{n})\) blowup [27] of the structure-based complementation algorithms (see., e.g., [7, 19, 21, 25]).

  • Inclusion testing (command include) between two BAs \(\mathcal {A}\) and \(\mathcal {B}\) is implemented in ROLL 1.0 as follows: (1) first, sample several \(\omega \)-words \(w \in \mathcal {L}(\mathcal {A})\) and check whether \(w \notin \mathcal {L}(\mathcal {B})\) to prove \(\mathcal {L}(\mathcal {A}) \not \subseteq \mathcal {L}(\mathcal {B})\); (2) then, try simulation techniques [11, 13, 14] to prove inclusion; (3) finally, use the learning based complementation algorithm to check inclusion. The ROLL 1.0’s \(\omega \)-word sampling algorithm is an extension of the one proposed in [17]. The latter only samples paths visiting any state at most twice while ROLL 1.0’s variant allows for sampling paths visiting any state at most K times, where K is usually set to the number of states in \(\mathcal {A}\). In this way, ROLL 1.0 can get a larger set of \(\omega \)-words accepted by \(\mathcal {A}\) than the set from the original algorithm.

Fig. 2.
figure 2

ROLL 1.0 running in the Jupyter notebook for interactively learning \(\varSigma ^{*} \cdot b^{\omega }\)

Online availability of ROLL 1.0. ROLL 1.0 is an open-source library freely available online at https://iscasmc.ios.ac.cn/roll/, where more details are provided about its commands and options, its use as a Java library, and its GitHub repositoryFootnote 2. Moreover, from the roll page, it is possible to access an online Jupyter notebookFootnote 3 allowing to interact with ROLL 1.0 without having to download and compile it. Each client gets a new instance of the notebook, provided by JupyterHubFootnote 4, so to avoid unexpected interactions between different users. Figure 2 shows few screenshots of the notebook for learning in interactive mode the language \(\varSigma ^{*} \cdot b^{\omega }\) over the alphabet \(\varSigma = \{a, b\}\). As we can see, the membership query \(\mathrm {MQ}(w)\) is answered by means of the mqOracle function: it gets as input two finite words, the stem and the loop of the ultimately periodic word w, and it checks whether loop contains only b. Then one can create a BA learner with the oracle mqOracle, say the BA learner nbaLearner, based on observation tables and the recurrent FDFAs, as shown in the top-left screenshot. One can check the internal table structures of nbaLearner by printing out the learner, as in the top-right screenshot. The answer to an equivalence query is split in two parts: first, the call to getHypothesis() shows the currently conjectured BA; then, the call to refineHypothesis("ba", "ba") simulates a negative answer with counterexample \(ba \cdot (ba)^{\omega }\). After the refinement by nbaLearner, the new conjectured BA is already the right conjecture.