Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Active automata learning, from its early beginnings almost thirty years ago [6], inspired a number of applications in quite a number of fields (see [19] for a survey). However, it took almost a decade for the software verification and testing community to recognize its value of being able to provide models of black-box systems for the plethora of model-based tools and techniques. More precisely, it was not until the seminal works of Peled et al. [36], employing automata learning to model check black-box systems, and Steffen et al. [18], who used it to automatically generate test cases for legacy computer-telephony integrated systems, that this use case of automata learning was discovered.

Since then, however, active automata learning has enjoyed quite a success story, having been used as a valuable tool in areas as diverse as automated GUI testing [13], fighting bot-nets [12], or typestate analysis [5, 41]. Most of these works, however, used their custom, one-off implementation of the well-known \(\text{ L }^*\) learning algorithm [6], and hence invested relatively little effort for optimizations, or using a more sophisticated (but harder to implement and lesser-known) algorithm altogether.Footnote 1

We started developing the LearnLib Footnote 2 library to provide researchers and practitioners with a reusable set of components to facilitate and promote the use of active automata learning, and to enable access to cutting-edge automata learning technology. From the beginnings of the development of LearnLib, started in 2003, until now, more than a decade has passed. In these years, many lessons were learned on what makes for a usable, efficient and practically feasible product that fulfills this goal (cf. [25, 35, 37]).

These lessons form the basis of the new LearnLib presented in this paper. The new LearnLib is not just an overhaul of the prior version, but completely re-written from scratch. It provides a higher level of abstraction and increased flexibility, while simultaneously being the fastest version of LearnLib to date (cf. Sect. 4). As a service to the community and to encourage contributions by and collaborations with other research groups, we decided to make LearnLib available under an open-source license (the Apache License, version 2.0 Footnote 3). In the remainder of this paper we highlight two aspects that we address with LearnLib.

Advanced Features. This is what we consider the strongest case for preferring a comprehensive automata learning framework such as LearnLib over a custom implementation. While implementing the original version of \(\text{ L }^*\) is not a challenging task, the situation is different for more refined active learning algorithms, such as Rivest & Schapire’s [38], Kearns & Vazirani’s [30] or even the very recent TTT algorithm [28]. While we found these algorithms to consistently outperform \(\text{ L }^*\), the latter remains the most widely used. Also, several other advanced optimizations such as query parallelization or efficient query caches are typically neglected. Through LearnLib ’s modular design, changing filters, algorithm parameters or even the whole algorithm is a matter of a few lines of code, yielding valuable insights on how different algorithms perform on certain input data. Many of these features rely on AutomataLib, the standalone finite-state machine library that was developed for LearnLib, which provides a rich toolbox of data structures and algorithms for finite-state machines. The design of AutomataLib is presented in Sect. 2, while Sect. 3 provides a more comprehensive overview of LearnLib ’s feature set.

Performance. The implementation of a learning algorithm comes with many performance pitfalls. Even though in most cases the time taken by the actual learning algorithm is an uncritical aspect (compared to the time spent in executing queries, which may involve, e.g., network communication), it should be kept as low as reasonably possible. Besides, an efficient management of data structures is necessary to enable learning of large-scale systems without running into out-of-memory conditions or experiencing huge performance slumps. In LearnLib, considerable effort was spent on efficient implementations while providing a conveniently high level of abstraction. This will be detailed in Sect. 4.

Finally, we conclude the paper by briefly discussing envisioned future work in Sect. 5.

Fig. 1.
figure 1

Architecture of AutomataLib

2 AutomataLib

One of the main architectural changes of the open-source LearnLib is that it uses a dedicated, stand-alone library for representing and manipulating automata, called AutomataLib.Footnote 4 While AutomataLib is formally independent of LearnLib, its development process is closely intertwined with the one of LearnLib. For this reason, AutomataLib mainly focuses on deterministic automata, even though selected classes of non-deterministic automata are supported as well (e.g., NFAs).

AutomataLib is divided into an abstraction layer, automata implementations, and algorithms (cf. Fig. 1). The abstraction layer comprises a set of Java interfaces to represent various types of automata and graphs, organized in a complex, fine-grained type hierarchy. Furthermore, these interfaces were designed in a generic fashion, to integrate existing, third-party automata implementations into AutomataLib ’s interface landscape with as little effort and run-time overhead as possible. For instance, a proof-of-concept adapter for the BRICS automaton libraryFootnote 5 could be realized in as little as 20 lines of Java code.

Adapters like for the BRICS library form one part of the implementation layer. The other part are generic automaton implementations, e.g., for DFAs or Mealy machines, that provide good defaults for general setups, and are also used by most algorithms in LearnLib to store hypotheses.

Sample algorithms shipped with AutomataLib include minimization, equivalence testing, or visualization (via GraphVIZ ’sFootnote 6 dot tool). The set of functionalities will be continuously extended, with a strong focus on functionality either directly required in LearnLib, or desirable in a typical automata learning application context.

An important aspect is that the algorithms operate solely on the abstraction layer, meaning that they are implementation agnostic: they can be used with a (wrapped) BRICS automaton as well as with other automaton implementations. Furthermore, the generic design enables a high degree of code reuse: the minimization (or equivalence checking) algorithm can be used for both DFA and Mealy machines, as it is designed to only require a deterministic automaton, instead of a concrete machine type (or even implementation).

3 LearnLib

LearnLib provides a set of components to apply automata learning in practical settings, or to develop or analyze automata learning algorithms. These can be grouped into three main classes: learning algorithms, methods for finding counterexamples (so-called Equivalence Queries), and infrastructure components.

Learning Algorithms. LearnLib features a rich set of learning algorithms, covering the majority of algorithms which have been published (and many beyond that). Care was taken to develop the algorithms in a modular and parameterizable fashion, which allows us to use a single “base" algorithm to realize several algorithms described in the literature, e.g., by merely exchanging the involved counterexample analysis strategy. Perhaps the best example for this is the L \({}^*\) algorithm [6], which can be configured to pose as Maler & Pnueli’s [31], Rivest & Schapire’s [38], or Shahbaz’s [26] algorithm, Suffix1by1 [26], or variants thereof. Other base algorithms available in LearnLib are the Observation Pack [21] algorithm, Kearns & Vazirani’s [30] algorithm, the DHC [34] algorithm, and the TTT [28] algorithm. These, too, can be adapted in the way they handle counterexamples, e.g., by linear search, binary search (à la Rivest & Schapire), or exponential search [29]. With the exception of DHC, all these algorithms are available in both DFA and Mealy versions. Furthermore, LearnLib features the NL \({}^*\) algorithm for learning NFAs [8].

Equivalence Tests and Finding Counterexamples. Once a learning algorithm converges to a stable hypothesis, a counterexample is needed to ensure further progress. In the context of active learning, the process of searching for a counterexample is also referred to as an equivalence query. “Perfect" equivalence queries are possible only when a model of the target system is available. In this case, LearnLib uses Hopcroft and Karp’s near-linear equivalence checking algorithm [4, 20] available through AutomataLib. In black-box scenarios, equivalence queries can be approximated using conformance tests. AutomataLib provides implementations of the W-method [14] and the Wp-method [16], two of the few conformance tests that can find missing states. Often, the cheapest and fastest way of approximating equivalence queries is searching for counterexamples directly: LearnLib implements a random walk (only for Mealy machines), randomized generation of tests, and exhaustive generation of test inputs (up to a certain depth).

Infrastructure. The third class of components that come with LearnLib provide useful infrastructure functionality such as a logging facility, an import/export mechanism to store and load hypotheses, or utilities for gathering statistics. An important component for many practical applications are (optimizing) filters, which pre-process the queries posed by the learning algorithm. A universally useful example of such a filter is a cache filter [32], eliminating duplicate queries that most algorithms pose. Other examples include a parallelization component that distributes queries across multiple workers [22], a mechanism for reusing system states to reduce the number of resets [7], and for prefix-closed systems [32].

Fig. 2.
figure 2

Performance comparison between the new LearnLib and libalf. Left: run-time of the classic \(\text{ L }^*\) algorithm on a series of randomly generated automata with state counts between 10 and 1000. Right: run-time of five comparable algorithms from LearnLib and libalf on a DFA with 500 states.

For a learning algorithm to work in practice, some interface to the system under learning (SUL) needs to be available. While this is generally specific to the SUL itself, LearnLib provides SUL adapters for typical cases, e.g., Java classes, web-services, or processes that are interfaced with via standard I/O.

4 Evaluation

We are aware of two other open-source automata learning libraries that provide implementations of textbook algorithms, complemented by own developments:

  • libalf Footnote 7. The Automata Learning Framework [9], was developed primarily at the RWTH Aachen. It is available under LGPLv3 and written in C++. Its active development seems to have ceased; the last version was released in April 2011.

  • AIDE Footnote 8. The Automata-Identification Engine, under active development, is available under the open-source license LGPLv2.1 and written in C#.

The ambitions behind LearnLib go further: It is specifically designed to easily compose new custom learning algorithms on the basis of components for counterexample analysis, approximations of equivalence queries, as well as connectors to real life systems. Moreover, LearnLib provides a variety of underlying data structures, and various means for visualizing the algorithm and its statistics. This does not only facilitate the construction of highly performant custom solutions, but also provides a deeper understanding of the algorithms’ characteristics. The latter has been essential, e.g., for designing the TTT algorithm [28], which almost uniformly outperforms all the previous algorithms.

Performance. As we have mentioned earlier, the open-source LearnLib is the fastest version of LearnLib to date, and moreover the fastest automata learning implementation that we are aware of. We have conducted a preliminary performance evaluation, comparing the new LearnLib to libalf and the old, closed-source version of LearnLib (which we will refer to as JLearn in order to avoid confusion). A visualization of some of the results comparing LearnLib and libalf is shown in Fig. 2. It can be clearly seen that in the considered setting, LearnLib is more than an order of magnitude faster than libalf (even though the former is implemented in Java while the latter is implemented in C++). More importantly, the gap grows with the size of the system to be learned. In our experiments, the open-source LearnLib also outperformed JLearn on a similar scale. More detailed performance data can be found on the LearnLib website.Footnote 9

Applications. The performance data demonstrates that LearnLib provides a robust basis for fast and scalable active automata learning solutions. Consequently, in its ten years of continued development, LearnLib has been used in a number of research and industry projects, of which we briefly present some of the more recent ones. A more complete list can be found on the LearnLib homepage. LearnLib has been used to infer models of smart card readers [11] and of bank cards [3]. The models were used to verify security properties of these systems. In [2, 15], models of communication protocols are inferred using LearnLib. The models are used to verify the conformance of protocol implementations to the corresponding specifications. At TU Dortmund, LearnLib has been used in an industry project [40] to generate models of a web application. The models were used to test regressions in the user interface and in the business processes of this application. The authors of [33] propose a method for generating checking circuits for functions implemented in FPGAs. The method uses models of the functions that are inferred with LearnLib. LearnLib is also used in other tools: PSYCO [17, 23] is a tool for generating precise interfaces of software components developed at CMU and NASA Ames. The tool combines concolic execution and active automata learning (i.e., LearnLib). Tomte, developed at the Radboud University of Nijmegen [1] leverages regular inference algorithms provided by LearnLib to infer richer classes of models by simultaneously inferring sophisticated abstractions (or “mappers").

5 Conclusion

In this paper we have presented LearnLib, a versatile open-source library of active automata learning algorithms. LearnLib is unique in its modular design, which has furthered the development of new learning algorithms (e.g., the TTT algorithm [28]) and tools (e.g., Tomte [1] and PSYCO [17, 23]).

While in many aspects the open-source LearnLib by far surpasses the capabilities of the previous version, there are two major features which have yet to be ported. The first is LearnLib Studio (cf. [35]), a graphical user interface for LearnLib, and the second is an extension for learning Register Automata. An extension for learning Register Automata with the theory of equality only was available upon request for the old LearnLib in binary form [24, 27]. We are currently working on a generalized approach [10], which will be included in the open-source release.