figure a
figure b

1 Introduction

Automata learning aims to extract state machines from observed input-output sequences of some system-under-learning (SUL). Active automata learning (AAL) assumes that one has black-box access to this SUL, allowing the learner to incrementally choose inputs and observe the outputs. The models learned by AAL can be used as a documentation effort, but are more typically used as basis for testing, verification, conformance checking, fingerprinting—see [9, 23] for an overview of applications. The classical algorithm for AAL is \(L^*\), introduced by Angluin [2]; state-of-the-art algorithms are, e.g., \(L^{\#}\) [24] and TTT [11], which are available in toolboxes such as LearnLib [12] and AALpy [16].

The primary challenge in AAL is to reduce the number of inputs sent to the SUL, referred to as the sample complexity. To learn a 31-state machine with 22 inputs, state-of-the-art learners may send several million inputs to the SUL [24]. This is not necessarily unexpected: the underlying space of 31-state state machines is huge and it is nontrivial how to maximise information gain. The literature has investigated several approaches to accelerate learners, see the overview of [23]. Nevertheless, scalability remains a core challenge for AAL.

We study adaptive AAL [8], which aims to improve the sample efficiency by utilizing expert knowledge already given to the learner. In (regular) AAL, a learner commonly starts learning from scratch. In adaptive AAL, however, the learner is given a reference model, which ought to be similar to the SUL. Reference models occur naturally in many applications of AAL. For instance: (1) Systems evolve over time due to, e.g., bug fixes or new functionalities—and we may have learned the previous system; (2) Standard protocols may be implemented by a variety of tools; (3) The SUL may be a variant of other systems, e.g., being the same system executing in another environment, or a system configured differently.

Several algorithms for adaptive AAL have been proposed [5,6,7,8, 25]. Intuitively, the idea is that these methods try to rebuild the part of the SUL which is similar to the reference model. This is achieved by deriving suitable queries from the reference model, using so-called access sequences to reach states, and so-called separating sequences to distinguish these from other states. These algorithms rely on a rather strict notion of similarity that depends on the way we reach these states. In particular, existing rebuilding algorithms cannot effectively learn an SUL from a reference model that has a different initial state, see Sect. 2.

We propose an approach to adaptive AAL based on state matching, which allows flexibly identifying parts of the unknown SUL where the reference model may be an informative guide. More specifically, in this approach, we match states in the model that we have learned so far (captured as a tree-shaped automaton) with states in the reference model such that the outputs agree on all enabled input sequences. This matching allows for targeted re-use of separating sequences from the reference model and is independent of the access sequences. We refine the approach by using approximate state matching, where we match a current state with one from the reference model that agrees on most inputs.

Approximate state matching is the essential ingredient for the novel \(AL^{\#}\) algorithm. This algorithm is a conservative extension of the recent \(L^{\#}\) [24]. Along with approximate state matching, \(AL^{\#}\) includes rebuilding steps, which are similar to existing methods, but tightly integrated in \(L^{\#}\). Finally, \(AL^{\#}\) is the first approach with dedicated support to use more than one reference model.

Contributions. We make the following contributions to the state-of-the-art in adaptive AAL. First, we present state matching and its generalization to approximate state matching which allows flexible re-use of separating sequences from the reference model. Second, we include state matching and rebuilding in an unifying approach, called \(AL^{\#}\), which generalizes the \(L^{\#}\) algorithm for non-adaptive automata learning. We analyse the resulting framework in terms of termination and complexity. This framework naturally supports using multiple reference models as well as removing and adding inputs to the alphabet. Our empirical results show the efficacy of \(AL^{\#}\). In particular, \(AL^{\#}\) may reduce the number of inputs to the SUL by two orders of magnitude.

Related work. Adaptive AAL goes back to [8]. That paper, and many of the follow-up approaches [4,5,6,7] re-use access sequences and separating sequences from the reference model (or from the data structures constructed when learning that model). The recent approach in [6] removes redundant access sequences during rebuilding and continues learning with informative separating sequences. In [25], an \(L^*\)-based adaptive AAL approach is proposed where the algorithm starts by including all separating sequences that arise when learning the reference model with \(L^*\), ignoring access sequences. This algorithm is used in [10] for a general study of the usefulness of adaptive AAL: Among others, the authors suggest using more advanced data structures than the observation tables in \(L^*\). Indeed, in [4] the internal data structure of the TTT algorithm is used [11] in the context of lifelong learning; the precise rebuilding approach is not described. The recent [7] proposes an adaptive AAL method based on discrimination trees as used in the Kearns-Vazirani algorithm [13]. We consider the algorithms proposed in [6, 7] the state-of-the-art and have experimentally compared \(AL^{\#}\) in Sect. 8.

2 Overview

We illustrate (1) how adaptive AAL uses a reference model to help learn a system and (2) how this may reduce the sample complexity of the learner.

Fig. 1.
figure 1

An SUL \(\mathcal {S}\) and three reference models \(\mathcal {R}_1\), \(\mathcal {R}_2\) and \(\mathcal {R}_3\).

MAT Framework. We recall the standard setting for AAL: Angluin’s MAT framework, cf. [9, 23]. Here, the learner has no direct access to the SUL, but may ask output queries (OQs): these return, for a given input sequence, the sequence of outputs from the SUL; and equivalence queries (EQs): these take a Mealy machine \(\mathcal {H}\) as input, and return whether or not \(\mathcal {H}\) is equivalent to the SUL. In case it is not, a counterexample is provided in the form of a sequence of inputs for which \(\mathcal {H}\) and the SUL return different outputs. EQs are expensive [3, 19, 22, 26], therefore, we aim to learn the SUL using primarily OQs.

Apartness. Learning algorithms in the MAT framework typically assume that two states are equivalent as long as their known residual languages are equivalent. To discover a new state, we must therefore (1) access it by an input sequence and (2) prove this state distinct (apart) from the other states that we already know. Consider the SUL \(\mathcal {S}\) in Fig. 1a. The access sequences c, ca access \(q_4\) and \(q_5\), respectively, from the initial state. These states are different because the response to executing c from \(q_4\) and \(q_5\) is distinct: We say c is a separating sequence for \(q_4\) and \(q_5\). This difference can be observed by posing OQs for cc and cac, consisting of the access sequences for \(q_4\) and \(q_5\) followed by their separating sequence c.

Aim. The aim of adaptive AAL is to learn SULs with fewer inputs, using knowledge in the form of a reference model, known to the learner and preferrably similar to the SUL. The discovery of states is accelerated by extracting candidates for both (1) access sequences and (2) separating sequences from the reference model.

Rebuilding. The state-of-the-art in adaptive AAL uses access sequences and separating sequences from the reference model [6, 7] in an initial phase. Consider the Mealy machine \(\mathcal {R}_1\) in Fig. 1b as a reference model for the SUL \(\mathcal {S}\) in Fig. 1a. The sequences \(\varepsilon \), c, ca can be used to access all orange states in both \(\mathcal {S}\) and \(\mathcal {R}_1\). The separating sequences c and ac for these states in \(\mathcal {R}_1\) also separate the orange states in \(\mathcal {S}\). By asking OQs combining the access sequences and separating sequences, we discover all orange states for \(\mathcal {S}\).

Limits of Rebuilding. However, these rebuilding approaches have limitations. Consider \(\mathcal {R}_2\) in Fig. 1c. The sequences \(\varepsilon \), b, bb and bbb can be used to access all states in \(\mathcal {R}_2\). Concatenating these with any separating sequences from \(\mathcal {R}_2\) will not be helpful to learn SUL \(\mathcal {S}\), because in \(\mathcal {S}\) these sequences all access \(q_0\). However, the separating sequences from \(\mathcal {R}_2\) are useful if executed in the right state of \(\mathcal {S}\). For instance, the sequence bb separates all states in \(\mathcal {R}_2\), and the blue states in \(\mathcal {S}\). Thus, rebuilding does not realise the potential of reusing the separating sequences from \(\mathcal {R}_2\), since the access sequences for the relevant states are different.

State Matching. We extend adaptive AAL with state matching. State matching overcomes the strong dependency on the access sequences and allows the efficient usage of reference models where the residual languages of the individual states are similar. Suppose that while learning, we have not yet separated \(q_0\) and \(q_1\) in \(\mathcal {S}\), but we do know the output of the b-transition from \(q_0\). We may use that output to match \(q_0\) with \(p_3\) in \(\mathcal {R}_2\): these two states agree on input sequences where both are defined. Subsequently, we can use the separating sequence bb between \(p_3\) and \(p_0\) to separate \(q_0\) and \(q_1\), through OQs bb and abb.

Approximate State Matching. It rarely happens that states in the SUL exactly match states in the reference model: Consider the scenario where we want to learn \(\mathcal {S}\) with reference model \(\mathcal {R}_3\) from Fig. 1d. States \(q_0\) and \(s_3\) do not match because they have different outputs for input b but are still similar. This motivates an approximate version of matching, where a state is matched to the reference state which maximises the number of inputs with the same output.

Outline. After the preliminaries (Sect. 3), we recall the \(L^{\#}\) algorithm and extend it with rebuilding (Sect. 4). We then introduce adaptive AAL with state matching and its approximate variant (Sect. 5). Together with rebuilding, this results in the \(AL^{\#}\) algorithm (Sect. 6). We proceed to define a variant that allows the use of multiple reference models (Sect. 7). This is helpful already in the example discussed in this section: given both \(\mathcal {R}_1\) and \(\mathcal {R}_2\), \(AL^{\#}\) with multiple reference models allows to discover all states in \(\mathcal {S}\) without any EQs, see App. F of [14].

3 Preliminaries

For a partial map \(f :X \rightharpoonup Y\), we write \(f(x)\mathord {\downarrow }\) if f(x) is defined and \(f(x)\mathord {\uparrow }\) otherwise.

Definition 3.1

A partial Mealy machine is a tuple \(\mathcal {M}= (Q, I, O, q_0, \delta , \lambda )\), where Q, I and O are finite sets of states, inputs and outputs respectively; \(q_0 \in Q\) an initial state, \(\delta :Q \times I \rightharpoonup Q\) a transition function, and \(\lambda :Q \times I \rightharpoonup O\) an output function such that \(\delta \) and \(\lambda \) have the same domain. A (complete) Mealy machine is a partial Mealy machine where \(\delta \) and \(\lambda \) are total. If not specified otherwise, a Mealy machine is assumed to be complete.

We write \(\mathcal {M}|_{I}\) to denote \(\mathcal {M}\) restricted to alphabet I. We use the superscript \(\mathcal {M}\) to indicate to which Mealy machine we refer, e.g. \(Q^{\mathcal {M}}\) and \(\delta ^{\mathcal {M}}\). The transition and output functions are naturally extended to input sequences of length \(n \in \mathbb {N}\) as functions \(\delta :Q \times I^n \rightharpoonup Q\) and \(\lambda :Q \times I^n \rightharpoonup O^n\). We abbreviate \(\delta (q_0, w)\) by \(\delta (w)\).

Definition 3.2

Let \(\mathcal {M}_1\), \(\mathcal {M}_2\) be partial Mealy machines. States \(p \in Q^{\mathcal {M}_1}\) and \(q \in Q^{\mathcal {M}_2}\) match, written , if \(\lambda (p,\sigma )=\lambda (q,\sigma )\) for all \(\sigma \in (I^{\mathcal {M}_1} \cap I^{\mathcal {M}_2})^*\) with \(\delta (p,\sigma ){\mathord {\downarrow }}\) and \(\delta (q,\sigma ){\mathord {\downarrow }}\). If p and q do not match, they are apart, written \(p \mathrel {\#}q\).

If \(p \mathrel {\#}q\), then there is a separating sequence, i.e., a sequence \(\sigma \) such that \(\lambda (p,\sigma ) \ne \lambda (q,\sigma )\); this situation is denoted by \(\sigma \vdash p \mathrel {\#}q\). The definition of matching allows the input (and output) alphabets of the underlying Mealy machines to differ; it requires that they agree on all commonly defined input sequences. If \(\mathcal {M}_1\) and \(\mathcal {M}_2\) are complete and have the same alphabet, then the matching of states is referred to as language equivalence. Two complete Mealy machines are equivalent if their initial states are language equivalent.

Let \(\mathcal {M}\) be a partial Mealy machine. A state \(q \in Q^{\mathcal {M}}\) is reachable if there exists \(\sigma \in I^*\) such that \(\delta ^{\mathcal {M}}(q_0, \sigma )=q\). The reachable part of \(\mathcal {M}\) contains all reachable states in \(Q^{\mathcal {M}}\). A sequence \(\sigma \) is an access sequence for \(q \in Q^{\mathcal {M}}\) if \(\delta ^{\mathcal {M}}(\sigma )=q\). A set \(P \subseteq I^*\) is a state cover for \(\mathcal {M}\) if P contains an access sequence for every reachable state in \(\mathcal {M}\). In this paper, a tree \(\mathcal {T}\) is a partial Mealy machine where every state q has a unique access sequence, denoted by \(\textsf{access}(q)\).

Definition 3.3

Let \(\mathcal {M}\) be a complete Mealy machine. A set \(W_q \subseteq (I^{\mathcal {M}})^*\) is a state identifier for \(q \in Q^\mathcal {M}\) if for all \(p \in Q^\mathcal {M}\) with \(p \mathrel {\#}q\) there exists \(\sigma \in W_q\) such that \(\sigma \vdash p \mathrel {\#}q\). A separating family is a collection of state identifiers \(\{W_p\}_{p \in Q^\mathcal {M}}\) such that for all \(p,q \in Q^\mathcal {M}\) with \(p \mathrel {\#}q\) there exists \(\sigma \in W_p \cap W_q\) with \(\sigma \vdash p \mathrel {\#}q\).

We use \(P^{\mathcal {M}}\) and \(\{W_q\}^{\mathcal {M}}\) to refer to a minimal state cover and a separating family for \(\mathcal {M}\) respectively. State covers and separating families can be constructed for every Mealy machine, but are not necessarily unique.

4 \(L^{\#}\) with Rebuilding

We first recall the \(L^{\#}\) algorithm for (standard) AAL [24]. Then, we consider adaptive learning by presenting an \(L^{\#}\)-compatible variant of rebuilding.

4.1 Observation Trees

\(L^{\#}\) uses an observation tree as data structure to store the observed traces of \(\mathcal {M}\).

Definition 4.1

A tree \(\mathcal {T}\) is an observation tree if there exists a mapping \(f :Q^\mathcal {T}\rightarrow Q^\mathcal {M}\) such that \(f(q_0^{\mathcal {T}})=q_0^{\mathcal {M}}\) and \(q \xrightarrow []{i/o} q'\) implies \(f(q) \xrightarrow []{i/o} f(q')\).

In an observation tree, a basis is a subtree that describes unique behaviour present in the SUL. Initially, a basis \(B\subseteq Q^{\mathcal {T}}\) contains the root state. All states in the basis are pairwise apart, i.e., for all \(q\ne q'\in B\) it holds that \(q\mathrel {\#}q'\). For a fixed basis, its frontier is the set of states \(F\subseteq Q^{\mathcal {T}}\) which are immediate successors of basis states but which are not in the basis themselves.

Fig. 2.
figure 2

Observation trees and hypotheses generated while learning \(\mathcal {R}_1\) with \(L^{\#}\). Basis states are displayed in pink and frontier states in yellow. (Color figure online)

Example 4.2

Figure 2c shows an observation tree \(\mathcal {T}'\) for the Mealy machine \(\mathcal {H}'\) from Fig. 2d. The separating sequences c and ac show that the states in basis \(B= \{t_0, t_2, t_3 \}\) are all pairwise apart. The frontier \(F\) is \(\{ t_1, t_4, t_5, t_6 \}\).

We say that a frontier state is isolated if it is apart from all basis states. A frontier state is identified with a basis state q if it is apart from all basis states except q. We say the observation tree is adequate if all frontier states are identified, no frontier states are isolated and each basis state has a transition with every input. If every frontier state is identified and each basis state has a transition for every input, the observation tree can be folded to create a complete Mealy machine . The Mealy machine has the same states as the basis. The transitions between basis states are the same as in the observation tree. Transitions from basis states to frontier states are folded back to the basis state the frontier state is identified with. We call the resulting complete Mealy machine a hypothesis whenever this canonical transformation is used.

Example 4.3

In \(\mathcal {T}'\) (Fig. 2c) the frontier states are identified as follows: \(t_1\mapsto t_2, t_4 \mapsto t_3, t_5 \mapsto t_0\) and \(t_6 \mapsto t_2\). Hypothesis \(\mathcal {H}'\) (Fig. 2d) can be folded back from \(\mathcal {T}'\). The dashed transitions in Fig. 2d represent the folded transitions.

4.2 The \(L^{\#}\) Algorithm

The \(L^{\#}\) algorithm maintains an observation tree \(\mathcal {T}\) and a basis \(B\). Initially, \(\mathcal {T}\) consists of just a root node \(q_0\) and \(B= \{ q_0 \}\). We denote the frontier of \(B\) by \(F\). The \(L^{\#}\) algorithm then repeatedly applies the following four rules.

  • The promotion rule (P) extends B by \(r\in F\) when \(r\) is isolated.

  • The extension rule (Ex) poses OQ \(\textsf{access}(q)i\) for \(q\in B, i \in I\) with \(\delta (q, i)\mathord {\uparrow }\).

  • The separation rule (S) takes a state \(r\in F\) that is not apart from \(q, q'\in B\) and poses OQ \(\textsf{access}(r)\sigma \) with \(\sigma \vdash q\mathrel {\#}q'\) that shows \(r\) is apart from \(q\) or \(q'\).

  • The equivalence rule (Eq) folds \(\mathcal {T}\) into hypothesis \(\mathcal {H}\), checks whether \(\mathcal {H}\) and \(\mathcal {T}\) agree on all sequences in \(\mathcal {T}\) and poses an EQ. If \(\mathcal {H}\) and the SUL are not equivalent, counterexample processing isolates a frontier state.

The pre- and postconditions of the rules are summarized in (the top rows of) Table 1. A detailed account is given in the paper introducing \(L^{\#}\) [24].

Table 1. Extended \(L^{\#}\) rules with parameters, preconditions and postconditions.

Example 4.4

Suppose we learn \(\mathcal {R}_1\) from Fig. 1. \(L^{\#}\) applies the extension rule twice, resulting in \(\mathcal {T}\) as in Fig. 2a. States \(t_1\) and \(t_2\) are identified with \(t_0\) because there is only one basis state. Next, \(L^{\#}\) applies the equivalence rule using hypothesis \(\mathcal {H}\) (Fig. 2b). Counterexample aac distinguishes \(\mathcal {H}\) from \(\mathcal {R}_1\). This sequence is added to \(\mathcal {T}\) and processed further by posing OQ ac in the equivalence rule. Observations ac and aac show that the states accessed with \(\varepsilon \), a and aa are pairwise apart. States \(t_2\) and \(t_3\) are added to the basis using the promotion rule. Next, \(L^{\#}\) poses OQ aaa during the extension rule. To identify all frontier states, \(L^{\#}\) may use \(ac \vdash t_2 \mathrel {\#}t_3\), \(ac \vdash t_0 \mathrel {\#}t_2\) and \(c \vdash t_0 \mathrel {\#}t_3\). Figure 2c shows one possible observation tree \(\mathcal {T}'\) after applying the separation rule multiple times. Next, the equivalence rule constructs hypothesis \(\mathcal {H}'\) (Fig. 2d) from \(\mathcal {T}'\) and \(L^{\#}\) terminates because \(\mathcal {H}'\) and \(\mathcal {R}_1\) are equivalent.

4.3 Rebuilding in \(L^{\#}\)

In this subsection, we combine rebuilding from [6, 7] with \(L^{\#}\) and implement this using two rules: rebuilding and prioritized promotion, see also Table 1. Both rules depend on a reference model \(\mathcal {R}\), which is a complete Mealy machine, with a possibly different alphabet than the SUL \(\mathcal {S}\). More precisely, these rules depend on a prefix-closed and minimal state cover \(P^{\mathcal {R}}\) and a separating family \(\{W_q\}^{\mathcal {R}}\) computed on \(\mathcal {R}|_{I^{\mathcal {S}}}\) for maximal overlap with \(\mathcal {S}\). The separating family can be computed with partition refinement [21]. We fix \(\textsf{sep}(p,p')\) with \(p, p' \in Q^{\mathcal {R}}\) to be a unique sequence from \(W_{p} \cap W_{p'}\) such that \(\textsf{sep}(p,p') \vdash p \mathrel {\#}p'\). Below, we use \(q\) for states in \(B\), \(r\) for states in \(F\) and \(p\) for states in \(Q^{\mathcal {R}}\). In App. A of [14], we depict the scenarios in the observation tree and reference model required for the new rules to be applicable.

Rule (R): Rebuilding. Let \(q\in B\), \(i \in I\) and suppose \(\delta ^{\mathcal {T}}(q,i) \notin B\). The aim of the rebuilding rule is to show apartness between \(\delta ^{\mathcal {T}}(q,i)\) and a basis state \(q'\), using the state cover and separating family from \(\mathcal {R}\). The rebuilding rule is applicable when \(\textsf{access}^{\mathcal {T}}(q)\) and \(\textsf{access}^{\mathcal {T}}(q)i\) are in \(P^{\mathcal {R}}\). If \(\textsf{access}^{\mathcal {T}}(q') \in P^{\mathcal {R}}\) then there exists a sequence \(\sigma \) such that \(\sigma = \textsf{sep}\boldsymbol{(}\delta ^{\mathcal {R}}(\textsf{access}^{\mathcal {T}}(q)i),\delta ^{\mathcal {R}}(\textsf{access}^{\mathcal {T}}(q'))\boldsymbol{)}\). We pose OQs \(\textsf{access}^{\mathcal {T}}(q)i\sigma \) and \(\textsf{access}^{\mathcal {T}}(q')\sigma \).

Lemma 4.5

Suppose \(\textsf{access}^{\mathcal {T}}(q') \in P^{\mathcal {R}}\) for all \(q'\in B\). Consider \(q\in B\), \(i \in I\) such that \(\delta ^{\mathcal {T}}(q,i) \notin B\) and \(\textsf{access}^{\mathcal {T}}(q)i \in P^{\mathcal {R}}\). If for all \(q' \in B\) it holds that \(\textsf{sep}\boldsymbol{(}\delta ^{\mathcal {R}}(\textsf{access}^{\mathcal {T}}(q)i),\delta ^{\mathcal {R}}(\textsf{access}^{\mathcal {T}}(q'))\boldsymbol{)} \vdash \delta ^{\mathcal {S}}(\textsf{access}^{\mathcal {T}}(q)i) \mathrel {\#}\delta ^{\mathcal {S}}(\textsf{access}^{\mathcal {T}}(q'))\), then after applying the rebuilding rule for \(q\), i and all \(q'\in B\) with \(\lnot (q'\mathrel {\#}\delta ^{\mathcal {T}}(q,i))\), state \(\delta ^{\mathcal {T}}(q,i)\) is isolated.

If a state is isolated, it can be added to the basis using the promotion rule.

Rule (PP): Prioritized Promotion. Like (regular) promotion, prioritized promotion extends the basis. However, prioritized promotion only applies to states r with \(\textsf{access}^{\mathcal {T}}(r) \in P^{\mathcal {R}}\). This enforces that the access sequences for basis states are in \(P^{\mathcal {R}}\) as often as possible, enabling the use of the rebuilding rule.

Example 4.6

Consider reference \(\mathcal {R}_1\) and SUL \(\mathcal {S}\) from Fig. 1. We learn the orange states similarly as described in Sect. 2: We apply the rebuilding rule with \(\textsf{access}^{\mathcal {T}}(q) = \varepsilon , \textsf{access}^{\mathcal {T}}(q') = \varepsilon , i = c\) which results in OQs cac and ac. Next, we promote \(\delta ^{\mathcal {T}}(c)\) with the prioritized promotion rule. We apply the rebuilding rule with \(\textsf{access}^{\mathcal {T}}(q) = c, \textsf{access}^{\mathcal {T}}(q') = c\) and \(i = a\) which results in OQs cac (already present in \(\mathcal {T}\)) and cc. Lastly, we promote \(\delta ^{\mathcal {T}}(ca)\) with prioritized promotion.

The overlap between \(\mathcal {S}\) and \(P^\mathcal {R}\) and \(\{W_q\}^\mathcal {R}\) determines how many states of \(\mathcal {S}\) can be discovered via rebuilding. The statement follows from Lemma 4.5 above.

Theorem 4.7

If \(q_0^{\mathcal {R}}\) matches \(q_0^{\mathcal {S}}\) and \(\mathcal {T}\) only contains a root \(q_0^{\mathcal {T}}\), then after applying only the rebuilding and prioritized promotion rules until they are no longer applicable, the basis consists of n states where n is the number of equivalence classes (w.r.t. language equivalence) in the reachable part of \(\mathcal {S}|_{I^{\mathcal {R}}}\).

Corollary 4.8

Suppose we learn SUL \(\mathcal {S}\) with reference \(\mathcal {S}\). Using the rebuilding and prioritized promotion rules, we can add all reachable states in \(\mathcal {S}\) to the basis.

5 \(L^{\#}\) Using State Matching

In this section, we describe another way to reuse information from references, called state matching, which is independent of the state cover. First, we present a version of state matching using the matching relation ( ) from Def. 3.2 and then we weaken this notion to approximate state matching.

Fig. 3.
figure 3

Observation trees generated while learning \(\mathcal {S}\) with \(\mathcal {R}_2\).

5.1 State Matching

With state matching, the learner maintains the matching relation between basis states and reference model states during learning. In the implementation, before applying a matching rule, the matching is updated based on the OQs asked since the previous match computation. We present two key rules here and an optimisation in the next subsection.

Rule (MS): Match separation. This rule aims to show apartness between the frontier and a basis state using separating sequences from the reference separating family. Let \(q\), \(q'\in B\), \(r\in F\) with \(\delta ^{\mathcal {T}}(q,i) = r\) for some \(i \in I\), and \(p,p'\in Q^\mathcal {R}\). Suppose that \(\delta ^{\mathcal {R}}(p,i) = p'\), \(\lnot (r\mathrel {\#}q')\), and \(p'\) does not match any basis state. In particular, there exists some separating sequence \(\sigma \) for \(p'\mathrel {\#}q'\). The match separation rule poses OQ \(\textsf{access}(q)i\sigma \) to either show \(r\mathrel {\#}q'\) or and \(r\mathrel {\#}p'\).

Example 5.1

Suppose we learn \(\mathcal {S}\) using \(\mathcal {R}_2\) from Fig. 1. After applying the extension rule three times, we get \(\mathcal {T}_0\) (Fig. 3a). State \(t_0\) matches \(p_3\) as their outputs coincide on sequences from alphabet \(I^{\mathcal {S}} \cap I^{\mathcal {R}_2} = \{a,b\}\). State \(p_3\) transitions to the unmatched state \(p_0\) with input a. The match separation rule conjectures \(t_1\) may match \(p_0\) which implies \(t_1 \mathrel {\#}t_0\). We use OQ \(\textsf{access}(t_1)a\) to test this conjecture and indeed find that \(t_1\) can be added to the basis using promotion.

Lemma 5.2

We fix \(p\in Q^{\mathcal {R}}\), \(q\in B\), \(i \in I\) and \(\delta ^{\mathcal {T}}(q,i)=r\in F\). Suppose . If for all \(q'\in B\), then after applying the match separation rule with \(q, p, i\) for all \(q'\in B\) with \(\lnot (q'\mathrel {\#}r)\), state \(r\) is isolated.

Rule (MR): Match Refinement. Let \(q\in B\) and \(p, p'\in Q^{\mathcal {R}}\). Suppose \(q\) matches both \(p\) and \(p'\) and let \(\sigma =\textsf{sep}(p,p')\). The match refinement rule poses OQ \(\textsf{access}(q)\sigma \) resulting in \(q\) no longer being matched to \(p\) or \(p'\).

Example 5.3

Suppose we continue learning \(\mathcal {S}\) using \(\mathcal {R}_2\) from observation tree \(\mathcal {T}_1\) (Fig. 3a). State \(t_1\) matches both \(p_0\) and \(p_1\). After posing OQ \(\textsf{access}(t_1)bb\) where \(bb \vdash p_0 \mathrel {\#}p_1\), \(t_1\) no longer matches \(p_1\).

If the initial state of SUL \(\mathcal {S}\) is language equivalent to some state in the reference model, then we can discover all reachable states in \(\mathcal {S}\) via state matching and \(L^{\#}\) rules. The statement uses Lemma 5.2 above.

Theorem 5.4

Suppose we have reference \(\mathcal {R}\) and SUL \(\mathcal {S}\) equivalent to \(\mathcal {R}\) but with a possibly different initial state. Using only the match refinement, match separation, promotion and extension rules, we can add n states to the basis where n is the number of equivalence classes (w.r.t. language equivalence) in the reachable part of \(\mathcal {S}\).

5.2 Optimised Separation Using State Matching

In this subsection, we add an optimisation rule prioritized separation that uses the matching to guide the identification of frontier states. First, we highlight the differences between prioritized separation and the previous separation rules. Both match separation and prioritized separation require that for \(r\in F\) and \(p\in Q^{\mathcal {R}}\). The aim of match separation is to isolate \(r\) and requires that \(p\) does not match any basis state. Instead, the aim of prioritized separation is to guide the identification of \(r\) using the state identifier for a \(p\) matched with a basis state. The prioritized separation rule is also different from the separation rule (Sect. 4.2) which randomly selects \(q, q'\in B\) to separate \(r\) from \(q\) or \(q'\).

Rule (PS): Prioritized Separation. The prioritized separation rule uses the matching to find a separating sequence from the reference model that is expected to separate a frontier state from a basis state. Let \(q', q''\in B\) and \(r\in F\). Suppose \(r\) is not apart from \(q'\) and \(q''\) and \(\sigma \vdash q'\mathrel {\#}q''\). If \(\sigma \) is in \(\{W_p\}^{\mathcal {R}}\) of a reference model state \(p\) that matches \(r\), the prioritized separation rule poses OQ \(\textsf{access}(r)\sigma \) resulting in \(r\) being apart from \(q'\) or \(q''\)Footnote 1.

Example 5.5

Suppose we learn \(\mathcal {S}\) using \(\mathcal {R}_1\) from Fig. 1. Assume we have discovered all states in \(\mathcal {S}\) and want to identify \(\delta ^{\mathcal {T}}(ca,c) \in F\), which is currently not apart from any basis state. The prioritized separation rule can only be applied with basis states \(q', q''\in B\) such that \(c \vdash q'\mathrel {\#}q''\), as c is the only sequence in the state identifier of \(r_2\) which is the state that matches \(\delta ^{\mathcal {T}}(ca,c)\). From the sequences \(\{bb, ac, c\}\) possibly used by \(L^{\#}\), only c immediately identifies \(\delta ^{\mathcal {T}}(ca,c)\).

5.3 Approximate State Matching

In this subsection, we introduce an approximate version of matching, by quantifying matching via a matching degree. Let \(\mathcal {T}\) be a tree and \(\mathcal {R}\) be a (partial) Mealy machine. Let \(I = I^{\mathcal {T}} \cap I^{\mathcal {R}}\). We define \(\textsf{WI}(q) = \{ (w,i) \in I^* \times I \mid \delta ^{\mathcal {T}}(q,wi)\mathord {\downarrow }\} \) as prefix-suffix pairs that are defined from \(q\in Q^{\mathcal {T}}\) onwards. Then, we define the matching degree \(\textsf{mdeg}: Q^{\mathcal {T}} \times Q^{\mathcal {R}} \rightarrow \mathbb {R}\) as

$$\begin{aligned}\textsf{mdeg}(q,p) = \frac{ \left| \{ (w,i) \in \textsf{WI}(q) \mid \lambda ^{\mathcal {T}}\bigg (\delta ^{\mathcal {T}}(q,w),i\bigg ) = \lambda ^{\mathcal {R}} \bigg (\delta ^{\mathcal {R}}(p,w),i\bigg ) \} \right| }{ \left| \textsf{WI}(q) \right| }. \end{aligned}$$

Example 5.6

Consider \(t_1\) from \(\mathcal {T}_2\) (Fig. 3c) and \(p_0\), \(p_1\) from \(\mathcal {R}_2\) (Fig. 1). We derive \(\textsf{WI}(t_1) = \{ (\varepsilon ,a), (\varepsilon ,b), (b,a), (b,b), (bb,b) \}\) from \(\mathcal {T}_2\) where \(I = I^{\mathcal {T}_2} \cap I^{\mathcal {R}_2} = \{ a,b \}\). On these pairs, all the suffix outputs for \(p_0\) and \(t_1\) are equivalent, \(\textsf{mdeg}(t_1,p_0) = \nicefrac {5}{5} = 1\). The matching degree between \(t_1\) and \(p_1\) is only \(\nicefrac {3}{5}\) because \(\lambda ^{\mathcal {R}_2}(p_1,bbb) = 120 \ne 112 = \lambda ^{\mathcal {T}}(t_1,bbb)\) which impacts pairs (bb) and (bbb).

A state \(q\) in an observation tree \(\mathcal {T}\) approximately matches a state \(p\in Q^{\mathcal {R}}\), written , if there does not exist a \(p'\in Q^{\mathcal {R}}\) such that \(\textsf{mdeg}(q,p') > \textsf{mdeg}(q,p)\).

Lemma 5.7

For any \(q\in Q^{\mathcal {T}}, p\in Q^{\mathcal {R}}\): \(\textsf{mdeg}(q,p) = 1\) implies .

We define rules approximate match separation (AMS), approximate match refinement (AMR) and approximate prioritized separation (APS) that represent the approximate matching variations of match separation, match refinement and prioritized separation respectively. These rules have weaker preconditions and postconditions, see Table 3 in App A of [14].

6 Adaptive \(L^{\#}\)

The rebuilding, state matching and \(L^{\#}\) rules described in Table 1 are ordered and combined into one adaptive learning algorithm called adaptive \(L^{\#}\) (written \(AL^{\#}\)). A non-ordered listing of the rules can be found in Algorithm 1 in App. A of [14]. We use the abbreviations for the rules defined in previous sections.

Definition 6.1

The \(AL^{\#}\) algorithm repeatedly applies the rules from Table 1 (see Algorithm 1), with the following ordering: Ex, APS, (S if APS was not applicable), P, if the observation tree is adequate we try AMR, AMS, Eq. The algorithm starts by applying R and PP until they are no longer applicable; these rules are not applied anymore afterwards.

Similar to \(L^{\#}\), the correctness of \(AL^{\#}\) amounts to showing termination because the algorithm can only terminate when the teacher indicates that the SUL and hypothesis are equivalent. We prove termination of \(AL^{\#}\) by proving that each rule application lowers a ranking function. The necessary ingredients for the ranking function are derived from the post-conditions of Table 1.

Theorem 6.2

\(AL^{\#}\) learns the correct Mealy machine within \(\mathcal {O}(kn^2 + kno + no^2 + n \log m)\) output queries and at most \(n-1\) equivalence queries where n is the number of equivalence classes for \(\mathcal {S}\), o is the number of equivalence classes for \(\mathcal {R}\), k is the number of input symbols and m the length of the longest counterexample.

7 Adaptive Learning with Multiple References

Let \(\mathcal {X}\) be a finite set of complete reference models with possibly different alphabets. Assume each reference model \(\mathcal {R}\in \mathcal {X}\) has a state cover \(P^{\mathcal {R}}\) and separating family \(\{W_q\}^{\mathcal {R}}\). We adapt the arguments for the \(AL^{\#}\) algorithm to represent the state cover and separating family for the set of reference models.

State Cover. We initialize the \(AL^{\#}\) algorithm with the union of the state cover of each reference model, i.e., \(\cup _{\mathcal {R}\in \mathcal {X}} P^{\mathcal {R}}\). To reduce the size of \(P^{\mathcal {X}}\), the state cover for each reference model is computed using a fixed ordering on inputs.

Separating Family. We combine the separating families for multiple reference models using a stronger notion of apartness, called total apartness, which also separates states based on whether inputs are defined. When changing the alphabet of a reference model to the alphabet of the SUL, as is done when computing the separating family, the reference model may become partial. If states from different reference models behave the same on their common alphabet but their alphabets contain different inputs from the SUL, we still want to distinguish the reference models based on which inputs they enable.

Definition 7.1

Let \(\mathcal {M}_1, \mathcal {M}_2\) be partial Mealy machines and \(p \in Q^{\mathcal {M}_1}, q \in Q^{\mathcal {M}_2}\). We say p and q are total apart, written \(p \mathrel {\#}_{\uparrow }q\), if \(p \mathrel {\#}q\) or there exists \(\sigma \in (I^{\mathcal {M}_1} \cap I^{\mathcal {M}_2})^*\) such that either \(\delta ^{\mathcal {M}_1}(p,w){\uparrow }\) or \(\delta ^{\mathcal {M}_2}(q,w){\uparrow }\) but not both.

We use total apartness to define a total state identifier and a total separating family. This definition is similar to Definition 3.3 but \(\mathrel {\#}\) is be replaced by \(\mathrel {\#}_{\uparrow }\). We combine the multiple reference models into a single one with an arbitrary initial state, compute the total separating family and use this to initialize \(AL^{\#}\) .

Example 7.2

A total separating family for \(\mathcal {X}= \{ \mathcal {R}_1, \mathcal {R}_2 \}\) and alphabet \(I^{\mathcal {S}}\) is \(W_{p_0}=W_{p_1}=\{c,b,bb\}, W_{p_2}=W_{p_3}=\{c,b\}, W_{r_0}=W_{r_1}=\{c,ac\}, W_{r_2}=\{c\}\).

We add an optimisation to \(AL^{\#}\) that only chooses \(p\) and \(p'\) from the same reference model during rebuilding. Theorem 6.2 can be generalized to this setting where o represents the number of equivalence classes across the reference models.

Table 2. Summed inputs in millions for learning the mutated models with the original models.

8 Experimental Evaluation

In this section, we empirically investigate the performance of our implementation of \(AL^{\#}\). The source code and all benchmarks are available onlineFootnote 2 [15]. We present four experiments to answer the following research questions:

  • R1 What is the performance of adaptive AAL algorithms, when ...

    • Exp 1...learning models from a similar reference model?

    • Exp 2...applied to benchmarks from the literature?

  • R2 Can multiple references help \(AL^{\#}\), when learning ...

    • Exp 3...a model from similar reference models?

    • Exp 4...a protocol implementation from reference implementations?

Setup. We implement \(AL^{\#}\) on top of the \(L^{\#}\) LearnLib implementationFootnote 3. We invoke conformance testing for the EQs, using the random Wp method from LearnLib with minimal size\({=}3\) and random length \(=\,3\)Footnote 4. We run all experiments with 30 seeds. We measure the performance of the algorithms based on the number of inputs sent to the SUL during both OQs and EQs: Fewer is better.

Experiment 1. We evaluate the performance of \(AL^{\#}\) against non-adaptive and adaptive algorithms from the literature, in particular \(L^*\) [2], KV [13], and \(L^{\#}\) [24] as well as \(\partial L^*_M\)[6] and (a Mealy machine adaptation of) IKV [7]. As part of an ablation study, we compare \(AL^{\#}\) with simpler variations which we refer to as . The subscripts indicate which rules are added:

.

We learn six models from the AutomataWiki benchmarks [17] also used in [24]. We limit ourselves to six models because we mutate every model in 14 different ways (and for 30 seeds). The chosen models represent different types of protocols with varying number of states. We learn the mutated models using the original models, referred to as \(\mathcal {S}\), as a reference. The mutations may add states, divert transitions, remove inputs, perform multiple mutations, or compose the model with a mutated version of the model. We provide details on the used models and mutations in App. E of [14].

Results. Table 2 shows for an algorithm (rows) and a mutation (columns) the total number of inputs (\(\cdot 10^6\)) necessary to learn all models, summed over all seedsFootnote 5. The values indicate the best performing algorithm. We provide detailed pairwise comparisons between algorithms in App. E of [14].

Discussion. First, we observe that \(AL^{\#}\) always outperforms non-adaptive learning algorithms, as is expected. By combining state matching and rebuilding, \(AL^{\#}\) mostly outperforms algorithms from the literature, with IKV being competitive on some types of mutations. In \(\textit{mut}_{9}(\mathcal {S})\) we append \(\mathcal {S}\) to \(\textit{mut}_{13}(\mathcal {S})\), outperforms because incorrectly matches \(\textit{mut}_{13}(\mathcal {S})\) states with states in \(\mathcal {S}\), making it harder to learn the \(\mathcal {S}\) fragment.

Experiment 2. We evaluate \(L^{\#}\), \(\partial L^*_M\), IKV and \(AL^{\#}\) on benchmarks that contain reference models. Adaptive-OpenSSL [18], used in [6], contains models learned from different git development branches for the OpenSSL server side. Adaptive-Philips [20] contains models representing some legacy code which evolved over time due to bug fixes and allowing more inputs.

Fig. 4.
figure 4

Results Experiments 2 and 3.

Results. Figure 4a shows the mean total number of inputs required for learning a model from the associated reference model, depicting the \(5^{\text {th}}-95^{\text {th}}\) percentile (line) and average (mark) over the seeds.

Discussion. We observe that \(L^{\#}\) and \(\partial L^*_M\) perform worse than \(AL^{\#}\) . \(AL^{\#}\) often outperforms IKV by a factor 2–4, despite that these models are relatively small and thus easy to learn.

Experiment 3. We evaluate \(AL^{\#}\) with one or multiple references on the models used in Experiment 1. We either (1) learn \(\mathcal {S}\) using several mutations of \(\mathcal {S}\) or (2) learn a mutation that represents a combination of the \(\mathcal {S}\) and \(\textit{mut}_{13}(\mathcal {S})\).

Results. Figure 4b, 4c show for every type of SUL (rows) and every set of references (columns) the total number of inputs (\(\cdot 10^6\)) necessary to learn all models, summed over all seeds. values indicate the best performing set of references. Column \(\{\mathcal {S}\}\) in Fig. 4c corresponds to values in row \(AL^{\#}\) of Table 2; they are added in Fig. 4c for clarity.

Fig. 5.
figure 5

Averaged inputs for learning \(\mathcal {S}\) with multiple references.

Discussion. We observe that using multiple references outperforms using one reference, as is expected. We hypothesize that learning with reference \(\textit{mut}_{13}(\mathcal {S})\) instead of \(\mathcal {S}\) often leads to an increase in total inputs because \(\textit{mut}_{13}(\mathcal {S})\) is less complex due to the random transitions. Therefore, discovering states belonging to the \(\mathcal {S}\) fragment in \(\textit{mut}_{8}(\mathcal {S})\), \(\textit{mut}_{9}(\mathcal {S})\) and \(\textit{mut}_{14}(\mathcal {S})\) becomes more difficult.

Experiment 4. We evaluate the performance of \(AL^{\#}\) with one or multiple references on learning DTLS and TCP models from AutomataWikiFootnote 6. We consider seven DTLS implementations selected to have the same key exchange algorithm and certification requirement. We consider three TCP client implementations.

Results. Figure 5 shows the required inputs for learning \(\mathcal {S}\) (x-axis) with only the reference model indicated by the colored data point, averaged over the seeds. For each DTLS model, we included learning \(\mathcal {S}\) with the \(\mathcal {S}\) as a reference model. The \(*\) mark indicates using all models except the \(\mathcal {S}\) as references, the \(\times \) mark indicates using no references, e.g., non-adaptive \(L^{\#}\).

Discussion. We observe that using all references except \(\mathcal {S}\) usually performs as well as the best performing reference model that is distinct from \(\mathcal {S}\). In scand-lat, using a set of references outperforms single reference models, almost matching the performance of learning \(\mathcal {S}\) with \(\mathcal {S}\) as a reference.

9 Conclusion

We introduced the adaptive \(L^{\#}\) algorithm (\(AL^{\#}\)), a new algorithm for adaptive active automata learning that allows to flexibly use domain knowledge in the form of (preferably similar) reference models and thereby aims to reduce the sample complexity for learning new models. Experiments show that the algorithm can lead to significant improvements over the state-of-the-art (Sect. 8).

Future Work. Approximate state matching is sometimes too eager and may mislead the learner, as happens for \(\textit{mut}_9\) in Experiment 1 (Sect. 8). This may be addressed by only applying matching rules when the matching degree is above some threshold. It is currently unclear how to determine an appropriate threshold.

Further, adaptive methods typically perform well when the reference model and SUL are similar [10]. We would like to dynamically determine which (parts of) reference models are similar, and incorporate this in the rebuilding rule.

Adaptive AAL allows the re-use of information in the form of a Mealy machine. Other sources of information that can be re-used in AAL are, for instance, system logs, realised by combining active and passive learning [1, 26]. An interesting direction of research is the development of a more general methodology that allows the re-use of various forms of previous knowledge.