Enhancing active model learning with equivalence checking using simulation relations

We present a new active model-learning approach to generating abstractions of a sys-tem from its execution traces. Given a system and a set of observables to collect execution traces, the abstraction produced by the algorithm is guaranteed to admit all system traces over the set of observables. To achieve this, the approach uses a pluggable model-learning component that can generate a model from a given set of traces. Conditions that encode a certain completeness hypothesis, formulated based on simulation relations, are then extracted from the abstraction under construction and used to evaluate its degree of completeness. The extracted conditions are sufficient to prove model completeness but not necessary. If all conditions are true, the algorithm terminates, returning a system overap-proximation. A condition falsification may not necessarily correspond to missing system behaviour in the abstraction. This is resolved by applying model checking to determine whether it corresponds to any concrete system trace. If so, the new concrete trace is used to iteratively learn new abstractions, until all extracted completeness conditions are true. To evaluate the approach, we reverse-engineer a set of publicly available Simulink Stateflow models from their C implementations. Our algorithm generates an equivalent model for 98% of the Stateflow models.


Introduction
Modern hardware and software system implementations often have complex behaviour and it is difficult to specify integrated system behaviour, particularly emergent behaviour, ahead of time.Execution traces provide an exact representation of the system behaviours that are exercised when an implementation runs.This can therefore be leveraged to reverse-engineer system abstractions, such as the model in Fig. 1, that are easy to understand, and can be used in place of the actual implementation for simulation and debugging.
Model-learning algorithms are classified into passive and active algorithms.In passive model learning [8,27,36,40], the behaviours admitted by the generated models are limited to only those manifest in the given traces.So capturing all system behaviour by the generated system models is conditional on devising a software load that exercises all relevant system behaviours.This can be difficult to achieve in practice, especially when a system comprises multiple components and it is not obvious how the components will behave collectively.Random input sampling is a pragmatic choice in this scenario, but it does not guarantee that generated models admit all system behaviour.
By contrast, active learning algorithms can, in principle, generate exact models [4,5,11,62].They iteratively refine a hypothesis model by extracting information from the system or an oracle that has sufficient knowledge of the system, using the hypothesis model as a guide.The most popular form of active learning is query-based learning [4], where the learning framework poses membership and equivalence queries to an oracle and uses the responses to guide model refinement.But when these algorithms are used in practice, especially to learn symbolic abstractions such as the models in Figs. 1 and 2, they suffer from high query complexity [23,28,31].Consequently, many active model-learning implementations are constrained to learning partial models for large systems.
In this article we present a new active learning approach to derive abstractions of a system implementation from its execution traces.The approach, on termination, is guaranteed to generate an abstraction that is a good overapproximation of the system.As illustrated in Fig. 3, the approach is a grey-box algorithm.It combines a black-box analysis, in the form of model learning from traces, with a white-box analysis that is used to evaluate the degree of completeness for a candidate system model returned by model learning.
The model-learning component can be any algorithm that generates a model that accepts a given set of system execution traces.The novelty of the approach is the procedure used to evaluate the degree of completeness for a learned candidate abstraction: the structure of the candidate abstraction is used to extract a set of conditions that Fig. 1 Abstraction modelling operation mode switches for a Home Climate-Control Cooler System generated by our algorithm collectively encode a completeness hypothesis.The hypothesis is formulated such that the satisfaction of the hypothesis is sufficient to guarantee that a simulation relation can be constructed between the system and the given abstraction.Further, the existence of a simulation relation is sufficient to guarantee that the given abstraction is overapproximating, i.e., it admits (at least) all system execution traces.
To verify the hypothesis, the truth value of all extracted conditions is checked using Boolean satisfiability (SAT) solving.The procedure returns the fraction of extracted conditions that hold as a quantitative measure of the degree of completeness for the given model.If all conditions are true, the algorithm terminates, returning the learned system overapproximation.In the event of a condition falsification, the SAT procedure returns a counterexample.
The satisfaction of the hypothesis is sufficient to guarantee that a given abstraction is overapproximating, but not necessary.Counterexamples to the hypothesis may therefore be spurious, i.e., a condition falsification may not actually correspond to missing system behaviour in the abstraction.This is resolved by model checking [16] to determine if the counterexample for a condition check is spurious.If found to be spurious, the respective extracted completeness condition is strengthened to guide the SAT solver towards a non-spurious counterexample, if any exists.
Non-spurious counterexamples are used to construct a set of new traces that exemplify system behaviours identified to be missing from the model.New traces are used to augment the input trace set for model learning, and iteratively generate new abstractions until all conditions are true.
Unlike query-based learning, our procedure to evaluate the degree of completeness for a given abstraction operates at the level of the abstraction and not concrete system traces: Fig. 2 Abstraction modelling gear-shift logic for an Automatic Transmission Gear System generated by our algorithm • The scope of each extracted condition is the set of incoming and outgoing transitions to a state rather than a finite path in the system or its model.• The completeness hypothesis can be represented symbolically, incorporating characteristic functions for sets of observations in a system trace, therefore eliminating the need for explicit enumeration of concrete transitions.This enables the procedure to be applied to symbolic abstractions over large alphabets.Further, the procedure is agnostic with respect to the algorithm used to learn the abstraction.This enables the approach to be easily integrated with model-learning algorithms that generate symbolic abstractions from traces to iteratively learn expressive system overapproximations with provable completeness guarantees.
2 Active learning of abstract system models

Formal model
The system for which we wish to generate an abstraction is represented as a Labelled Transition System (LTS).
Definition 1 (Labelled Transition System) An LTS M is a quadruple (S, Ω, Δ, s 0 ) where S is a set of states, Ω is a set of labels, Δ ⊆ S × Ω × S is the transition relation and s 0 ∈ S is the initial state.
The set of labels Ω is a set of system observations that can be used to collect execution traces.An observation o ∈ Ω could be any event that depends on a transition from state s to state s ′ .A path in M is a finite sequence = s 0 , o 0 , s 1 , … , o n−1 , s n of alternating states and observations such that (s i , o i , s i+1 ) ∈ Δ for 0 ≤ i < n .The trace of , denoted ( ) , is the corresponding sequence of observations o 0 , … , o n−1 along .The set of all traces of paths in M is called the language of M , denoted L(M) , defined over the alphabet of observations Ω .A trace is accepted by M if ∈ L(M) , and is termed an execution trace or positive trace.A trace ∉ L(M) is termed a negative trace.
A system abstraction is represented as an LTS � M = ( Ŝ, Ω, Δ, ŝ0 ) .The language of the learned abstraction L( M) is defined over the alphabet of observations, i.e., Ω = Ω .The abstraction accepts a trace = o 0 , … , o n−1 if ∈ L( M) , i.e., if there exists a sequence ŝ0 , … , ŝn of states in Ŝ such that (ŝ i , o i , ŝi+1 ) ∈ Δ for 0 ≤ i < n .We will show that our active learning algorithm returns an abstraction M that admits all execution traces of the system M , i.e., L(M) ⊆ L( � M).

Overview of the algorithm
Given a system M , our goal is to learn a system abstraction M such that L(M) ⊆ L( � M) .An overview of our approach is provided in Fig. 3.It consists of the following steps: 1. Generate candidate abstraction from available traces.The algorithm learns a candidate abstraction M from an initial set of system traces T using a pluggable model-learning algorithm.This is discussed in Sect.2.3.2. Extract completeness conditions.To evaluate the degree of completeness of the candidate abstraction returned by model learning, the structure of M is used to extract a set of conditions C that collectively encode a completeness hypothesis.If all conditions are true, it implies L(M) ⊆ L( � M) .The formulation of the completeness hypothesis and a formal proof of the above claim is provided in Sect.2.4.3. Check truth value of extracted conditions.The algorithm uses a SAT procedure to check the truth value of each condition, and thereby checks the hypothesis.If all conditions are true, the algorithm returns M as the learned system overapproximation.The extracted conditions are sufficient to prove model completeness, i.e., L(M) ⊆ L( � M) , but not necessary.In the event that a condition is falsified, the procedure returns a counterexample.However, a condition falsification may not necessarily indicate missing system behaviour in M .This is discussed in Sect.2.5.

Counterexample analysis.
To check if a condition falsification actually indicates missing system behaviour in M , i.e., L(M)�L( M) ≠ � , the algorithm uses model checking to determine whether the counterexample returned by the SAT procedure corresponds to a concrete system trace.If found to be spurious, the condition is strengthened to guide the SAT procedure towards non-spurious counterexamples, if any.The algorithm then repeats step 3. Details are provided in Sect.2.6. 5. Generate new abstraction.A set of new traces T CE is constructed from non-spurious counterexamples that exemplify missing system behaviour in M .These are used as additional trace inputs to the model-learning component in step 1, which learns a new abstraction admitting T ∪ T CE .Construction of new traces exemplifying missing system behaviour in the model is discussed in Sect.2.6.
In each iteration i of the algorithm, T CE i ∩ L( Mi−1 ) = � , where T CE i is the set of new traces constructed by the algorithm in iteration i after evaluating the degree of completeness for the abstraction Mi−1 generated in the previous iteration.The new abstraction Mi is gener- ated using the set all new traces T CE 1 ∪ T CE 2 ∪ … ∪ T CE i and the initial trace set T as input to model learning.The methodology described above is similar to Counterexample-Guided Abstraction Refinement (CEGAR) [13], illustrated in Fig. 4. The key difference is that CEGAR is a top-down approach that begins by generating an overapproximation, which is progressively pruned to obtain a finer overapproximation.Our algorithm, on the other hand, is a bottomup approach that progressively extends a candidate abstraction until an overapproximation is obtained.
The following sections describe each step of our algorithm in detail.

Model learning from execution traces
The approach uses a pluggable model-learning component to generate a candidate abstraction from a set of system execution traces.We impose two requirements on this component: • Given a set of execution traces T , the model-learning component must return a model M that accepts (at least) all traces in T , i.e., L( � M) ⊇ T .• The language accepted by the model must be prefix-closed, i.e., if the model accepts a trace , then it must also accept all finite prefixes of .
There are many model-learning algorithms that satisfy the first requirement [8,36,41,58,63,65].In general, these algorithms operate by employing automaton inference techniques, such as state-merging [8,40] or SAT [27,36], to generate a finite state automaton that conforms to a given set of traces.Among these, the algorithm in [36] satisfies both requirements.To use the other algorithms, simple pre-processing of the input trace set to include all prefixes pref ( ) for each trace ∈ T , i.e.T ← ⋃ ∈T { � � � ∈ pref ( )} can be applied.Although this technique enables the generation of prefix-closed automata for conventional state-merging algorithms, it may not always guarantee prefix-closure for models returned by other learning algorithms.A more reliable technique is to convert all non-accepting states that appear on paths to accepting states in the generated finite state automaton to accepting states.
It is straightforward to transform a finite state automaton that accepts a prefix-closed language into an LTS abstraction, as defined in Sect.2.1, by removing the non-accepting states and all transitions that lead into them.

Completeness conditions for a candidate abstraction
We first give an explicit-state, set-based definition of our completeness criterion, for the sake of clarity.We subsequently describe a symbolic representation of the completeness conditions using characteristic functions, which can be applied to symbolic abstractions such as the model in Fig. 2.

Set-based definition
To determine whether a given abstraction M for the system M is complete, we use the structure of the abstraction to extract the following conditions: For initial state ŝ0 ∈ Ŝ , ∀o ∈ Ω: And for all states ŝ ∈ Ŝ , ∀o � , o ∈ Ω: These conditions collectively encode the following completeness hypothesis: for any transition available in the system M , defined by the transition relation Δ , there is a correspond- ing transition in M defined by Δ.
In the following section we prove that if the above hypothesis holds, i.e., if the completeness conditions (1) and (2) evaluate to true, then a simulation relation can be constructed between M and M .We then use the fact that the existence of a simulation relation between M and M implies L(M) ⊆ L( � M).
If such a relation R exists, we write M ⪯ R M.
To support our claim that the satisfaction of the completeness hypothesis is sufficient to guarantee that a simulation relation can be constructed between the system M and the given abstraction M , we first describe a method to construct a binary relation R � ⊆ S × Ŝ when all extracted completeness conditions hold, and later formally prove that R ′ is indeed a simulation.
3. If the condition (2) extracted for a state ŝ ∈ Ŝ holds non-trivially for some o b s e r v a t i o n s o � , o ∈ Ω , i .e . ,∃s �� , s, s Note that in the above construction, for every state pair (s, ŝ) ∈ R � �(s 0 , ŝ0 ) , there exist incoming transitions to the states s and ŝ with some observation o � ∈ Ω .That is, Proof We use contradiction to prove that when the completeness conditions (1) and ( 2) hold, the constructed relation R ′ forms a simulation.Let us assume R ′ is not a simulation.
Assume clause (b) holds.Then, There are two possibilities here: • If s = s 0 and ŝ = ŝ0 , then This violates completeness condition (1), which is a contradiction.
) , then from (3) there exists incoming transitions to s and ŝ on some

This implies
This violates completeness condition (2), which is a contradiction.Therefore, clause (b) does not hold.
Assume clause (c) holds.Then, There are two possibilities here: • If s = s 0 and ŝ = ŝ0 , then This is a case where condition (1) holds non-trivially, and therefore (s � , ŝ� ) ∈ R � by construction.This contradicts our assumption that clause (c) holds.• If (s, ŝ) ∈ R � �(s 0 , ŝ0 ) , then from (3) there exists incoming transitions to s and ŝ on some This implies This is a case where condition (2) holds non-trivially, and therefore (s � , ŝ� ) ∈ R � by construction.This contradicts our assumption that clause (c) holds.Therefore, clause (c) does not hold.
As none of the clauses (a), (b) or (c) hold, the constructed relation R ′ is a simulation by contradiction, i.e., M ⪯ R � M .◻ Note that the satisfaction of the completeness hypothesis is sufficient to guarantee the existence of a simulation relation between M and M , but not necessary.An example is provided in Fig. 5. Here, the completeness conditions extracted for state ŝ2 do not hold: Variants of this theorem are commonplace; Park offers a proof in [47].By Theorems 1 and 2 it is guaranteed that if all completeness conditions extracted for a given system abstraction are true, then the abstraction is an overapproximation accepting (at least) all system execution traces.We compute the fraction of the completeness conditions that are true, denoted by , as a quantitative measure of the degree of completeness of the given system abstraction.The procedure used to check the truth value of the extracted completeness conditions is described in Sect.2.5.

Symbolic definition
Symbolic representations of abstractions have transitions labelled with characteristic functions or predicates for sets of observations, such as the models in Figs. 1 and 2. A single edge in these graphs in fact corresponds to a set of multiple transitions.There are three benefits of this representation: Fig. 5 Example system and its abstraction 1.It reduces the computational cost of the method when compared to an explicit representation that enumerates concrete transitions.2. We hypothesise that human engineers prefer the more succinct symbolic presentation over an explicit list.In lieu of experimental evidence, we remark that popular design tools such as Simulink [54] strongly encourage the use of symbolic transition predicates.3. The symbolic representation also enables an extension of our method to infinite alphabets, provided the model learning component can infer such models from execution traces.
The standard means to represent sets or relations symbolically is to use characteristic functions.We expect that a single observation o ∈ Ω can be described as a valuation of a set of system variables X that range over some domain D. We can therefore give a description of a subset O ⊆ Ω as a characteristic function f O ∶ D |X| ⟶ , where is the set of Boolean truth values.The subset O then corresponds to: As is standard, the function is given by means of a Boolean-valued expression.We refrain from defining a full syntax and semantics for the expressions we use.For sake of exposition, we use a C-like syntax, and semantics that roughly correspond to the bit-vector theory in SMT-LIB 2.
Of these three components, we represent only the observation symbolically.Both automaton states are represented explicitly.Hence, a symbolic transition is a triple (ŝ i , p, ŝi+1 ) , which corresponds to the following set of concrete transitions: To derive the completeness conditions (1) and ( 2) for a symbolic abstraction, we represent the transition relation Δ and the initial state s 0 symbolically as characteristic functions f Δ ∶ Ω × Ω ⟶ and Init ∶ Ω ⟶ respectively, defined as follows: For a given symbolic abstraction, we extract the following conditions encoding a symbolic representation of the completeness hypothesis.
For initial state ŝo ∈ Ŝ , ∀o ∈ Ω where P (0,out) is the set of predicates for all outgoing transitions from ŝ0 , as illustrated in Fig. 6a.
And for all states ŝi ∈ Ŝ , ∀p in ∈ P (i,in) , ∀o � , o ∈ Ω where P (i,in) is the set of predicates on the incoming transitions to state ŝi and P (i,out) is the set of predicates on outgoing transitions from ŝi , as illustrated in Fig. 6b.We illustrate the formulation of the completeness hypothesis for a symbolic abstraction as described above with an example in Fig. 7.Note that the conditions ( 14) and ( 15) are symbolic representations of the completeness conditions ( 1) and ( 2), respectively.For the remainder of the article we use the symbolic representation of the completeness hypothesis as encoded by conditions ( 14) and (15).

Checking the truth of extracted conditions
To verify if the completeness conditions evaluate to true for all observations in Ω we use symbolic variables ′ , to represent the observations o ′ , o in ( 14) and ( 15) respectively, and use a SAT solver to check if there exists an assignment of values in Ω to ′ , that sat- isfies the negation of the completeness conditions.
To this end, the negation of the conditions ( 14) and ( 15), represented as and respectively, is fed to a SAT solver.A satisfying assignment indicates a falsification of the corresponding completeness condition, and serves as a counterexample for the condition.
In the event that a satisfying assignment cannot be found, we conclude that the corresponding completeness condition evaluates to true for all observations in Ω.
As discussed in Sect.2.4, the satisfaction of the completeness hypothesis is sufficient to guarantee completeness, but not necessary.In the event of a falsification of condition (14), the SAT solver returns a counterexample = o , such that o ⊧ Init and o ⊧ p out , ∀p out ∈ P (0,out) .Since o ∈ L(M) , this is a non-spurious counterexample indicating missing system behaviour in the learned abstraction, i.e., L(M) ⊈ L( � M) .But, in the event of a falsification of condition (15), the SAT solver returns a counterexample Here, it is not guaranteed that the observation o ′ lies on a system path from the initial system state s 0 ∈ S .The counterex- ample may therefore be spurious and may not actually correspond to any missing system behaviour in the abstraction.

Counterexample analysis
To check if a counterexample � = o � , = o for condition (15) is spurious-i.e., it does not correspond to any concrete system trace, we use model checking to verify if the observation o ′ is reachable from s 0 .That is, the algorithm checks if there exists a path If such a path does not exist, the counterexample is spurious.
In the event that the counterexample for condition (15) is spurious, the corresponding input to the SAT solver is strengthened by adding the clause ′ ≠ o ′ to (17) as follows The conjunction of ′ ≠ o ′ guides the SAT solver away from the spurious counterexample � = o � , and towards a non-spurious counterexample, if any.All non-spurious counterexamples are used to construct a set of new traces T CE that exemplify system behaviours found to be missing from the candidate abstraction.For each counterexample = o for condition (14), we add a trace CE = o to the set T CE .For each counterexample � = o � , = o for condition (15), we find the smallest pre- fix � = o 1 , o 2 , … , o m for all ∈ T such that o m ⊧ p in .We then construct a new trace o for each prefix ′ .Note that since o ′ ⊧ p in , the new trace CE does not change the system behaviour exemplified by ′ but merely augments it to include the missing behaviour.The set of new traces T CE thus generated is used as an additional input to the model-learning component, which in turn generates an abstraction that admits the missing behaviour.
Example run of the approach An example run demonstrating the active model-learning algorithm for a Home Climate Control Cooling system is illustrated in Fig. 8 and described below.First a candidate abstraction is learned from an initial set of system execution traces T .The gener- ated abstraction is provided in Fig. 8a.The abstraction models the following sequential system behaviour: the system stays in the Off mode ( ŝ1 → ŝ1 ), or switches from the Off mode to the On mode when inp.temp > T_thresh ( ŝ1 → ŝ2 ).The system then switches back to the Off mode and stays in the Off mode indefinitely ( ŝ2 → ŝ2 ).
The structure of the generated abstraction is used to extract the following completeness conditions: For state ŝ1 , ∀o � , o ∈ Ω: For state ŝ2 , ∀o � , o ∈ Ω: The subsequent completeness hypothesis check yields falsifications for conditions (21) and (22).The SAT procedure returns the counterexamples provided in Fig. 8a for the two conditions respectively.These are found to be not spurious.
The counterexamples exemplify the following system behaviours that are missing from the abstraction in Fig. 8a: 1.The first counterexample corresponding to a falsification of condition (21) indicates that after the system switches from the Off mode to the On mode ( ŝ1 → ŝ2 ), the system may remain in the On mode.2. The second counterexample corresponding to a falsification of condition (22) indicates that when the system is in the Off mode after switching from the On mode ( ŝ2 → ŝ2 ), the system may switch back to the On mode.
The counterexamples are used to construct new traces that serve as additional inputs to the model-learning component, which in turn generates the model in Fig. 8b.Note that the new model now captures the system behaviours identified to the missing from the old model: ŝ1 → ŝ2 → ŝ4 captures missing behaviour 1 as above and ŝ2 → ŝ3 → ŝ2 captures missing behaviour 2.
The abstractions generated for subsequent iterations of active learning are provided in Fig. 8c, d.All conditions extracted from the abstraction in Fig. 8d evaluate to true, i.e., = 1 .Thus, the algorithm terminates, returning the model in Fig. 8d as the final generated system overapproximation.

Implementation
We implement the active learning approach using the Trace2Model (T2M) [33,36] tool as the model-learning component.T2M generates symbolic finite state automata from traces using a combination of SAT and program synthesis [36].
We use the C Bounded Model Checker (CBMC v5.35) [15] to implement the procedure that evaluates degree of completeness for a learned model.The SAT solver in CBMC is used to check the truth value of each extracted condition.CBMC is used to perform k-induction [52] to verify if the counterexample for a condition check is spurious.This is done by asserting that there does not exist a concrete system path corresponding to the counterexample.If both the base case and step case for k-induction hold, it is guaranteed that the counterexample is spurious, while a violation in the base case indicates otherwise.However, in the event of a violation only in the step case, there is no conclusive evidence for the validity of the counterexample.Since we are interested in generating a system overapproximation, we treat such a counterexample as we would treat a non-spurious counterexample.
We use a constant value of k = 10 for our experiments.Note that we only discard those counterexamples that k-induction guarantees to be spurious.This ensures that, irrespective of the value used for k, all non-spurious counterexamples are used for subsequent modellearning iterations.

Benchmarks
To evaluate the algorithm, we attempt to reverse-engineer a set of LTSs from their respective C implementations.For this purpose, we use the dataset of Simulink Stateflow example models [55], available as part of the Simulink documentation.We select this dataset since these example models are state machines that can serve as ground-truth for our evaluation.
For each Stateflow example, we use Embedded Coder (MATLAB 2018b) [53] to automatically generate a corresponding C code implementation.The generated C code is used as the system M in our experiments.To collect traces, we instrument the implementation to observe a set of program variables X.The set of observations Ω is the set of valuations for all variables x ∈ X.  [3]. 1 We use the remaining 28 examples for our evaluation.
The majority of the Stateflow example models feature predicates on the transition edges.Some of the Stateflow example models are implemented as multiple parallel and hierarchical state machines.Our goal is to reproduce each of these state machines from traces, and we therefore obtain a total of 45 target state machines from the 28 Stateflow examples.These serve as our benchmarks for evaluation.A mapping of each Stateflow example model to its set of target state machine benchmarks is provided in Table 3 in the Appendix.The algorithm implementation and benchmarks are available online [34]. 2

Experiments and results
For each benchmark, we generate an initial set of 50 traces, each of length 50, by executing the C implementation with randomly sampled inputs.This set of traces and the C implementation are fed as input to the algorithm, which in turn attempts to learn an abstraction overapproximating system behaviours.The results are summarised in Table 1.
Each entry in the table from B1 to B45 corresponds to a target state machine that we wish to reverse-engineer.These are grouped by the Stateflow example that they belong to.We record the number of model learning iterations #iter , the number of states | Ŝ| and degree of completeness for the final abstraction, the total runtime in seconds T(s) and the percentage of the total runtime attributed to model learning, denoted by %T m .We also record the cardinality of the set X (the number of variables) for every Stateflow model.We set a timeout of 24 h for our experiments.For benchmarks that time out, we present the results for the candidate abstraction generated right before timeout.
Since the algorithm is designed to generate overapproximating system abstractions, the inferred model for a target state machine could admit traces that are outside the language of target machine, and therefore may not be accurate.We assess the accuracy of the final generated system overapproximation by assigning a score d computed as the fraction of state transitions in the target state machine that match corresponding transitions in the abstraction we generate.This is done by semantically comparing the corresponding transition predicates in the target state machine and the abstraction using CBMC.For hierarchical Stateflow models, we flatten the hierarchy and compare the abstraction with the flattened state machine.

Runtime
The active learning algorithm is able to generate overapproximations in under 12 min for the majority of the benchmarks.For the benchmarks that take more than 1 h to terminate, namely B11 and B12, we see that the model checker tends to go through a large number of spurious counterexamples before arriving at a non-spurious counterexample for a condition falsification.This is because, depending on the size of the domain for the variables x ∈ X , there can be a large number of possible valuations that falsify an extracted condition, of which very few may actually correspond to a concrete system trace.such cases, runtime can be improved by strengthening the conditions with domain knowledge to guide the SAT solver towards non-spurious counterexamples.For the B5 benchmark, the SAT solver takes a long time to check each condition.This is because the implementation features several operations, such as memory access and array operations, that especially increase the complexity of the SAT problem and the solving runtime.

Accuracy of the generated models
The algorithm terminates when = 1 and therefore, by Theorems 1 and 2 the algorithm is guaranteed to generate an overapproximation on termination.For the benchmarks that terminate, the generated abstraction is found to accurately capture the behaviour of the corresponding state machine ( d = 1).

Impact of the initial sample
As described in Sect.2.2, in each learning iteration i, L( � The number of learning iterations depends on |L(M)⧵L( M0 )| , where M0 is the abstraction generated from the initial trace set T .
To evaluate the impact of the seed traces on the number of iterations that the algorithm requires, we have run our experiments without any seed traces, i.e., using L( M0 ) = � .We observe that on an average the number of iterations increases ≈ 5 times compared to the number of iterations reported in Table 1.

Comparison with random sampling
We performed a set of experiments to check if random sampling is sufficient to learn abstractions that admit all behaviours.A million randomly sampled inputs are used to execute each benchmark.Generated traces are fed to T2M to passively learn a model.For ≈ 29% of the benchmarks, random sampling fails to produce a model that admits all sys- tem behaviours ( < 1).

Threats to validity
The key threat to the validity of our experimental claim is benchmark bias.We have attempted to limit this bias by using a set of benchmarks that was curated by others.Further, we use C implementations of Simulink Stateflow models that are auto-generated using a specific code generator.Although there is diversity among these benchmarks, our algorithm may not generalise to software that is not generated from Simulink models, or software generated using a different code generator.While the active-learning implementation used for our experiments produces an equivalent model ( d = 1 ) for the benchmarks that terminate, there is no formal guarantee that the algorithm delivers this in all cases.The accuracy of generated models may vary depending The bold is meant to highlight the benchmarks for which the algorithm was able to generate complete models The dataset includes another implementation of this system with similar results.We present the results for only one of them on the algorithm used as the model-learning component and its ability to consolidate trace information into symbolic abstractions.The procedure to evaluate the degree of completeness for a learned abstraction only formally guarantees the generation of a system overapproximation on algorithm termination.
For complex systems, model checking for counterexample analysis as described in Sect.2.6 can be computationally expensive in practice.Here, simulation-based techniques [18,50] could be a pragmatic alternative to explore system paths to check if the observation in the counterexample is reachable.

Overview
Active model-learning implementations largely consist of two components: a model-learning algorithm that generates a model from a set of traces, and an oracle that evaluates the learned model to identify missing and/or wrong behaviours.
The state-merge [8,27,40] algorithm and query-based learning [4,32,49] are popular choices for the model-learning component.State-merge algorithms reverse-engineer abstractions by constructing a Prefix Tree Acceptor (PTA) from the traces and identifying equivalent states to be merged in the PTA.The L* algorithm forms the basis of querybased active learning, where the learning algorithm poses equivalence and membership queries to an oracle.The responses to the queries are recorded into an observation table, that is eventually used to construct an automaton.
The oracle for active learning can be implemented as a black-box or a white-box procedure.One such black-box oracle implementation uses model checking, where pre-defined Linear Temporal Logic (LTL) system properties are checked against the generated model to identify wrong behaviours [26,57,59,62].For query-based learning in a black-box setting, membership queries are implemented as tests on the system.Equivalence queries are often approximated using techniques such as conformance testing [18], through a finite number of membership queries.For a white-box oracle implementation, algorithms use techniques such as fuzzing [66] and symbolic execution [38].
In the broader literature of equivalence checking, particularly in the field of Electronic Design Automation (EDA), several techniques are used to prove if two representations or implementations of a system exhibit the same behaviour [6,14,25,39,[44][45][46]61].Among these, the most closely related to our work are the techniques based on SAT and Bounded Model Checking (BMC).These techniques primarily check for input/output equivalence, i.e., assuming the inputs to each implementation are equal, the corresponding outputs are equal.
The SAT based techniques [25,44] generally operate by representing the output for each implementation as a Boolean expression over the inputs.The clause obtained by an XOR of these expressions is fed to a SAT solver.If a satisfying assignment is found, it implies that the outputs are not equal and thus the two implementations are not equivalent.In BMC-based equivalence checking [14,39] the two implementations are unwound a finite number of times, and translated into a formula representing behavioural equivalence that is fed to a SAT solver.In [39] input/output equivalence is verified on abstract overapproximations of the implementations.Equivalence is modelled as a safety property that is checked using CEGAR on the product of the abstract models.Counterexample analysis for CEGAR is performed by simulating the abstract counterexample on the concrete model using BMC.
In this section, we will primarily focus on equivalence checking in the context of active model learning.There are many active learning techniques that use various combinations of model learning algorithms and oracle implementations discussed above.In this work, we described an algorithm that uses a white-box oracle implemented using SAT solving and model checking, when combined with a symbolic-model learning algorithm can learn expressive overapproximations for a system.A summary of related active learning implementations is provided in Table 2.In the following sections we describe these techniques in detail and compare them in terms of generated model completeness and expressivity.

Learning system overapproximations
State-merge algorithms are predominantly passive and generated abstractions admit only those system behaviours exemplified by the traces [27,40,41,63,65].One of the earliest active algorithms using state-merge is Query-Driven State Merging (QSM) [19], where model refinement is guided by responses to membership queries posed to an end-user.Other active versions of state-merge use model checking [59,62] and model-based testing [64] to identify spurious behaviours in the generated model.In [59,62] a priori known LTL system properties are checked against the generated model.Counterexamples for property violations serve as negative traces for automaton refinement.In [64], tests generated from the learned model are used to simulate the system to identify any discrepancies.However, abstractions generated by these algorithms are not guaranteed to accept all system traces.
Query-based learning algorithms, such as Angluin's L* algorithm and its variants [5,32,37,51], can in principle generate exact system models.But the absence of an equivalence oracle, in practice, often restricts their ability to generate exact models or even system over-approximations.In a black-box setting, membership queries are posed as tests on the system.The elicited response to a test is used to classify the corresponding query as accepting or rejecting.Equivalence queries are often approximated through a finite number of membership queries [2,11,51] on the system generated using techniques such as conformance testing or random walks of the hypothesis model.
An essential pre-requisite to enable black-box model learning is that the system can be simulated with an input sequence to elicit a response or output, such as systems modelled as Mealy machines or register automata.Moreover, obtaining an adequate approximation of an equivalence oracle may require a large number of membership queries, that is exponential in the number of states in the system.The resulting high query complexity constrains these algorithms to learning only partial models for large systems [28,31].
One way to address these challenges is to combine model learning with white-box techniques, such as fuzzing [56], symbolic execution [9,24,30] and model checking [17,20], to extract system information at a lower cost.But, these are not always guaranteed to generate system overapproximations.
In [56], model learning is combined with mutation based testing that is guided by code coverage.This proves to be more effective than conformance testing, but the approach does not always produce complete models.In [24,30], symbolic execution is used to answer membership queries and generate component interface abstractions modelling safe orderings of component method calls.Sequences of method calls in a query are symbolically executed to check if they reach an a priori known unsafe state.However, learned models may be partial as unsafe method call orderings that are unknown to the end user due to insufficient domain knowledge are missed by the approach.The Sigma* [9] algorithm combines L* with symbolic execution to iteratively learn an over-approximation in parallel to the models learned using L*.The algorithm terminates when the hypothesis model equals the over-approximation, and therefore generates exact system models.In [17,20], model checking is used in combination with model learning for assume guarantee reasoning.The primary goal of the approach is not to generate an abstract model of a component and may therefore terminate before generating a complete model.
Very closely related to our work are the algorithms that use L* in combination with blackbox testing [48] and model checking [26,57].The latter use pre-defined LTL properties, similar to [59,62], that are model-checked against the generated abstraction.Any counterexamples are checked with the system.This either results in the conclusion that the system does not satisfy the property or a refinement of the abstraction to remove incorrect behaviours.Black-box testing [48] may be a pragmatic approach to identify missing behaviours for an abstraction by simulating the learned model with a set of system execution traces.However, it is not guaranteed that the model admits all system traces, as this requires a complete set of execution traces.

Learning symbolic models
An open challenge with query-based active model learning is learning symbolic abstractions.Many practical applications of L* [12,17] and its variants are limited to learning system models defined over an a priori known finite alphabet consisting of Boolean events, such as function calls.Maler and Mens developed a symbolic version of the L* algorithm [42,43] to extend model inference to large alphabets by learning symbolic models where transitions are labelled with partitions of the alphabet.
In [1], manually constructed mappers abstract concrete values into a finite symbolic alphabet.However, different applications would require different mappers to be manually specified, which can be a laborious and error prone process.The authors in [2] propose a CEGAR-based method to automatically construct mappers for a restricted class of Mealy machines that test for equality of data parameters, but do not allow any data operations.In [29], CEGAR is used for automated alphabet abstraction refinement to preserve determinism in the generated abstraction.Given a model, the refinement procedure is triggered by counterexamples exposing non-determinism in the current abstraction.
The MAT* algorithm [5] generates symbolic finite automata (SFA), where the transitions carry predicates over a Boolean algebra that can be efficiently learned using membership and equivalence queries.The input to the algorithm is a membership oracle, an equivalence oracle and a learning algorithm to learn the Boolean algebra of the target SFA.The algorithm has been used to learn SFAs over Boolean algebras with finite domain, the equality algebra, a Binary Decision Diagram (BDD) algebra and SFAs over SFAs that accept a finite sequence of strings.But, designing and implementing oracles for richer models such as SFAs over the theory of linear integer arithmetic is not straightforward, as it would require answering queries comprising valuations of multiple variables, some of which could have large and possibly infinite domains.
In [7], an inferred Mealy machine is converted to a symbolic abstraction in a post-processing step.The algorithm, however, is restricted to learning models with simple predicates such as equality/inequality relations.The algorithm in [60] is restricted to generating Mealy machines with a single timer.Sigma* [9] extends the L* algorithm to learn symbolic models of software.Dynamic symbolic execution is used to find constraints on inputs and expressions generating output to build a symbolic alphabet.But, behaviours modelled by the generated abstraction are limited to input-output steps of a software.Although the algorithm generates symbolic abstractions that are complete, as illustrated in Table 2, an implementation of the algorithm is not publicly available for an experimental comparison.The SL* algorithm [11] extends query-based learning to infer register automata that model both control flow and data flow.Register automata have registers that can store input characters, and allow comparisons with existing values that are already stored in registers, making them inherently more expressive that SFAs.RALib [10] implements the SL* algorithm and supports the inference of Input-Output Register Automaton (IORA).An IORA is a register automaton transducer that generates an output action after each input action.
We attempted to reverse-engineer the Simulink state machine benchmarks modelled as IORA using RALib.We present the results obtained for benchmarks B1 and B6.We modelled state machine B6 of a Home Climate Control Cooling system as an IORA with input action check(inp.temp)that takes a parameter inp.temp , and output actions On() and Off() represent- ing the operation modes of the system, as illustrated in Fig. 9.This is fed as the system-underlearning (SUL) to RALib.The models generated by our active learning approach and RALib are provided in Fig. 10.Similar to our algorithm, RALib was able to accurately capture the system behaviours and generate an exact representation of the SUL, as is evident from Fig. 10b.
As illustrated in Fig. 11, we modelled state machine B1 of an Automatic Transmission Gear system as an IORA with input action check(time abs , c 1 , c 2 ) that takes parameters time abs , c 1 and c 2 , and output actions One(), Two(), Three() and Four(), representing the four gears in the system.This is fed to RALib as the SUL.The abstractions generated by our algorithm and RALib are provided in Fig. 12. RALib was only able to generate the partial model illustrated in Fig. 12b before timing out at 10 h.Our algorithm, on the other hand, was able to generate a complete model (Fig. 12a) in less than 12 min, as evidenced in Table 1.
The basic tool implementation of RALib currently supports predicates featuring equality over integers and inequality over real numbers.In addition to equality/inequality relations, automaton transitions may also feature simple arithmetic expressions such as increment by 1 and sum.However, these are still in the development stage and only partially supported, often tailored to specific domains such as TCP protocols [21].Owing to the high query complexity it is not obvious how the approach can be generalised to efficiently learn symbolic models over richer theories.
An extension of the SL* algorithm [23] uses taint analysis to improve performance by extracting constraints on input and output parameters.However, it currently does not allow the analysis of multiple or more involved operations on data values.

Use-cases and future work
In this article, we have presented a new active model-learning algorithm to learning abstractions of a system from its execution traces.The generated models are guaranteed to admit all system traces defined over a set of observations.This can be particularly useful when system specifications are incomplete, and so any implementation errors outside the scope of defined requirements cannot be flagged.This is a common risk when essential domain knowledge gets progressively pruned as it is passed on from one team to another during the development life cycle.In such scenarios, manual inspection of the learned models can help identify errors in the implementation.The approach can also be used to evaluate test coverage for a given test suite and generate new tests to address coverage holes.
In the future, we intend to explore these potential use-cases further.This will drive improvements to reduce runtime, such as ways to guide the condition check procedure towards non-spurious counterexamples.We intend also to investigate extensions of the approach to model recursive state machines.

Appendix 1: List of benchmarks
See Table 3.

Fig. 3
Fig. 3 Overview of the active model-learning algorithm

Fig. 4
Fig. 4 Counterexample-Guided Abstraction Refinement (CEGAR) loop ) in definition 2 for a simulation relation; while clause (a) is obtained by negating condition (1), clauses (b) and (c) are obtained by negating condition (2) as follows: Condition (2) in definition 2 can be written as On negating the above expression we get Here, expression (9) corresponds to (clause (b) ∨ clause (c)).

Fig. 6
Fig. 6 Symbolic representation of abstract model states and transitions

Fig. 7
Fig. 7 Completeness hypothesis for a symbolic abstraction

Fig. 8
Fig.8 Example run of the active learning algorithm for a Home Climate Control Cooling system with observable system variables X = {mode next , inp.temp, inp.humid,T_thresh, H_thresh}

Fig. 9 Fig. 10 Fig. 12
Fig.9IORA modelling a Home Climate Control Cooling system (B6) The dataset of Stateflow example models comprises 51 examples that are available in MATLAB 2018b.Out of the 51 Stateflow examples, Embedded Coder fails to generate code for 7; a total of 13 have no sequential behaviour and 3 implement Recursive State Machines (RSM)

Table 1
Results of experimental evaluation of the active learning algorithm

Table 2
Summary of related active model-learning implementations

Table 3
Mapping of Simulink Stateflow example models to their benchmark number B # used in this article