Compositional learning of mutually recursive procedural systems

This paper presents a compositional approach to active automata learning of Systems of Procedural Automata (SPAs), an extension of Deterministic Finite Automata (DFAs) to systems of DFAs that can mutually call each other. SPAs are of high practical relevance, as they allow one to efficiently learn intuitive recursive models of recursive programs after an easy instrumentation that makes calls and returns observable. Key to our approach is the simultaneous inference of individual DFAs for each of the involved procedures via expansion and projection: membership queries for the individual DFAs are expanded to membership queries of the entire SPA, and global counterexample traces are transformed into counterexamples for the DFAs of concerned procedures. This reduces the inference of SPAs to a simultaneous inference of the DFAs for the involved procedures for which we can utilize various existing regular learning algorithms. The inferred models are easy to understand and allow for an intuitive display of the procedural system under learning that reveals its recursive structure. We implemented the algorithm within the LearnLib framework in order to provide a ready-to-use tool for practical application which is publicly available on GitHub for experimentation.


Introduction
Formal validation and verification methods such as modelbased testing [10] and model-checking [6,14] are an integral part of today's software development process. As software grows in size and complexity, (automated) validation and verification of system properties not only helps finding errors during the development process but often is a requirement for final acceptance tests.
Crucial for these techniques to be applied properly is a formal model of the system (components) to verify. However, one often faces situations where formal representations are not available, either because creating and maintaining a correct model is tedious and error-prone or not possible at all if dealing with legacy systems or third-party components. Active Automata Learning (AAL) has shown to be a power-B Bernhard Steffen steffen@cs.tu-dortmund.de Markus Frohme markus.frohme@cs.tu-dortmund.de 1 Chair of Programming Systems, Faculty of Computer Science, TU Dortmund, Dortmund, Germany ful means to attack these problems in many applications by allowing to infer behavioral models fully automatically on the basis of testing [24,31,35,41].
The fact that AAL as a testing-based technology is neither correct nor complete can nicely be compensated for via monitoring-based lifelong learning which became practical with the development of the TTT algorithm [29]. Essential for the success of AAL is the availability of powerful tools like [9,13,30,46] and the continuous development of more and more expressive frameworks capturing increasingly many system properties like input/output behavior [25], data [1,8,15,22,23,27,28,34], probability [16], additional computational structures like hierarchy/recursion [26,33] and parallelism.
This paper presents a compositional approach to active automata learning of Systems of Procedural Automata (SPAs), an extension of Deterministic Finite Automata (DFAs) to systems of DFAs that can mutually call each other according to the classical copy-rule semantics (cf. Fig. 2) used already in early programming languages like Algol 60 [38]. SPAs are of high practical relevance as they allow one to efficiently learn intuitive recursive models of recursive programs after an easy instrumentation that makes calls and returns observable.
Key to our approach is the simultaneous inference of individual DFAs for each of the involved procedures in a modular fashion using well-known learning algorithms for deterministic finite automata [5,21,29,32,44]. Technically, our approach is based on a translation layer that bridges between the view of the entire system and the local view concerning the individual procedures: Local queries of procedural automata are expanded to global queries of the instrumented system under learning, and global counterexample traces for the global system are projected onto local counterexample traces for the concerned procedural automata. This translation layer introduces a negligible query overhead, as queries can be directly mapped between the two scopes and we show that counterexample projection can be implemented in a binary search-fashion. Figure 1 illustrates three essential characteristics of SPAs: -the intuitive structure: the operational semantics of SPAs follow the copy-rule semantics (cf. Fig. 2), i.e. upon encountering a procedural call the control is transferred to the respective procedural automaton from which it can only return at specific states. This is a universal concept that is independent of the automaton type of the procedures as it can be realized on a purely syntactical level, e.g. via graph transformation/rewriting [43].
In this paper, we focus on a context-free language/acceptorbased perspective where successful runs of a system correspond to accepted words (wrt. the underlying language). Extending the SPA principle to other automaton types, such as transducers (e.g. Mealy machines), is on F -> a | a F a | b | b F b | G | ε G -> c | c G c | F Listing 1 Production rules in BNF for the language of palindromes over the three characters a, b and c.
our future research agenda. The SPA in Fig. 1 is composed of non-deterministic finite automata (NFAs) to emphasize the trivial translation between SPAs and the production rules of context-free grammars (CFGs)-see Listing 1 for the corresponding representation in BNF. 1 In this paper, however, we focus on (equivalent) deterministic procedures (DFAs) which are more common in the active automata learning community. See Fig. 6 for the deterministic version of the SPA of Fig. 1. In order to describe context-free systems via DFAs, we assume the procedural calls as observable, which we justify by the fact that in practice the required observability can be achieved via easy instrumentation. When ignoring this control overhead, the set of accepting runs coincides with the context-free language corresponding to the procedural system. -the expressive power: SPAs cover the full spectrum of context-free languages. The SPA shown in Fig. 1 "implements" the language of all palindromes over the alphabet {a, b, c}. -the role of procedural names (non-terminals): they can be considered as "architectural knowledge" about the system to be learned. 2 In this example it imposes a (here intended, but from the mere language perspective unnecessary) separate treatment of symbol c, something which could not be observed simply on the basis of terminal words. This allows one to represent the compositional architecture of the system in terms of intuitive models.
Similar to the classical learning of DFAs, our learning algorithm for SPAs terminates with a canonical SPA for the considered language and simply requires a so-called membership oracle, i.e. a "teacher" that is able to answer the required word problems-as long as one accepts that the socalled equivalence queries are typically approximated using membership queries in practice. Even better, also its (query) complexity remains unchanged as the effort is dominated by the learning of the individual procedural automata. As shown in Sect. 8, our approach yields significant performance improvements compared to existing learning algorithms for similar systems such as visibly pushdown automata (VPAs).

Fig. 2
The copy-rule semantics: for each procedural invocation the automaton of the called procedure is copied into the automaton of the calling procedure. This concrete example shows the first expansion step for the recursive invocation of G in procedure G. When labeling the dotted transitions with observable call and return symbols, the language of this (potentially infinite state) automaton coincides with the language of our instrumented system (cf. Listing 2). When interpreting them as direct (i.e. ε) transitions, the language coincides with the original context-free language An implementation of the presented algorithm is publicly available at https://github.com/LearnLib/learnlib-spa and open to everyone for experimentation. Our implementation utilizes the LearnLib [30] library and therefore comes with direct support for practical application to real systems.

Outline
We continue in Sect. 2 with introducing the results of related fields of research and sketching preliminary terminology. Section 3 formalizes our concept of systems of procedural automata. In Sect. 4 we describe the essential concepts for the inference process of SPAs by formalizing the translation layer and the different phases of the learning process. Section 5 presents our approach to efficiently analyze and project global counterexamples for SPA hypotheses and Sect. 6 aggregates the previous concepts in a sketch of a learning algorithm. Sections 7 and 8 present a theoretical and empirical evaluation of the algorithm and Sect. 9 concludes the paper and provides some directions for future work.

Related work and preliminaries
The idea of SPAs was originally introduced under the name Context-Free Process Systems (CFPSs) in [11] for model checking and has since then been adapted to several similar formalisms such as Recursive State Machines (RSMs) in [2]. Calling them Systems of Procedural Automata here and focusing on deterministic automata is meant to better address the automata learning community. The formal foundation of our learning approach, similar to many other active automata learning algorithms, is the minimal adequate teacher (MAT) framework proposed by Angluin [5]. Key to this framework is the existence of a "teacher" that is able to answer membership queries, i.e. questions whether a word is a member of the target language, and equivalence queries, i.e. questions whether a tentative hypothesis exactly recognizes the target language.
The process of inferring a (regular) blackbox language is then given by discovering the equivalence classes of the Myhill-Nerode congruence [39] of the target language by means of partition refinement. This is usually done in an iterative loop consisting of the following two phases: -exploration: the learning algorithm poses membership queries for exploring language characteristics and constructing a tentative hypothesis. -verification: upon hypothesis stabilization, an equivalence query is posed that either indicates equivalence (thus terminating the learning process) or yields a counterexample (a word over the given input alphabet) which exposes a difference between the language of the tentative hypothesis and the unknown target language. This counterexample can then be used to refine the tentative hypothesis (by splitting a too coarse partition class) and start a new exploration phase.
We expect the reader to be familiar with the general process and formalities of active automata learning. For a more thorough introduction (to the regular case) see e.g. [45] or [32,Chapter 8]. Regular languages are not powerful enough to capture the key characteristics of procedural systems which inherently support (potentially infinite) recursive calls between their sub-procedures. These semantics are, however, expressible with context-free languages. Angluin herself already reasoned about the inference of context-free languages [5], but her extensions required for answering, e.g., membership queries have-at least to the knowledge of the authorsprevented any practical application.

Listing 2
The palindrome system after our proposed instrumentation. Each procedure (F, G) is now treated as an observable call symbol indicating the start of a procedure and a separate return symbol (R) has been added to denote a procedure's termination. To maintain the hierarchiy of the original language, new non-terminals (X's) have been used.
For inferring context-free/procedural systems, we propose an instrumentation similar to the idea of parenthesis grammars [36]: Each invocation of a procedure can be observed by means of a call symbol which denotes the start of a specific procedure, as well as a return symbol which denotes its termination. An example of this instrumentation is given in Listing 2 which shows the instrumented system of palindromes of Listing 1.
This instrumentation can easily be integrated into software programs with imperative structure but also concepts such as aspect-oriented programming or proxying (in object-oriented programming) allow one to intercept method invocations and terminations and therefore grant the fine-grained control we require. For certain application domains-especially tag languages such as XML-these observable entry and exit points require no instrumentation at all. We exploit this property in [17] for inferring blackbox DTDs-a CFG-like language for describing the structure of XML documents.
The idea of assigning specific semantics to certain input symbols is conceptually related to visibly pushdown languages (VPLs) [3,4] proposed by Alur et al. Intrinsic to these languages is that the stack operations of the corresponding visibly pushdown automaton (VPA) are bound to the observation of certain symbols. The characterizations given by Alur et al. have been used by Kumar et al. [33] and Isberner [26,Chapter 6] to formulate learning algorithms for visibly pushdown languages, requiring only classic membership queries as well.
Alur et al. have shown that in general there exists no unique (up to isomorphism) minimal VPA for a visibly pushdown language without further restricting the automaton to a fixed set of modules. Therefore, they propose n-single entry visibly pushdown automata (n-SEVPAs) where the set of call symbols Σ call is partitioned into n classes which are then "individually" represented in terms of n inter-connected structures. Such partitions support canonical representations, which in particular means that there exist canonical 1-SEVPA and |Σ call |-SEVPA representations.
SPAs are conceptually close to |Σ call |-SEVPAs and indeed, our proposed instrumentation transforms any contextfree language into a visibly pushdown language, which allows us to compare the two approaches in Sect. 8. However, SPAs exhibit the following key advantages: -The main difference between SPAs and n-SEVPAs concerns the treatment of the interrelation between substructures. Both in SPAs and n-SEVPAs, observing a call symbol guarantees to transition the system into a (for each call symbol) unique configuration. However, only for SPAs the same holds for observing the return symbol. While in general this allows n-SEVPAs to describe more complex languages, our instrumented systems do not require this complexity. Instead, a VPA learning algorithm has to compensate for this lack of certainty by posing more queries during the learning process. As Sect. 8 shows, this directly impacts the performance of the learning process, allowing our SPA learning algorithm to outperform the VPA approach by more than one order of magnitude in (symbol) query performance for small examples already. -The generality of the SPA representation allows one to "implement" the semantics of an SPA via a variety of formalisms: SPAs can be realized via pushdown semantics (similar to VPAs), via in-place graph expansion (cf. Fig. 2), and directly via context-free grammars (e.g. using the CYK algorithm [20,Chapter 7]). They allow one to choose the best implementation for a specific situation, making them an implementation-agnostic meta model for context-free systems. -The SPA structure is intuitive for everybody with programming knowledge and can directly be used also for μ-calculus-based model checking of context-free [11] and even pushdown processes [12], quite in contrast to the VPA representation and its "hard-coded" stack interpretation which is indeed quite cumbersome (cf. Sect. 8). -As we will show, the compositional nature of SPAs allows us to learn the individual procedures with any learning algorithm for regular languages. Consequently, improvements in the field of regular language inference (e.g. the handling of redundancy in counterexamples in TTT [29]) seamlessly transfer to our procedural learning scenario.
The rest of this section summarizes formal definitions and notations we use throughout the paper.

Definition 1 (SPA alphabet)
An SPA alphabet Σ = Σ call Σ int {r } is the disjoint union of three finite sets, where Σ call denotes the call alphabet, Σ int denotes the internal alphabet and r denotes the return symbol.
An SPA alphabet can be seen as a special case of a visibly pushdown alphabet [3,4]. However, we choose a distinct name here in order to address the specifics of using only a single return symbol and to emphasize that calling a procedure does not necessarily involve any kind of stack operations. For our palindrome examples in Listings 1 and 2, the alphabet definition is given by Σ = {F, G} {a, b, c} {R}. We write Σ * to denote the set of all words over an alphabet Σ and we use · to denote the concatenation of symbols and words.
Furthermore, we distinguish between global and procedural interpretations of words and symbols, where we use to denote the procedural context. We write Σ = Σ call Σ int { r } and w ∈ Σ * , accordingly. We add (remove) to (from) individual words or symbols in order to change the context of a word or symbol to a procedural (global) one. We continue to use Σ as a shorthand notation for Σ call Σ int {r } and Σ as a shorthand notation for In the following, we are especially interested in subwords: For 1 ≤ i ≤ j ≤ |w|, where |w| denotes the length of a word w, we write w[i, j] to denote the sub-sequence of w starting at the symbol at position i and ending at position j (inclusive). We write w[i, ] (w[, j]) to denote the suffix starting at position i (prefix up to and including position j). For any i > j, w[i, j] denotes the empty word ε.
Due to our proposed instrumentation, we focus especially on well-matched instrumented words. Intuitively, a well-matched word is a word where every call symbol is succeeded (at some point) by a matching return symbol and no unmatched call or return symbols exist such that a wellmatched nesting structure is obtained. Formally, we define the set of well-matched words by induction.
Definition 2 (Well-matched words) Let Σ be an SPA alphabet. We define the set of well-matched words WM(Σ) ⊂ Σ * as the smallest set satisfying the following properties: -Every word of only internal symbols is well-matched, i.e.
We call well-matched words rooted if they start with a call symbol and end with a return symbol. In order to specify the scope of a procedural subsequence and to talk about the depth of nested procedural invocations, we further introduce the concept of a call-return balance.
Definition 3 (Call-return balance) Let Σ be an SPA alphabet. The call-return balance is a function β : Σ * → Z, defined as For a well-matched word w ∈ WM(Σ) we have that every prefix u satisfies β(u)≥0 and every suffix v satisfies β(v)≤0.
We further introduce a find-return function that allows us to extract the earliest unmatched return symbol from a (sub-) word, and an instances set that describes all call symbols and their respective indices in a word.
Definition 4 (Find-return function) Let Σ be an SPA alphabet and w ∈ Σ * . We define the find-return function ρ w : N → N as Definition 5 (Instances set) Let Σ be an SPA alphabet and w ∈ Σ * . We define the instances set Inst w ⊆ Σ call × N as

Systems of procedural automata
In this section we present the base formalism of our approach: orchestrating regular systems to a system of procedural automata.

Orchestrating regular DFAs to procedural systems
We start with introducing procedural automata which form the core components of our systems of automata. Intuitively, they describe the possible actions of a single procedure and therefore an essential part of the global system behavior.
Definition 6 (Procedural automaton) Let Σ be an SPA alphabet and c ∈ Σ call denote a procedure. A procedural automaton for procedure c over Σ is a deterministic finite -Q c denotes the finite, non-empty set of states, q c 0 ∈ Q c denotes the initial state, -δ c : Q c × ( Σ call ∪ Σ int ) → Q c denotes the transition function, and -Q c F ⊆ Q c denotes the set of accepting states.
We define L(P c ) as the language of P c , i.e. the set of all accepted words of P c .
In essence, procedural automata resemble regular DFAs over the joined alphabet of call symbols and internal symbols and accept the language of right-hand sides of the production rules of a non-terminal in a (non-instrumented) context-free grammar. Internal symbols correspond to "terminal" actions, i.e. direct system actions, whereas call symbols correspond to (recursive) "calls" to other procedures. Please note that accepting states are used here to express that a procedure can terminate after a sequence of actions instead of using the (artificial) return symbol. Definition 7 (System of procedural automata) Let Σ be an SPA alphabet with Σ call = {c 1 , . . . , c q }. A system of procedural automata S over Σ is given by the tuple of procedural automata (P c 1 , . . . , P c q ) such that for each call symbol there exists a corresponding procedural automaton. The initial procedure of S is denoted as c 0 ∈ Σ call .
An example of such a system of procedural automata is given by the two DFAs in Fig. 6. We will continue to use S as a shorthand notation for (P c 1 , . . . , P c q ).
Intuitively, the parallels between SPAs and contextfree grammars should be clear to everyone with a basic understanding of context-free languages. To formally define the language of an SPA, we use structural operational semantics (SOS, [42]), incorporating stack semantics. Using an SOS-based semantic definition allows us to abstract from implementation details (e.g. graph-transformations, grammar-based interpretation, etc.) and simplifies the proofs. We write for some states s 1 , s 2 and some control components σ 1 , σ 2 to denote that this transformation (if applicable) emits an output symbol o. We generalize this notation to output sequences by writing to denote that there exists a sequence of individual (applicable) transformations starting in configuration (s 1 , σ 1 ) and ending in configuration (s 2 , σ 2 ), whose concatenation of output symbols yields w.
To define the semantics of SPAs by means of SOS rules, we first define a stack to model the control components of the SOS rules and then define the language of an SPA. This allows us to define the semantics of an SPA in terms of its associated language.
Definition 9 (Language of an SPA) Let Σ be an SPA alphabet and S be an SPA over Σ. Using tuples from Σ * × ST (Γ ) to denote a system configuration, we define three kinds of SOS transformation rules: 1. call-rules: The language of an SPA is then defined as Please note that by choosing ( c 0 , ⊥) as the initial configuration, we ensure that all words of a (non-empty) SPA language are rooted because of the mandatory initial application of a call-rule to consume c 0 . However, if for example L(P c 0 ) = ∅, we have L(S) = ∅ as well.
To give an intuition for the operational semantics let us give an exemplary run for the SPA S = (P F , P G ) of Fig. 6, using F as the initial procedure: We start with the configuration ( F, ⊥). Since a · F · a ∈ L(P F ), we can apply a call-rule to perform the transition Parsing the internal symbol a via the corresponding int-rule, we perform Since G ∈ L(P F ), we can apply a call-rule to perform the transition Since c ∈ L(P G ), we can apply a call-rule to perform the transition Parsing the internal symbol c via the corresponding int-rule, we perform Now we use two ret-rules to parse two consecutive return symbols Parsing the internal symbol a via the corresponding int-rule, we perform Applying a ret-rule again, we get Here, no more transformations are applicable and the process stops. Collapsing these individual steps, we have In the following, we will call a word w ∈ Σ * admissible in an SPA S, if there exist configurations (s 1 , σ 1 ), Please note, while the language of an SPA consists only of instrumented words, the non-instrumented language of the original context-free language (cf. Listing 2) can be easily obtained by post-processing each word of the SPA language and removing each of the instrumentation symbols

Essentials of SPA inference
As Theorem 1 will show, an SPA is fully characterized by its procedural automata. Therefore, the key idea of our learning algorithm for inferring an SPA is to infer each of the procedures/procedural automata simultaneously by using an active learning algorithm for the individual (regular) procedural languages. Within the MAT framework, this requires that the local learning algorithms can explore the local procedures of a global system and that (global) counterexample information can be returned to the local learners.
Key to our approach is a translation layer that bridges between the global system view and the local view concerning the individual procedural automata: Local queries of procedural automata are expanded to global queries of the instrumented SUL, and global counterexample traces for the global system are projected onto local counterexample traces for the concerned procedural automata.
To be able to perform these translations, we maintain socalled access, terminating and return sequences. Intuitively, these sequences store information about how a procedural automaton can be accessed, how a successfully terminating run of a procedure looks like, and how global termination can be achieved after executing a procedure (accessed by the matching access sequence).
For a procedure c ∈ Σ call , we formalize the pairs of access and return sequences by means of a context Cont c and the set of successfully terminating runs by means of a set TS c .
Definition 10 (Access, terminating, and return sequences) Let Σ be an SPA alphabet and S be an SPA over Σ. The context of a procedure c ∈ Σ call , Cont c ⊆ Σ * × Σ * , and the set of terminating sequences TS c ⊆ Σ * are defined as From the exemplary word F · a · F · G · c · R · R · a · R of Sect. 3 one can extract the following access, terminating and returning sequences for procedure G: Since an SPA can accept multiple words and a procedure can be called multiple times, there can also exist multiple terminating sequences and (matching) access and return sequence pairs. In the following, we do not depend on a specific instance of the three sequences as long as they hold the above properties. To refer to an arbitrary instance of an access, terminating and (matching) return sequence for a procedure p, we write as [ p], ts [ p], rs[ p] respectively. We detail in Sect. 6 how we obtain these sequences throughout the learning process-in the following, assume them as available.
We continue to explain in the next two subsections how our two translations are realized, where we begin with the simpler query expansion: Fig. 3 The expansion of a local query of a procedural automaton p to a global query of the instrumented SUL Membership query expansion Membership query expansion proceeds by symbol-wise processing of the proposed (local) query. Internal symbols are left unchanged and each (procedrual) call symbol c ∈ Σ call is replaced with the concatenation of its global equivalent c ∈ Σ call , the corresponding terminating sequence of c, and the return symbol. This expansion step is formalized in Definition 11.
In order to embed an expanded query into the correct global context, we further prepend the corresponding access sequence and append the corresponding return sequence to the translated query of the procedure in question. The complete expansion step is illustrated in Fig. 3.
Counterexample projection During counterexample analysis (cf. Sect. 4.3), one extracts from a global, instrumented trace of the SUL a sub-sequence that exposes wrong behavior in one of the procedural hypotheses. This sub-sequence, however, contains symbols of our instrumentation, which we need to transform to a local, procedural counterexample in order to be processable by the local learner. The corresponding projection step which essentially reverses query expansion is a bit more involved.
Again, we process the global trace symbol-wise and leave internal symbols unchanged. However, when we encounter an instrumented call symbol c ∈ Σ call , we replace it with the corresponding procedural call symbol c ∈ Σ call and skip forward until we have reached the matching return symbol. This procedure is formalized in Definition 12.
Definition 12 (Alpha projection) Let Σ be an SPA alphabet. The alpha projection α : Note that since α only accepts well-matched words, the call to ρ will always be able to find a valid return index in v when u is a call symbol. Furthermore, u will never be a return symbol, because we always skip over any nested return symbols in case of u ∈ Σ call . See Fig. 4 for an illustration of this projection step.

Localization theorem
With the concept of projection (cf. Definition 12) we establish in Theorem 1 a characteristic connection between the global language of an SPA and the local languages of its individual procedures, which will play an integral role in our inference process.
Theorem 1 (Localization theorem) Let Σ be an SPA alphabet, S be an SPA over Σ and w ∈ WM(Σ) be rooted in c 0 .
Proof This equivalence is based on the fact that for every emitted call symbol c of the SPA, there needs to exist a corresponding word v ∈ L(P c ). One can verify this property for each call symbol by checking the membership of the projected, procedural trace in the language of the respective procedural automaton. For the full proof, see "Appendix" Theorem 1 guarantees that a word w ∈ WM(Σ) which is rooted in the initial procedure belongs to the language of an SPA if and only if each (projected) procedural subsequence within w belongs to the language of its respective procedural automaton. It is important to note that this equivalence establishes a notion of modularity between an SPA and its procedural automata: Procedural automata contribute to the semantics of an SPA only by their respective procedural membership properties-no further requirements (e.g. about their internal structure) are needed. In particular, this property enables us to use arbitrary (MAT-based) regular learners for inferring the procedural automata and, consequently, for inferring an SPA.
Furthermore, we can show that our SPA approach can integrate to the MAT framework as well, by demonstrating how to realize the exploration and verification phase of the MAT framework.

Exploration phase
In Corollary 1 we formalize that each local membership query w ∈ ( Σ call ∪ Σ int ) * can be answered by querying the global SPA with its expanded version.

Corollary 1 (Membership query expansion)
Let Σ be an SPA alphabet and S be an SPA over Σ.
Proof This equivalence is based on the fact that pairs of access sequences and matching return sequences for a procedure c provide an admissible context for arbitrary w ∈ L(P c ). One can then show by induction that for all . For the full proof, see "Appendix" By providing the local learners with individual membership oracles that perform these translations automatically, the exploration phase of the global SPA hypothesis is directly driven by the exploration phases of the individual local learners. In order to construct and explore an SPA hypothesis S H , we use the procedural learners to construct and explore hypotheses of the individual procedures. What remains to be shown is how the information of global counterexamples can be used to refine local procedures.

Verification phase
Counterexamples are input sequences that expose differences between the conjectured SPA hypothesis and the SUL as they reveal diverging behavior with regards to the membership question. For acceptor-based systems there exist two kinds of counterexamples: positive and negative counterexamples. Positive counterexamples are words which are accepted by the SUL but (incorrectly) rejected by the current hypothesis, whereas negative counterexamples are rejected by the SUL but (incorrectly) accepted by the hypothesis.
In the following, we will abstract from the concrete kind of counterexample and only consider an accepting system S A = (P c 1 A , · · · , P c q A ) and a rejecting system S R = (P c 1 R , · · · , P c q R ). For positive counterexamples we map the SUL to S A and the current hypothesis to S R and for negative counterexamples we do the converse. This shows that, conceptually, the counterexample analysis process for SPAs is symmetrical for both kinds of counterexamples and that the kind only determines the mapping of S R and S A . We will, however, see that this symmetry does not hold when considering the query complexity.
Given a counterexample ce (and therefore S A and S R ), Theorem 1 states that and This allows us to split the counterexample analysis process into two phases: 1. In the global phase, we first analyze the global counterexample to pinpoint an individual procedure of S R that behaves differently from its respective counterpart in S A . 2. In the local phase, we use the corresponding projected sub-sequence of the global counterexample to refine the previously identified procedure.
Due to our concept of projection and expansion, the refinement during the local phase is essentially identical to the refinement phase of regular automata learning. Consequently, we can delegate this process completely to the local learning algorithms and focus in the following only on the first phase. The goal of our global counterexample analysis is then given by identifying a (not necessarily unique) procedural automaton which does not accept its corresponding projected trace and causes a mismatching behavior. Once we have identified the misbehaving procedural automaton, we can use the projected trace to construct a local counterexample: In the positive case (i.e. the hypothesis is mapped to S R and hence should accept the local trace) we construct a positive local counterexample and in the negative case (i.e. the SUL is mapped to S R and hence the hypothesis should also reject the local trace) we construct a negative local counterexample.
As mentioned, the projected counterexample is completely agnostic to the internal structures of the procedural hypotheses and only relies on their respective membership properties. Thus, following Theorem 1, we can use arbitrary (MAT-based) regular learning algorithms for refining the procedural hypotheses.
Please note, however, that Theorem 1 only guarantees the existence of a procedure that needs refinement. Efficiently identifying such a procedure is a different matter that we address in the next section.

Efficient counterexample analysis
In order to efficiently analyze global counterexamples and identify a procedure of S R that rejects its projected trace, we propose a binary search-based approach similar to the one of Rivest & Schapire [44] for the regular case. This process consists of transforming prefixes of the counterexample to sequences that are guaranteed to be admissible in S R and analyzing when this transformation changes the acceptance of the transformed counterexample trace.
Our approach to make binary search applicable in our scenario depends on S R to accept the current representatives of terminating sequences ts[ p] that are used by the gamma expansion (cf. Definition 11). This is naturally given for the SUL, as we will extract terminating sequences only from accepted runs of the SUL (cf. Sect. 6), but has to be explicitly enforced for the hypothesis. We therefore introduce the following notion: Definition 13 (ts-conformance) Let Σ be an SPA alphabet and S be an SPA over Σ. We call S ts-conform with respect to the current terminating sequences if and only if ∀c ∈ Σ call : ∃ w ∈ L(P c ) : For testing and validating the ts-conformance of an SPA, we re-use the result of Theorem 1.

Lemma 1
Let Σ be an SPA alphabet, S be an SPA over Σ and ts c = c · ts[c] · r denote the embedded terminating sequence for each c ∈ Σ call . S is ts-conform ⇔ ∀ p ∈ Σ call : ∀(c, i) ∈ Inst ts p : Proof This is a direct consequence of Theorem 1 if we consider for each p ∈ Σ call an SPA S p (based on S) which has p as its initial procedure.
Enforcing ts-conformance of an SPA hypothesis S H can be done straightforwardly as follows: -Check for each embedded terminating sequence ts p whether its nested, projected invocations are accepted by the respective procedures of S H . This does not require any membership queries since it can be checked on the procedural hypothesis automata. -If there exists a (nested) invocation that is not accepted, use the corresponding sequence as a local counterexample for refining the corresponding procedural hypothesis. This can be regarded as a "refinement for free", as it does not require the detection and treatment of a global counterexample.
Please note that this conformance check has to be re-initiated after each refinement, as refining the procedural hypotheses may introduce changes that affect the acceptance of a terminating sequence. Given a ts-conform system S R the following transformation allows one to perform the intended binary search: · can be generalized to prefixes of rooted words to obtain a transformation · * : (Σ call · WM(Σ)) * → (Σ call · WM(Σ)) * defined via piecewise application of · as follows: The following monotonicity property of · * is key to our binary search-based counterexample analysis. For technical reasons, we will, without loss of generality, only consider counterexamples with more than one procedural invocation. Please note that if a counterexample only contains the single invocation of the main procedure (i.e. the error occurs in the main procedure) there is no need for a global analysis process since the violating procedure is clear.
Theorem 2 (Acceptance monotonicity of · * ) Let Σ be an SPA alphabet, S be a ts-conform SPA over Σ, w ∈ WM(Σ) be rooted and r h , r k be indices of return symbols of w with r h < r k . Then we have Proof This implication is based on the fact that for all admissible words v ∈ WM(Σ) or v ∈ (Σ call · WM(Σ)) * , v * is also admissible in a ts-conform SPA. Furthermore, the admissibility of a word is decided on call-rules, since they are the only rules which are guarded by the procedural membership questions. Hence, when the call symbols in w[r h +1, ] do not cause a word to be rejected, then the call symbols of its suffix w[r k + 1, ] won't neither. For the full proof, see "Appendix" The acceptance monotonicity of · * allows us to adopt the Rivest & Schapire-style counterexample analysis of the regular case [44] to the procedural level: There exist two extreme points (the unprocessed counterexample) and ce * · ε ∈ L(S R ) (the terminating sequence of the main procedure) so the acceptance has to flip for some decomposition in between. For a return index r i of ce, we check whether or not If the answer is "yes", it suffices to search for lower return indices than r i , because by Theorem 2, we already know the answers for all higher return indices. Dually, if the answer is "no", we continue our search for higher return indices than r i , because by contraposition of Theorem 2, we already know the answers for all lower return indices. This observation allows us to formulate a binary searchstyle analysis (cf. Algorithm 1) which determines the lowest return index r * such that ce[, r * ] * · ce[r * + 1, ] ∈ L(S R ).
Its matching call symbol c * (whose index i * can be determined by ρ ce (i * + 1) = r * ) now identifies a procedure that does not accept its corresponding projected input sequence α(ce[i * + 1, r * − 1]) and must therefore be refined. Please note that investigating a specific return index r i only requires to query S R once because the construction of ce[, r i ] * · ce[r i + 1, ] can be done by in-memory transformations on ce.
To give a better intuition of this decomposition process, the first steps of the two cases (left-continuation and rightcontinuation of the binary search) are visualized in Fig. 5.
While the process of analyzing a global counterexample is symmetrical for the negative and positive case, it is worth noting that the two cases differ regarding their impact on the query complexity: For positive counterexamples we map the current SPA hypothesis to S R . Here, determining c * by querying S R does not induce any membership queries at all, since the queries can be answered via the current SPA hypothesis. This means positive counterexamples can be analyzed at zero (query) cost. Only the analysis process of negative counterexamples requires to pose membership queries on the SUL, which introduces a corresponding logarithmic factor (cf. Theorem 3).
Summarizing, an inequivalent procedure and its corresponding local counterexample can be determined in the following steps: 1. Depending on whether we receive a positive or a negative (global) counterexample, we select either the current hypothesis SPA or the SUL as the rejecting system S R . 2. Using S R and our alpha-gamma transformation (cf. Definition 14), we determine a single procedural hypothesis that behaves differently to its counterpart of S A . 3. Using the alpha projection (cf. Definition 12), we construct from the global counterexample a procedural counterexample that exposes the previously detected discrepancy on a procedural level.
We sketch these steps in a function called Analyze-Counterexample shown in Algorithm 1. The function takes a global counterexample ce ∈ WM(Σ) and returns the rejecting procedure c * including the respective local counterexample α(ce[i * + 1, ρ ce (i * + 1) − 1]). The main property of this function for our learning algorithm is stated in Theorem 3.

Algorithm 1 Analysis of a global counterexample
Input: A counterexample ce ∈ WM(Σ) rejected by S R Output: A tuple containing a procedure c * of S R and its rejected, procedural trace 1: function AnalyzeCounterexample(ce) 2: ensureTSconformance 3: low ← 1, high ← |Inst ce |, res ← |Inst ce | 4: if ce[, r mid ] * · ce[r mid + 1, ] ∈ L(S R ) then 7: high ← mid − 1, res ← mid 8: else 9: low ← mid + 1 10: end if 11: end while 12: i * ← min{i ∈ N | ρ ce (i + 1) = r res }  where dashed lines indicate that an illegal procedural invocation has occurred that irrecoverably causes S R to reject the trace. On the left-hand side the error occurs in u. Here our (extended) alpha-gamma transformation replaces the violating procedural invocation with an admissible prefix, which causes the transformed trace to be accepted. This indicates that further analysis (i.e. binary search) should continue with splitting u. On the right-hand side the error occurs in v. Here the error prevails even after our (extended) alpha-gamma transformation, which indicates that further analysis (i.e. binary search) should continue with splitting v. Note that replacing nested calls via the alpha-gamma transformation may result in a more complex nesting structure than before, depending on the terminating sequence. In the above figure-for simplicity reasons-terminating sequences only consist of internal symbols Proof This is a direct consequence of the binary search strategy and the fact that for a given return index r mid the query ce[, r mid ] * · ce[r mid + 1, ] can be constructed without any further membership queries.

A sketch of the algorithm
In this section we aggregate the concepts from the previous sections and sketch an active learning algorithm for SPAs. As stated previously, we exploit that SPAs are characterized by their procedural automata which we can infer in a modular fashion by -answering local membership queries via global membership queries to the SPA (cf. Sect. 4) and -constructing local counterexamples for the procedures from global SPA counterexamples (cf. Sect. 5).
Query expansion hinges on the availability of corresponding access sequences, terminating sequences, and return sequences. Thus, one of the main tasks of the learning algorithm of an SPA is obtaining and managing these sequences throughout the learning process.
Positive counterexamples (i.e. words that are rejected by the current hypothesis but are accepted by the SUL) play a special role throughout the learning process because they are witnesses for successful runs of the SUL. In particular, since we are observing well-matched words, for every procedural invocation (i.e. call symbol) in a positive counterexample, we can directly extract: -a corresponding access sequence (everything up until and including the call symbol), -a terminating sequence for the procedure (everything in between the call symbol and the matching return symbol), and -a return sequence for the procedure (the matching return symbol and everything after).
The following subsections display how we initialize our procedural learning algorithms and how we use positive counterexamples during hypothesis refinements to manage access sequences, terminating sequences and return sequences.

Initialization
We initialize our SPA learner by setting up regular learning algorithms (e.g. TTT [29]) for the procedures and configuring them accordingly (individual membership oracles for automated query translation, etc.). However, during the initial-ization, the local learners cannot explore any procedures due to the lack of the required sequences mentioned above. As a consequence, it is also not possible to construct an initial SPA hypothesis (e.g. by means of applying a call-rule to the initial configuration of Definition 9). Instead, an initial (empty) hypothesis is constructed, that simply rejects all input words. This guarantees that the first counterexample our SPA learning algorithm receives will always be positive, which provides us with access sequences, terminating sequences and return sequences of at least the main procedure and ensures progress. If no such counterexample exists, i.e. the SUL describes the empty language (L(SU L) = ∅ ⊆ WM(Σ)), the initial hypothesis already coincides with the SUL and the learning process is finished at this point.

Refinement
The core of the learning algorithm is the refinement step which for a given counterexample triggers a refinement of the hypothesis, or specifically in our case, a refinement of (at least) one procedure. Given the above initialization, it may however not be possible to address certain procedures due to the lack of initialization. As we pointed out earlier, positive counterexamples hold a special role as they grant access to the required sequences for activating local learners and therefore constructing hypotheses of procedures. Generally, we cannot expect a single counterexample to contain all procedural call symbols at once and thereby giving us access to the information required for activating all local learners and reasoning about procedural invocations. Even after the first (global) counterexample there may be procedures for which no access, terminating or return sequences have been observed yet.
We tackle this issue by introducing the concept of incremental alphabet extension. In our context this means that we successively add call symbols to our learning alphabet only after we have witnessed them in a positive counterexample. This does not cause any problem because the learning process is monotonic wrt. alphabet extension.
At the start of the learning process, we initialize the currently active learning alphabet Σ act with Σ int so that it contains no call symbols. No local learning algorithmsnot even for the start procedure-have been activated and thus the corresponding initial hypothesis specifies the empty language. As mentioned above, this guarantees that the first counterexample is positive (corresponds to a successful run) and therefore provides us with the three kinds of sequences for at least the start procedure. In general, gaining access to the three sequences of a procedure allows us -in case of access and return sequences: to activate the corresponding local learning algorithm because its queries can now be embedded in a global context using access sequences and return sequences, and -in case of terminating sequences: to invoke the corresponding procedure in other contexts because local queries can be properly expanded to admissible input sequence in the global system.
Essentially, we delay exploration of a procedure until we obtained the knowledge about its access sequence and return sequence, and restrict exploring a procedure wrt. the currently active alphabet which consists only of internal alphabet symbols and procedural invocations (i.e. call symbols) for which a terminating sequence has already been found. This guarantees that invocations of a procedure are reflected in the tentative SPA hypothesis only after they can be correctly embedded in the global context. By rejecting words that contain uninitialized procedures, counterexamples that introduce previously unobserved call symbols will always be positive, allowing us to extract the three kinds of sequences and progress the SPA inference process. Thus the active alphabet successively grows towards the complete input alphabet Σ call ∪ Σ int . Algorithm 2 describes the corresponding refinement process in more detail: Lines 2 to 15 cover the aforementioned special handling of positive counterexamples. If we receive our first counterexample (cf. line 3), we extract c 0 from the counterexample. Recall that due to the nature of our instrumentation, all accepted words of the SUL are rooted in c 0 and thus the initial procedure can be directly determined from the first symbol of any accepted word of the SUL. We continue to scan the counterexample for previously unobserved call symbols (cf. line 6), extract the access sequences, terminating sequences and return sequences from the counterexample trace (cf. lines 7-9), and update the set of currently active alphabet symbols (cf. line 10). The variables as, ts, rs and Σ act are stored in a global scope and thus are shared across multiple invocations of the refinement procedure.
Discovering a (new) terminating sequence for a procedure p allows us to invoke p in the context of other procedures because local queries containing p can now be correctly expanded to global queries using our gamma expansion (cf. Definition 11). Therefore, in line 13 we extend the alphabets of the procedural learners to match the set of currently active symbols Σ act . For already activated procedural learners this includes posing new (local) queries in order to determine the successors of transitions for the just-added symbols.
From lines 16 to 19 we analyze the given counterexample with regards to the current hypothesis SPA S H . From every global counterexample trace ce we can extract a procedure c and a (projected) sub-sequence localCe such that localCe exposes an error in P c (see Sect. 5). We then use the projected, local counterexample to delegate the actual refinement step of the identified local hypothesis to the respective Algorithm 2 Main refinement loop of the SPA learner Input: A counterexample ce ∈ WM(Σ) and a boolean value answer indicating whether ce is a positive or negative counterexample 1: function RefineHypothesis(ce, answer ) 2: if answer = true then 3: if Σ act = Σ int then 4: c 0 ← ce [1] used for determining the initial procedure 5:  18). Note that if the procedural learner of the determined procedure is not yet initialized, we initialize it first (to construct an initial hypothesis) and then pass the projected counterexample to the learner.

Correctness and complexity
A canonical SPA is given by the tuple S = (P c 1 , . . . , P c q ) such that each P c i is a canonical automaton for the corresponding procedure c i ∈ Σ call . The size of an SPA is the sum of the individual sizes of the procedures, i.e. the number of their states. We have Similar to the original work by Angluin [5], the following discussion assumes that so-called equivalence queries are available to indicate discrepancies between inferred hypothesis models and the considered SUL. In practice, equivalence queries are typically approximated using membership queries which themselves are realized via testing. The discussion of this issue, which is typically based on application specific heuristics (e.g. context-free model checking [11]), is beyond the scope of this paper.
Our following correctness and complexity considerations are based on the assumption that the individual procedural automata are learned using one of the well-known algorithms for regular inference which incrementally construct canon-ical (i.e. minimal unique, up to isomorphism) hypotheses requiring at most n i (for procedure c i ∈ Σ call ) equivalence queries. Under these assumptions one can show: Theorem 4 (Correctness and termination) Having access to a MAT teacher for an instrumented context-free language L, our learning algorithm determines a canonical SPA S for L requiring at most n + 1 equivalence queries.
Proof The proof follows a three-step pattern: -Invariance: The number of states of the hypothesis of procedure P c i never exceeds n i , a central invariant of all learners for regular languages. Hence, the size of the SPA will never exceed n. -Progress: Each global counterexample either activates a local learner, which then adds the first "true" state to its procedural hypothesis, or identifies an error in one of the existing tentative hypotheses. Using the presented counterexample analysis, one extracts from the global counterexample a concerned procedure c i and the projected local counterexample that allows the local learner to refine the corresponding local hypothesis automaton for P c i . This adds at least one state to the procedural hypothesis of P c i and thereby properly increases the number of states of the hypothesis SPA. This refinement works by expanding the required local membership queries to SPA queries (cf. Fig. 3), and by interpreting the SPA responses to the expanded query as answers to the local query. -Termination: After at most n i local counterexamples the tentative hypothesis of c i is equivalent to the target procedure. Hence, at most n i global counterexamples can identify an error in the hypothesis of c i , which overall only allows for at most n global counterexamples. This is a direct consequence of the above notion of invariance and progress. The final (additional) equivalence query is required to confirm the equivalence of the SPA hypothesis and the SUL, and terminates the inference process.
We can further show that the query complexity of our algorithm, i.e. the number of posed membership queries, is mainly determined by the choice of the regular learning algorithms. Our concepts of orchestration (to systems of procedural automata) and query translation do not impact the asymptotic query complexity. For inferring an isolated procedure P c i , state-of-the-art learning algorithms (such as TTT [29]), require O(kn 2 i + n i log 2 m) queries, where k denotes the size of the input alphabet (in our case |Σ call |+|Σ int |) and m denotes the length of the longest (local) counterexample. The query complexity is usually split into two parts: hypothesis construction C c i = kn 2 i and counterexample analysis n i log 2 m. This results in the following query complexity. -After all states of all procedures have been identified, the algorithm terminates with the correct hypothesis (cf. Theorem 4).
A direct comparison to existing learning algorithms for, e.g. visibly pushdown languages is hard, due to the different structure of the inferred models. However, first experiments (cf. Sect. 8) show that already for small systems our approach performs significantly better.

A comparison to visibly pushdown automata
To our knowledge, related work only considers VPAs. In order to elaborate on the qualitative and quantitative differences between our SPA approach and existing learning setups for visibly pushdown languages, we showcase the system of palindromes (cf. Fig. 1) as a visibly pushdown automaton. As stated before, the instrumented SUL yields a visibly pushdown language where every observable invocation can be interpreted as a call symbol and every observable termination can be interpreted as a return symbol. Thus, the instrumented system can also be inferred in form of a visibly pushdown automaton. For this showcase, we inferred the system of (instrumented) palindromes using learning algorithms currently present in LearnLib [30] which infer 1-SEVPAs.
The inferred models are shown in Figs. 6 and 7. Table 1 shows performance measurements of our SPA approach using different regular learning algorithms and the two VPA learners present in LearnLib. We approximated equivalence queries by generating 10000 random, well-matched, rooted test-words to allow for some variance and ultimately sampled from a manually constructed set of characteristic words to ensure that each algorithm terminated with the correct hypothesis model. As mentioned before, the realization of equivalence queries is an issue on its own, which is beyond of the scope of this paper.
Although both models capture in essence the same information, there is a big difference in their comprehensibility: Whereas the SPA model ( Fig. 6) very intuitively reflects the underlying grammar and thus the structure of the system, the VPA model (Fig. 7) is quite hard to understand. For successfully following an accepting run of the automaton, one has to manually keep track of the current stack contents which have been pushed (popped) by previous call (return) symbols. In particular, it is hardly possible to reveal typical structural properties.
Regarding performance there is another interesting observation. While the 1-SEVPA representation is more compact in the sense that it can represent the system with fewer states/locations, it requires significantly more queries and even an order of magnitude more symbols to infer the model. We reckon this is due to the global execution semantics of SEVPAs: Essentially, every state can be the successor of a return transition and for every return tran- This results in a lot of overhead for determining transition successors. While in general this allows one to capture more complex behavior (e.g. returns to different procedures/modules), it shows no benefit in our context of procedural systems revolving around the copy-rule semantics.
We have observed similar results for other applications, such as inferring the structure of XML documents [17] or exponential systems which try to approximate recursive systems with pure regular automata. See https://github.com/ LearnLib/learnlib-spa for further benchmark data. We plan to further analyze and exploit these characteristics for complex, large-scale systems in the future.

Conclusion and future work
In this paper we have presented a compositional approach for active automata learning of Systems of Procedural Automata (SPAs), an extension of Deterministic Finite Automata (DFAs) to systems of DFAs that can mutually call each other. SPAs are of high practical relevance, as they allow one to efficiently learn intuitive, recursive models of recursive programs after an easy instrumentation that makes calls and returns observable. Instrumentations like this are a very fruitful example of how to exploit additional (architectural) knowledge during the learning process in order to boost performance. In this case, they even expand the reach of regular active automata learning to cover all context-free languages and this without increasing the required query complexity. This is possible because the learning process for SPAs can be organized as a simultaneous inference of individual DFAs for each of the involved procedures via projection and expansion that bridge the gap between the global view concerning the SPA and the local views for the individual procedural automata.
There are numerous directions for future work: The treatment of equivalence queries-the drivers of the learning process-is a research topic of its own. They are typically approximated using membership queries and often depend on application-specific heuristics to be effective. A common heuristic involves model checking to generate membership queries for detecting potential counterexamples. This works particularly well if some assumed behavioral properties are known at learning time. That this approach can also be applied for procedural systems has been shown in [11]. Especially in the context of procedural systems, where errors occur locally within procedures, counterexamples generated using fuzzing [18,19,37] look promising to have a positive impact on the performance of the learning process. An alternative method to realize equivalence queries is a change of perspective in the direction of never-stop or lifelong learning [7]. The underlying main idea is to instrument the (potentially in-production) system with a monitoring mechanism that observes and controls its runs on the basis of previously learned hypothesis models. Whenever the monitor recognizes a discrepancy between the current hypothesis model and the system, the corresponding trace is fed to the learner in order to refine the hypothesis model and the corresponding monitor. Subsequently, the life-long learning process continues with the next monitoring phase. This approach, which is characterized by its never-stopping, userdriven counterexample search, has shown promising results in a number of software projects in the past [7,31,40,47]. Lifelong learning comes with a challenge: counterexamples may be excessively long, as they typically arise as unexpected continuations of days-long normal operating. "Classical" AAL algorithms are not able to deal with this characteristic as their complexity depends (a least) linearly on the length of counterexamples. In our current research we are observing that the procedural structure of SPAs with their potential to dynamically optimize access, terminating and return sequences as well as using redundancy-free learners (e.g. TTT [29]) as procedural learners have a lot of potential to tackle these issues and allow for practical context-free life-long learning.
Finally, SPAs provide a very powerful basis for further conceptual extension. A particularly interesting challenge is how far further system properties like inputs/outputs (e.g. Mealy machines) or other data-flow properties (e.g. register automata [22]) can be married with our procedural approach. Currently, the context-free nature of SPAs does not allow individual procedures to account for different execution contexts (compared to e.g. VPAs which, however, pay for this expressiveness with increased complexity). This may be tackled by the concept of call abstraction where different procedures (representing context-dependent behavior) may share the same call symbol. Here, concepts such as alphabet abstraction refinement [23,27] may prove helpful to create a dynamically adjusting learning approach that is as simple as possible but as specific as necessary, combining performance with expressive power. The concept of instrumentation may turn out to be a powerful enabler in this context. Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

Appendix: Proofs
Proof This follows by induction over the length of w.
holds. We distinguish whether u is a call symbol or an internal symbol. If u ∈ Σ call , the SPA will emit x ∈ WM(Σ) consisting of u, possibly a well-matched sub-word, and a matching (to u) return symbol r before reaching the ( v · r , σ ) configuration. We have Since x is well-matched, α will map x to u and continue to process y. Hence If u ∈ Σ int , the SPA will emit u (int-rule) and α will map this symbol to u. We have Hence, α(z) = α(u · y) = u · α(y) = u · v = w.

Lemma 3
Let Σ be an SPA alphabet, S be an SPA over Σ and w ∈ WM(Σ).
∀c ∈ Σ call : ∀ts ∈ TS c : α(ts) ∈ L(P c ) Proof Let c ∈ Σ call and ts ∈ TS c be arbitrary. Since ts is part of an accepted word, we know that there exists a path in the SOS transition system such that According to the definition of call-rules we have w ∈ L(P c ) and by Lemma 2 α(ts) = w. Hence, α(ts) ∈ L(P c ).
Corollary 1 (Membership query expansion) Let Σ be an SPA alphabet and S be an SPA over Σ.
Proof By Definition 10 it is guaranteed that for all w ∈ L(P c ) and a fixed (depending on as) σ ∈ ST (Γ ). What remains to be shown is This follows by induction over the length of w.
-If w = ε we have γ ( w) = ε and the statement follows.
We distinguish whether u is a call symbol or an internal symbol. If u ∈ Σ call we have γ ( u) = u · ts[u] · r . Since ts[u] is a terminating sequence (i.e. part of an accepted word), we know that there exists an x ∈ L(P u ) such that and therefore which is equivalent to since γ ( u) · γ ( v) = γ ( u · v). If u ∈ Σ int we have γ ( u) = u and by application of an int-rule, we have −−→ * ( r , σ ) and the statement directly follows since γ ( u) · γ ( v) = u · γ ( v) = γ ( u · v). Lemma 4 Let Σ be an SPA alphabet and w ∈ WM(Σ) be a rooted word. Then any decomposition of w = u · v can be written as u = c i 1 · w 1 · . . . · c i j · w j v = w j+1 · r · w j+2 · . . . · r with w i ∈ WM(Σ).
Proof This directly follows from the definition of rooted (i.e. well-matched) words, where for every prefix u, β(u) ≥ 0 and for every suffix v, β(v) ≤ 0 holds. β(u) gives the "nesting depth" of w after parsing u and corresponds to the number of unmatched call symbols c i 1 , c i 2 , . . . at that position. Analogously, one can isolate from the suffix v the unmatched return symbols.

Lemma 5
Let Σ be an SPA alphabet and w, w 1 , w 2 ∈ WM(Σ) be well-matched words.
Thus, in both cases the statement holds.