The Learnability of Symbolic Automata

. Symbolic automata (s-FAs) allow transitions to carry predicates over rich alphabet theories, such as linear arithmetic, and therefore extend classic automata to operate over inﬁnite alphabets, such as the set of rational numbers. In this paper, we study the problem of the learn-ability of symbolic automata. First, we present MAT ⇤ , a novel L ⇤ -style algorithm for learning symbolic automata using membership and equivalence queries, which treats the predicates appearing on transitions as their own learnable entities. The main novelty of MAT ⇤ is that it can take as input an algorithm ⇤ for learning predicates in the underlying alphabet theory and it uses ⇤ to infer the predicates appearing on the transitions in the target automaton. Using this idea, MAT ⇤ is able to learn automata operating over alphabets theories in which predicates are e � ciently learnable using membership and equivalence queries. Furthermore, we prove that a necessary condition for e � cient learnability of an s-FA is that predicates in the underlying algebra are also e � ciently learnable using queries and thus settling the learnability of a large class of s-FA instances. We implement MAT ⇤ in an open-source library and show that it can e � ciently learn automata that cannot be learned us-ing existing algorithms and signiﬁcantly outperforms existing automata learning algorithms over large alphabets.


Introduction
In 1987, Dana Angluin showed that finite automata can be learned in polynomial time using membership and equivalence queries [3]. In this learning model, often referred to as a minimally adequate teacher (MAT), the teacher can answer (i ) whether a given string belongs to the target language being learned and (ii ) whether a certain automaton is correct and accepts the target language, and provide a counterexample if the automaton is incorrect. Following this result, her L ⇤ algorithm has been studied extensively [16,15], it has been extended to several variants of finite automata [11,4,19] and has found many applications in program analysis [2,5,6] and program synthesis [23].
Recent work [5,10] developed algorithms which can e ciently learn s-FAs over certain alphabet theories. These algorithms operate using an underlying predicate learning algorithm which can learn partitions of the domain using predicates from counterexamples. While such results give su cient conditions under which s-FAs can be e ciently learned, they do not provide any necessary conditions. More precisely, the following question remains open: For what alphabet theories can s-FAs be e ciently learned?
In this paper, we make significant progress towards answering this question by providing new su cient and necessary conditions for e ciently learning symbolic automata. More specifically, we present M AT ⇤ , a new algorithm for learning s-FAs using membership and equivalence queries. The main novelty of M AT ⇤ is that it can accept as input a MAT learning algorithm ⇤ for predicates in the underlying alphabet theory. Afterwards, M AT ⇤ spawns instances of ⇤ to infer each transition in the target s-FA and e ciently answers membership and equivalence queries performed by ⇤ using the s-FA membership and equivalence oracles. The predicate learning algorithms do not need to learn entire partitions but individual predicates and therefore, M AT ⇤ greatly simplifies the design of learning algorithms for s-FAs by allowing one to reuse existing learning algorithms for the underlying alphabet theory. Moreover, M AT ⇤ allows the underlying predicate learning algorithms to perform both membership and equivalence queries, thus extending the class of e ciently learnable s-FAs to MAT-learnable alphabet theories-e.g., bit-vector predicates expressed as BDDs.
Furthermore, we show that a necessary condition for e ciently learning a symbolic automaton over a Boolean algebra is that the individual predicates in the algebra also have to be e ciently learnable. Moreover, we provide a characterization of the instances which are not e ciently learnable by our algorithm and conjecture that such instances are not learnable by any e cient algorithm.
We implement M AT ⇤ in the open-source symbolicautomata library [1] and evaluate it on 15 regular-expression benchmarks, 1,500 s-FA benchmarks over bit-vector alphabets, and 18 synthetic benchmarks over infinite alphabets. Our results show that M AT ⇤ can e ciently learn automata over di↵erent alphabet theories, some of which cannot be learned using existing algorithms. Moreover, for large finite alphabets, M AT ⇤ significantly outperforms existing automata learning algorithms.
Contributions In summary, our contributions are: -M AT ⇤ , the first algorithm for learning symbolic automata that operate over MAT-learnable alphabet theories-i.e., in which predicates can be learned using only membership and equivalence queries (Section 3).
-A soundness result for M AT ⇤ and new necessary and su cient conditions for the learnability of symbolic automata. Moreover, a characterization of the remaining class for which the learnability is not settled (Section 4).
-A modular implementation of M AT ⇤ in an existing open-source library together with a comprehensive evaluation on existing and new automatalearning benchmarks (Section 6).

Boolean Algebras and Symbolic Automata
In symbolic automata, transitions carry predicates over a decidable Boolean algebra. An e↵ective Boolean algebra A is a tuple (D, , J K, ?, >, _,^, ¬) where D is a set of domain elements; is a set of predicates closed under the Boolean connectives, with ?, > 2 ; J K : Example 1 (Equality Algebra). The equality algebra for an arbitrary set D has predicates formed from Boolean combinations of formulas of the form c. c = a where a 2 D. Formally, is generated from the Boolean closure of Examples of predicates in this algebra include c. c = 5 _ c = 10 and c. ¬(c = 0).

Definition 1 (Symbolic Finite Automata). A symbolic finite automaton
where A is an e↵ective Boolean algebra, called the alphabet; Q is a finite set of states; q init 2 Q is the initial state; F ✓ Q is the set of final states; and ✓ Q ⇥ A ⇥ Q is the transition relation consisting of a finite set of moves or transitions.
Characters are elements of D A , and words or strings are finite sequences of characters, or elements of D ⇤ A . The empty word of length 0 is denoted by ✏. A move ⇢ = (q 1 , ', q 2 ) 2 , also denoted by q 1 ' ! q 2 , is a transition from the source state q 1 to the target state q 2 , where ' is the guard or predicate of the move. For a state q 2 Q, we denote by guard(q) the set of guards for all moves from q. For a character a 2 D A , an a-move of M , denoted q 1 a ! q 2 is a move q 1 ' ! q 2 such that a 2 J'K. An s-FA M is deterministic if, for all transitions (q, ' 1 , q 1 ), (q, ' 2 , q 2 ) 2 , q 1 6 = q 2 ! J' 1^'2 K = ;-i.e., for each state q and character a there is at most one a-move out of q. An s-FA M is complete if, for all q 2 Q, J W (q,'i,qi)2 ' i K = D-i.e., for each state q and character a there exists an amove out of q. Throughout the paper we assume all s-FAs are deterministic and complete, since determinization and completion are always possible [9]. Given an s-FA M = (A, Q, q init , F, ) and a state q 2 Q, we say a word w = a 1 a 2 · · · a k is accepted at state q if, for 1  i  k, there exist moves q i 1 ai ! q i such that q init = q and q k 2 F .
For a deterministic s-FA M and a word w, we denote by M q [w] the state reached in M by w when starting at state q. When q is omitted we assume that execution starts at q init . For a word w = a 1 · · · a k , we use w[i..] = a i · · · a k , w[..i] = a 1 · · · a i , w[i] = a i to denote the su x starting from the i-th position, the prefix up to the i-th position and the character at the i-th position respectively. We use B = {T, F} to denote the Boolean domain. A word w is called an access string for state q 2 Q if M [w] = q. For two states q, p 2 Q, a word w is called a distinguishing string, if exactly one of M q [w] and M p [w] is final.

Learning Model
In this paper, we follow the notation from [16]. A concept is a Boolean function c : D ! B. A concept class C is a set of concepts which is represented using representation class R. By representation class we denote a fixed function from strings to concepts in C. For example, regular expressions, DFAs and NFAs are di↵erent representation classes for the concept class of regular languages.
The learning model under which all learning algorithms in this paper operate is called exact learning from membership and equivalence queries or learning using a Minimal Adequate Teacher (MAT), and was originally introduced by Angluin [3]. In this model, to learn an unknown concept c 2 C, a learning algorithm has access to two types of queries: Membership Query: In a membership query O(x), the input is x 2 D and the query returns the value c(x) of the concept on given input x-i.e., T if x belongs to the concept and F otherwise.
Equivalence Query: In an equivalence query E(H), the input given is a hypothesis (or model) H. The query returns T if for every Otherwise, an input w 2 D is returned such that H(w) 6 = c(w). An algorithm is a learning algorithm for a concept class C if, for any c 2 C, the algorithm terminates with a correct model for c after making a finite number of membership and equivalence queries. In this paper, we will say that a learning algorithm is e cient for a concept class C if it learns any concept c 2 C using a polynomial number of queries on the size of the representation of the target concept in R and the length of the longest counterexample provided to the algorithm.
An e↵ective Boolean algebra A = (D, , J K, ?, >, _,^, ¬) naturally defines the concept class 2 D with representations in of predicates over the domain D. We will say that an algorithm is a learning algorithm for the algebra A to denote a learning algorithm that can e ciently learn predicates from the representation class .

The M AT ⇤ Algorithm
Our learning algorithm, M AT ⇤ , can be viewed as a symbolic version of the TTT algorithm for learning DFAs [15], but without discriminator finalization. The learning algorithm accepts as input a membership oracle O, an equivalence oracle E as well as a learning algorithm ⇤ for the underlying Boolean algebra used in the target s-FA M. The algorithm uses a classification tree [16] to generate a partition of D ⇤ into equivalence classes which represent the states in the target s-FA. Once a tree is obtained, we can use it to determine, for any word w 2 D ⇤ , the state accessed by w in M-i.e., what state the automaton reaches when reading the word w. Then, we build an s-FA model H, using the algebra learning algorithm ⇤ to create models for each transition guard T and utilizing the classification tree in order to implement a membership oracle for ⇤. Once a model is generated, we check for equivalence and, given a counterexample, we either update the classification tree with a new state and a corresponding distinguishing string, or propagate the counterexample into one of the instances of the algebra learning algorithm ⇤. The structure of M AT ⇤ is shown in Algorithm 1. In the rest of the section, we use the s-FA in Figure 1 as a running example for our algorithm.

The Classification Tree
The main data structure used by our learning algorithm is the classification tree (CT) [16]. The classification tree is a tree data structure used to store the access and distinguishing strings for the target s-FA so that all internal nodes of the tree are labelled using a distinguishing string while all leafs are labeled using access strings.

Definition 2.
A classification tree T = (V, L, E) is a binary tree such that: Intuitively, given any internal node v 2 V , any leaf l T reached by following the T-child of v can be distinguished from any leaf l F reached by the F-child using v. In other words, the membership queries for Tree initialization. To initialize the CT data structure, we use a membership query on the empty word ✏. Then, we create a CT with two nodes, a root node labeled with ✏ and one child also labeled with ✏. The child of the root is either a T-child or F-child, according to the result of the O(✏) query. The sift operation. The main operation performed using the classification tree is an operation called sift which allows one to determine, for any input word s, the state reached by s in the target s-FA. The sift(s) operation performs the following steps: 1. Set the current node to be the root node of the tree and let w be the label at the root. Perform a membership query on the word sw. 2. Let b = O(sw). Select the b-child of the current node and repeat step 2 until a leaf is reached. 3. Once a leaf is reached, return the access string with which the leaf is labelled.
Note that, until both children of the root node are added, we will have inputs that may not end up in any leaf node. In these cases our sift operation will return ? and M AT ⇤ will add the queried input as a new leaf in the tree.
Once a classification tree is obtained, we use it to simulate a membership oracle for the underlying algebra learning algorithm ⇤. This oracle is then used to infer models for the transitions and eventually construct an s-FA model. In figure 2 we show the classification tree and the corresponding states learned by the M AT ⇤ algorithm during the execution on our running example from figure 1.

Building an s-FA Model
Assume we are given a classification tree T = (V, L, E). Our next task is to use the tree along with the underlying algebra learning algorithm ⇤ to produce an s-FA model. The main idea is to spawn an instance of the ⇤ algorithm for each potential transition and then use the classification tree to answer membership queries posed by each ⇤ instance. Initially, we define an s-FA we create one state for each leaf of the classification tree T . Finally, for any q 2 Q H , we have that q 2 F H if and only if O(q) = T. Next, we will show how to build the transition relation for H. As mentioned above, our construction is based on the idea of spawning instances of ⇤ for each potential transition of the s-FA and then using the classification tree to decide, for each character, if the character satisfies the guard of the potential transition thus answering membership queries performed by the underlying algebra learner.
Guard inference. To infer the set of guards in the transition relation H , we spawn, for each pair of states (q u , q v ) 2 Q H ⇥ Q H , an instance ⇤ (qu,qv) of the algebra learning algorithm. We answer membership queries to ⇤ (qu,qv) as follows. Let ↵ 2 D be a symbol queried by ⇤ (qu,qv) . Then, we return T as the answer to O(↵) if sift(u↵) = v and F otherwise. Once ⇤ (qu,qv) submits an equivalence query E( ) using a model , we suspend the execution of the algorithm and add the transition (q u , , q v ) in H .
Partition verification. Once all algebra learners have submitted a model through an equivalence query, we have a complete transition relation H . However, at this point there is no guarantee that for each state q the outgoing transitions from q form a partition of the domain D. Therefore, it may be the case that our s-FA model H is in fact non-deterministic and, moreover, that certain symbols do not satisfy any guard. Using such a model in an equivalence query would result in an improper learning algorithm and potential problems in the counterexample processing algorithm in Section 3.3. To mitigate this issue we perform the following checks: Determinism check: For each state q s 2 Q H and each pair of moves (q s , 1 , q u ), (q s , 2 , q v ) 2 H , we verify that J 1^ 2 K = ;. Assume that a character ↵ is found such that ↵ 2 J 1^ 2 K and let m = sift(s↵). Then, it must be the case that the guard of the transition q s ! q m must satisfy ↵. Therefore, we check if m = u and m = v and provide ↵ as a counterexample to ⇤ (qs,qu) and ⇤ (qs,qv) respectively if the corresponding check fails.
. Following the same reasoning as above, we provide h as a counterexample to ⇤ (qu,qv) .
These checks are iterated for each state until no more counterexamples are found. In figure 2 we demonstrate instances of failed determinism and completeness checks while learning our running example from figure 1 along with the corresponding updates on the predicates. For details regarding the equality algebra learner, see section 5.
Optimizing the number of algebra learning instances. Note that in the description above, M AT ⇤ spawns one instance of ⇤ for each possible transition between states in H. To reduce the number of spawned algebra learning instances, we perform the following optimization: For each state q s we initially spawn a single algebra learning instance ⇤ (qs,?) . Let ↵ be the first symbol queried by ⇤ (qs,?) and let u = sift(s↵). We return > as a query answer for ↵ to ⇤ (qs,?) and set the target state for the instance to q u , i.e. we convert the algebra learning instance to ⇤ (qs,qu) . Afterwards, we keep a set R = {q v | v = sift(s )} for all 2 D queried by the di↵erent algebra learning instances and generate new instances only for states q v 2 R for which the guards are not yet inferred. Using this optimization, the total number of generated algebra learning instances never exceeds the number of transitions in the target s-FA.

Processing Counterexamples
For counterexample processing, we adapt the algorithm used in [5] in the setting of M AT ⇤ . In a nutshell, our algorithm works similarly to the classic Rivest-Schapire algorithm [22] and the TTT algorithm [15] for learning DFAs, where a binary search is performed to locate the index in the counterexample where the executions of the model automaton and the target one diverge. However, once this breakpoint index is found, our algorithm performs further analysis to determine if the divergence is caused by an undiscovered state in our model automaton or because the guard predicate that consumes the breakpoint index character is incorrect. . It follows that, there exists an index j, which we will refer to as breakpoint, for which O( j ) 6 = O( j+1 ). The counterexample processing algorithm uses a binary search on the index j to find such a breakpoint. For more information on the correctness of this method we refer the reader to [5,22]. Breakpoint analysis. Once we find an index j such that O( j ) 6 = O( j+1 ) we can conclude that the transition taken in H from H[w[..j]] with the symbol w[j +1] is incorrect. In traditional algorithms for learning DFAs, the sole reason for having an incorrect transition would be that the transition is actually directed to a yet undiscovered state in the target automaton. However, in the symbolic setting we have to explore two di↵erent possibilities. Let q u = H[w[..j]] be the state accessed in H by w[..j], q v = sift(uw[j + 1]) be the result of sifting uw[j + 1] in the classification tree and consider the transition (q u , , q v ) 2 H . We use the guard to determine if the counterexample was caused by an invalid predicate guard or an undiscovered state in the target s-FA. Case 1. Incorrect guard. Assume that w[j + 1] 6 2 J K. Note that, was generated as a model by ⇤ (qu,qv) and therefore, a membership query from ⇤ (qu,qv) for a character ↵ returns T if sift(u↵) = v. Moreover, we have that sift(uw[j + 1]) = v. Therefore, if w[j + 1] 6 2 J K, then w[j + 1] is a counterexample for the learning instance ⇤ (qu,qv) which produced . We proceed to supply ⇤ (qu,qv) with the counterexample w[j + 1], update the corresponding guard and continue to generate a new s-FA model. Case 2. Undiscovered state. Assume w[j + 1] 2 J K. It follows that is behaving as expected on the symbol w[j + 1] based on the current classification tree. We conclude that the state accessed by w[..j + 1] is in fact an undiscovered state in the target s-FA which we have to distinguish from the previously discovered states. Therefore, we proceed to add a new leaf in the tree to access this state. More specifically, we replace the leaf labelled with v with a sub-tree consisting of three nodes: the root is the word w[j + 1..], which is the distinguishing string Finally, we have to take care of one last point: Once we add another state in the classification tree, certain queries that were previously directed to v may be directed to uw[j] once we sift them down in the tree. This change implies that certain previous queries performed by algebra learning instances ⇤ (qs,qv) may be given invalid results and therefore, we can no longer guarantee correctness of the generated predicates. To solve this problem, we terminate all instances ⇤ (qs,qv) for all q s 2 Q H and replace them with fresh instances of the algebra learning algorithm.

Correctness and Completeness of M AT ⇤
Given a learning algorithm ⇤, we use C ⇤ m (n) to denote the number of membership queries and C ⇤ e (n) to denote the number of equivalence queries performed by ⇤ for a target concept with representation size n. In our analysis we will also use the following definitions: Example 2. Consider the s-FA in the left side of figure 3. When we execute the M AT ⇤ algorithm in this s-FA, and after an access string for q 2 is added to the classification tree, the tree will correspond to the s-FA shown on the right, in which the transition from q init is taken over the union of the individual transitions in the target. Certain sequences of answers to equivalence queries can force M AT ⇤ to first learn a correct model of 1 _ 2 _ 3 before revealing a new state in the target s-FA.
We now state the correctness and query complexity of our algorithm. Proof. First, we note that our counterexample processing algorithm only splits a leaf if there exists a valid distinguishing condition separating the two newly generated leafs. Therefore, the number of leafs in the discrimination tree is always at most |Q|. Next, note that each counterexample is processed using a binary search with complexity O(log m) to detect the breakpoint and, afterwards, either a new state is added or a counterexample is dispatched to the corresponding algebra learner.
Each classification tree T = (V, L, E) defines a partition over D ⇤ and, therefore, an s-FA H T . In the worst case, M AT ⇤ will learn H T exactly before a new state in the target s-FA is revealed through an equivalence query. Since H T is the result of merging states in the target s-FA, we conclude that the size of each predicate in H T is at most k. It follows that, for each classification tree T , we can get at most | H T |C ⇤ e (k) counterexamples until a new state is uncovered on the target s-FA. Note here, that our counterexample processing algorithm ensures that each counterexample will be either a valid counterexample for a predicate guard in H T or it will uncover a new state. For each membership query performed by an underlying algebra learner, we have to sift a string in the classification tree which requires at most |Q| membership queries. Therefore, the total number of membership queries performed for each candidate model H is bounded by where m is the size of the longest counterexample so far. The number of equivalence queries is bounded by O(| |C ⇤ e (k)). When a new state is uncovered, we assume that, in the worst case, all the algebra learners will be restarted (this is an overestimation) and therefore, the same process will be repeated at most |Q| times giving us the stated bounds.
Note that the bounds on the number of queries stated in theorem 1 are based on the worst-case assumption that we may have to restart all guard learning instances each time we discover a new state. In practice, we expect these bounds to be closer O(| |C ⇤ m (k)+(| |C ⇤ e (k)+|Q|) log m) membership and O(| |C ⇤ e (k)+ |Q|) equivalence queries.
Minimality of learned s-FA. Since the M AT ⇤ will only add a new state in the s-FA if a distinguishing sequence is found it follows that the total number of states in the s-FA is minimal. Moreover, M AT ⇤ will not modify in any way the predicates returned by the underlying algebra learning instances. Therefore, if the size of the predicates returned by the ⇤ instances is minimal, M AT ⇤ will maintain their minimality.
The following theorem shows that it is indeed not possible to learn s-FAs over a Boolean algebra that is not itself learnable.
Theorem 2. Let ⇤ s-FA be an e cient learning algorithm for the algebra of s-FAs over a Boolean algebra A. Then, the Boolean algebra A is e ciently learnable.
Which s-FAs are e ciently learnable? Theorem 2 shows that e cient lernability of an s-FA requires e cient learnability of the underlying algebra. Moreover, from theorem 1 it follows that e ciently learnability using M AT ⇤ depends on the following property of the underlying algebra: At this point we would like to point out that the above condition arises due to the fact that M AT ⇤ is a congruence-based algorithm which successively computes hypothesis automata based on refining a set of access and distinguishing strings which is a common characteristic among all L ⇤ -based algorithms. Therefore, this limitation of M AT ⇤ is expected to be shared by any other algorithm in the same family. Given the fact that after three decades of research, L ⇤ -based algorithms are the only known, provably e cient algorithms for learning DFAs (and subsequently s-FAs), we expect that expanding the class of learnable s-FAs is a very challenging task.

Learnable Boolean Algebras
We will now describe a number of interesting e↵ective Boolean algebras which are e ciently learnable using membership and equivalence queries. Boolean Algebras over finite domains. Let A be any Boolean Algebra over a finite domain D. Then, any predicate 2 can be learned using |D| membership queries. More specifically, the learning algorithm constructs a predicate accepting all elements in D for which the membership queries return true as = {c | c 2 D^O(c) = T}. Plugging this algebra learning algorithm into our algorithm, we get the TTT learning algorithm for DFAs without discriminator finalization [15]. This simple example demonstrates that algorithms for DFAs can be viewed as special cases of our s-FA learning algorithm for finite domains. Equality Algebra. Consider the equality algebra defined in example 1. Predicates in this algebra of size | | = k can be learned using 2k equivalence queries and no membership queries. Initially, the algorithm outputs the empty set ? as a hypothesis. In any subsequent step, the algorithm keeps a list of the counterexamples obtained so far in two sets P, N ✓ D such that P holds all the positive examples received so far and N holds all the negative examples. Afterwards, the algorithm finds the smallest hypothesis consistent with the counterexamples given. This hypothesis can be found e ciently as follows: It can be easily shown that the algorithm will find a correct hypothesis after at most 2k equivalence queries. Other Algebras. The following Boolean algebras can be e ciently learned using membership and equivalence queries. All these algebras also have approximate fingerprints [3], which means that they are not learnable by equivalence queries alone. Thus, s-FAs over these algebras are not e ciently learnable by previous s-FA learning algorithms [10,5].
BDD algebra. The algebra of ordered binary decision diagrams (OBDDs) is e ciently learnable using a variant of the L ⇤ [21].
Tree automata algebra. Deterministic finite tree automata form an algebra which is also learnable using membership and equivalence queries [12].
s-FA algebra. s-FAs themselves form an e↵ective Boolean algebra and therefore, s-FAs over s-FAs over learnable algebras are also learnable.

Equality Algebra Learning
In this experiment, we use M AT ⇤ to learn s-FAs obtained from 15 regular expressions drawn from 3 domains: (1) Regular expressions used in web application sanitization frameworks such as in the CodeIgniter framework, (2) Regular expressions drawn from popular web application firewall ModSecurity and finally (3) Regular expressions from [17]. For this set of experiments we utilize as alphabet the entire UTF-16 (2 16 characters) and used the equality algebra to represent predicates. Since the alphabet is finite, we also tried learning the same automata using TTT [15], the most e cient algorithm for learning finite automata over finite alphabets.
Results Table 1 presents the results of M AT ⇤ . The Memb and Equiv columns present the number of distinct membership and equivalence queries respectively.
The R-CE column shows how many times a counterexample was reused, while the GU column shows the number of counterexamples that were used to update an underlying predicate (as opposed to adding a new state in the s-FA). Finally, D-CE shows the number of counterexamples provided to an underlying algebra learner due to failed determinism checks, while C-CE shows the number of counterexamples due to failed completeness checks. Note that these counterexamples did not require invoking the equivalence oracle. Given the large alphabet sizes, TTT runs out of memory on all our benchmarks. This is not surprising since the number of queries required by TTT just to construct the correct model for a DFA with 128 = 2 7 states is at least |⌃||Q| log |Q| = 2 16 ⇤ 2 7 ⇤ 7 ⇡ 2 26 . We point out that a corresponding lower bound of ⌦(|Q| log |Q||⌃|) exists for the number of queries any DFA algorithm may perform and therefore, the size of the alphabet provides a fundamental limitation for any such algorithm.
Analysis. First, we observe that the performance of the algorithm is not always monotone in the number of states or transitions of the s-FA. For example, RE.10 requires more than 10x more membership and equivalence queries than RE.7 despite the fact that both the number of states and transitions of RE.10 are smaller. In this case, RE.10 has fewer transitions, but they contain predicates that are harder to learn-e.g., large character classes. Second, the completeness check and the corresponding counterexamples are not only useful to ensure that the generated guards form a partition but also to restore predicates after new states are discovered. Recall that, once we discover (split) a new state, a number of learning instances is discarded. Usually, the newly created learning instances will simply output ? as the initial hypothesis. At this point, completeness counterexamples are used to update the newly created hypothesis accordingly and thus save the M AT ⇤ from having to rerun a large number of equivalence queries. Finally, we point out that the equality algebra learner made no special assumptions on the structure of the predicates such as recognizing character classes which are used in regular expressions and others. We expect that providing such heuristics can greatly improve the performance M AT ⇤ in these benchmarks.

BDD Algebra Learning
In this experiment, we use M AT ⇤ to learn s-FAs over a BDD algebra. We run M AT ⇤ on 1,500 automata obtained by transforming Linear Temporal Logic over finite traces into s-FAs [8]. The formulas have 4 atomic propositions and the height in each BDD used by the s-FAs is four. To learn the underlying BDDs we use M AT ⇤ with the learning algorithm for algebras over finite domains (see section 5) since ordered BDDs can be seen as s-FAs over D = {0, 1}. Figure 4 shows the number of membership (top left) and equivalence (top right) queries performed by M AT ⇤ for s-FAs with di↵erent number of states. For this s-FAs, M AT ⇤ is highly e cient with respect to both the number of membership and equivalence queries, scaling linearly with the number of states. Moreover, we note that the size of the set of transitions | | does not drastically a↵ect the overall performance of the algorithm. This is in agreement with the results presented in the previous section, where we argued that the di culty of the underlying predicates and not their number is the primary factor a↵ecting performance.

s-FA Algebra Learning
In this experiment, we use M AT ⇤ to learn 18 s-FAs over s-FAs, which accept strings of strings. We evaluate the scalability of our algorithms when the diculty of learning the underlying predicates increases. The possible internal s-FAs, which we will use as predicates, operate over the equality algebra and are denoted as I k (where 2  k  17). Each s-FA I k accepts exactly one word a · · · a of length k and has k + 1 states and 2k + 1 transitions. The external s-FAs are denoted as M m,n (where m 2 {5, 10, 15} and 2  n  17). Each s-FA M m,n accepts exactly one word s · · · s of length m where each s is accepted by I n . Analysis. For simplicity, let's assume that we have the s-FA M n,n . Consider a membership query performed by one of the underlying algebra learning instances. Answering the membership query requires sifting a sequence in the classification tree of height at most n which requires O(n) membership queries. Therefore, the number of membership queries required to learn each individual predicate is increased by a factor of O(n). Moreover, for each equivalence query performed by an algebra learning instance, the s-FA learning algorithm has to pinpoint the counterexample to the specific algebra learning instance, a process which requires log m membership queries, where m is the length of the counterexample. Therefore, we conclude that each underlying guard with n states will require a number of membership queries which is of the order of O(n 3 ) at the worst and O(n 2 log n) queries at the best (since the CT has height ⌦(log n)), ignoring the queries required for counterexample processing. Figure 4 shows the number of membership (bottom left) and equivalence (bottom right) queries, which verify the theoretical analysis presented in the previous paragraph. Indeed, we see that in terms of membership queries, we have a very sharp increase in the number of membership queries which is in fact about quadratic in the number of states in the underlying guards. On the other hand, equivalence queries are not a↵ected so drastically, and only increase linearly.

Related Work
Learning finite automata The L ⇤ algorithm proposed by Dana Angluin [3] was the first to introduce the notion of minimally adequate teacher-i.e., learning using membership and equivalence queries-and was also the first for learning finite automata in polynomial time. Following Angluin's result, L ⇤ has been studied extensively [16,15], it has been extended to many other models-e.g., to nondeterministic automata [11] alternating automata [4]-and has found many applications in program analysis [2,5,6] and program synthesis [23]. Since finite automata only operate over finite alphabets, all the automata that can be learned using variants of L ⇤ , can also be learned using M AT ⇤ .
Learning symbolic automata The problem of scaling L ⇤ to large alphabets was initially studied outside the setting of s-FAs using alphabet abstractions [14,13]. The first algorithm for symbolic automata over ordered alphabets was proposed in [19] but the algorithm assumes that the counterexamples provided to the learning algorithm are of minimal length. Argyros et al. [5] proposed the first algorithm for learning symbolic automata in the standard MAT model and also described the algorithm to distinguish counterexamples leading to new states from counterexamples due to invalid predicates which we adapt in M AT ⇤ . Drews and D' Antoni [10] proposed a symbolic extension to the L ⇤ algorithm, gave a general definition of learnability and demonstrated more learnable algebras such as union and product algebras. The algorithms in [5,10,18] are all extensions of L ⇤ and assume the existence of an underlying learning algorithm capable of learning partitions of the domain from counterexamples. M AT ⇤ does not require that the predicate learning algorithms are able to learn partitions, thus allowing to easily plug existing learning algorithms for Boolean algebras. Moreover, M AT ⇤ allows the underlying algebra learning algorithms to perform both equivalence and membership queries, a capability not present in any previous work, thus expanding the class of s-FAs which are can be e ciently learned.
Learning other models Argyros et al. [5] and Botincan et al. [6] presented algorithms for learning restricted families of symbolic transducers-i.e., symbolic automata with outputs. Other algorithms can learn nominal [20] and register automata [7]. In these models, the alphabet is infinite but not structured (i.e., it does not form a Boolean algebra) and characters at di↵erent positions can be compared using binary relations.