Polynomial Identification of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega $$\end{document}-Automata

We study identification in the limit using polynomial time and data for models of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega $$\end{document}-automata. On the negative side we show that non-deterministic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega $$\end{document}-automata (of types Büchi, coBüchi, Parity or Muller) can not be polynomially learned in the limit. On the positive side we show that the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega $$\end{document}-language classes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {IB}$$\end{document}, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {IC}$$\end{document}, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {IP}$$\end{document}, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {IM}$$\end{document} that are defined by deterministic Büchi, coBüchi, parity, and Muller acceptors that are isomorphic to their right-congruence automata (that is, the right congruences of languages in these classes are fully informative) are identifiable in the limit using polynomial time and data. We further show that for these classes a characteristic sample can be constructed in polynomial time.


Introduction
With the growing success of machine learning in efficiently solving a wide spectrum of problems, we are witnessing an increased use of machine learning techniques in formal methods for system design. One thread in recent literature uses general purpose machine learning techniques for obtaining more efficient verification/synthesis algorithms. Another thread, following the automata theoretic approach to verification [33,21] works on developing grammatical inference algorithms for verification and synthesis purposes. Grammatical inference (aka automata learning) refers to the problem of automatically inferring from examples a finite representation (e.g. an automaton, a grammar, or a formula) for an unknown language. The term model learning [31] was coined for the task of learning an automaton model for an unknown system. A large body of works has developed learning techniques for different automata types (e.g. I/O automata [1], register automata [20], symbolic automata [14], ω-automata [7], and program automata [25]) and has shown its usability in a diverse range of tasks. 3 In grammatical inference, the learning algorithm does not learn a language, but rather a finite representation of it. Complexity of learning algorithms may vary greatly by switching representations. For instance, if one wishes to learn regular languages, she may consider representations using deterministic finite automata (DFAs), non-deterministic finite automata (NFAs), regular expressions, linear grammars etc. Since the translation results between two such formalisms are not necessarily polynomial, a polynomial learnability result for one representation does not necessarily imply a polynomial learnability result for another representation. Let C be a class of representations C with a size measure size(C) (e.g. for DFAs the size measure can be the number of states in the minimal automaton). We extend size(·) to the languages recognized by representations in C by defining size(L) to be the minimum of size(C) over all C representing L. In this paper we restrict attention to automata representations, namely, acceptors.
There are various learning paradigms considered in the grammatical inference literature, roughly classified into passive and active. We mention here the two central ones. In passive learning the model of learning from finite data refers to the following problem: given a finite sample T ⊆ Σ * × {0, 1} of labeled words, a learning algorithm A should return an acceptor C that agrees with the sample T . That is, for every (w, l) ∈ T the following holds: w ∈ C iff l = 1 (where C is the language accepted by C). The class C is identifiable in the limit using polynomial time and data if and only if there exists a polynomial time algorithm A that takes as input a labeled sample T and outputs an acceptor C ∈ C that is consistent with T , and A also satisfies the following condition. If L is any language recognized by an automaton from class C, then there exists a labeled sample T L consistent with L of length bounded by a polynomial in size(L), and for any labeled sample T consistent with L such that T L ⊆ T , on input T the algorithm A produces an acceptor C that recognizes L. In this case, T L is termed a characteristic sample for the algorithm A. In some cases (e.g., DFAs) there is also a polynomial time algorithm to compute a characteristic sample for A, given an acceptor C ∈ C.
In active learning the model of query learning [5] assumes the learner communicates with an oracle (sometimes called teacher ) that can answer certain types of queries about the language. The most common type of queries are membership queries (is w ∈ L where L is the unknown language) and equivalence queries (is A = L where A is the current hypothesis for an acceptor recognizing L). Equivalence queries are typically assumed to return a counterexample, i.e. a word in A \ L or in L \ A .
One of the main obstacles in obtaining a polynomial learning algorithm for ω-regular languages is that they do not in general have a Myhill-Nerode characterization; that is, there is no theorem correlating the states of a minimal automaton of some of the common automata types (Büchi, Parity, Muller, etc.) to the equivalence classes of the right congruence of the language. The right congruence relation for an ω-language L relates two finite words x and y iff there is no infinite suffix z differentiating them, that is x ∼ L y (for x, y ∈ Σ * ) iff ∀z ∈ Σ ω . xz ∈ L ⇐⇒ yz ∈ L. In our quest for finding a polynomial query learning algorithm for a subclass of the ω-regular languages, we have studied subclasses of languages for which such a relation holds [4], and termed them fully informative. We use IB, IC, IP, IM to denote the classes of languages that are fully informative of type Büchi, coBüchi, Parity and Muller, respectively. A language L is said to be fully informative of type X for X ∈ {B, C, P, M} if there exists a deterministic automaton of type X which is isomorphic to the automaton derived from ∼ L . While a lot of properties about these classes are now known, in particular that they span the entire hierarchy of ω-regular properties [34], a polynomial learning algorithm for them has not been found yet.
In this paper we show that the classes IB, IC, IP, IM can be identified in the limit using polynomial time and data. We further show that there is a polynomial time algorithm to compute a characteristic sample given an acceptor C ∈ IX. A corollary of this result is that the class of languages accepted by DWPAs (which as mentioned above is polynomially learnable in the query learning setting) also has a polynomial size characteristic sample. On the negative side, we show that the classes NBA, NCA, NPA, NMA of non-deterministic Büchi, coBüchi, Parity and Muller automata, resp., cannot be identified in the limit using polynomial data.

Preliminaries
Automata and Acceptors An automaton is a tuple A = Σ, Q, q ι , δ consisting of a finite totally ordered alphabet Σ of symbols, a finite set Q of states, an initial state q ι ∈ Q, and a transition function δ : Q × Σ → 2 Q . A run of an automaton on a finite word v = a 1 a 2 . . . a n is a sequence of states q 0 , q 1 , . . . , q n such that q 0 = q ι , and for each i ≥ 0, q i+1 ∈ δ(q i , a i+1 ). A run on an infinite word is defined similarly and results in an infinite sequence of states. We say that A is deterministic if |δ(q, a)| ≤ 1 and complete if |δ(q, a)| ≥ 1, for every q ∈ Q and a ∈ Σ. We extend δ to domain Q × Σ * in the usual manner, and abbreviate δ(q, σ) = {q } as δ(q, σ) = q .
By augmenting an automaton with an acceptance condition α, obtaining a tuple Σ, Q, q ι , δ, α , we get an acceptor, a machine that accepts some words and rejects others. An acceptor accepts a word if at least one of the runs on that word is accepting. For finite words the acceptance condition is a set F ⊆ Q and a run on a word v is accepting if it ends in an accepting state, i.e., if δ(q ι , v) contains an element of F . For infinite words, there are various acceptance conditions in the literature; we consider four: Büchi, coBüchi, parity, and Muller. The Büchi and coBüchi acceptance conditions are also a set F ⊆ Q. A run of a Büchi automaton is accepting if it visits F infinitely often. A run of a coBüchi is accepting if it visits F only finitely many times. A parity acceptance condition is a map κ : Q → N assigning each state a natural number termed a color (or priority). A run is accepting if the minimum color visited infinitely often is odd.
A Muller acceptance condition is a set of sets of states α = {F 1 , F 2 , . . . , F k } for some k ∈ N and F i ⊆ Q for i ∈ [1..k]. A run of a Muller automaton is accepting if the set S of states visited infinitely often in the run is a member of α. We use A to denote the set of words accepted by a given acceptor A. We use NBA, NPA, NMA, NCA for non-determinstic Büchi, parity, Muller and coBüchi, automata. We use NBA, NPA, NMA and NCA for the classes of languages they recognize. The first three recognize the full class of ω-regular languages while the forth only a subset of it.
Right congruences An equivalence relation ∼ on Σ * is a right congruence if x ∼ y implies xv ∼ yv for every x, y, v ∈ Σ * . The index of ∼, denoted | ∼ | is the number of equivalence classes of ∼. Given a language L ⊆ Σ * its canonical right congruence ∼ L is defined as follows: x ∼ L y iff ∀z ∈ Σ * we have xz ∈ L ⇐⇒ yz ∈ L. For a word v ∈ Σ * the notation [v] is used for the equivalence class of ∼ in which v resides.
With a right congruence ∼ of finite index one can naturally associate an automaton M ∼ = Σ, Q, q ι , δ as follows: the set of states Q consists of the equivalence classes of ∼. The initial state q ι is the equivalence class [ε]. The transition function δ is defined by δ([u], a) = [ua]. Similarly, given a complete deterministic automaton M = Σ, Q, q ι , δ we can naturally associate with it a right congruence as follows: x ∼ M y iff M reaches the same state when reading x or y. The Myhill-Nerode Theorem states that a language L is regular iff ∼ L is of finite index. Moreover, if L is accepted by a DFA A then ∼ A refines ∼ L . Finally, the index of ∼ L gives the size of the minimal complete DFA for L.
For an ω-language L ⊆ Σ ω , the right congruence ∼ L is defined similarly, by quantifying over ω-words. That is, x ∼ L y iff ∀z ∈ Σ ω we have xz ∈ L ⇐⇒ yz ∈ L. Given a deterministic automaton M we can define ∼ M exactly as for finite words. However, for ω-regular languages, the relation ∼ L does not suffice to obtain a "Myhill-Nerode" characterization. As an example consider the language L = (a + b) * (bba) ω . We have that ∼ L consists of just one equivalence class, since for any x ∈ Σ * and w ∈ Σ ω we have that xw ∈ L iff w has (bba) ω as a suffix. But an ω-acceptor recognizing L obviously needs more than a single state.
The classes IB, IC, IP and IM A language L is in IB (resp., IC, IP, IM) if there exists a deterministic Büchi (resp., coBüchi, parity, Muller) acceptor A such that L = A and there is a 1-to-1 relationship between the states of A and the equivalence classes of ∼ L : if x ∼ L y then x and y reach the same state q in A, and an ω-word z is accepted from q iff xz ∈ L (which holds iff yz ∈ L). These classes are more expressive than one might conjecture, it was shown in [4] that in every class of the infinite Wagner hierarchy [34] there are languages in IM and IP. Moreover, in a small experiment reported in [4], among randomly generated Muller automata, the vast majority turned out to be in IM.
Examples and samples Since we need finite representations of examples, ω-words in our case, we work with ultimately periodic words, that is, words of the form u(v) ω where u ∈ Σ * and v ∈ Σ + . It is known that two regular ω-languages are equivalent iff they agree on the set of ultimately periodic words, so this choice is not limiting. The example u(v) ω is concretely represented by the pair (u, v) of finite strings, and its length is |u| + |v|. A labeled example is a pair (u(v) ω , l), where the label l is either 0 or 1. A sample is a finite set of labeled examples such that no example is assigned two different labels. The length of a sample is the sum of the lengths of the examples that appear in it. A sample T and a language L are consistent with each other if and only if for every labeled example (u(v) ω , l) ∈ T , l = 1 iff u(v) ω ∈ L. A sample and an acceptor are consistent with each other if and only if the sample and the language recognized by the acceptor are consistent with each other. The following results give two useful procedures on examples that are computable in polynomial time.
Let suffixes(u(v) ω ) denote the set of all ω-words that are suffixes of u(v) ω .
Identification in the limit using polynomial time and data We consider the notion of identification in the limit using polynomial time and data. This criterion of learning was introduced by [16], who showed that regular languages of finite strings represented by DFAs are learnable in this sense. We follow a more general definition given by [19]. The definition has two requirements: (1) a learning algorithm A that runs in polynomial time on a set of labeled examples and produces a hypothesis consistent with the examples, and (2) that for every language L in the class, there exists a set T L of labeled examples of size polynomial in a measure of size of L such that on any set of labeled examples containing T L , the algorithm A outputs a hypothesis correct for L. Condition (1) ensures polynomial time, while condition (2) ensures polynomial data. The latter is not a worst-case measure; there could be arbitrarily large finite samples for which A outputs an incorrect hypothesis. However, de la Higuera shows that identifiability in the limit with polynomial time and data is closely related to a model of a learner and a helpful teacher introduced by [17].

Negative Results
We start with negative results. We show that when the representation at hand is non-deterministic, polynomial identification is not feasible.
Theorem 3. The class NBA cannot be identified in the limit using polynomial data.
Proof. The proof follows the idea given in the negative result for learning in the limit NFAs from polynomial data [19]. For any integer M ≥ 2, let p 1 , . . . , p m be The NBA B M accepts the set of all words of the form a k b ω such that k is not a positive multiple of = p 1 · p 2 · · · p m . Note that the size of the shortest ultimately periodic word in a * b ω \ B M is + 1, and thus, to distinguish the language B M from the language a * b ω , a word of at least this size must be provided. Since the number of primes not greater than M is Θ(M/ log M ) and since each prime is of size at least 2 the data must be of size at least 2 Θ(M/ log M ) while the number of states of B M is O(M 2 ).
Since NBAs are a special case of non-deterministic parity automata (NPA) and non-deterministic Muller automata (NMA) it follows that these models too cannot be identified in the limit using polynomial data. Note that indeed the NBA in the proof of Theorem 3 can be regarded as an NPA by setting the color of state b to 1 and the color of all other states to 0. Likewise it can be regarded as an NMA by defining the accepting set as {{b}}.
Corollary 1. The classes NPA and NMA cannot be identified in the limit using polynomial data.
While NBAs are not a special case of non-deterministic coBüchi automata (NCA) it can be shown that NCA as well cannot be identified in the limit from polynomial data, which is in some sense surprising, since NCAs are not more expressive than DCAs, their deterministic counterpart, and accept a very small subclass of the regular ω-languages.
Theorem 4. The class NCA cannot be identified in the limit using polynomial data.
Proof. The proof is almost identical to that of Theorem 3. The only difference is that it considers the automaton C M that takes exactly the same form as B M from that proof but switching accepting and non-accepting states. Since C M clearly accepts the same language as that of B M , with the same number of states, the proof continues exactly the same.

Outline for the positive results
The rest of the paper is devoted to the positive results. To show that a class is identified in the limit using polynomial time and data there are two steps: (i) constructing a sample of words T L of size polynomial in the given acceptor M for the language L at hand, the so called, characteristic sample, and (ii) providing a learning algorithm that for every given sample T returns an acceptor consistent with that sample, and in addition for any sample T that subsumes T L returns an acceptor that exactly recognizes L.
Since the construction of the characteristic sample is simpler we start with that. We show that the classes IB, IC, IP and IM have characteristic samples of size polynomial in the number of states of the acceptor, and that the characteristic sample can be constructed in polynomial time. The definition of an acceptor is composed of two steps: (a) the definition of the automaton and (b) the definition of the acceptance condition. Some words are put in the sample to help retrieving the automaton and some to help retrieving the acceptance condition. We view the characteristic sample as a union of two parts T Aut (for retrieving the automaton) and T Acc (for retrieving the acceptance condition). The learning algorithm first constructs the automaton, then retrieves the acceptance condition.
In Section 5 we discuss the construction of T Aut which is common to all the classes we consider, as they all are isomorphic to the automaton of the right congruence. In Section 6 we show how an algorithm can retrieve the automaton using the labeled words in T Aut . In Section 7 we discuss the construction of T Acc that regards the acceptance condition of the DPA. This part is the most involved one. We first associate with a DPA a canonical forest of its strongly connected components. From this canonical forest we build the T Acc part of the characteristic sample. In Section 8 we show a learning algorithm that can retrieve in polynomial time the acceptance condition of the DPA, from labeled examples in T Acc . This implies that IP (as well as its special cases IB and IC) can be learned in the limit from polynomial time and data. In Section 9 we show that the class IM can also be learned in the limit from polynomial time and data.

The characteristic sample for the automaton
In this section we show how to construct the T Aut part of the sample. We first show that any two states that are distinguishable in the automaton, are distinguishable by words of length polynomial in the number of states.

Polynomial construction of short distinguishing words
Let M be an acceptor in one of the classes IB, IC, IP or IM with states Q over alphabet Σ. If M is in one of the first three classes we use max{|Σ|, |Q|} for its size measure. If M ∈ IM we use max{|Σ|, |Q|, m} for its size measure where m is the number of sets in the acceptance condition α. We say that states q 1 and q 2 of M are distinguishable if there exists a word z ∈ Σ ω that is accepted from one but not the other (and that z is a distinguishing word ). First we show that any two distinguishable states of M are distinguishable by an ultimately periodic word of size polynomial in M. Then we show how to use these words to construct the T Aut part of the characteristic sample.
Proposition 5. If two states of a DMA, DPA, DBA or DCA of n states are distinguishable, then they are distinguishable by an ultimately periodic ω-word of length bounded by n 2 + n 4 .
Proof. We prove that for a DMA M of n states, if two distinct states q 1 and q 2 are distinguishable, then they are distinguishable by an ultimately periodic ω-word of length bounded by n 2 +n 4 . Since any DPA, DBA or DCA is equivalent to an isomorphic DMA, the above result holds also for DPAs, DBAs and DCAs.
Because q 1 and q 2 are distinguishable, there exists an ultimately periodic ω-word x(y) ω that is accepted from exactly one of the two states. For each nonnegative integer k and i = 1, 2, let q i (k) be the state visited after k symbols of x(y) ω have been read, starting with state q i . Also, let C i be the set of states visited infinitely often by the sequence q i (k), which determines the acceptance or rejection of x(y) ω from q i . The sequence of pairs (q 1 (k), q 2 (k)) for k = 0, 1, . . . takes on at most n 2 different values. Let C be the set of pairs visited infinitely often by this sequence. The two projections π 1 (C) and π 2 (C) are C 1 and C 2 .
Let be the minimum value for which (q 1 (k), q 2 (k)) visits only pairs in C for all k ≥ . Let x be the prefix of x(y) ω consisting of symbols. By removing symbols between repeated pairs (q 1 (k), q 2 (k)) from x we obtain a string u of length at most n 2 that reaches the pair (q 1 ( ), q 2 ( )) from (q 1 (0), q 2 (0)). Let m be the minimum value for which (q 1 (k), q 2 (k)) for ≤ k ≤ m visits all the pairs of C and returns to (q 1 ( ), q 2 ( )), and let y be the string from symbol to m − 1 of x(y) ω . Distinguishing a subsequence of pairs that visits each element of C once, we can remove from y sequences of symbols between repeated pairs that do not include a distinguished pair between them. Thus we obtain a string v of length at most |C|n 2 , that starts at (q 1 ( ), q 2 ( )), visits all the distinguished pairs and returns to the starting pair. Since |C| ≤ n 2 , the length of u(v) ω is at most n 2 + n 4 . Also, since the set of states visited infinitely often on input u(v) ω from q i is C i we have that u(v) ω is accepted from exactly one of q 1 and q 2 .
For DPAs as well as DMAs there is a polynomial time algorithm to determine whether two states are distinguishable and to find a distinguishing ω-word u(v) ω if they are. This result relies on a polynomial time algorithm to test the equivalence of two DPAs or two DMAs and return an example u(v) ω on which they differ if not [9]. Since DBA and DCA are special cases of a DPA, a polynomial construction of a distinguishing word applies to them as well.

Constructing the characteristic sample for the automaton
We now show how to construct the T Aut part of the characteristic sample, given an acceptor M in one of the classes IM, IP, IB or IC. Let n be the number of states of M. We may assume that every state of M is reachable from the initial state q ι . The algorithm constructs a set S of n access strings by breadth-first search in the transition graph of M such that S is prefix-closed and contains exactly one lexicographically least string of shortest possible length reaching each state of M from the initial state. Using Proposition 5, the algorithm may also construct a set E of at most n 2 distinguishing experiments that contains for each pair q 1 and q 2 of distinct states of M, an ω-word u(v) ω of length at most n 2 + n 4 that is accepted from exactly one of the states q 1 and q 2 .
Part Proof. The states of M reached from the initial state by the access strings in S must all be distinct, because for any pair of different strings s 1 , s 2 ∈ S, there exists a word u(v) ω ∈ E such that s 1 · u(v) ω and s 2 · u(v) ω have different labels in T Aut . Thus M must have at least n distinct states.
Assume that M has exactly n states. Given the state q of M reached by some s ∈ S and a symbol σ ∈ Σ, the labeled examples s · σ · u(v) ω in T Aut for all u(v) ω ∈ E uniquely determine which string s ∈ S corresponds to the state reached in M from q on input symbol σ. Thus the transition graph of M is isomorphic to the transition graph of M.

Learning the automaton
Let L denote the language to be learned, and M denote an acceptor of n states that is isomorphic to its right congruence automaton and recognizes L. Let the input sample of labeled examples be T . We now describe a learning algorithm A that makes use of the information in the given sample T to construct an automaton. If T subsumes T Aut the returned automaton will be isomorphic to the acceptor M.
From the sample T , the algorithm constructs as follows a set E of strings that serve as experiments used to distinguish states. For each labeled example (u(v) ω , l) in T , all of the elements of suffixes(u(v) ω ) are placed in E. Thus if the sample T includes T Aut , then for any pair of states of M the set E includes an experiment that distinguishes them.
Starting with the empty string ε, the algorithm attempts to build up a prefixclosed set S of finite strings that reach different states of M from the initial state.
Initially, S 1 = {ε}. After S k has been constructed, the algorithm attempts to determine, for each s ∈ S k and each symbol σ ∈ Σ in the ordering defined on Σ, whether s · σ reaches the same state as some string already in S k or a new state. If for each string s in S k , there exists some u(v) ω ∈ E such that the sample T has different labels for s · σ · u(v) ω and s · u(v) ω , then this is evidence that s · σ reaches a new state, and S k+1 is set to S k ∪ {s · σ}. If no such pair s and σ is found, then the final set S is S k . Because M has only n states, this case is reached with k ≤ n. If the sample T subsumes T Aut then this process will discover exactly the strings reaching all n states of M used in the construction of T Aut ; otherwise, it may terminate early.
In the second phase, the algorithm uses the strings in S as names for states and constructs a transition function δ using S and E. For each s ∈ S and σ ∈ Σ, we know that there is at least one s ∈ S such that there is no u(v) ω ∈ E for which s · σ · u(v) ω and s · u(v) ω have different labels in T (possibly because one or more of these examples are not in T at all.) The algorithm selects one such s and defines δ (s, σ) = s . If the strings in S actually reach all the states of M and the choice of s is unique in each case, then δ will be isomorphic to the transition function of M. This will be the case if the sample T includes T Aut because then among the elements of E will be experiments that distinguish any pair of states of M; otherwise, δ may not be correct.

Characteristic sample for a DPA
The construction of T Acc , the part of the characteristic sample used for retrieving the accepting condition of a DPA, builds on the construction of a forest of SCCs associated with a given DPA, which we term the canonical forest. Its properties and its construction are described next.

Constructing the canonical forest of a DPA
We start with some definition and simple claims.Let P = (Σ, Q, q ι , δ, κ) be a deterministic parity acceptor (DPA). A set of states C ⊆ Q is a strongly connected component (SCC) if and only if C is nonempty and for every q 1 , q 2 ∈ C, there exists a nonempty string v ∈ Σ + such that δ(q 1 , v) = q 2 and for all u v, δ(q 1 , u) ∈ C. Note that an SCC need not be maximal, and that a singleton {q} is an SCC if and only if the state q has a self-loop, that is, δ(q, σ) = q for some σ ∈ Σ. For any ω-word w, the set C of states visited infinitely often in the run of P on input w is an SCC of P. Claim 7. If C 1 and C 2 are SCCs of P and C 1 ∩ C 2 = ∅, then C 1 ∪ C 2 is also an SCC of P.
If P is a DPA and R ⊆ Q is any set of states, define SCCs(R) to be the set of all C such that C ⊆ R and C is an SCC of P. Also define maxSCCs(R) to be the maximal elements of SCCs(R) with respect to the subset ordering. Claim 8. If P is a DPA and R ⊆ Q is any set of states, then the elements of maxSCCs(R) are pairwise disjoint, and every set C ∈ SCCs(R) is a subset of exactly one element of maxSCCs(R).
If P is a DPA, we extend its coloring function κ to any nonempty set R of states by κ(R) = min{κ(q) | q ∈ R}. We define the parity of R to be 1 if κ(R) is odd, and 0 otherwise. For an ω-word w, if the SCC C is the set of states visited infinitely often in the run of P on w, then w is accepted by P iff the parity of C is 1. Note that the union of two sets of parity b is also of parity b. For any set of states R ⊆ Q, we define minStates(R) to be the set of states q ∈ R such that κ(q) = κ(R), that is, the states of R that are assigned the minimum color among all states of R.
The Canonical Forest Using these definitions we can show that there exists a forest associated with a DPA that has the following interesting properties. We provide an example for a canonical forest for a given DPA at the end of the current subsection.
Theorem 9. Let P = (Σ, Q, q 0 , δ, κ) be a DPA. There exists a canonical forest F * (P) that is unique up to isomorphism and has the following properties.
1. There are at most |Q| nodes in F * (P), each one a distinct SCC of P. Proof. The root nodes of F * (P) are the elements of maxSCCs(Q) and are SCCs that are pairwise disjoint, by Claim 8. Let C be one of them, and assume its parity is b. Let T be the set of SCCs that are subsets of C and of parity 1 − b.
If T = ∅ then C has no children and is a leaf of F * (P). Otherwise, the children of C are the maximal elements of T with respect to the subset ordering. The children of C must be pairwise disjoint because if they share a state, then their union is an SCC contained in C of parity 1 − b and is a proper superset of at least one of them, violating maximality. No child of C can contain an element of minStates(C) because otherwise the parity of the child would be b. Thus the union of the children of C must be a proper subset of C. These conditions imply that there are at most |Q| nodes in the forest, and that it is unique up to isomorphism. Let D be any SCC of P. Then D ∈ SCCs(Q), so by Claim 8, because the roots of F * (P) are the elements of maxSCCs(Q), there is a unique root node C 0 such that D ⊆ C 0 . Suppose the parity of C 0 is b. If D is not a subset of any of the children of C 0 , then it cannot have parity 1 − b, so the choice C = C 0 satisfies the required condition. If, however, D is a subset of some child C 1 of C 0 , then because the children of C 0 are pairwise disjoint, C 1 is the only child of C 0 that contains D. Again, if D is not a subset of any of the children of C 1 then D and C 1 must have the same parity, and the choice C = C 1 satisfies the condition. Otherwise, we continue down the tree rooted at C 0 until a node C is found that satisfies the condition. Note that if we arrive at a leaf C k , then D is not a subset of any of the children of C k (there are none) and D must have the same parity as C k because otherwise C k would have at least one child.
The Canonical Coloring The canonical forest F * (P) allows us to define a canonical coloring κ * for P, as follows. The states in (Q \ maxSCCs(Q)) are not contained in any SCC of P and do not affect the acceptance or rejection of any ω-word. For definiteness, we assign them κ * (q) = 0. For each node C of F * (P), we define ∆(C) to be the set of states of C that are not contained in the union of the children of C. For a root node C of parity b, we define κ * (q) = b for all q ∈ ∆(C). Let C be an arbitrary node of F * (P). If the states of ∆(C) have been assigned color k by κ * and D is a child of C, then the states of ∆(D) are assigned color k + 1 by κ * . We observe that if q 1 ∈ ∆(C) and q 2 is in a child of C, then κ * (q 1 ) < κ * (q 2 ), and κ * (q 1 ) is of the same parity as C.
Theorem 10. Let P = (Σ, Q, q 0 , δ, κ) be a DPA, and P be P with the canonical coloring κ * for P in place of κ. Then P and P recognize the same ω-language.
Proof. Let w be an ω-word and let D be the SCC consisting of the states visited infinitely often in the run of P (and also of P ) on input w. Let C be the unique node of F * (P) such that D is a subset of C and is not a subset of any of the children of C. Thus D contains at least one q ∈ ∆(C). In P the parity of D is the same as the parity of C, which is the same as the parity of κ * (q), which is equal to the parity of D in P . Thus either both P and P accept w or both reject w.
Computing the Canonical Forest We now show that, given a DPA P = (Σ, Q, q 0 , δ, κ), we can compute the canonical forest of P in polynomial time. We first define a (possibly non-canonical) forest F κ (P) using the given coloring κ. The root nodes are the elements of maxSCCs(Q), the set of all maximal SCCs of P. Once we have defined a node C of the forest, the children are the elements of the set maxSCCs(C \ minStates(C)), that is, the maximal SCCs contained in C with the set of states of minimum color removed. If this set is empty, the node has no children and is a leaf. Note that in contrast to the case of the canonical forest, in F κ (P) the children of a node are not constrained to be of parity opposite to that of the parent.
By construction each node in the forest F κ (P) is an SCC of P. If D is a descendant of C in the forest, then D is a proper subset of C, and κ(C) < κ(D). Because the roots are pairwise disjoint and the children of any node are pairwise disjoint, the sets minStates(C) for nodes C in the forest are pairwise disjoint and  nonempty, so there are at most |Q| nodes. Because a linear time algorithm for computing strongly connected components can be used to compute the children of a node, the forest F κ (P) may be computed in polynomial time in the size of the given DPA P.
To obtain the canonical forest F * (P) from the possibly non-canonical forest F κ (P), we may repeatedly merge pairs of adjacent nodes of the same parity until every pair of adjacent nodes are of different parity. That is, if C is a node of parity b and D is a child of C of parity b, then D ⊆ C, and we merge D into C by deleting D and making all the children of D direct children of C. Repeating this operation until there are no parent/child pairs of equal parity yields the canonical forest F * (P). This computation can be done in polynomial time.
Note that to obtain a canonical forest for a given DBA (resp., DCA) we can simply first color states in F by 1 (resp. 0) and in Q \ F by 2 (resp., 1) and then compute the canonical forest for the resulting DPA. In both cases the canonical forest will be of depth at most two, since in DBA an accepting SCC cannot be subsumed by a rejecting SCC (and vice versa in DCA). Figure 2(a) shows the transition graph of an example DPA P with states a through m, labeled by the colors assigned by κ. There is a directed edge from state q 1 to state q 2 if there exists a symbol σ ∈ Σ such that δ(q 1 , σ) = q 2 . Figure 2(b) shows the non-canonical SCC forest F κ (P) of P, with the nodes labeled by their parities. Figure 2(c) shows the canonical SCC forest F * (P) of P, with the nodes labeled by their parities. Figure 2(d) shows the transition graph of P re-colored using the canonical coloring κ * .

Constructing the characteristic sample for a DPA
We can now construct T Acc , the second part of the characteristic sample for a DPA P. The sample T Acc consists of one example u(v) ω for each node C of the canonical forest F * (P), where u is a string that reaches a state q in C from the initial state q 0 , and v is a nonempty string that, starting from q, visits every state of C and no state outside of C and returns to q. The length of the example u(v) ω can be taken to be bounded by n + n 2 . The example u(v) ω is labeled 1 if it is accepted by P and otherwise is labeled 0. Then T Acc contains at most

The learning algorithm for a DPA
We can now describe the learning algorithm A that makes use of the information in T L . Similar to Gold's construction, the algorithm optimistically assumes that the sample includes a characteristic sample, and if that assumption fails to produce an acceptor consistent with the sample, the algorithm defaults to producing a table-lookup acceptor to ensure that its hypothesis is consistent with the sample. The algorithm we describe is sufficient to establish the theoretical results, but for practical applications much more effort should be expended to find good heuristic choices to avoid defaulting too easily.
Let L denote the language to be learned, and P denote a DPA of n states that is isomorphic to its right congruence automaton and recognizes L. The first and second phases of the algorithm are as described in Section 6: in the first phase the algorithm builds the set S of states of the automaton, and in the second step it builds the transition relation δ . In the third phase, the acceptance (namely the coloring) is determined. In this phase, the algorithm may default to returning the table-lookup DPA for T . We first explain the construction of the table-lookup DPA then describe the third phase. A table-lookup DPA A table-lookup DPA for a given sample T is constructed by finding the shortest prefix of each example u(v) ω in T that distinguishes it from all other examples in T and placing these prefixes in a trie-like structure. At each leaf of the trie is a structure accepting (or rejecting, depending on the label of the example) the appropriate suffix of the unique example that arrives at that leaf. By Claim 1, this DPA can be constructed in time polynomial in the length of the sample T . Note that this construction is easily modified to give a DBA, DCA or DMA instead of a DPA. As an example, for the sample T = {(a(b) ω , 1), ((ab) ω , 1), (ab(baa) ω , 0)}, the corresponding prefixes are abbb, aba, and abba, and the table-lookup DPA for T is shown in Figure 3, with states labeled by colors 0 and 1.
Determining the coloring In the third phase, the algorithm attempts to define a coloring of the states in S. The algorithm constructs the set Z of all subsets C of S such that for some labeled example (u(v) ω , l) in T , the subset C is the set of elements of S that are visited infinitely often in the run on input u(v) ω starting at ε using the transition function δ . If in this process two examples with different labels are found to yield the same set C, the learning algorithm defaults to the table-lookup DPA for T . Otherwise, each set C in Z is associated with the label of the example(s) that yield C. The set Z is partially ordered by the subset relation. The learning algorithm then attempts to construct a forest F with nodes that are elements of Z, corresponding to the canonical forest of P.
Initially, F contains as roots all the maximal elements of Z. If these are not pairwise disjoint, it defaults to the table-lookup DPA for T . Otherwise, for each unprocessed element C in F , it computes the set of all D ∈ Z such that D ⊆ C, D has the opposite label to C, and D is maximal with these properties, and makes D a child of C. When all the children of a node C have been determined, the algorithm checks two conditions: (1) that the children of C are pairwise disjoint, and (2) there is at least one s ∈ C that is not in any child of C. If either of these conditions fail, then it defaults to the table-lookup DPA for T . If both conditions are satisfied, then the node C is marked as processed. When there are no more unprocessed nodes, the construction of F is complete. Note that F can have at most n nodes, because S has at most n elements. When the construction of F completes, for each node C in F let ∆(C) denote the elements of C that do not appear in any of its children. Then the learning algorithm assigns colors to the elements of S starting from the roots of F , as follows. If C is a root with label l, then κ (s) = l for all s ∈ ∆(C). If the elements of ∆(C) have been assigned color k and D is a child of C, then κ (s) = k + 1 for all s ∈ ∆(D). When this process is complete, any uncolored strings s are assigned κ (s) = 0. If the resulting DPA P is consistent with the sample T , the learning algorithm outputs P and halts. If the sample T includes both T Aut (to specify the automaton) and T Acc (to specify the coloring), then F will be isomorphic to the canonical forest F * (P) and κ will correspond to the canonical coloring κ * , and P will recognize the target language L.
If the process described above does not result in a DPA that is consistent with the sample T , then the algorithm defaults to constructing the table-lookup DPA for T .
The learning algorithm also works for the classes IB and IC: In the case of IB and IC we need to define a set F rather than a coloring κ. After constructing the forest, the set F is determined to contain the states in the root nodes that are not in the leaves. Thus we have the following. Theorem 11. The classes IB, IC and IP are identifiable in the limit using polynomial time and data. Moreover, characteristic samples can be computed in polynomial time.
A corollary of Theorem 11 is that the class of languages recognized by derministic weak parity acceptors (DWPA) which was shown to be polynomially learnable using membership and equivalence queries in [24] is identified in the limit using polynomial time and data. This class (which is equivalent to the intersection of classes DBA ∩ DCA) was shown to be a subset of IM in [30], and to be a subset of IP in [4].
Corollary 2. The class DWPA is identifiable in the limit using polynomial time and data. Moreover, characteristic samples can be computed in polynomial time. 9 The sample T Acc and the learning algorithm for a DMA The above results can be extended to the class IM. Recall that we define the size measure for a DMA to be max{|Σ|, |Q|, m}, where m is the number of sets in the acceptance condition. For the characteristic sample T L , T Aut remains the same, but T Acc contains for each accepting set C, an example u(v) ω for which C is the set of states visited infinitely often. In the learning algorithm, the construction of the transition function remains the same. Instead of attempting to construct a coloring function, the learning algorithm finds for each labeled example (u(v) ω , 1) ∈ T , the set C of states s that are visited infinitely often on input u(v) ω starting from ε and using the transition function δ , and adds C to the acceptance condition. If the construction does not result in a DMA consistent with T , then it defaults to producing a table-lookup DMA for T . Because in addition, as stated in Section 5.1, a characteristic samples can be computed in polynomial time, we have the following.
Theorem 12. The class IM is identifiable in the limit using polynomial time and data. Moreover, a characteristic sample can be computed in polynomial time.

Discussion
We have shown that the non-deterministic classes of ω-automata NBA, NPA, NMA and NCA cannot be identified in the limit using polynomial data. A negative result regarding query learning of the first three classes was recently obtained in [3]. That result makes a plausible assumption of cryptographic hardness, which is not required here. On the positive side we have shown that the classes IB, IC, IP and IM can be identified in the limit using polynomial time and data. And moreover, a characteristic sample can be constructed in polynomial time. The construction builds on the definition of a canonical forest for a DPA which may be of use in other contexts as well. The question whether the deterministic classes DBA, DPA, DMA and DCA can be polynomially learned in the limit remains open.