Abstract
We develop an algebraic theory for languages of data words. We prove that, under certain conditions, a language of data words is definable in firstorder logic if and only if its syntactic monoid is aperiodic.
Keywords
Monoids Nominal sets Data words1 Introduction
This paper is an attempt to combine two fields.
For instance, the language “words where there exists a position with label a” is defined by the firstorder logic formula (this example does not even use the order on positions <, which is also allowed in general) The syntactic monoid of this language is isomorphic to {0,1} with multiplication, where 0 corresponds to the words that satisfy the formula, and 1 to the words that do not. Clearly, this monoid does not contain any nontrivial group. There are many results similar to the theorem above, each one providing a connection between seemingly unrelated concepts of logic and algebra, see e.g. the book [13].A regular language of finite words is definable in firstorder logic if and only if its syntactic monoid is aperiodic.
The motivation of this paper is to combine the two fields, and prove a theorem that is analogous to the SchützenbergerMcNaughtonPapert characterization theorem, but talks about languages of data words. If we want an analogue, we need to choose the notion of regular language for data words, the definition of firstorder logic for data words, and then find the property corresponding to aperiodicity.
For the sake of simplicity, we assume that the alphabet is of the form \(A = \varSigma\times\mathbb{D}\), where Σ is a finite set, whose elements are called labels, and where \(\mathbb{D}\) is the infinite set of data values. Later, we will study a more general and abstract definition of an alphabet, which will cover other cases, e.g. the set of unordered pairs of data values.
As for the notion of regular language, we use the monoid approach. For languages of data words, we use the syntactic monoid, defined in the same way as it is for words over finite alphabets. That is, elements of the syntactic monoid are equivalence classes of the twosided MyhillNerode congruence. When the alphabet is infinite, the syntactic monoid is infinite for almost every language of data words. For instance, in the case of the running example language L _{ dd }, two words w,w′∈D ^{∗} are equivalent if and only if: either both belong to L _{ dd }, or both have the same first and last letters. Since there are infinitely many letters, there are infinitely many equivalence classes.
The contribution of this paper is a study of orbitfinite monoids. We develop the algebraic theory of orbitfinite monoids, and show that it resembles the theory of finite monoids. The main result of the paper is Theorem 9.1, which shows that the SchützenbergerMcNaughtonPapert characterization also holds for languages of data words with orbitfinite syntactic monoids.
Nominal Sets
The key idea of using permutations of data values to act on the syntactic monoid of a language, or more generally, to act on any monoid, goes back to nominal sets. The theory of nominal sets originates from the work of Frankel in 1922, further developed by Mostowski in the 1930s. At that time, nominal sets were used to prove independence of the axiom of choice, and other axioms. In Computer Science, they have been rediscovered by Gabbay and Pitts in [7], as an elegant formalism for modeling name binding. Since then, nominal sets have become a lively topic in semantics. They were also independently rediscovered by the concurrency community, as a basis for syntaxfree models of namepassing process calculi, see [11, 12].
The definition of monoid used in this paper is the same thing as a monoid in the category of nominal sets. That is why we call it a nominal monoid. The restriction on orbitfiniteness is what corresponds to finiteness in the category of nominal sets (more precisely, a nominal set is orbitfinite if and only if it is a finitely presentable object). In other words, the theory of syntactic monoids for languages of data words turns out to be the same theory as the theory of finite monoids in the category of nominal sets.
Other Related Work
The idea to present effective characterizations of logics on data words was proposed by Benedikt, Ley and Puppis. In [1], they show that definability in firstorder logic is undecidable if the input is a nondeterministic register automaton. Also, they provide some decidable characterizations, including a characterization of firstorder logic with local data comparisons within the class of languages recognized by deterministic register automata. This result is incomparable to the one in this paper, because we characterize a different logic (data comparisons are not necessarily local), and inside a weaker class of recognizers (nominal monoids have less expressive power than deterministic register automata).
There are two papers on algebra for languages over infinite alphabets. One approach is described by Bouyer, Petit and Thérien [5]. Their monoids are different from ours in the following ways: the definition of a monoid explicitly talks about registers, there is no syntactic monoid, monoids have undecidable emptiness (not mention questions such as aperiodicity or firstorder definability). Our setting is closer to the approach of Francez and Kaminski from [6], but the latter talks about automata and not monoids, and does not study the connection with logics.
Progress Beyond the Conference Version
This article is a journal version of a conference paper [2]. The main difference is that the conference version was written without examining the connection to nominal sets; this shortcoming is fixed here. The paper is written entirely using the language of nominal sets; and theorems from the nominal literature are cited instead of reproved.
Using the abstract language of nominal sets requires more abstract and general definitions. An important example is our abstract notion of firstorder logic and MSO logic for words in nominal sets. In the special case of data words with alphabets of the form \(\varSigma\times \mathbb{D}\), the abstract logic coincides with the standard notion of logic for data words.
Unlike the conference version, we use the more general notion of nominal sets, from [3], which allows more structure on data values, such as order instead of equality only.
2 Nominal Sets
In this section we describe nominal sets. The definition is based on [3], with slight modifications.
Group Action
Nominal Symmetry

The set \(\mathbb{D}\) is empty, and G has only the identity element. We call this the classical symmetry.

The set \(\mathbb{D}\) is a countable set, say the natural numbers. The group G consists of all bijections on \(\mathbb{D}\). We call this the equality symmetry.

The set \(\mathbb{D}=\mathbb{Q}\) is the set of rational numbers, and G is the set of monotone bijections. We call this total order symmetry.
Nominal Set
Definition 2.1
Consider a data symmetry \((\mathbb{D},G)\), and a finite set \(C \subseteq \mathbb{D}\) of data values. A nominal set with support C is a G _{ C }set where every element has some finite support.
Nominal sets and nominal functions form a category, which is parametrized by the data symmetry \((\mathbb{D},G)\). We say a nominal set is equivariant if it has empty support. A nominal function is called equivariant if the set E in the definition is empty. In an equivariant function, the domain and codomain must be equivariant sets.
Observe that nominal sets in the classical symmetry are simply sets (equipped with the only possible action), and nominal functions are simply functions. Therefore the classical symmetry corresponds to classical set theory, without data values.
Nominal Subsets
Orbit Finite Sets
Let X be a nominal set with support C. We say that X is orbitfinite with respect to C if it has finitely many orbits under the action of G _{ C }.
Assumptions on the Data Symmetry

Least supports. Every element of a nominal set has a finite support that is least with respect to inclusion. This assumption is very useful in proofs. Intuitively it says that an element of a nominal set can be canonically represented by identifying its least support.

Cartesian products. Orbitfinite sets are preserved under Cartesian products. The usefulness of this assumption should be clear: it is very difficult to get any work done without pairs and other tuples.

Orbit refinement. Let C⊆D be supports of a nominal set X. If X is orbitfinite with respect to C, then it is also orbitfinite with respect to D. The reason for this assumption is that it establishes a robust notion of “finite” set, namely the notion of an orbitfinite set, see Lemma 2.2 below.
These assumptions are satisfied by the classical, equality and total order symmetries. This was shown in [3] for least supports and Cartesian products, and in [4] for orbit refinement. For other examples of data symmetries that satisfy these properties, see [3, 4].
An example of a data symmetry that violates all three assumptions is when the data values are integers \(\mathbb{Z}\), and the group consists of translations x↦x+y. The set \(\mathbb{Z}\) of data values itself is a counterexample to all three assumptions. It does not have least supports, since \(0 \in\mathbb{Z}\), or any other number, is supported by every singleton, but not by the empty set. The set \(\mathbb{Z}\) is orbitfinite, since it has just one orbit, but \(\mathbb{Z}\times \mathbb{Z}\) has infinitely many orbits, which are diagonals of the form \(\{(x,y+x) : y \in\mathbb{Z}\}\). Finally, \(\mathbb{Z}\) is orbitfinite with respect to the empty set of data values, but it shatters into singleton orbits with respect to any nonempty set of data values.
As mentioned above, a corollary of orbit refinement is we can simply say that a set is orbitfinite, without saying that it is finite with respect to some C, as stated by the following lemma.
Lemma 2.2
Let C,D be supports of a nominal set X. Then X is orbitfinite with respect to C if and only if it is orbitfinite with respect to D.
Proof
Suppose that X is orbitfinite with respect to C. By orbit refinement, it is orbitfinite with respect to C∪D⊇C. Because the group G _{ C∪D } is a subgroup of G _{ D }, then orbitequivalence with respect to G _{ C∪D } refines orbitequivalence with respect to G _{ D }. It follows that X is also orbitfinite with respect to D. □
Background
The definition of nominal sets presented above is based on, but not identical to, the one from [3]. The definition used in this paper is slightly more relaxed than the one in [3]; in [3] the category of nominal sets used only equivariant sets and equivariant objects. The difference between the nominal sets from [3] and this paper, and the nominal sets of Gabbay and Pitts [7], is that here and in [3] there is the additional parameter of a data symmetry, while in [7] the data symmetry is always assumed to be the same, namely \(\mathbb{D}\) is the natural numbers and G is the set of all permutations of natural numbers.
3 Nominal Monoids
Fix a data symmetry \((\mathbb{D},G)\). When talking about nominal sets and functions below, we mean nominal under this symmetry. In the category of nominal sets, there is a natural definition of a monoid:
Definition 3.1
A nominal monoid is a monoid, where the carrier is a nominal set, and the concatenation operation is nominal. If M,N are nominal monoids, then a nominal monoid morphism α:M→N is a function that is both a nominal function and a monoid morphism.
Example
The rest of this section is devoted to showing that the properties of monoids as language recognizers work just as well in the nominal setting, including the free monoid and the syntactic monoid.
3.1 Free Monoid
Lemma 3.2
Let M be a nominal monoid and A a nominal set. Every nominal function α:A→M can be uniquely extended to a nominal monoid morphism [α]:A ^{∗}→M.
Proof
Recognition
3.2 Syntactic Monoid
A nominal alphabet is any nominal orbitfinite set.
Examples of Alphabets

The set \(\mathbb{D}\) of data values.
 The product \(\varSigma\times\mathbb{D}\), where Σ is a finite set and the action is defined by

Pairs of data values: \(\mathbb{D}^{2}\).

Unordered pairs of data values: \(\{\{d,e\} : d \neq e \in \mathbb{D}\}\).

Recall that in the definition of a nominal set, Definition 2.1, a nominal set comes with a support C. In the examples above, the support was empty. An example of a set with nonempty support {d} is the set of data values with some chosen value d being excluded: \(\mathbb{D} \{d\}\).
Nominal Language
Consider a nominal alphabet A. A nominal language over A is a nominal subset of A ^{∗}. It is easy to see that any language recognized by a nominal monoid morphism is a nominal language. Using the syntactic morphism, we will also prove the converse in Lemma 3.3.
TwoSided MyhillNerode Congruence
Lemma 3.3
If L⊆A ^{∗} is a nominal language, then its syntactic monoid is a nominal monoid, and the syntactic morphism is a nominal monoid morphism.
Proof
The assumption in the lemma is that for some support D that contains the support of the alphabet, L=Lπ holds for all π∈G _{ D }.
Because the syntactic morphism is \(w \mapsto[w]_{\equiv_{L}}\), the very definition of the action in the syntactic monoid proves that the syntactic morphism is a nominal function. □
The following lemma shows that the syntactic morphism is the “best morphism” for recognizing a language.
Lemma 3.4
Consider a nominal language L⊆A ^{∗}. For every surjective nominal monoid morphism α:A ^{∗}→M that recognizes L, there is a unique β such that β∘α is the syntactic morphism.
Proof
4 Logic
In this section, we give a definition of MSO and firstorder logics for words in any data symmetry. The definition is abstract, but in the special case of the equality symmetry it specializes to the well known MSO and firstorder logics, which access data values via a binary data equality predicate.
Let A be an orbitfinite alphabet. Define a relational vocabulary τ _{ A }, which has one binary relation symbol <, and one nary relation symbol R for every nary nominal subset R⊆A ^{ n }. The vocabulary is infinite.
When talking about nominal firstorder logic for words over A, we refer to firstorder logic over the vocabulary τ _{ A }. A sentence is said to be true in a word w if it is true in the structure \(\underline{w}\). Likewise for MSO.
Example 1
Example 2
4.1 Nominal vs Standard Logic
We now prove the result that was announced in Example 1, namely that nominal and standard firstorder logics coincide in the special case of the equality symmetry and alphabets of the form \(\varSigma\times\mathbb{D}\).
Theorem 4.1
Consider the equality symmetry, and an alphabet \(A = \varSigma\times \mathbb{D}\), with Σ a finite set. Then the standard and nominal firstorder logics for words over A have the same expressive power.
Lemma 4.2
Proof

For every i∈{1,…,n}, the label b _{ i } is a _{ i }.

For every i∈{1,…,n}, if d _{ i }∈C, then e _{ i }=d _{ i }.

For every i∈{1,…,n}, if d _{ i }∉C, then e _{ i }∉C.

For every i,j∈{1,…,n} if d _{ i }=d _{ j } then e _{ i }=e _{ j }.
Theorem 4.1 follows immediately from the lemma above, because the relations in standard firstorder logic for data words can capture the predicates corresponding to the relations in the statement of Lemma 4.2. Observe also that, thanks to the “furthermore” clause in the lemma, if a formula of nominal firstorder logic uses only equivariant predicates, then its corresponding formula in standard firstorder logic does not need to use the predicates d(x) for \(d \in\mathbb{D}\).
The same proof as for Theorem 4.1 would work in other symmetries, such as the total order symmetry, except that ∼ would need to be replaced by the ordering on the data values.
5 Local Finiteness
A monoid is called locally finite if every finitely generated submonoid of M is finite. For example, the monoid \(\mathbb{N}\) with addition is not locally finite, because the submonoid generated by {1} is the whole infinite set \(\mathbb{N}\). Below is an example of an infinite but locally finite monoid.
Example 3
Let X be an infinite set. Consider the monoid where elements are subsets of X, and the monoid operation is set union. This monoid is infinite. However, any finite set of n generators will generate a submonoid of at most 2^{ n } elements, hence the monoid is locally finite.
In this section we show that every orbitfinite nominal monoid is locally finite. As we shall see, this implies that most of Green’s theory can be used in orbitfinite monoids.
Theorem 5.1
Every orbitfinite nominal monoid is locally finite.
In the proof, we use the following lemma.
Lemma 5.2
Let X be an orbitfinite nominal set, and C a finite set of data values. There are finitely many elements in X that are supported by C.
Proof
We assume without loss of generality that X has one orbit. Let D be the least support of the set X itself.
If an element x∈X is supported by C, then its least support (which exists by the least support assumption) is a subset E⊆C, which must satisfy D⊆E. There are finitely many subsets of C. Therefore, the lemma will follow once we show that for every E⊇D there are finitely many elements x∈X whose least support is E.
Proof of Theorem 5.1
Aperiodic Monoid
Lemma 5.3

M does not contain a nontrivial subgroup.

M is aperiodic.
Proof
Suppose first that M is aperiodic. We show that there can be no subgroup. For the sake of contradiction, suppose that G is a subgroup. Let g∈G be an element different than the group identity. By aperiodicity, there must be some \(i\in\mathbb{N}\) such that g ^{ i }=g ^{ i+1}, and by multiplying both sides by g ^{−i }, we get that 1=g, where 1 is the identity of G.
Observe that the two conditions in Lemma 5.3 are not equivalent in all monoids. For instance, the group of integers with addition is aperiodic.
6 Local Finiteness for MSO
As an added bonus, we prove a much stronger result than Theorem 5.1. This section is independent from the rest of the paper, and may be skipped by the reader who is only interested in the main result, Theorem 9.1
Theorem 6.1
Let A be an orbitfinite alphabet, and let L⊆A ^{∗} be a language definable in nominal MSO. Then the syntactic monoid of L is locally finite.
Theorem 6.1 was stated in the conference paper but not proved. This theorem may come useful in future work. For instance, we might want to develop an algebraic theory of languages recognized by deterministic, or even nondeterministic, finite nominal automata, as defined in [3]. These automata can be simulated in MSO, and therefore Theorem 6.1 says that Green’s theory can be used for them. The rest of this section is devoted to the proof of Theorem 6.1.
Fiber Bounded Morphism
Language Characterization of MSO
When proving Theorem 6.1, we want to avoid formulas of logic, so that we do not have to talk about free variables and other nuisances. That is why we state below a characterization of MSO that is purely in terms of languages.
Lemma 6.2
Languages definable in nominal MSO are included ^{2} in the smallest class of languages that contains languages recognized by orbitfinite nominal monoids, is closed under boolean combinations and images under fiber bounded morphisms.
Proof of Lemma 6.2
Example 4
Consider now the morphism \(f : \mathbb{D}^{*} \to a^{*}\) which replaces all letters by a. This morphism is not fiber bounded, because the inverse image of a is \(\mathbb{D}\). The image of L under f is the set of words of prime length. The syntactic monoid of that language is not locally finite, because it is finitely generated by {a} and infinite.
We now prove Theorem 6.1. By Lemma 6.2, it is enough to prove that the syntactic monoids of the class of languages in the lemma are locally finite. We first show two operations on monoids that correspond to boolean combinations and fiber bounded morphisms, respectively.
Monoid Operations
Lemma 6.3

Any boolean combination of L and K is recognized by the product M×N.

Any fiber bounded projection of L is recognized by P _{fin}(M).
A corollary of Lemmas 6.2 and 6.3 is that every language definable in MSO is recognized by a monoid from the least class of monoids that contains orbitfinite monoids and is closed under Cartesian products and finite powersets. By Theorem 5.1, all orbitfinite nominal monoids are locally finite. In Lemmas 6.4 and 6.5 below, we show that locally finite monoids are closed under Cartesian products and finite powersets, respectively. It follows that every language definable in MSO is recognized by a locally finite monoid. Finally, by Lemma 6.6 below, and Lemma 3.4, it follows that the syntactic monoid of every language definable in MSO is locally finite.
Lemma 6.4
If monoids M _{1},M _{2} are locally finite, then so is M _{1}×M _{2}.
Proof
Let X be a finite subset of M _{1}×M _{2}. For i∈{1,2} let N _{ i } be the submonoid of M _{ i } generated by the projection of X to the ith coordinate. By assumption on M _{1},M _{2} being locally finite, the submonoids N _{1},N _{2} are finite. The submonoid generated by X in M _{1}×M _{2} is a subset of N _{1}×N _{2}, and therefore it is also finite. □
Lemma 6.5
If monoid M is locally finite, then so is P _{fin}(M).
Proof
Let \(\mathcal{X}\) be a finite set of elements in P _{fin}(M). Let X be the union \(\bigcup\mathcal{X}\), which is a finite set, because it is a finite union of finite sets. Let N be the submonoid of M that is generated by X. By local finiteness of M, the monoid N is finite. Let \(\mathcal{N}\) be the submonoid of P _{fin}(M) generated by \(\mathcal{X}\). By induction one shows that every element of \(\mathcal{N}\) is a subset of N. By finiteness of N, there are finitely many elements in \(\mathcal{N}\). □
Lemma 6.6
Let α:M→N be a surjective monoid morphism. If M is locally finite, then so is N.
Proof
Let X be a finite subset of N. Using surjectivity, choose a finite subset Y of M such that α(Y)=X. By local finiteness of M, the monoid generated by Y in M is finite; and therefore so is its image in N, which is the same as the monoid generated by X in N. □
7 Green’s Relations for Nominal Monoids

\(m \le_{\mathcal{L}}n\) if Mm⊆Mn

\(m \le_{\mathcal{R}}n\) if mM⊆nM

\(m \le_{\mathcal{J}}n\) if MmM⊆MnM
It is easy to see that Green’s relations are nominal relations, with their support being the support of the monoid. The following lemma^{3} shows that two key properties of Green’s relations work in all locally finite monoids, which covers the case of orbitfinite nominal monoids thanks to Theorem 5.1.
Lemma 7.1

If \(n \ \mathcal{J}\ m\) and \(m \le_{\mathcal{R}}n\), then \(m\ \mathcal{R}\ n\). Likewise for \(\mathcal{L}\).

If M is aperiodic, then \(m\ \mathcal{L}\ n\) and \(m\ \mathcal {R}\ n\) imply m=n.
Proof
It is well known that the properties above hold in finite monoids.
The same argument works for the second property. □
8 FirstOrder Definable Functions
Example 5
B=2 and f says if some data value appears twice. More generally, the characteristic function of any firstorder definable language is a firstorder definable function, see Lemma 8.1.
Example 6
B=A and f returns the first letter. This is a partial function, because f is undefined on the empty word. Unlike in Example 5, the output set is infinite, and therefore f cannot be described just by using sentences of firstorder logic.
Example 7
B is the family of two element subsets of A, and f returns the unordered set containing the first and last letters. Again, f is a partial function.
In the classical setting, where orbitfinite alphabets are simply finite, the notion of firstorder definable function is just syntactic sugar on top of firstorder definable languages. This is because a function f:A ^{∗}→B can be described as finite set of sentences {φ _{ b }}_{ b∈B }, such that φ _{ b } defines the language f ^{−1}(b). However, when orbitfinite sets are actually infinite, the ability to return letters and data values becomes important. In Sect. 8.1, we come back to the idea of representing a function as a set {φ _{ b }}_{ b∈B }.
Definition of FirstOrder Definable Functions

Boolean properties. Suppose that φ is a firstorder sentence. Then the characteristic function f:A ^{∗}→2 is a firstorder definable function.

Letter selection. Suppose that φ(x) is a firstorder formula with one free variable. Then the function f:A ^{∗}→A, which maps a word w to the label of the first position selected by φ(x), is firstorder definable. The function f is partial, because in some words no position is selected by φ(x).
 Product and case. If functions f:A ^{∗}→B and f:A ^{∗}→C are firstorder definable, then so are their product and sum The product (f,g) is defined whenever both f and g are defined.

Information loss. If the function f:A ^{∗}→B is firstorder definable, and g:B→C is any nominal function, then the composition f;g:A ^{∗}→C is also firstorder definable.
It is easy to see that all the examples from the beginning of this section are firstorder definable. Consider the function from Example 7. We use the product of two letter selectors to define a function h:A ^{∗}→A ^{2}, which maps a word to the ordered pair of its first and last letters. Then, we use information loss to forget the order in the pair.
Lemma 8.1
A language L⊆A ^{∗} is firstorder definable if and only if its characteristic function f _{ L }:A ^{∗}→2 is firstorder definable.
Proof
The lefttoright implication is by definition of firstorder definable functions. The righttoleft implication is by substituting formulas of firstorder logic. □
8.1 Formulas as a Nominal Set
In this section we conjecture that firstorder definable functions have a more elegant description than the one presented in the previous section.
Recall that the predicates in nominal firstorder logic are obtained from nominal subsets R⊆A ^{ n } (except for the order relation). Suppose that the alphabet A has support C, i.e. one can use G _{ C } to act on elements of A. Extending this action coordinatewise to A ^{ n }, and then pointwise to subsets of A ^{ n }, we can apply any permutation π∈G _{ C } to a nominal relation R⊆A ^{ n }, and get a new nominal relation Rπ⊆A ^{ n }. We can then extend this action of G _{ C }, by simple structural induction, to all formulas of nominal firstorder logic, so that a permutation π∈G _{ C } maps one formula φ to another formula φπ. It is easy to see that the set of formulas of nominal firstorder logic is also a nominal set, with support C. Call this set FO_{ A }.

For every b, the sentence φ _{ b } defines the language f ^{−1}(b); and

The correspondence b↦φ _{ b } is a nominal function from B to FO_{ A }.
9 FirstOrder Logic
In this section we prove the main result of the paper, Theorem 9.1, which is an effective characterization of firstorder logic in terms of aperiodic monoids. In the theorem, we talk about the \(\mathcal{J}\)order, which is the relation \(\le_{\mathcal{J}}\) lifted to \(\mathcal{J}\)classes.
Theorem 9.1
Let L be a nominal language, and let M _{ L } be its syntactic monoid. Assume that M _{ L } is orbitfinite and has a wellfounded \(\mathcal{J}\)order. Then L is definable in firstorder logic if and only if M _{ L } is aperiodic.
Lemma 9.2 shows that the easier implication holds even without the assumptions on orbitfiniteness and wellfoundedness.
Lemma 9.2
If L is definable in firstorder logic, then M _{ L } is aperiodic.
Proof
This proof is the same as in the classical case, without data values.
In Sect. 9.1, we show the more difficult implication: if M _{ L } is aperiodic then L is firstorder definable. First, we give examples which illustrate the assumptions in the theorem. Both examples show syntactic monoids where the \(\mathcal {J}\)order is not wellfounded. It could be the case that if a syntactic monoid is aperiodic and has a wellfounded \(\mathcal{J}\)order, then it is orbitfinite. In particular, in Theorem 9.1, it could be that the assumption on orbitfiniteness is not necessary.^{4}
Example 8
Example 9
Let A be an infinite but orbitfinite alphabet, e.g. \(\mathbb{D}\) in the equality symmetry. Consider the set of words where the number of distinct letters is even. The syntactic monoid of this language is the family of finite subsets of A, with union as the monoid operation. The monoid is aperiodic (and therefore a language that talks about parity can have an aperiodic syntactic monoid). However, the monoid is not orbitfinite, because subsets of different sizes are in different orbits. Also, the \(\mathcal{J}\)order is not wellfounded (because the \(\mathcal{J} \)order is the superset relation, which is not wellfounded).^{5}
Observe that in Example 8 we used the total order symmetry, and not the simpler equality symmetry. Indeed, it is impossible to provide an example that uses the equality symmetry, as shown by the following lemma.
Lemma 9.3
In the equality symmetry, the \(\mathcal{J}\)order is wellfounded in every orbitfinite monoid.
Proof
Observe that we have proved a slightly stronger result: if a permutation from G _{ C } can map one \(\mathcal{J}\)class to another, then the \(\mathcal{J} \)classes are either equal or incomparable in the \(\mathcal{J}\)ordering. □
9.1 From Aperiodic to FirstOrder Definable
In this section we finish the proof of Theorem 9.1, by showing the more difficult implication: if M _{ L } is aperiodic then L is firstorder definable. The proof is an induction, which needs a more detailed statement, as presented below.
Proposition 9.4
Consider a nominal monoid morphism α:A ^{∗}→M into a nominal monoid M that is orbitfinite and aperiodic. Let X⊆M be a nominal subset that is upward closed under \(\le_{\mathcal{J}}\). Then the partial function α _{ X }, which is α with domain restricted to X, is firstorder definable.
Let C be a set that supports both M and X. The proof is by induction on the number of orbits in X, under the action of the group G _{ C }. The base of the induction is when there are zero orbits, in which case the function α _{ X } is easy to define.
Lemma 9.5
The set Z is upward closed under \(\le_{\mathcal{J}}\).
Proof
Thanks to Lemma 9.5, we can apply the induction assumption to Z, yielding a function α _{ Z } that is firstorder definable. The rest of this section is devoted to extending the function from Z to X=Y∪Z
For a word w∈A ^{∗}, we use the name type of w for α(w). Define R _{ w } to be the \(\mathcal{R}\)class of the shortest prefix of w with type outside Z. Likewise, define L _{ w } to be the \(\mathcal{L}\)class of the shortest suffix of w whose type is outside Z. Both R _{ w } and L _{ w } might be undefined, if the appropriate prefixes or suffixes do not exist.
Lemma 9.6
Proof
Suppose that a word w has type in Z. Because the \(\mathcal {J}\)class of an infix decreases as the infix grows, and because Z is upward closed under the \(\mathcal{J}\)ordering, it follows that all infixes of w have type in Z. Therefore, g is undefined on w.
Consider now a word w with type outside Z. Let u _{1} be the longest prefix of w with type in Z. Let a _{1} be the next letter after u _{1} in w. By choice of a _{1}, we have that R _{ w } is the \(\mathcal{R} \)class of u _{1} a _{1}. Likewise, let u _{2} be the shortest suffix of w with type in Z, and let a _{2} be the letter that precedes u _{2} in w. Again, L _{ w } is the \(\mathcal{L}\)class of a _{2} u _{2}.
Lemma 9.7
If w has type in X, then that type is f(w).
Proof
Because X is partitioned into Y and Z, we consider two cases.
If w has type in Z, then the result follows from the definition of f.
Lemma 9.8
The set of words with type in Y is firstorder definable.
Proof
For the righttoleft implication, observe that all words in K have type outside X, and therefore also all words that contain an infix from K.
For the lefttoright implication, observe that the complement of X is downward closed in the \(\mathcal{J}\)order, by assumption on X being upward closed. Therefore, a word has type outside X if and only if it has an infix outside X. If we choose the infix to have minimal length, then removing the first or last letter of the infix gives a word in with type in X, for which we can use Lemma 9.7. This infix is the word u in the definition of K. □
We now conclude the proof of the induction step in Proposition 9.4. We need to define the function α _{ X }. This is the disjoint union of the functions α _{ Y } and α _{ Z }. The latter function is defined by induction assumption. The former function is the restriction of f to the set of words with type in Y, by Lemma 9.7. By Lemma 9.8, this restriction can be done in firstorder logic.
10 Further Work
Characterize Other Logics
It is natural to extend the characterization of firstorder logic to other logics. Candidates that come to mind include firstorder logic with two variables, or various logics inspired by XPath, or piecewise testable languages. Also, it would be interesting to see the expressive power of languages recognized by orbitfinite nominal monoids. This class of languages is incomparable in expressive power to firstorder logic, e.g. the firstorder definable language “some data value appears twice” is not recognized by an orbitfinite nominal monoid. It would be nice to see a logic, maybe a variant of monadic secondorder logic, with the same expressive power as orbitfinite nominal monoids.
Use Mechanisms More Powerful than Monoids
As language recognizers, orbitfinite nominal monoids are very weak. In most data symmetries, such as the equality and total order symmetries, orbitfinite nominal monoids are strictly less expressive than orbitfinite deterministic automata.^{6} For example, the language “the first letter in the word appears also on some other position”, is recognized by an orbitfinite deterministic automaton, but has a syntactic monoid with infinitely many orbits. Therefore, one can ask: is it decidable if an orbitfinite automaton recognizes a language that can be defined in firstorder logic? We conjecture that this problem is decidable, and even that a necessary and sufficient condition is aperiodicity of the syntactic monoid (which need not be orbitfinite).
Footnotes
 1.
This is related to Proposition 3 in [9].
 2.
The inclusion is strict. Consider the language described in Sect. III.A of an unpublished manuscript, http://www.mimuw.edu.pl/~bojan/papers/atomturing.pdf. This is an example of a language that is not definable in nominal MSO (because it is not even recognised by a deterministic Turing machine with atoms, and the machines are more powerful than nominal MSO), but which is an image, under a fiber bounded morphism, of a language definable in nominal MSO (because having an even number of conflicts, as defined in the manuscript, is definable in nominal MSO).
 3.
As pointed out by one of the anonymous referees, every locally finite monoid is group bound, which implies Lemma 7.1. See e.g. [8].
 4.
I would like to thank an anonymous reviewer for pointing this out.
 5.
A previous version of this paper claimed that the monoid has a wellfounded \(\mathcal{J}\)order. This error was pointed out by the anonymous referee. I am not aware of a syntactic monoid that is aperiodic, not orbitfinite, but has a wellfounded \(\mathcal{J}\)order.
 6.
Notes
Acknowledgements
I would like to thank Clemens Ley for introducing me to the subject. I would like to thank Sławomir Lasota and Bartek Klin for many stimulating discussions. Finally, I would also like to thank Michael Kaminski and the anonymous referees for their many helpful comments.
References
 1.Benedikt, M., Ley, C., Puppis, G.: Automata vs. logics on data words. In: Dawar, A., Veith, H. (eds.) CSL. Lecture Notes in Computer Science, vol. 6247, pp. 110–124. Springer, Berlin (2010) Google Scholar
 2.Bojańczyk, M.: Data monoids. In: Schwentick, T., Dürr, C. (eds.) STACS. LIPIcs, vol. 9, pp. 105–116. Schloss Dagstuhl, LeibnizZentrum für Informatik, Saarbrücken (2011) Google Scholar
 3.Bojańczyk, M., Klin, B., Lasota, S.: Automata with group actions. In: LICS, pp. 355–364. IEEE Computer Society, Los Alamitos (2011) Google Scholar
 4.Bojańczyk, M., Braud, L., Klin, B., Lasota, S.: Towards nominal computation. In: Field, J., Hicks, M. (eds.) POPL, pp. 401–412. ACM, New York (2012) Google Scholar
 5.Bouyer, P., Petit, A., Thérien, D.: An algebraic approach to data languages and timed languages. Inf. Comput. 182(2), 137–162 (2003) zbMATHCrossRefGoogle Scholar
 6.Francez, N., Kaminski, M.: An algebraic characterization of deterministic regular languages over infinite alphabets. Theor. Comput. Sci. 306(1–3), 155–175 (2003) MathSciNetzbMATHCrossRefGoogle Scholar
 7.Gabbay, M., Pitts, A.M.: A new approach to abstract syntax with variable binding. Form. Asp. Comput. 13(3–5), 341–363 (2002) zbMATHCrossRefGoogle Scholar
 8.Higgins, P.M.: Techniques of Semigroup Theory. Oxford University Press, London (1992) zbMATHGoogle Scholar
 9.Kaminski, M., Francez, N.: Finitememory automata. Theor. Comput. Sci. 134(2), 329–363 (1994) MathSciNetzbMATHCrossRefGoogle Scholar
 10.Luc, S.: Automata and logics for words and trees over an infinite alphabet. In: Ésik, Z. (ed.) CSL. Lecture Notes in Computer Science, vol. 4207, pp. 41–57. Springer, Berlin (2006) Google Scholar
 11.Montanari, U., Pistore, M.: Finite state verification for the asynchronous picalculus. In: Cleaveland, R. (ed.) TACAS. Lecture Notes in Computer Science, vol. 1579, pp. 255–269. Springer, Berlin (1999) Google Scholar
 12.Montanari, U., Pistore, M.: Historydependent automata: an introduction. In: Bernardo, M., Bogliolo, A. (eds.) SFM. Lecture Notes in Computer Science, vol. 3465, pp. 1–28. Springer, Berlin (2005) Google Scholar
 13.Straubing, H.: Finite Automata, Formal Languages, and Circuit Complexity. Birkhäuser, Boston (1994) CrossRefGoogle Scholar
 14.Thomas, W.: Languages, automata, and logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Language Theory, vol. III, pp. 389–455. Springer, Berlin (1997) CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.