1 Introduction and main results

Let K be a field. A univariate formal power series

$$\begin{aligned} S=\sum _{n \ge 0} s(n) x^n \in K\llbracket x \rrbracket \end{aligned}$$

is a rational series if it is the power series expansion of a rational function at 0. Necessarily, this rational function does not have a pole at 0. Equivalently, the coefficients of a rational series satisfy a linear recurrence relation, that is, there exist \(\alpha _1\), \(\ldots \,\)\(\alpha _m \in K\) such that

$$\begin{aligned} s(n+m) = \alpha _1 s(n+m-1) + \cdots + \alpha _m s(n) \quad \text {for all } n \ge 0. \end{aligned}$$

Pólya [18] considered arithmetical properties of rational series over \(K=\mathbb {Q}\), and characterized the univariate rational series whose coefficients are supported at finitely many prime numbers. This was later extended to number fields by Benzaghou [2, Chapitre 5], and to arbitrary fields, in particular fields of positive characteristic, by Bézivin [7]. Ultimately, they proved the following theorem.

We call a rational series \(S \in K\llbracket x \rrbracket \) a Pólya series if there exists a finitely generated subgroup \(G \le K^\times \), such that all coefficients of S are contained in \(G \cup \{0\}\).

Theorem 1.1

(Pólya; Benzaghou; Bézivin). Let K be a field, let \(S=P/Q \in K(x)\) be a rational function with \(Q(0) \ne 0\), and let

$$\begin{aligned} S = \sum _{n=0}^\infty s(n) x^n \in K\llbracket x\rrbracket \end{aligned}$$

be the power series expansion of S at 0. Suppose that S is a Pólya series. Then there exist a polynomial \(T \in K[x]\), \(d \in \mathbb {Z}_{\ge 0}\), and for each \(r \in [0,d-1]\) elements \(\alpha _r \in K\), \(\beta _r \in K^\times \) such that

$$\begin{aligned} S = T + \sum _{r=0}^{d-1} \frac{\alpha _r x^r}{1-\beta _r x^d}. \end{aligned}$$

Equivalently, there exists a finite set \(F \subseteq \mathbb {Z}_{\ge 0}\) such that

$$\begin{aligned} s(kd+r) = \alpha _r \beta _r^k \qquad \text {for all } k \ge 0 \text { and } r \in [0,d-1] \text { with } kd+r \not \in F. \end{aligned}$$

The converse of the previous theorem, that every series with such coefficients is a Pólya series, is of course trivial.

Let R be a (commutative) domain. A noncommutative formal power series \(S=\sum _{w \in X^*} S(w) w \in R\langle \langle X \rangle \rangle \) is rational if it can be obtained from noncommutative polynomials in \(R\langle X\rangle \) by successive applications of addition, multiplication, and the star operation \(S^* = (1-S)^{-1} = \sum _{n \ge 0} S^n\) (if S has zero constant coefficient). See Sect. 2 below for formal definitions, and the book by Berstel–Reutenauer [6] for more background. Extending the correspondence between univariate rational series and linear recurrence relations to the noncommutative setting, a theorem of Schützenberger shows that S is rational if and only if it has a linear representation, or, equivalently, is recognized by a weighted finite automaton.

The definition of Pólya series extends to this noncommutative setting: A rational series \(S \in R\langle \langle X \rangle \rangle \) is a Pólya series if its nonzero coefficients are contained in a finitely generated subgroup \(G\le K^\times \) of the quotient field K of R. Noncommutative rational Pólya series were first studied by Reutenauer in 1979 [19]. Reutenauer introduced the notion of an unambiguous rational series and conjectured that these should be precisely the rational Pólya series [19]; see also [20, 21, §6] and [6, p.233, Open Problem 4]. He proved many equivalent characterizations of unambiguous rational series, for instance, showing that they are precisely the ones being recognized by an unambiguous weighted finite automaton. The conjecture that rational Pólya series are unambiguous however so far remained open.

The goal of the present paper is to prove this conjecture. We also recover a new proof of Theorem 1.1 as a special case of our more general theorem, and give a characterization of those rational series recognized by a deterministic weighted finite automaton. Moreover, this resolves all three conjectures in [20, Chapitre 6].

A rational series is unambiguous if it can be obtained from noncommutative polynomials and the operations of addition, multiplication, and the star operation \(S^* = (1-S)^{-1} = \sum _{n \ge 0} S^n\) in such a way that, in these operations, one never forms a sum of two nonzero coefficients. (This is defined more formally in Definition 2.4 below; in Sect. 2 we also recall the definitions of rational series and (unambiguous) weighted automata.) A formal series \(S \in \mathbb {Z}\langle \langle X \rangle \rangle \) is linearly bounded if there exists \(C \ge 0\) such that \({|}S(w){|} \le C{|}w{|}\) for all nonempty words \(w \in X^*\).

Let R be a (commutative) domain and K its quotient field. An element \(a \in K\) is almost integral over R if there exists \(0 \ne c \in R\) such that \(ca^n \in R\) for all \(n \ge 0\). A domain R is completely integrally closed if it contains all such almost integral elements. We are mostly interested in the cases where \(R=K\) is a field or \(R=\mathbb {Z}\). In general, one cannot relax the completely integrally closed condition in the following theorem to integrally closed—see Remark 9.2 and the example following it.

Theorem 1.2

Let R be a completely integrally closed domain with quotient field K. Let X be a finite non-empty set, and let \(S \in R \langle \langle X \rangle \rangle \) be a rational series. Then the following statements are equivalent.

  1. (a)

    S is a Pólya series.

  2. (b)

    S is recognized by an unambiguous weighted finite automaton with weights in R.

  3. (c)

    S is unambiguous (over R).

  4. (d)

    There exist \(\lambda _1\), \(\ldots \,\)\(\lambda _k \in R \setminus \{0\}\), linearly bounded rational series \(a_1\), \(\ldots \,\)\(a_k \in \mathbb {Z}\langle \langle X\rangle \rangle \), and a rational language \(\mathcal {L}\subseteq X^*\) such that \({{\,\mathrm{supp}\,}}(a_i) \subseteq \mathcal {L}\) for all \(i \in [1,k]\) and

    $$\begin{aligned} S(w) = {\left\{ \begin{array}{ll} \lambda _1^{a_1(w)} \cdots \lambda _k^{a_k(w)} &{} \text {if } w \in \mathcal {L},\\ 0 &{} \text {if } w \not \in \mathcal {L}. \end{array}\right. } \end{aligned}$$
  5. (e)

    S is Hadamard sub-invertible, that is, the series

    $$\begin{aligned} \sum _{w \in {{\,\mathrm{supp}\,}}(S)} S(w)^{-1} w \in K\langle \langle X \rangle \rangle \end{aligned}$$

    is a rational series (over K).

The ‘hard’ part of this theorem is showing (a)\(\,\Rightarrow \,\)(b) in the case where \(R=K\) is a field. It involves the use of finiteness results on unit equations in characteristic 0 and a recent extension of Derksen–Masser to positive characteristic [8]. We also make use of a new invariant associated to a linear representation, its linear hull. The linear hull also allows a characterization of determinizable weighted automatons, see Theorem 1.3 below.

The other implications of Theorem 1.2 are comparatively straightforward and are largely known. The equivalence (b)\(\,\Leftrightarrow \,\)(c) was first noted by Reutenauer [20, Chapitre VI, Théorème 1]. The implications (c)\(\,\Rightarrow \,\)(e) and (e) \(\,\Rightarrow \,\)(a) are also known [6, Exercise 3.1 of Chapter 6], and so once (a)\(\,\Rightarrow \,\)(b) is shown, the equivalence of (a), (b), (c), and (e) is clear. Finally, (a)\(\,\Leftrightarrow \,\)(d) appears in the proof of [19, Proposition 4(ii)]. Despite this, we opt to give a self-contained proof of all of Theorem 1.2 in the present paper.

Denote by \(\mathbb {1}_\mathcal {L}\) the characteristic series of a set \(\mathcal {L}\subseteq X^*\). By a theorem of Schützenberger [6, Corollary 9.2.6], any linearly bounded rational series \(a \in \mathbb {Z}\langle \langle X \rangle \rangle \) can be expressed as a \(\mathbb {Z}\)-linear combination of series of the form \(\mathbb {1}_\mathcal {L}\) and \(\mathbb {1}_\mathcal {L}\mathbb {1}_\mathcal {K}\) for rational languages \(\mathcal {L}\), \(\mathcal {K}\). This gives a more explicit description of the series appearing as exponents in (d) of Theorem 1.2.

In computer science the question whether a given weighted automaton is equivalent to a deterministic (sequential) one has received considerable attention; we mention the survey [15]. The question is of theoretical importance but also of practical relevance in natural language processing [4, 16, 17]. In this context, in contrast to our setting, however K is usually a tropical semiring. Nevertheless one can also pose these questions for K a field, as is done for instance in [15, §5].

When K is a field, the linear hull (see Definition 3.6) allows a characterization of rational series recognized by a deterministic weighted automaton. See Sect. 10 for another characterization, using bounded variation, in the spirit of Mohri [16].

Theorem 1.3

Let K be a field and X a finite non-empty set. For a rational series \(S \in K\langle \langle X \rangle \rangle \), the following statements are equivalent.

  1. (a)

    S is recognized by a deterministic weighted automaton.

  2. (b)

    If \((u,\mu ,v)\) is a minimal linear representation of S, then its linear hull has dimension at most 1.

Rephrasing Theorem 1.3, a weighted automaton (with weights in a field K) is equivalent to a deterministic one if and only if (b) holds. We also obtain the following (already known) corollary.

Corollary 1.4

If \(S \in K\langle \langle X \rangle \rangle \) is a rational series whose coefficients take only finitely many values, then S is recognized by a deterministic weighted automaton. In particular, if K is a finite field, then every rational series over K is recognized by a deterministic weighted automaton.

1.1 Notation

Throughout the paper, let R be a (commutative) domain with quotient field K. Often we will be concerned only with the case where \(R=K\) is a field. Let X be a finite non-empty set. Let \(G \le K^\times \) be a finitely generated subgroup, and set \(G_0=G \cup \{0\}\). When considering Pólya series S, we will assume that G is such that \(G_0\) contains all coefficients of S.

1.2 Outline

The paper is organized as follows. In Sect. 2 we recall necessary background on rational series. In Sect. 3 we introduce a useful topology and the notion of a linear hull; we also make a first crucial reduction in Lemma 3.13 and obtain Corollary 1.4. In Sect. 4 we use unit equations to prove a key lemma, with the majority of the work dedicated to dealing with positive characteristic. In Sects. 3 and 4 we restrict to the case where \(R=K\) is a field. Now we can prove Theorem 1.2 over fields: in Sect. 5 we prove the hard direction (a)\(\,\Rightarrow \,\)(b). In Sect. 6 we show (b)\(\,\Leftrightarrow \,\)(c), in Sect. 7 we show (c)\(\,\Rightarrow \,\)(d). The implications (c)\(\,\Rightarrow \,\)(e) and (e) \(\,\Rightarrow \,\)(a) are shown in Sect. 8. In Sect. 9 we put all these pieces together and extend the main implication (a)\(\,\Rightarrow \,\)(b) from fields to completely integrally closed domains, then prove Theorems 1.1 and 1.2. In Sect. 10 we conclude the proof of Theorem 1.3.

2 Preliminaries: rational series, linear representations, and weighted automata

We briefly recall the definitions of (noncommutative) rational series, linear representations, and weighted automata and how they relate to each other. We largely follow the notation and terminology from [6].

Let \(X^*\) denote the free monoid on the alphabet X. For a (noncommutative) formal power series \(S \in R\langle \langle X \rangle \rangle \) and a word \(w \in X^*\), we write S(w) for the coefficient of w, that is

$$\begin{aligned} S = \sum _{w \in X^*} S(w) w. \end{aligned}$$

The support of S is \({{\,\mathrm{supp}\,}}(S) = \{\, w \in X^* : S(w) \ne 0 \,\}\).

The ring of rational series in X is the smallest subring of the power series ring \(R\langle \langle X \rangle \rangle \) that contains the noncommutative polynomials \(R\langle X \rangle \) and is closed under addition, multiplication, and the partial operation

$$\begin{aligned} S \mapsto S^* :=(1-S)^{-1} = \sum _{n \ge 0} S^n \end{aligned}$$

whenever S has zero constant coefficient.

If \(X=\{x\}\) is a singleton, then \(R \langle \langle X \rangle \rangle = R\llbracket x \rrbracket \). One can easily check that \(S \in R\llbracket x \rrbracket \) is a rational series if and only if there exist polynomials P\(Q \in R[x]\) with \(Q(0) =1\) such that \(S=P/Q\) [6, Proposition 6.1.1]. In other words, S is rational if and only if it is the power series expansion, at the point 0, of a rational function not having a pole at 0. If \(R=K\) is a field, it is also well-known (and not hard to check) that this is the case if and only if the coefficients of S satisfy a linear recurrence relation. Equivalently, there exist vectors \(u \in K^{1 \times n}\), \(v \in K^{n \times 1}\), and a matrix \(A \in K^{n \times n}\) such that \(S(x^i) = u A^i v\) for every \(i \ge 0\).

A fundamental theorem of Schützenberger extends this description to multivariate noncommutative rational series. A linear representation of rank (dimension) n is a triple \((u,\mu ,v)\) where \(u\in R^{1 \times n}\) and \(v \in R^{n \times 1}\) are vectors, and \(\mu :X^* \rightarrow R^{n \times n}\) is a monoid homomorphism from the free monoid \(X^*\) to multiplicative monoid of \(n \times n\)-matrices. Schützenberger showed that \(S \in R\langle \langle X \rangle \rangle \) is rational if and only if there exists a linear representation \((u,\mu ,v)\) such that \(S(w) = u \mu (w) v\) for every \(w \in X^*\); see [6, Theorem 1.7.1].

Suppose \(R=K\) is a field. A linear representation is minimal if the dimension n is minimal among all possible linear representations of S. This is the case if and only if the span of \(u\mu (X^*) = \{\, u\mu (w) : w \in X^* \,\}\) is \(K^{1\times n}\) and the span of \(\mu (X^*)v\) is \(K^{n \times 1}\).

There is another, graph-theoretical, way to view linear representations over the domain R that will come in handy. A weighted (finite) automaton \(\mathcal {A}=(Q,I,E,T)\) over the alphabet X with weights in R consists of a finite set of states Q and three maps

$$\begin{aligned} I :Q \rightarrow R,\quad E :Q \times X \times Q \rightarrow R,\quad T :Q \rightarrow R. \end{aligned}$$

A triple \((p,x,q) \in Q \times X \times Q\) is an edge if \(E(p,x,q) \ne 0\). More specifically, we say that there is an edge from p to q labeled by x and with weight E(pxq). A state \(p \in Q\) is initial if \(I(p) \ne 0\) and terminal if \(T(p) \ne 0\).

A path is a sequence of edges

$$\begin{aligned} P = (p_0,x_1,p_1)(p_1,x_2,p_2)\cdots (p_{l-1},x_l,p_l). \end{aligned}$$

Its weight is \(E(P) = \prod _{i=1}^l E(p_{i-1},x_i,p_i)\) and its label is the word \(x_1 \cdots x_l \in X^*\). The path is accepting if \(p_0\) is an initial state and \(p_l\) is a terminal state. The automaton is trim if every state lies on an accepting path.

The series \(S \in R\langle \langle X \rangle \rangle \) is recognized by \(\mathcal {A}\) if

$$\begin{aligned} S(w) = \sum _{\begin{array}{c} p_0,p_1,\ldots ,p_l \in Q\\ w=x_1\cdots x_l,\, x_i \in X \end{array}} I(p_0)E(p_0,x_1,p_1) \cdots E(p_{l-1},x_l,p_l) T(p_l). \end{aligned}$$
(1)

Thus, the coefficient S(w) is obtained by summing the weights of all accepting paths labeled by the word w, weighing each path by initial/terminal weights. Two automata are equivalent if they recognize the same series. Obviously every weighted automaton is equivalent to a trim one.

There is an easy correspondence between linear representations and weighted automata. Explicitly, the weighted automaton associated to a linear representation \((u,\mu ,v)\) is given by \(Q=[1,n]\), with \(I(k)=u_k\), with \(T(k)=v_k\), and \(E(k,x,l)=\mu (x)_{k,l}\); here the subscripts denote the corresponding coordinates of the vectors u and v, respectively the entries of the matrix \(\mu (x)\). Conversely, for a weighted automaton, one may without loss of generality assume \(Q=[1,n]\), and then the correspondence above yields a linear representation (a different labeling of the states gives a conjugate linear representation, corresponding to a permutation of the basis vectors). A series is recognized by the weighted finite automaton if and only if it is recognized by the associated linear representation. Hence series recognized by automata and series with linear representations are the same, and by Schützenberger’s Theorem coincide with rational series.

Definition 2.1

Let \(\mathcal {A}\) be a weighted automaton. Then \(\mathcal {A}\) is unambiguous if each \(w \in X^*\) labels at most one accepting path. It is deterministic (or sequential) if

  • there exists at most one initial state; and

  • for each \((p,x) \in Q \times X\), there exists at most one \(q \in Q\) with \(E(p,x,q) \ne 0\).

Note that for an unambiguous automaton, in the expression (1) for S(w), at most one summand is nonzero. Deterministic weighted automata are clearly unambiguous.

Remark 2.2

For automata without weights (equivalently, weights in the Boolean semiring \(\mathcal {B}=\{0,1\}\) with \(1+1=1\)), it is well known that every automaton is equivalent to a deterministic one. This is no longer true for weighted automata; there exist unambiguous weighted automata that are not equivalent to deterministic ones, and there exist weighted automata that are not equivalent to unambiguous ones.

Definition 2.3

A rational series \(S \in R\langle \langle X \rangle \rangle \) is a Pólya series if there exists a finitely generated subgroup \(G \le K^\times \) of the quotient field K of R such that \(S(w) \in G_0=G \cup \{0\}\) for all \(w \in X^*\).

Let S\(T \in R\langle \langle X \rangle \rangle \) be two series with \(\mathcal {K}= {{\,\mathrm{supp}\,}}(S)\) and \(\mathcal {L}= {{\,\mathrm{supp}\,}}(T)\). The addition \(S+T\) is unambiguous if \({{\,\mathrm{supp}\,}}(S) \cap {{\,\mathrm{supp}\,}}(T) = \emptyset \); the product ST is unambiguous if every \(w \in \mathcal {K}\mathcal {L}\) has a unique expression \(w=w_1 w_2\) with \(w_1 \in \mathcal {K}\) and \(w_2 \in \mathcal {L}\); and the star operation \(S^*\) is unambiguous if \(\mathcal {K}\) is a code, that is, every \(w \in \mathcal {K}^*\) has a unique expression \(w=w_1\cdots w_k\) with \(w_i \in \mathcal {K}\).

Definition 2.4

The set of unambiguous rational series is the smallest subset of \(R\langle \langle X \rangle \rangle \) that contains \(R\langle X \rangle \) and is closed under unambiguous addition, multiplication, and star operation.

Note that unambiguous operations are defined in such a way that every coefficient of the resulting series is a product of coefficients of the initial series. That is, one never forms a sum of two nonzero coefficients. We thus we have the following.

Lemma 2.5

Every unambiguous rational series is a Pólya series.

3 The linear hull of a linear representation

In this section we consider only the case where \(R=K\) is a field. We introduce a topology and a related invariant of a linear representation that will be essential in the proof of the implication (a)\(\,\Rightarrow \,\)(b) of Theorem 1.2.

Definition 3.1

For a finite-dimensional vector space V, let \(\mathcal {F}(V)\) be the collection of all subsets \(Y \subseteq V\) of the form \(Y = V_1 \cup \cdots \cup V_l\) with \(l \in \mathbb {Z}_{\ge 0}\) and \(V_i \subseteq V\) vector subspaces.

Lemma 3.2

Every finite-dimensional vector space V has a noetherian topology for which \(\mathcal {F}(V)\) is the collection of closed sets.

Proof

Set \(\mathcal {F}=\mathcal {F}(V)\). Clearly \(V \in \mathcal {F}\) and \(\emptyset \in \mathcal {F}\) (with \(\emptyset \) represented by the empty union). By definition \(\mathcal {F}\) is closed under finite unions. To show that \(\mathcal {F}\) is the collection of closed sets of a topology, it remains to verify that \(\mathcal {F}\) is closed under intersections. Every \(Y \in \mathcal {F}\) is closed in the Zariski topology (identifying V with \({\mathbb {A}}_K^n\) for \(n=\dim _K V\)), which is noetherian. Hence, any intersection is equal to a finite subintersection. The claim follows since intersections distribute over unions, and intersections of vector subspaces are again vector subspaces. Since every \(Y \in \mathcal {F}\) is Zariski-closed, the topology is noetherian. \(\square \)

Definition 3.3

Let V be a finite-dimensional vector space. The linear Zariski topology on V is the topology whose collection of closed sets is \(\mathcal {F}(V)\).

If \(W \subseteq V\) is a vector subspace, then the subspace topology induced on W is the linear Zariski topology on W. All topological notions occurring in the remainder of the paper will refer to the linear Zariski topology. We mention [5, §II.4.1 and §II.4.2] and [23, Sections 004U and 0050] as references on irreducible and noetherian topological spaces.

A topological space X is irreducible if it is non-empty and \(X=Z_1 \cup Z_2\) with \(Z_1\)\(Z_2\) closed implies \(X=Z_1\) or \(X=Z_2\). A subset \(Z \subseteq X\) is an irreducible component if it is a maximal irreducible subspace. By \(\mathcal {Z}(X)\) we denote the set of all irreducible components of X. Then \(X = \bigcup _{Z \in \mathcal {Z}(X)} Z\).

Lemma 3.4

The closed irreducible subsets of a vector space V are exactly the

  1. (1)

    vector subspaces of V if K is infinite,

  2. (2)

    vector subspaces of V of dimension \(\le 1\) if K is finite.

Proof

If K is infinite, then a vector space cannot be expressed as a finite union of proper vector subspaces. It follows that the closed irreducible subsets of V are exactly the vector subspaces. If K is finite, we can write every nonzero vector subspace of V as a finite union of one-dimensional vector subspaces. \(\square \)

The dimension of a closed set is the maximal dimension of its irreducible components, with \(\dim \emptyset = -\infty \).

We recall the following basic properties, of which we will make use throughout.

Lemma 3.5

Let Y be a topological space.

  1. (1)

    If Y is irreducible, \(Y' \subsetneq Y\) is closed, and \(\Omega \subseteq Y\) is dense, then \(\overline{\Omega \setminus Y'} = Y\).

  2. (2)

    If \(Z \subseteq Y\) is irreducible, and \(f:Y \rightarrow Y'\) is continuous, then f(Z) is irreducible.

  3. (3)

    If \(Z \in \mathcal {Z}(Y)\) and \(f:Y \rightarrow Y'\) is continuous, then \(f(Z) \subseteq Z'\) for some \(Z' \in \mathcal {Z}(Y')\).

  4. (4)

    If Y is noetherian, then \(\mathcal {Z}(Y)\) is finite.

  5. (5)

    If \(\mathcal {Z}(Y)\) is finite, \(\Omega \subseteq Y\) is dense, and \(Z \in \mathcal {Z}(Y)\), then \(\overline{\Omega \cap Z} = Z\).

Proof

  1. (1)

    The set \(U = Y \setminus Y'\) is non-empty and open and therefore dense in Y by irreducibility [5, Proposition 1 of §II.4.1]. By basic topology, the intersection of a dense subset with an open subset is dense in the open subset. Thus \(\Omega \setminus Y' = \Omega \cap U\) is dense in U, and by transitivity, in Y.

  2. (2)

    [5, Proposition 4 of §II.4.1] or [23, Lemma 0379].

  3. (3)

    By (2), the subset f(Z) of \(Y'\) is irreducible. Every irreducible subset of \(Y'\) is a subset of an irreducible component by [5, Proposition 5 of §II.4.2] or [23, Lemma 004W].

  4. (4)

    [5, Proposition 10 of §II.4.2] or [23, Lemma 0052].

  5. (5)

    Let \(Z_1\), \(\ldots \,\)\(Z_m\) be the irreducible components of Y, with \(Z=Z_1\). Then \(Y = \bigcup _{i=1}^m Z_i\). Let \(U \subseteq Z_1\) be relatively open in Y and assume \(U \cap \Omega = \emptyset \). We have to show \(U = \emptyset \). Since irreducible components are closed ( [5, Proposition 2 of §II.4.1] or [23, Lemma 004W]), the set \(Z_1 \setminus U\) is closed in Y. Therefore \(\Omega \subseteq (Z_1 \setminus U) \cup Z_2 \cup \cdots \cup Z_m\) implies \(Z_1 \subseteq {\overline{\Omega }} \subseteq (Z_1 \setminus U) \cup Z_2 \cup \cdots \cup Z_m\). The irreducibility of \(Z_1\), together with the incomparability of irreducible components, implies \(Z_1 \subseteq Z_1 \setminus U\), that is \(U=\emptyset \). \(\square \)

We can now define a key invariant associated to a linear representation.

Definition 3.6

Let \((u,\mu ,v)\) be a linear representation over the field K, and let

$$\begin{aligned} \Omega :=u \mu (X^*) = \{\, u \mu (w) : w \in X^* \,\} \end{aligned}$$

be the (left) reachability set. The closed set \({{\overline{\Omega }}}\) is the (left) linear hull of \((u,\mu ,v)\).

Before continuing, we illustrate the linear hull on two examples.

Example 3.7

Let \(K= \mathbb {Q}\), let \(X=\{a,b,c\}\), and define a linear representation \((u,\mu ,v)\) by \(u=(1,1,1)\), by \(v = (1,1,0)^T\), and by

$$\begin{aligned} \mu (a)&= \begin{pmatrix} 2 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad -2 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 3 \end{pmatrix},&\mu (b)&= \begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 0 \end{pmatrix},&\mu (c)&= \begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 5 \end{pmatrix}. \end{aligned}$$

The corresponding automaton is depicted in Fig. 1. It is easy to see that \((u,\mu ,v)\) is minimal. We claim \({\overline{\Omega }} = \langle e_1+ e_2, e_3 \rangle \cup \langle e_1-e_2, e_3 \rangle \). The inclusion \(\subseteq \) follows from \(u \in {{\overline{\Omega }}}\) and the fact that \(\langle e_1+ e_2, e_3 \rangle \cup \langle e_1-e_2, e_3 \rangle \) is closed under right action by \(\mu (x)\) for all \(x \in \{a,b,c\}\). For the inclusion \(\supseteq \), observe \(u\mu (a)^n = (2^n, (-1)^n2^n, 3^n)\). Therefore \(\{\, u\mu (a)^{2n} : n \ge 0\,\} \subseteq \Omega \) is dense in \(\langle e_1+e_2, e_3 \rangle \) while \(\{\, u\mu (a)^{2n+1} : n \ge 0 \,\} \subseteq \Omega \) is dense in \(\langle e_1-e_1, e_3 \rangle \).

One could check that all nonzero coefficients of the series \(S = 2 + 2b + 8a^2 + 10cb + 6ab + \cdots \) recognized by \((u,\mu ,v)\) are of the form \(2^e3^f5^g\) with e, f\(g \ge 0\), and that S is therefore a Pólya series. However, since we will construct an unambiguous linear representation recognizing the same series, we will see that S is Pólya a posteriori.

Fig. 1
figure 1

Left: Example 3.7. A weighted automaton recognizing a Pólya series. The automaton is minimal but ambiguous. A non-minimal unambiguous automaton on four states recognizes the same series (see Example 5.4 and Fig. 2). Right: Example 3.8. A weighted automaton that is not determinizable and has a two-dimensional left linear hull. The same series can be recognized by a co-deterministic weighted automaton (that is, deterministic when reading words from right to left). Correspondingly the right linear hull is one-dimensional

Example 3.8

Let \(K= \mathbb {Q}\), let \(X=\{a,b,c\}\), and define a minimal linear representation \((u,\mu ,v)\) by \(u=(1,1,0)\), \(v = (0,0,1)^T\), and

$$\begin{aligned} \mu (a)&= \begin{pmatrix} 2 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 3 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix},&\mu (b)&= \begin{pmatrix} 0 &{}\quad 0 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix},&\mu (c)&= \begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 0 \end{pmatrix}. \end{aligned}$$

The automaton is depicted in Fig. 1. Here the (left) linear hull is \({\overline{\Omega }}=\overline{u \mu (X^*)} = \langle e_1,e_2 \rangle \cup \langle e_3 \rangle \). The dually defined right linear hull is \(\overline{\mu (X^*) v} = \langle e_1 \rangle \cup \langle e_2 \rangle \cup \langle e_3 \rangle \). Observe that neither the dimension nor the number of components of the left and right linear hull coincide.

Let S be the series recognized by \((u,\mu ,v)\). Then \(S(a^nb) = 2^n\) and \(S(a^nc) = 3^n\) for every \(n \ge 0\) and \(S(w) = 0\) for every other word w. By our convention, automata read words left to right. In view of Theorem 1.3, and its natural dual for reading words right to left, the asymmetry in the linear hull reflects that S can be recognized by a deterministic automaton when reading words right to left, but not when reading words left to right.

Our next goal is to show that we can change the linear representation in such a way that the irreducible components of the linear hull, which are vector subspaces, form a direct sum (Lemma 3.13). We do this by forming the external direct sum of the irreducible components of the linear hull, thereby passing to a linear representation of possibly larger dimension. For instance, in Example 3.7, the linear hull is a union of two planes in affine 3-space, necessarily intersecting in a line, while the new linear representation will be defined in affine 4-space and have a linear hull consisting of two 2-dimensional planes (intersecting only in the origin). We will moreover do this in such a way that, if S is a Pólya series, then all coefficients appearing in vectors of \(\Omega \) have their coordinates in \(G_0\).

Let \(Y \subseteq V\) be a closed subset of a vector space V, and \(\varphi \in {{\,\mathrm{End}\,}}_K(V)\) with \(\varphi (Y) \subseteq Y\). Then \(\varphi \) is continuous in our topology. Thus, if Z is an irreducible component of Y, there exists an irreducible component \(Z'\) of Y such that \(\varphi (Z) \subseteq Z'\) (see (3) of Lemma 3.5). In particular, there exists a map \(f:\mathcal {Z}(Y) \rightarrow \mathcal {Z}(Y)\) such that \(\varphi (Z) \subseteq f(Z)\) for all \(Z \in \mathcal {Z}(Y)\). In general, there are several possible choices for this map if \(\varphi (Z)\) lies in an intersection of multiple irreducible components.

Definition 3.9

Let V be a finite-dimensional vector space and \(Y \subseteq V\) a closed set. We define a vector space \({{\widehat{Y}}}\) together with a homomorphism \(\sigma :{\widehat{Y}} \rightarrow V\) by

$$\begin{aligned} {\widehat{Y}} = \bigoplus _{Z \in \mathcal {Z}(Y)} Z \end{aligned}$$

and \(\sigma = \sum _{Z \in \mathcal {Z}(Y)} \pi _Z\), where \(\pi _Z :{\widehat{Y}} \rightarrow Z \subseteq V\) denotes the canonical projection.

For \(Z \in \mathcal {Z}(Y)\), let \(\varepsilon _Z :Z \rightarrow {\widehat{Y}}\) denote the canonical embedding. For \(\varphi \in {{\,\mathrm{End}\,}}_K(V)\) with \(\varphi (Y) \subseteq Y\) and a map \(f:\mathcal {Z}(Y) \rightarrow \mathcal {Z}(Y)\) with \(\varphi (Z) \subseteq f(Z)\) for all \(Z \in \mathcal {Z}(Y)\), let

$$\begin{aligned} {{\widehat{\varphi }}} = {}_f {{\widehat{\varphi }}} :{\widehat{Y}} \rightarrow {\widehat{Y}} \quad \text {be defined by}\quad {{\widehat{\varphi }}} \circ \varepsilon _Z = \varepsilon _{f(Z)} \circ \varphi |_Z. \end{aligned}$$

The definition of \({{\widehat{\varphi }}}\) strongly depends on the choice of f. However, as we will see in a moment, \(\sigma \circ {{\widehat{\varphi }}}\) does not depend on f. Since we will ultimately be interested in this composition, the particular choice of f will not matter, and we suppress f in the notation when this does not cause confusion.

Lemma 3.10

Let all notation be as in Definition 3.9.

  1. (1)

    For \(Z \in \mathcal {Z}(Y)\) and \(z \in Z\), we have \(\varphi (z) = \sigma \circ {{\widehat{\varphi }}} \circ \varepsilon _Z(z)\).

  2. (2)

    If \(\varphi \), \(\psi \in {{\,\mathrm{End}\,}}_K(V)\) with \(\varphi (Y) \subseteq Y\) and \(\psi (Y) \subseteq Y\), then

    $$\begin{aligned} \sigma \circ {}_h\widehat{(\psi \circ \varphi )} = \sigma \circ {}_g {{\widehat{\psi }}} \circ {}_f {{\widehat{\phi }}}. \end{aligned}$$

    for any choice of f, g\(h :\mathcal {Z}(Y) \rightarrow \mathcal {Z}(Y)\) with \(\varphi (Z) \subseteq f(Z)\), \(\psi (Z) \subseteq g(Z)\), and \(\psi \circ \varphi (Z) \subseteq h(Z)\) for all \(Z \in \mathcal {Z}(Y)\).

Proof

  1. (1)

    We have

    $$\begin{aligned} \begin{aligned} \sigma \circ {{\widehat{\varphi }}} \circ \varepsilon _Z(z)&= \sigma \circ \varepsilon _{f(Z)} \circ \varphi (z) = \sum _{Z' \in \mathcal {Z}(Y)} \pi _{Z'} \circ \varepsilon _{f(Z)} \circ \varphi (z)\\&= \pi _{f(Z)} \circ \varepsilon _{f(Z)} \circ \varphi (z) = \varphi (z). \end{aligned} \end{aligned}$$
  2. (2)

    Let \(Z \in \mathcal {Z}(Y)\) and \(z \in Z\). Then, by applying (1), we have

    $$\begin{aligned} \sigma \circ {}_h\widehat{(\psi \circ \varphi )} \circ \varepsilon _Z(z) = \psi \circ \varphi (z). \end{aligned}$$

    On the other hand,

    $$\begin{aligned} \sigma \circ {}_g {{\widehat{\psi }}} \circ {}_f {{\widehat{\varphi }}} \circ \varepsilon _Z(z) = \sigma \circ {}_g {{\widehat{\psi }}} \circ \varepsilon _{f(Z)} \circ \varphi (z) = \psi (\varphi (z)), \end{aligned}$$

    where have again applied (1) in the second equality. \(\square \)

Lemma 3.11

Let \(S \in K\langle \langle X \rangle \rangle \) be a rational series and \(\Gamma = \{\, S(w) : w \in X^* \,\} \subseteq K\). Then S has a minimal linear representation \((u, \mu , v)\) with \(u\mu (w) \in \Gamma ^{1\times n}\) for all \(w \in X^*\). If \(n\ge 1\), we can take \(v=e_1\).

Proof

Let \((u',\mu ',v')\) be a minimal linear representation of S. By minimality we have \(\langle \mu '(w)v' : w \in X^* \rangle _K = K^{n \times 1}\). Let \(w_1\), \(\ldots \,\)\(w_n \in X^*\) be such that the vectors \(b_i = \mu '(w_i)v'\) for \(i \in [1,n]\) form a basis of \(K^{n \times 1}\). If \(n\ge 1\), then \(v' \ne 0\) by minimality, and we may take \(w_1=1\) (the empty word), ensuring \(b_1 = v'\). Let \(B \in K^{n\times n}\) be the matrix whose i-th column is \(b_i\), and let \(\mu :X^* \rightarrow K^{n\times n}\) be defined by \(\mu (w) = B^{-1}\mu '(w)B\). Let \(u=u'B\) and \(v=B^{-1}v'\). Then \((u,\mu ,v)\) is a minimal linear representation of S.

Let \(w \in X^*\) and \(i \in [1,n]\). Then \(u \mu (w) = u'\mu '(w)B\). The i-th coordinate of this vector is

$$\begin{aligned} u'\mu '(w) b_i = u'\mu '(ww_i)v' = S(ww_i) \in \Gamma . \end{aligned}$$

By construction, we also have \(v=B^{-1}v'=e_1\) if \(n \ge 1\). \(\square \)

Lemma 3.12

Let V be a vector space with basis \(e_1\), \(\ldots \,\)\(e_n\). Let \(\Gamma \subseteq K\). For every vector subspace \(W \subseteq V\), there exists a basis \(f_1\), \(\ldots \,\)\(f_m\) of W such that

$$\begin{aligned} (\Gamma e_1 + \cdots + \Gamma e_n) \cap W \subseteq \Gamma f_1 + \cdots + \Gamma f_m. \end{aligned}$$

Proof

Using standard reductions and possibly renumbering the basis elements of V, we can find a basis \(f_1\), \(\ldots \,\)\(f_m\) of W with \(e_i^*(f_j) = \delta _{i,j}\) for i, \(j \in [1,m]\) (essentially Gram-Schmidt).

Let

$$\begin{aligned} w=\alpha _1 e_1 + \cdots + \alpha _n e_n = \beta _1 f_1 + \cdots + \beta _m f_m \end{aligned}$$

with \(\alpha _1\), \(\ldots \,\)\(\alpha _n \in \Gamma \) and \(\beta _1\), \(\ldots \,\)\(\beta _m \in K\). Applying \(e_i^*\) to the equation shows \(\beta _i = \alpha _i \in \Gamma \) for \(i \in [1,m]\). \(\square \)

Lemma 3.13

Let \((u,\mu ,v)\) be a linear representation representing a rational series \(S \in K\langle \langle X \rangle \rangle \) and \(\Omega = u \mu (X^*)\). Let \(\Gamma \subseteq K\) be such that \(\Omega \subseteq \Gamma ^{1\times n}\) and \(\Omega v \subseteq \Gamma \). Let \(\mathcal {Z}({{\overline{\Omega }}}) = \{ W_1, \ldots , W_k \}\), let \(m_i= \dim W_i\), and let \(m=m_1+\cdots +m_k\).

For \(i \in [1,k]\), let \(W_i' \subseteq K^{1 \times m}\) be the \(m_i\)-dimensional subspace spanned by the standard basis vectors \(e_{j} \in K^{1\times m}\) with \(j \in [m_1 + \cdots + m_{i-1}+1, m_1+\cdots +m_i]\), so that in particular

$$\begin{aligned} K^{1 \times m} = W_1' \oplus \cdots \oplus W_k'. \end{aligned}$$

Then there exists an m-dimensional linear representation \((u',\mu ',v')\) of S with

$$\begin{aligned} \mathcal {Z}({{\overline{\Omega }}}') = \{ W_1', \ldots , W_k' \} \end{aligned}$$

where \(\Omega ' = u' \mu '(X^*)\). Moreover \(\Omega ' \subseteq \Gamma ^{1 \times m}\) and \(\Omega 'v' \subseteq \Gamma \).

Proof

If \(n=0\), then \((u',\mu ',v')=(u,\mu ,v)\) trivially has the desired properties. To avoid this degenerate case, from now on assume \(n \ge 1\).

For each \(W_i\) choose a basis \(f_{(i,1)}\), \(\ldots \,\)\(f_{(i,m_i)}\) as in Lemma 3.12. Denote by \(e_1\), \(\ldots \,\)\(e_n\) the standard basis of \(K^{1\times n}\). For \(x \in X\) let \(\varphi _x :K^{1\times n} \rightarrow K^{1 \times n}\) denote the homomorphism that is represented by \(\mu (x)\), and let \(\psi :K^{1\times n} \rightarrow K\) denote the homomorphism represented by v. That is, \(u \mu (x_{1} \cdots x_{l}) v = \psi \circ \varphi _{x_l} \circ \cdots \circ \varphi _{x_1}(u)\) for \(x_1\), \(\ldots \,\)\(x_l \in X\).

For each \(x \in X\), the homomorphism \(\varphi _x\) is continuous and closed and \(\varphi _x(\Omega ) \subseteq \Omega \). Hence also \(\varphi _x({{\overline{\Omega }}}) \subseteq {{\overline{\Omega }}}\). Without restriction \(u \in W_1\). Let \(Y={{\overline{\Omega }}}\). We denote by \(\varepsilon _i :W_i \rightarrow {\widehat{Y}}\) the canonical embedding. By Lemma 3.10,

$$\begin{aligned} \psi \circ \varphi _{x_l} \circ \cdots \circ \varphi _{x_1}(u) = \psi \circ \sigma \circ {{\widehat{\varphi }}}_{x_l} \circ \cdots \circ {{\widehat{\varphi }}}_{x_1} \circ \varepsilon _1(u). \end{aligned}$$
(2)

Set \(Q = \{\, (i,j) : i \in [1,k],\, j \in [1,m_i] \,\}\). Then the family \((\varepsilon _i(f_{(i,j)}))_{(i,j)\in Q}\) is a basis of \({\widehat{Y}}\). With respect to this choice of basis, let \(u' \in K^{1\times Q}\) represent \(\varepsilon _1(u)\), let \(v' \in K^{Q \times 1}\) represent \(\psi \circ \sigma :K^{1 \times Q} \rightarrow K\), and let \(A'_x\) represent \({{\widehat{\varphi }}}_x :K^{Q \times Q} \rightarrow K^{Q \times Q}\). Setting \(\mu '(x)=A'_x\), the tuple \((u', \mu ', v')\) is a linear representation of S by Eq. 2.

Clearly \(\Omega ' \subseteq \varepsilon _1(W_1) \cup \cdots \cup \varepsilon _k(W_k)\), and we show that the right side is the decomposition of \(\overline{\Omega '}\) into irreducible components. The sets \(\varepsilon _i(W_i)\) are irreducible closed subsets of \({\widehat{Y}}\), and \(\varepsilon _i(W_i) \cap \varepsilon _j(W_j) = 0\) if \(i \ne j\). Let \(i \in [1,k]\). The set

$$\begin{aligned} \Omega _i = (\Omega \cap W_i) \setminus \bigcup \{\, W_j : j \in [1,k],\, i \ne j \,\} \end{aligned}$$

is dense in \(W_i\). If \(w \in X^*\) with \(u \mu (w) \in \Omega _i\), then necessarily \(u'\mu '(w) \in \varepsilon _i(W_i) \cap \Omega '\). Hence \(\varepsilon _i(W_i) \cap \Omega '\) is dense in \(\varepsilon _i(W_i)\).

Finally, by choice of the \(f_{(i,j)}\), we have

$$\begin{aligned} \Omega \cap W_i \subseteq (\Gamma e_1 + \cdots + \Gamma e_n) \cap W_i \subseteq \Gamma f_{(i,1)} + \cdots + \Gamma f_{(i,m_i)}. \end{aligned}$$

Thus \(u'\mu '(w) \in \Gamma ^{1 \times Q}\) for all \(w \in X^*\). If \(w \in X^*\) with \(u'\mu '(w) \in \Omega '\), then \(u\mu (w) \in \Omega \) and \(u'\mu '(w)v' = u\mu (w)v \in \Gamma \). \(\square \)

We are now in a position to deal with the (relatively easy) case in which the linear hull has dimension at most 1. As an immediate corollary we obtain the direction (a)\(\,\Rightarrow \,\)(b) of Theorem 1.2 in the special case where S has only finitely many distinct coefficients.

Proposition 3.14

Let \(S \in K\langle \langle X \rangle \rangle \) be a rational series with a linear representation \((u,\mu ,v)\) whose linear hull has dimension \(\le 1\). Then S is recognized by a deterministic weighted automaton.

Proof

Replacing the representation by one as in Lemma 3.13, we can assume that the spaces in \(\mathcal {Z}({{\overline{\Omega }}})\) form a direct sum. Thus, if \(W \in \mathcal {Z}({{\overline{\Omega }}})\), \(a \in W\), and \(x \in X\) with \(0 \ne a \mu (x)\), then there exists a unique \(W' \in \mathcal {Z}({{\overline{\Omega }}})\) with \(a \mu (x) \in W'\). This means that each column of \(\mu (x)\) contains at most one non-zero entry. Hence the weighted automaton associated to this linear representation is deterministic. \(\square \)

Proof of Corollary 1.4

Choose a representation as in Lemma 3.11 with \(\Gamma = \{\, S(w): w \in X^* \, \}\) being finite. Then \(\Omega = u \mu (X^*) \subseteq \Gamma ^{1 \times n}\) is a finite set, and hence \({{\overline{\Omega }}}\) is a finite union of vector spaces of dimension \(\le 1\). Thus \(\dim {{\overline{\Omega }}} \le 1\), and Proposition 3.14 implies the claim. \(\square \)

Remark 3.15

Corollary 1.4 was known before. If G is finite, then S is a finite linear combination of characteristic series of rational languages [6, Corollary 3.2.6]. Since each of these characteristic series can be recognized by a deterministic automaton, the result follows [21, §6]. However, the proof we give here is more in line with the one for our general result.

4 An important lemma

In this section we again consider only the case where \(R=K\) is a field. We are now ready to prove a key lemma in characteristic 0. Its proof depends on unit equations. As a consequence, a variant for positive characteristic is more complicated and will follow at the end of the section.

We recall the fundamental finiteness result on unit equations in characteristic 0 that we will be using. For number fields it was proved independently by Evertse [11] and van der Poorten–Schlickewei [24]; the extension to arbitrary fields appears in [25]. We refer to [9, Chapter 6] or [3, Theorem 7.4.1] for more details.

Proposition 4.1

(Evertse; van der Poorten–Schlickewei). Suppose \({{\,\mathrm{char}\,}}K = 0\). Let \(m \ge 2\), and \(a_1\), \(\ldots \,\)\(a_m \in K^\times \). Then there exist only finitely many projective points \((x_1: \cdots : x_m)\) with coordinates \(x_1\), \(\ldots \,\)\(x_m \in G\) such that

$$\begin{aligned} a_1 x_1 + \cdots + a_m x_m = 0 \end{aligned}$$
(3)

and \(\sum _{i \in I} a_i x_i \ne 0\) for any non-empty, proper subset I of [1, m].

A solution \((x_1,\ldots ,x_m)\) of (3) with \(\sum _{i\in I} a_i x_i \ne 0\) for every \(\emptyset \ne I \subsetneq [1,m]\) is called non-degenerate. So, counted as projective points, there are only finitely many non-degenerate solutions with coordinates in G. It is easily seen that there can be infinitely many degenerate solutions (even when considered as projective points), but by definition, the affine coordinates of the degenerate solutions lie in a finite union of proper vector subspaces.

Lemma 4.2

Suppose \({{\,\mathrm{char}\,}}K=0\). Let V be a vector space with basis \(e_1\), \(\ldots \,\)\(e_n\). Suppose that \(\Omega \subseteq G_0e_1 + \cdots + G_0e_n\) is a dense subset of V. Then, for all \(\varphi \in {{\,\mathrm{Hom}\,}}_K(V,K)\) with \(\varphi (\Omega ) \subseteq G_0\), there exists at most one \(i \in [1,n]\) with \(\varphi (e_i) \ne 0\).

Proof

The claim is trivial for \(n=1\). Suppose \(n \ge 2\). With respect to the basis \((e_1,\ldots ,e_n)\), the homomorphism \(\varphi \) is represented by \((\alpha _1, \ldots , \alpha _n) \in K^{1\times n}\) with \(\alpha _i=\varphi (e_i)\). Enlarging G if necessary, we may assume \(\alpha _1\), \(\ldots \,\)\(\alpha _n \in G_0\).

Let \(I = \{\, i \in [1,n] : \alpha _i \ne 0 \,\}\). We must show \({|}I{|} \le 1\). Suppose to the contrary that \({|}I{|} \ge 2\). After renumbering the basis vectors if necessary, we may assume \(I = [1,m]\) for some \(m \ge 2\).

For \(\emptyset \ne J \subseteq I\) let

$$\begin{aligned} V_J = \big \{\, \lambda _1 e_1 + \cdots + \lambda _n e_n \in V : \sum _{j \in J} \alpha _j \lambda _j = 0 \,\big \}. \end{aligned}$$

By choice of I, each \(V_J\) is a proper subspace of V. Set \(Y = \bigcup _{\emptyset \ne J \subseteq I} V_J\). Since K is an infinite field, a vector space cannot be covered by a finite union of proper subspaces. Thus \(Y \subsetneq V\). Hence \(\Omega ' = \Omega \setminus Y\) is dense in V by Lemma 3.5.

If \(v = \lambda _1 e_1 + \cdots + \lambda _n e_n \in \Omega '\), then

$$\begin{aligned} \sum _{i \in I} \alpha _i \lambda _i = g, \end{aligned}$$
(4)

for some g in \(G_0\) by assumption on \(\varphi \). If \(\emptyset \ne J \subseteq I\), then

$$\begin{aligned} \sum _{j \in J} \alpha _j \lambda _j \ne 0, \end{aligned}$$

since \(v \not \in V_J\). Thus \((\alpha _1 \lambda _1, \ldots , \alpha _m \lambda _m, -g)\) is a non-degenerate solution of the unit equation \(X_1 + \cdots + X_{m+1} = 0\).

Hence there exists a finite subset \(M \subseteq {\mathbb {P}}^{m-1}(K)\) with \((\alpha _1 \lambda _1 :\cdots :\alpha _m \lambda _m) \in M\) for all \(v=\lambda _1 e_1 + \cdots + \lambda _n e_n \in \Omega '\). In particular, since \(m \ge 2\), we see that \(\lambda _1/\lambda _2\) can take only finitely many values. Thus \(\Omega '\) can be covered by finitely many proper vector subspaces of V, in contradiction to \(\overline{ \Omega '}=V\). \(\square \)

4.1 Positive characteristic

For this subsection, we now make the additional assumptions that \({{\,\mathrm{char}\,}}K = p > 0\), and that K is finitely generated over its prime field \(\mathbb {F}_p\).

In extending Lemma 4.2 to positive characteristic, we face the problem that unit equations may have infinitely many non-degenerate solutions. However, a result of Derksen and Masser [8] is useful in bounding the number of solutions of bounded height. In this way, we will be able to recover a version of Lemma 4.2 with the original density hypothesis replaced by a quantitative one.

As in [8, Section 2], we can define a set of discrete valuations and associated absolute values on K in such a way that the absolute values satisfy the product formula. These absolute values depend on a choice of transcendence basis; we will always work with a fixed such set.

Associated to this set of absolute values, we define a (logarithmic, projective) height of an element \(a = (\alpha _1 : \cdots : \alpha _{n}) \in \mathbb {P}^{n-1}(K)\) by

$$\begin{aligned} h(a) = \log \prod _{v} \max \{ {|}\alpha _1{|}_v, \ldots , {|}\alpha _n{|}_v \}. \end{aligned}$$

For \(0 \ne (\alpha _1,\ldots ,\alpha _n) \in K^{1 \times n}\) we set \(h(\alpha _1, \ldots , \alpha _n) = h(\alpha _1 : \cdots : \alpha _n)\). This height satisfies the Northcott property, that is, for every \(N \ge 0\), the set \(\{\, a \in \mathbb {P}^{n-1}(K) : h(a) \le N \,\}\) is finite (at this point we are using that the prime field is finite). Moreover, for every \(A \in K^{n \times n}\) there exists a constant \(C_A\), such that for any \(a \in K^{1 \times n}\) with \(a \not \in \ker A\),

$$\begin{aligned} h(aA) \le h(a) + C_A. \end{aligned}$$

If A is invertible, then even

$$\begin{aligned} h(a) - C_A \le h(aA) \le h(a) + C_A. \end{aligned}$$

(See [12, Theorem B.2.5].)

If V is a finite-dimensional vector space, then any choice of basis gives an isomorphism \(V \rightarrow K^{1\times n}\), and therefore induces a corresponding height \(h_V\) on V and on the projective space \(\mathbb {P}(V)\). If \(h_V\), \(h_V'\) are two such heights, induced by different bases, then \(h_V'(a) = h_V(a) + O(1)\). The exact choice of height will not matter.

We shall also need the following property.

Lemma 4.3

The group

$$\begin{aligned} \sqrt{G} :=\{\, a \in K : a^n \in G \text { for some } n \ge 1\,\} \le K^\times \end{aligned}$$

is finitely generated.

Proof

By assumption K is finitely generated over its prime field \(\mathbb {F}_p\) and \(G \le K^\times \) is a finitely generated subgroup. Let \(\mathbb {F}_q\) be the algebraic closure of \(\mathbb {F}_p\) in K, so that K is a regular extension of \(\mathbb {F}_q\). Let R be the finitely generated \(\mathbb {F}_q\)-subalgebra of K generated by G. Then the integral closure \({\overline{R}}\) is a finitely generated R-module by [13, Proposition 2.4.1] or [10, Corollary 13.13]. Hence \({\overline{R}}\) also is a finitely generated \(\mathbb {F}_q\)-algebra. Now [13, Corollary 2.7.3] implies that \(\overline{R}^\times \) is a finitely generated group; since \(\sqrt{G} \subseteq \overline{R}^\times \), the claim follows. \(\square \)

Let

$$\begin{aligned} \mathbb {P}^{n-1}(G) = \{\, (\alpha _1: \cdots : \alpha _n) : \alpha _1, \ldots , \alpha _n \in G \,\}. \end{aligned}$$

As a consequence of a theorem of Derksen–Masser [8, Theorem 3], we have an upper bound on the number of solutions of bounded height of a unit equation in positive characteristic.

Lemma 4.4

Let \(n \ge 2\) and let \(S \subseteq \mathbb {P}^{n-1}(G)\) be the set of non-degenerate solutions in G to

$$\begin{aligned} a_1 x_1 + \cdots + a_n x_n = 0, \qquad {a_1, \ldots , a_n \in K^\times }, \end{aligned}$$
(5)

and for all \(N \ge 0\) let \(c(N) :={|} \{\, a \in S : h(a) \le N \,\} {|}\). Then there exists \(D \in \mathbb {Z}_{\ge 0}\) such that

$$\begin{aligned} c(N) = O(\log (N)^D). \end{aligned}$$
(6)

Proof

By the Lemma 4.3, the group \(\sqrt{G}\) is finitely generated. A \(\sqrt{G}\)-automorphism is a map

$$\begin{aligned} \psi :\mathbb {P}^{n-1}(K) \rightarrow \mathbb {P}^{n-1}(K),\ (\alpha _1:\cdots :\alpha _n) \mapsto (g_1 \alpha _1: \ldots : g_n \alpha _n) \end{aligned}$$

with \(g_1\), \(\ldots \,\)\(g_n \in \sqrt{G}\). For q a power of p, let

$$\begin{aligned} \varphi _q(\alpha _1:\cdots :\alpha _n) = (\alpha _1^q: \cdots : \alpha _n^q). \end{aligned}$$

Finally, for \(\sqrt{G}\)-automorphisms \(\psi _1\), \(\ldots \,\)\(\psi _k\) and \(a \in \mathbb {P}^{n-1}(K)\) let

$$\begin{aligned}{}[ \psi _{1}, \ldots , \psi _{k} ]_q(a) :=\{\, (\psi _{1}^{-1}\varphi _q^{e_1} \psi _{1}) (\psi _{2}^{-1} \varphi _q^{e_2} \psi _{2}) \cdots (\psi _{k}^{-1} \varphi _q^{e_k} \psi _{k})(a) : e_1, \ldots , e_k \in \mathbb {Z}_{\ge 0} \,\}. \end{aligned}$$

(We suppress the composition operator \(\circ \) for brevity.)

By [8, Theorem 3], the set of solutions S of (5) is contained in a finite union of sets of the form \([ \psi _{1}, \ldots , \psi _{k} ]_q(a)\). It therefore suffices to show that (6) holds for S such a set. Thus, suppose \(S=[ \psi _{1}, \ldots , \psi _{k} ]_q(a)\) for some \(a \in \mathbb {P}^{n-1}(K)\) and \(\sqrt{G}\)-automorphisms \(\psi _1\), \(\ldots \,\)\(\psi _k\).

By induction on k, we show that S contains \(O((\log N)^k)\) elements of height at most N. The case \(k=0\) is clear. Suppose \(k \ge 1\) and that the claim holds for \(k-1\). There exists \(C' \ge 0\) such that \(h(\psi _1^{-1}(b)) \ge h(b) - C'\) and \(h(\psi _1(b)) \ge h(b) - C'\) for all \(b \in \mathbb {P}^{n-1}(K)\). Let

$$\begin{aligned} T_0&= \{\, b \in [ \psi _{2}, \ldots , \psi _{k} ]_q(a) : h(b) \le C'+1 \,\}, \text { and}\\ T_1&= \{\, b \in [ \psi _{2}, \ldots , \psi _{k} ]_q(a) : h(b) > C'+1 \,\}. \end{aligned}$$

The set \(T_0\) is finite. For each \(b \in T_0\) and \(b'=\psi _1^{-1} \varphi _q^{e_1} \psi _1(b)\) with \(e_1 \ge 0\), we have \(h(b') \ge q^{e_1} h(\psi _1(b)) - C'\). Hence, the number of such elements \(b'\) with \(h(b') \le N\) is \(O_b(\log N)\). Exploiting the finiteness of \(T_0\), altogether there are \(O(\log N)\) elements \(b' \in \psi _1^{-1}\varphi _q^{e_1}\psi _1(T_0)\), with \(e_1 \ge 0\), such that \(h(b') \le N\).

For \(b \in T_1\) and \(b'=\psi _1^{-1} \varphi _q^{e_1} \psi _1(b)\) with \(e_1 \ge 0\), we have

$$\begin{aligned} h( b' ) \ge q^{e_1}( h(b) - C') - C' > q^{e_1} - C'. \end{aligned}$$

Thus, if \(h(b') \le N\) then \(e_1 \le \log (N+C')/\log (q)\) and \(h(b) \le N + 2C'\). By the induction hypothesis, there are \(O(\log (N+2C')^{k-1}) = O(\log (N)^{k-1})\) elements \(b \in T_1\) with \(h(b) \le N + 2C'\). Thus there are \(O(\log (N)^k)\) elements \(b' \in \psi _1^{-1} \varphi _q^{e_1} \psi _1(T_1)\), with \(e_1 \ge 0\), for which \(h(b') \le N\).

Altogether,

$$\begin{aligned} S = \bigcup _{e_1 \ge 0} \psi _1^{-1} \varphi _q^{e_1} \psi _1(T_0) \,\cup \, \psi _1^{-1} \varphi _q^{e_1} \psi _1(T_1) \end{aligned}$$

contains \(O(\log (N)^k)\) elements of height at most N. \(\square \)

We can now obtain a variant of Lemma 4.2, using a slightly stronger hypothesis, that also holds in positive characteristic. To ensure that a vector subspace of \(K^{1 \times n}\) is irreducible in our topology, we do still need to assume that K is infinite.

For a subset S of a vector space V, let \(\mathbb {P}(S)\) be the image of \(S\setminus \{0\}\) in the projective space \(\mathbb {P}(V)\).

Lemma 4.5

Suppose K is an infinite field of positive characteristic, finitely generated over its prime field. Let V be a vector space with basis \(e_1\), \(\ldots \,\)\(e_n\), and \(\Omega \subseteq G_0e_1 + \cdots + G_0e_n\) a dense subset of V. Assume that, for every standard projection \(\pi :V \rightarrow W\) with \(W=\langle e_{i_1}, \ldots , e_{i_m} \rangle _K\) and \(m \ge 2\), every closed subset \(Y \subsetneq W\), and every \(C \in \mathbb {R}_{\ge 0}\), \(D \in \mathbb {Z}_{\ge 0}\), there exist arbitrarily large N such that

$$\begin{aligned} {|}\{\, a \in \mathbb {P}(\pi (\Omega ) \setminus Y) : h_W(a) \le N \,\}{|} > C \log (N)^D. \end{aligned}$$

Then, for all \(\varphi \in {{\,\mathrm{Hom}\,}}_K(V,K)\) with \(\varphi (\Omega ) \subseteq G_0\), there exists at most one \(i \in [1,n]\) with \(\varphi (e_i) \ne 0\).

Proof

The beginning of the proof and the overall strategy are analogous to Lemma 4.2. Let \(\varphi (e_i) = \alpha _i \in G_0\) and \(I = \{\, i \in [1,m] : \alpha _i \ne 0 \,\}\). Without restriction \(I = [1,m]\) and we have to show \(m=1\). Assume \(m \ge 2\). Let \(\pi :V \rightarrow W\) with \(W=\langle e_1,\ldots ,e_m \rangle _K\) denote the standard projection.

For \(\emptyset \ne J \subseteq I\), let \(W_J = \{\, \lambda _1e_1 + \cdots + \lambda _m e_m \in W : \sum _{j \in J} \alpha _j \lambda _j = 0\,\}\) and \(Y = \bigcup _{\emptyset \ne J \subseteq I} W_J\). Note that \(Y \subsetneq W\). Thus, applying our assumption to points of bounded height in \(\mathbb {P}(\pi (\Omega ) \setminus Y)\), we find that, for any \(D \in \mathbb {Z}_{\ge 0}\), the set \(\mathbb {P}(\pi (\Omega ) \setminus Y)\) contains more than \(O(\log (N)^D)\) points of height at most N.

Suppose \(\lambda _1 e_1 + \cdots + \lambda _m e_m\) represents a point in \(\mathbb {P}(\pi (\Omega ) \setminus Y)\). Then there exists a \(g \in G\) with

$$\begin{aligned} \sum _{j=1}^m \alpha _i \lambda _i = g. \end{aligned}$$

Hence \((\alpha _1 \lambda _1 : \cdots : \alpha _m \lambda _m : -g)\) is a non-degenerate solution of the unit equation \(X_1 + \cdots + X_{m+1} = 0\). By Lemma 4.4, we conclude that there exist \(O(\log (N)^D)\) such points \((\alpha _1 \lambda _1 : \cdots : \alpha _m \lambda _m)\) of height at most N, a contradiction to the size of \(\mathbb {P}(\pi (\Omega ) \setminus Y)\). \(\square \)

The following example shows that the conclusion of the previous lemma is trivially false for finite fields.

Example 4.6

Let K be a finite field and \(n \ge 2\). Let \(A_1\), \(\ldots \,\)\(A_k \in {{\,\mathrm{GL}\,}}(n,K)\) be such that the residue classes generate \({{\,\mathrm{PGL}\,}}(n,K)\) as a semigroup, and let \(X=\{a_1,\ldots ,a_k\}\). Let \(0 \ne u \in K^{1 \times n}\), let \(0 \ne v \in K^{n \times 1}\), and let \(\mu (a_i)=A_i\) for \(i \in [1,k]\). Since \({{\,\mathrm{PGL}\,}}(n,K)=\mu (X^*)\) acts transitively on \(\mathbb {P}^{n-1}(K)\), the linear representation \((u,\mu ,v)\) is minimal and its linear hull is \(K^{1\times n}\), that is, the set \(\Omega =u\mu (X^*)\) is dense in all of \(K^{1 \times n}\). With \(G =K^\times \) and \(G_0 =K\), any choice of \(\lambda _1\), \(\ldots \,\)\(\lambda _n \in K\), yields a linear map \(\varphi :K^{1\times n} \rightarrow K, (\beta _1,\ldots ,\beta _n) \mapsto \lambda _1 \beta _1 + \cdots + \lambda _n \beta _n\) with \(\varphi (\Omega ) \subseteq G_0\). In particular, we may take all \(\lambda _i\) to be nonzero, in contrast to the conclusion of the previous lemma.

We will later apply Lemma 4.2, respectively Lemma 4.5, to an irreducible component V of the linear hull \({\overline{\Omega }}=\overline{u\mu (X^*)}\) of a linear representation \((u,\mu ,v)\). In characteristic 0 we may use Lemma 4.2, and in this case it is clear that \(\Omega \cap V\) is dense in V. However, in positive characteristic, where we need to apply Lemma 4.5, it is necessary to verify that the stronger conditions of this lemma are indeed satisfied. This is the subject of the next two lemmas.

Let \((u,\mu ,v)\) be a linear representation and \(\Omega = u \mu (X^*)\). For \(N \in \mathbb {Z}_{\ge 0}\), let \(\Omega _{\le N} = \{\, u\mu (w) : w \in X^*,\, {|}w{|} \le N \,\}\).

Lemma 4.7

Let \(Y \subseteq K^{1\times n}\) be a closed set of dimension m with k irreducible components, and let \(N = k^{{|}X{|}(m-1)}+1\). If \(\Omega _{\le N} \subseteq Y\) then \(\Omega \subseteq Y\).

Proof

Starting from Y we iteratively construct a sequence of subsequently smaller closed subsets; it is easiest to keep track of the necessary data using disjoint unions of k rooted trees.

We thus construct a sequence of graphs \(\mathcal {T}_1\), \(\ldots \,\), \(\mathcal {T}_l\) whose vertices are labeled by vector spaces contained in Y, and with each \(\mathcal {T}_i\) having the following properties:

  1. (1)

    If \(W'\) labels a child of a vertex labeled by W, then \(W' \subsetneq W\).

  2. (2)

    \(\mathcal {T}_i\) is s-regular (except for the leaves) with \(s=k^{{|}X{|}}\).

  3. (3)

    If W labels a leaf and \(x \in X\), then there exists a vertex labeled by \(W'\) such that \(W \mu (x) \subseteq W'\).

  4. (4)

    If W labels an internal vertex and \(W_1\), \(\ldots \,\)\(W_s\) label its children, then

    $$\begin{aligned} \Omega _{\le N-i} \cap W \,\subseteq \, W_1 \cup \cdots \cup W_s. \end{aligned}$$

The graph \(\mathcal {T}_1\) has k roots labeled by the elements of \(\mathcal {Z}(Y)\). For \(W \in \mathcal {Z}(Y)\) and \(F:X \rightarrow \mathcal {Z}(Y)\), let

$$\begin{aligned} W_F = \{\, a \in W : a \mu (x) \in F(x) \text { for all } x \in X \,\}. \end{aligned}$$

If, for a fixed W, each of the vector spaces \(W_F\) is a proper subspace of W, then we attach s children to the root labeled by W. These children are labeled by \(W_F\) for \(F:X \rightarrow \mathcal {Z}(Y)\). On the other hand, if \(W=W_F\) for some F, we do not attach any children to the vertex labeled by W. In this case, observe that \(W\mu (x) \subseteq F(x)\) for every \(x \in X\).

It is clear that \(\mathcal {T}_1\) satisfies (1) and (2) . Property (3) holds by the choice of the spaces. For (4) let \(a \in \Omega _{\le N-1} \cap W\). Taking any \(x \in X\), we have \(a\mu (x) \in W'\) for some \(W' \in \mathcal {Z}(Y)\). Then \(a \in W_F\) for any \(F :X \rightarrow \mathcal {Z}(Y)\) with \(F(x) = W'\).

We now iteratively construct \(\mathcal {T}_i\) from \(\mathcal {T}_{i-1}\) for \(i \ge 2\). If, for every leaf of \(\mathcal {T}_{i-1}\), say, labeled by W, and every \(x \in X\), there exists a leaf labeled by \(W'\) such that \(W \mu (x) \subseteq W'\), then we stop and set \(l = i-1\).

Otherwise, fix a leaf \(\alpha \) labeled by W and an \(x \in X\) such that \(W \mu (x)\) is not contained in any label of a leaf of \(\mathcal {T}_{i-1}\). By (3) there exists an internal vertex \(\beta \) labeled by \(W'\) such that \(W \mu (x) \subseteq W'\) but \(W\mu (x) \not \subseteq W''\) for any \(W''\) labeling a child of \(\beta \). By our construction \(W'\) has s children labeled by \(W'_1\), \(\ldots \,\)\(W'_s\). Set \(W_j = \{\, a \in W : a \mu (x) \in W'_j \,\}\), and attach s new children to \(\alpha \), labeled by \(W_1\), \(\ldots \,\)\(W_s\).

It is clear that (1)–(3) are preserved for \(\mathcal {T}_i\). Property (4) also carries over for vertices other than \(\alpha \). To verify (4) for \(\alpha \), let \(a \in \Omega _{\le N-i} \cap W\). Then \(a \mu (x) \in W' \cap \Omega _{\le N-(i-1)}\) and hence \(a\mu (x) \in W_j'\) for some j by (4). Thus \(a \in W_j\).

To see that this process terminates, note that each \(\mathcal {T}_i\) is s-regular (except for the leaves) of height at most m, and hence has at most \(s^{m}\) vertices. Since each \(\mathcal {T}_i\) has s vertices more than \(\mathcal {T}_{i-1}\), the process terminates after at most \(s^{m-1}\) steps, that is \(l \le s^{m-1}+1\).

Instead of (3), the final graph \(\mathcal {T}_l\) has the stronger property that if W labels a leaf and \(x \in X\), then there exists \(W'\) labeling a leaf of \(\mathcal {T}_l\) such that \(W \mu (x) \subseteq W'\). Defining \(Y'\) to be the union of all labels of leaves of \(T_l\), this implies \(Y'\mu (x) \subseteq Y'\) for each \(x \in X\) and hence \(Y' \mu (X^*) \subseteq Y'\). As \(Y'\) contains \(u \in \Omega _{\le N - s^{m-1} -1}\), this implies \(\Omega \subseteq Y' \subseteq Y\). \(\square \)

We give an example in dimension \(n=3\) with \(\mathcal {T}_2 \ne \mathcal {T}_1\), illustrating the iterative construction in the previous proof.

Example 4.8

Let \(e_1\)\(e_2\)\(e_3 \in K^{1\times 3}\) be the standard unit vectors and let \(X=\{a\}\). Consider \((u,\mu ,v)\) with \(u=e_1\), with \(v=e_1^T\), and with \(\mu (a) \in K^{3\times 3}\) the permutation matrix defined by \(e_1\mu (a)=e_2\), \(e_2\mu (a)=e_3\), and \(e_3\mu (a)=e_1\). Clearly \(\Omega = \{e_1,e_2,e_3\}\) and \({{\overline{\Omega }}} = \langle e_1 \rangle \cup \langle e_2 \rangle \cup \langle e_3 \rangle \). Let \(Y=\langle e_1,e_2 \rangle \cup \langle e_2,e_3 \rangle \) and note \(\Omega \subseteq Y\).

Now \(\mathcal {T}_1\) has two roots, labeled by \(W_{1,2} :=\langle e_1,e_2 \rangle \) and \(W_{2,3} :=\langle e_2,e_3\rangle \). Since \(W_{1,2}\mu (a) = W_{2,3}\), no children are attached to \(W_{1,2}\) in \(\mathcal {T}_1\). However \(W_{2,3}\mu (a) = \langle e_1,e_3\rangle \not \subseteq Y\). Thus, in \(\mathcal {T}_1\), two children are attached to the root labeled by \(W_{2,3}\). These children are labeled by \(\langle e_2 \rangle = \{\,w \in W_{2,3} : w\mu (a) \subseteq W_{2,3} \,\}\) and \(\langle e_3 \rangle = \{\, w \in W_{2,3} : w\mu (a) \in W_{1,2} \,\}\).

In the next step, we observe that \(W_{1,2}\) is not mapped into labels of leaves of \(\mathcal {T}_1\) by \(\mu (a)\). Thus, to the vertex labeled by \(W_{1,2}\), we attach two new children labeled by \(\langle e_1 \rangle = \{\, w \in W_{1,2} : w \mu (a) \in \langle e_2 \rangle \,\}\) and \(\langle e_2 \rangle = \{\, w \in W_{1,2} : w \mu (a) \in \langle e_3 \rangle \,\}\). (There are two different vertices with label \(\langle e_2 \rangle \), one attached to each of the two roots.) At this point the process stops, because every label of a leaf is mapped into a label of a leaf by \(\mu (a)\).

Lemma 4.9

Let V be an irreducible component of \({{\overline{\Omega }}}\). Let \(\pi :V \rightarrow W\) be an epimorphism with \(\dim W \ge 2\). If \(Y \subsetneq W\) is a closed subset, then there exists \(C \in \mathbb {R}_{>0}\) such that

$$\begin{aligned} {|}\{\, a \in \mathbb {P}(\pi (\Omega ) \setminus Y) : h_W(a) \le N \, \}{|} \ \ge \ C N^{\frac{1}{e{|}X{|}}} \qquad \text {for } N \in \mathbb {Z}_{\ge 0}, \end{aligned}$$

where \(e = \max \{\dim Y, 1\} + n - m\).

Proof

Extend \(\pi \) to \(\pi :K^{1\times n} \rightarrow W\). Let \(C > 0\) be such that \(h(a\mu (x)) \le h(a) + C\) for all \(x \in X\) and \(a \in K^{1 \times n} \setminus \ker (\mu (x))\). We may moreover assume \(h_W(\pi (a)) \le h(a) + C\) for all \(a \in K^{1\times n} \setminus \ker (\pi )\). If \(a \in \Omega _{\le M} \setminus \ker (\pi )\), then \(h_W(\pi (a)) \le h(u) + (M+1)C\). Choosing \(M = (N - h(u)) / C - 1\) we find that all \(a \in \Omega _{\le M} \setminus \ker (\pi )\) have \(h_W(\pi (a)) \le N\). Thus it suffices to show that \(\mathbb {P}(\pi (\Omega _{\le M}) \setminus Y)\) contains at least \(C' M^{\frac{1}{e{|}X{|}}}\) points for some \(C' > 0\).

If \(\mathbb {P}(\pi (\Omega _{\le M}) \setminus Y)\) contains l points, then \(\Omega _{\le M} \subseteq \pi ^{-1}(Y) \cup \pi ^{-1}(P_1) \cup \cdots \cup \pi ^{-1}(P_l)\) for some 1-dimensional vector spaces \(P_1\), \(\ldots \,\)\(P_l\). Since \(\dim Y < m\) and \(\overline{\pi (\Omega )}=W\), we have \(\Omega \not \subseteq \pi ^{-1}(Y) \cup \pi ^{-1}(P_1) \cup \cdots \cup \pi ^{-1}(P_l)\).

By Lemma 4.7, for \(l \ge \max \{1,k\}\) with \(k = {|}\mathcal {Z}(Y){|}\),

$$\begin{aligned} M < (l+k)^{(e-1){|}X{|}} + 1 \le (2l)^{e{|}X{|}}.\square \end{aligned}$$

5 Proof of (a)\(\,\Rightarrow \,\)(b)

Having made the necessary preparations, in this section we prove (a)\(\,\Rightarrow \,\)(b) of Theorem 1.2. Let \(\mathcal {P}(Q)\) denote the power set of a set Q. The following lemmas very closely parallel the corresponding results on semi-monomial matrices used in the proof of a decomposition theorem for rational functions between free monoids; see [22, Chapter V.2] and Remark 5.5.

Lemma 5.1

Let \(\mathcal {A}=(Q,I,E,T)\) be a weighted automaton on the alphabet X with coefficients in R. Suppose that there exists \(\mathcal {S}\subseteq \mathcal {P}(Q)\) such that \(Q = \bigcup _{M \in \mathcal {S}} M\), and all of the following conditions are satisfied.

  1. (1)

    There exists an \(M \in \mathcal {S}\) containing all initial states of \(\mathcal {A}\).

  2. (2)

    For \(M \in \mathcal {S}\) and \(x \in X\), there exists an \(N \in \mathcal {S}\) such that, whenever there is an edge from \(p \in M\) to \(q \in Q\) labeled by x, then \(q \in N\).

  3. (3)

    For every state \(q \in Q\), \(x \in X\), and \(M \in \mathcal {S}\), there exists at most one state \(p \in M\) that has an edge from p to q labeled by x.

  4. (4)

    Every \(M \in \mathcal {S}\) contains at most one terminal state.

Then \(\mathcal {A}\) is unambiguous.

Proof

We need to show that for a word \(w=a_1\cdots a_l \in X^*\) with \(a_1\), \(\ldots \,\)\(a_d \in X\), there exists at most one accepting path in \(\mathcal {A}\) that is labeled by w. Suppose that there are two accepting paths

$$\begin{aligned} (p_0,a_1,p_1) (p_1,a_2,p_2) \cdots (p_{l-1},a_l,p_l) \quad \text {and}\quad (q_0,a_1,q_1) (q_1,a_2,q_2) \cdots (q_{l-1},a_l,q_l). \end{aligned}$$

We first show that for every \(j \in [0,l]\), there exists a set \(M_j \in \mathcal {S}\) with \(p_j\)\(q_j \in M_j\). For \(j=0\), note that \(p_0\) and \(q_0\) are initial states, hence by (1), there exists \(M_0 \in \mathcal {S}\) with \(p_0\)\(q_0 \in M_0\). Now, if \(j \in [1,l]\) and \(p_{j-1}\)\(q_{j-1} \in M_{j-1}\), then (2) implies that there exists \(M_j \in \mathcal {S}\) with \(p_j\)\(q_j \in M_j\).

Since the paths are accepting, \(p_l\) and \(q_l\) are terminal states. Since \(p_l\), \(q_l \in M_l\), condition (4) implies \(p_l=q_l\). If \(p_j=q_j\) for some \(j \in [1,l]\), then, since we already know \(p_{j-1}\)\(q_{j-1} \in M_{j-1}\), condition (3) implies \(p_{j-1}=q_{j-1}\). Thus, altogether we have \(p_j=q_j\) for all \(j\in [1,l]\) and hence the two paths are the same. Thus we have shown that \(\mathcal {A}\) is unambiguous. \(\square \)

For the statement of the next lemma we fix the following notation: Let \(e_1\), \(\ldots \,\)\(e_n\) denote the standard basis vectors of \(R^{1\times n}\). For \(M \subseteq [1,n]\) we set \(V(M) = \langle e_\nu : \nu \in M \rangle _K\). The subscripts \(v_i\) and \(\mu (x)_{\nu ,j}\) below refer to the respective coordinates.

Lemma 5.2

Let \((u, \mu , v)\) be a linear representation of rank n with coefficients in R. Suppose that there exists \(\mathcal {S}\subseteq \mathcal {P}([1,n])\) such that \([1,n] = \bigcup _{M \in \mathcal {S}} M\), and all of the following conditions are satisfied.

  1. (1)

    There exists an \(M \in \mathcal {S}\) with \(u \in V(M)\).

  2. (2)

    For every \(M \in \mathcal {S}\) and \(x \in X\), there exists \(N \in \mathcal {S}\) with \(V(M) \mu (x) \subseteq V(N)\).

  3. (3)

    For every \(M \in \mathcal {S}\), \(x \in X\), and \(j \in [1,n]\), there exists at most one \(\nu \in M\) with \(\mu (x)_{\nu ,j} \ne 0\).

  4. (4)

    For every \(M \in \mathcal {S}\) there exists at most one \(\nu \in M\) with \(v_\nu \ne 0\).

Then the weighted automaton \(\mathcal {A}\) associated to the linear representation \((u, \mu , v)\) is unambiguous.

Proof

Following the construction of the associated automaton \(\mathcal {A}\), the conditions above translate directly into the ones of Lemma 5.1. \(\square \)

We are now ready to prove the main implication over a field.

Proposition 5.3

Every rational Pólya series over K is recognized by an unambiguous weighted automaton (with weights in K).

Proof

Let S be a rational Pólya series, and let \((u, \mu , v)\) be a minimal linear representation of S, chosen as in Lemma 3.11. Hence \(\Omega :=u \mu (X^*) \subseteq G_0^{1\times n}\) and \(\Omega v \subseteq G_0\). If \(n=0\), then \(S=0\), and S is recognized by the trivial automaton with empty set of states. To avoid this corner case, from now on assume \(n \ge 1\). Then \(v = e_1 \in K^{n \times 1}\) is the first standard basis vector by Lemma 3.11.

We may replace K by the field generated by all coefficients in u, v, and \(\mu (x)\) for \(x \in X\). Thus, we may without restriction assume that K is finitely generated over its prime field. We may also assume that K is infinite; otherwise Proposition 3.14 implies the even stronger claim that S is recognized by a deterministic automaton.

Applying Lemma 3.13, we can assume \(K^{1\times n} = W_1 \oplus \cdots \oplus W_k\) with \(W_i = \langle e_{m_1+\cdots +m_{i-1}+1}, \ldots , e_{m_1+\cdots +m_i} \rangle _K\) and \(\mathcal {Z}({{\overline{\Omega }}}) = \{ W_1, \ldots , W_k \}\). Without restriction \(u \in W_1\). Note that, taking \(\Gamma =G_0\) in Lemma 3.13, also the properties \(\Omega \subseteq G_0^{1\times n}\) and \(\Omega v \subseteq G_0\) are preserved by this change of linear representation.

For \(i \in [1,k]\) let

$$\begin{aligned} M_i = [m_1+\cdots +m_{i-1}+1, m_1+\cdots +m_i], \end{aligned}$$

so that \(\{\, e_\nu : \nu \in M_i \,\}\) is a basis for \(W_i\). We show that the linear representation \((u,\mu ,v)\), with

$$\begin{aligned} \mathcal {S}= \{\, M_i : i \in [1,k] \,\} \end{aligned}$$

satisfies the conditions of Lemma 5.2, from which the claim will follow.

Indeed, (1) holds for \(M_1\) since \(u \in W_1\). Since \(\mu (x)\) is continuous, statement (3) of Lemma 3.5 implies that for every \(i \in [1,k]\) and \(x \in X\), there exists \(j \in [1,k]\) such that \(W_i \mu (x) \subseteq W_j\). This implies (2).

Let \(i \in [1,k]\), let \(x \in X\), and let \(j \in [1,n]\). Let \(\varphi :W_i \rightarrow K\) be defined by \(\varphi (a) = a \mu (x) e_j^T\). By (5) of Lemma 3.5, the set \(\Omega \cap W_i\) is dense in \(W_i\). Moreover \(\Omega \cap W_i \subseteq \sum _{\nu \in M_i} G_0 e_\nu \) since \(\Omega \subseteq G_0^{1 \times n}\). Finally, if \(a \in \Omega \), then \(a \mu (x) \in \Omega \subseteq G_0^{1\times n}\), so that \(\varphi (\Omega \cap W_i) \subseteq G_0\). If K has characteristic 0, we can thus apply Lemma 4.2 to the vector space \(W_i\), its dense subset \(\Omega \cap W_i\), and the homomorphism \(\varphi :W_i \rightarrow K\). We conclude that there exists at most one \(\nu \in M_i\) with \(0 \ne \varphi (e_\nu ) = e_\nu \mu (x) e_j^T\). Since \(e_\nu \mu (x) e_j^T\) is the \((\nu ,j)\)-entry of the matrix \(\mu (x)\), there is at most one \(\nu \in M_i\) with \(\mu (x)_{\nu ,j} \ne 0\). Thus (3) holds in characteristic 0.

If K has positive characteristic, we apply Lemma 4.5 instead of Lemma 4.2; the additional condition in this lemma is satisfied by Lemma 4.9. This shows (3) in positive characteristic.

Similarly, applying Lemma 4.2 in characteristic 0 (respectively Lemma 4.5 together with Lemma 4.9 in positive characteristic) to the map \(W_i \rightarrow K\) given by \(a \mapsto av\), we find that there exists at most one \(\nu \in M_i\) with \(v_\nu \ne 0\), implying (4). \(\square \)

Example 5.4

(Continuation of Example 3.7.) We illustrate the construction of the previous proof using the linear representation from Example 3.7. The linear hull decomposes as \({{\overline{\Omega }}} = W_1 \cup W_2\) with \(W_1 = \langle e_1+e_2,e_3 \rangle \) and \(W_2 = \langle e_1-e_2,e_3 \rangle \). Now \(W_1 \oplus W_2 \cong K^{1\times 4} = \langle e_1', \ldots , e_4' \rangle \), where we fix the embeddings \(W_1 \hookrightarrow K^{1 \times 4}\), \(e_1+e_2 \mapsto e_1'\) and \(e_3 \mapsto e_2'\), as well as \(W_2 \hookrightarrow K^{1\times 4}\), \(e_1-e_2 \mapsto e_3'\) and \(e_3 \mapsto e_4'\). For every \(x \in \{a,b,c\}\) we need to choose \(f_x:\{W_1,W_2\} \rightarrow \{W_1,W_2\}\) such that \(W_i\mu (x)\subseteq f_x(W_i)\) for \(i \in \{1,2\}\). For ab there is a unique such choice. Since \(W_i\mu (c) \subseteq W_1 \cap W_2\), there are several choices for c and we pick \(f_c(W_i)=W_i\) for \(i \in \{1,2\}\). This choice fixes a deterministic automaton describing the transitions between irreducible components (left side of Fig. 2).

The newly constructed linear representation on \(K^{1\times 4}\) is given by \((u',\mu ',v')\) with \(u'=(1,1,0,0)\), with \(v'=(2,0,0,0)^T\), and with

Here the block structure is determined by the choice of transitions between irreducible components, e.g., a different choice of \(f_c\) would yield a different matrix \(\mu '(c)\). The resulting automaton is depicted in the right side of Fig. 2.

Fig. 2
figure 2

(Example 5.4) Left: Our choice of transitions between irreducible components can be depicted as a deterministic automaton. Right: An unambiguous weighted automaton recognizing the same series as in Example 3.7 and Fig. 1

Remark 5.5

Let XY be finite sets. A rational function is a function \(f:X^* \rightarrow Y^*\) whose graph is a rational subset of \(X^* \times Y^*\). By the Decomposition Theorem of Elgot–Mezei every such rational function is a composition of a (pure) sequential function with a (pure) co-sequential function [22, Chapter V.2] (varying terminology is used, see for instance [1]; we follow Sakarovitch).

One of the proofs of this theorem produces, through the use of the Schützenberger covering, a semi-monomial linear representation. This is the same type of block-matrix structure we have obtained here. Consequently we can obtain an analogous decomposition of the weighted automaton: every Pólya series is recognized by a weighted finite automaton that is a composition of a sequential function followed by a co-deterministic weighted automaton. (A weighted automaton is co-deterministic if there is unique final state and for every \(x \in X\) and \(q \in Q\) there is at most one \(p \in Q\) with \(E(p,x,q) \ne 0\).) As the construction is very similar to the one in [22, Chapter V.2.2], we omit the details.

If X is a singleton, then series recognized by an unambiguous weighted automaton have a particularly simple shape. In this way we will recover the full univariate result of Pólya, Benzaghou, and Bézivin [2, 7, 18].

Proposition 5.6

Suppose \(X=\{x\}\) consists of a single element, let \(\mathcal {A}\) be an unambiguous weighted automaton with weights in R, and let \(S \in R\langle \langle X \rangle \rangle \) be the series it recognizes. Then there exist a finite set \(F \subseteq \mathbb {Z}_{\ge 0}\), an element \(d \in \mathbb {Z}_{\ge 0}\), and for each \(r \in [0,d-1]\) elements \(a_r \in R\) and \(b_r \in R \setminus \{0\}\) such that

$$\begin{aligned} S(x^{kd+r}) = a_r b_r^k \qquad \text {for all } k \in \mathbb {Z}_{\ge 0} \text { and } r \in [0,d-1] \text { with } xd+r \not \in F. \end{aligned}$$

Proof

We may assume that \(\mathcal {A}\) is trim. Then, for any two states p, q and any \(n \ge 0\) there exists at most one path from p to q labeled by \(x^n\).

Let m be the maximal length of an acyclic path in \(\mathcal {A}\), that is, a path that does not visit any vertex twice. Then any cycle, that is, a path whose only repeated vertices are the first and the last one, has length bounded by \(m+1\). Let d be a common multiple of all lengths of cycles in \(\mathcal {A}\), e.g., \(d = (m+1)!\). Let \(r \in \{0,\ldots ,d-1\}\) and let us consider the claim for \(S(x^{kd+r})\) with \(k \ge 0\). If \(S(x^{kd+r}) =0\) for all but finitely many \(d \ge 1\), the claim holds with \(a_r=0\). Otherwise, let \(k_0 \ge 1\) with \(k_0d+r > m\) and \(S(x^{k_0d+r}) \ne 0\). The (unique) accepting path labeled by \(x^{k_0d+r}\) must contain a cycle, and so is of the form pcq with c a cycle of length l dividing d, and p, q paths. Let \(e \in \mathbb {N}\) with \(d=le\), let \(a_r :=S(x^{pq})\) and let \(b_r\) be the product of the weights along \(c^e\). For all \(n \ge 0\), the path \(pc(c^e)^nq\) is the unique accepting path for \(x^{(k_0+n)d+r}\). Hence \(S(x^{(k_0+n)d+r}) = a_r b_r^n\) for all \(n \ge 0\). Thus \(S(x^{kd+r}) = (a_rb_r^{-k_0}) b_r^k\) for all \(k \ge k_0\). \(\square \)

6 Proof of (b)\(\,\Leftrightarrow \,\)(c)

The following proof very closely follows [14, Proposition 1.3.5], where the same result is proved for deterministic automata without weights. A language \(\mathcal {L}\subseteq X^*\) is a code if the elements of \(\mathcal {L}\) are a basis of a free submonoid of \(X^*\).

Proposition 6.1

(Reutenauer). If a rational series \(S \in R\langle \langle X \rangle \rangle \) is recognized by an unambiguous weighted automaton with weights in R, then S is unambiguous over R.

Proof

Let \(\mathcal {A}\) be an unambiguous weighted automaton that recognizes S. We may without restriction assume that \(\mathcal {A}\) is trim. Then, for any two states p, q and any word \(w \in X^*\) there exists at most one path from p to q labeled by w.

For states p\(q \in Q\) and a set \(P \subseteq Q\) define

$$\begin{aligned} S_{p,P,q} = \sum _{\begin{array}{c} p_1, \ldots , p_{l-1} \in P\\ p=p_0,\, p_l=q\\ a_1,\ldots ,a_l \in X\\ l \ge 1 \end{array}} E(p_0,a_1,p_1) \cdots E(p_{l-1},a_l,p_l)a_1\cdots a_l. \end{aligned}$$

In words, the sum is taken over all non-empty paths from p to q with the property that all states strictly in-between are in P. Since \(\mathcal {A}\) is unambiguous, the words in \({{\,\mathrm{supp}\,}}(S_{p,P,q})\) are in bijective correspondence with non-empty paths from p to q.

Then

$$\begin{aligned} S = \sum _{p, q \in Q} I(p)S_{p,Q,q}T(q) + \sum _{p \in Q} I(p)T(p), \end{aligned}$$

and the finite sum on the left is unambiguous because \(\mathcal {A}\) is unambiguous.

It suffices to show that each \(S_{p,P,q}\) is unambiguous, and we do so by induction on \({|}P{|}\). If \(P = \emptyset \), then \(S_{p,P,q}\) is a polynomial and hence unambiguous. If \(r \not \in P\), then

$$\begin{aligned} S_{p,P \cup \{r\},q} = S_{p,P,q} + S_{p,P,r} S_{r,P,r}^* S_{r,P,q}. \end{aligned}$$

Note that \({{\,\mathrm{supp}\,}}(S_{r,P,r})\) consist of the words labeling first returns of r, that is, non-empty paths starting and ending at r that do not pass through r in-between. Using that \(\mathcal {A}\) is unambiguous, it is easily seen that the words in \({{\,\mathrm{supp}\,}}(S_{r,P,r})\) are a code. Hence \(S_{r,P,r}^*\) is unambiguous. Similarly, we see that the products and the sum are unambiguous, by looking at when a path passes through r. \(\square \)

The converse of the previous implication is also easy to see.

Lemma 6.2

If \(S \in R\langle \langle X \rangle \rangle \) is unambiguous rational, then there exists an unambiguous weighted automaton with weights in R that recognizes S.

Proof

A suitable weighted automaton can inductively be constructed from an unambiguous rational decomposition of S. \(\square \)

7 Proof of (c)\(\,\Rightarrow \,\)(d)

A clever proof of (c)\(\,\Rightarrow \,\)(d) of Theorem 1.2 is given by Reutenauer in the proof of [19, Proposition 4, (iii)\(\,\Rightarrow \,\)(ii)]. We opt to give an alternative, somewhat longer but very straightforward, proof of the same result.

Lemma 7.1

Let A\(B \in \mathbb {Z}\langle \langle X \rangle \rangle \), and let \(\mathcal {L}\)\(\mathcal {K}\) be rational languages with \({{\,\mathrm{supp}\,}}(A) \subseteq \mathcal {L}\) and \({{\,\mathrm{supp}\,}}(B) \subseteq \mathcal {K}\).

  1. (1)

    Suppose \(\mathcal {K}\cap \mathcal {L}= \emptyset \). Then \(C= A+B\) is a rational series with \({{\,\mathrm{supp}\,}}(C) \subseteq \mathcal {K}\cup \mathcal {L}\) and

    $$\begin{aligned} C(w) = {\left\{ \begin{array}{ll} A(w) &{}\text {if } w \in \mathcal {L}, \\ B(w) &{}\text {if } w \in \mathcal {K}. \end{array}\right. } \end{aligned}$$
  2. (2)

    Suppose \(\mathcal {L}\mathcal {K}\) is unambiguous. Then \(C=A \mathbb {1}_{{\mathcal {K}}} + \mathbb {1}_{{\mathcal {L}}} B\) is a rational series with \({{\,\mathrm{supp}\,}}(C) \subseteq \mathcal {L}\mathcal {K}\). For \(w=uv\) with \(u \in \mathcal {L}\), \(v \in \mathcal {K}\),

    $$\begin{aligned} C(w) = A(u) + B(v). \end{aligned}$$
  3. (3)

    Suppose that \(\mathcal {L}\) is a code. Then

    $$\begin{aligned} C = (1 - \mathbb {1}_{{\mathcal {L}^*}} A)\big ((\mathbb {1}_{{\mathcal {L}}} + A)^* - \mathbb {1}_{{\mathcal {L}^*}}\big ) \end{aligned}$$

    is a rational series with \({{\,\mathrm{supp}\,}}(C) \subseteq \mathcal {L}^*\). For \(w=w_1\cdots w_l\) with \(w_1\), \(\ldots \,\)\(w_l \in \mathcal {L}\),

    $$\begin{aligned} C(w) = A(w_1) + \cdots + A(w_l). \end{aligned}$$

Proof

Throughout, we use that the characteristic series \(\mathbb {1}_{{\mathcal {L}}} \in \mathbb {Z}\langle \langle X\rangle \rangle \) of a rational language \(\mathcal {L}\) is rational.

  1. (1)

    Clear.

  2. (2)

    For \(w \in X^*\) we have \(A \mathbb {1}_{{\mathcal {K}}}(w) = \sum _{w=uv} A(u)\mathbb {1}_{{\mathcal {K}}}(v)\). A term \(A(u)\mathbb {1}_{{\mathcal {K}}}(v)\) is nonzero if and only if \(u \in {{\,\mathrm{supp}\,}}(A) \subseteq \mathcal {L}\) and \(v \in \mathcal {K}\). Since \(\mathcal {L}\mathcal {K}\) is unambiguous there is at most one such term. Thus \(A \mathbb {1}_{{\mathcal {K}}}(w) = A(u)\) if \(w \in \mathcal {L}\mathcal {K}\) with \(w=uv\) where \(u \in \mathcal {L}\), \(v \in \mathcal {K}\), and \(A \mathbb {1}_{{\mathcal {K}}}(w)=0\) if \(w \not \in \mathcal {L}\mathcal {K}\). An analogous claim holds for \(\mathbb {1}_{{\mathcal {L}}}{B}\).

  3. (3)

    For \(w \in \mathcal {L}^*\) there are uniquely determined \(w_1\), \(\ldots \,\)\(w_l \in \mathcal {L}\) with \(w=w_1\cdots w_l\). Then

    $$\begin{aligned} (\mathbb {1}_{{\mathcal {L}}} + A)^*(w) = \sum _{k=0}^l \sum _{1 \le i_1< \cdots < i_k \le l} A(w_{i_1}) \cdots A(w_{i_k}). \end{aligned}$$

    For \(D = (\mathbb {1}_{{\mathcal {L}}} + A)^* - \mathbb {1}_{{\mathcal {L}^*}}\) we obtain an analogous sum with \(k \in [1,l]\).

Now, \(\mathbb {1}_{{\mathcal {L}^*}} A(1)=0\) since \(1 \not \in \mathcal {L}\) and, for \(j \ge 1\),

$$\begin{aligned} \mathbb {1}_{{\mathcal {L}^*}} A(w_1 \cdots w_j) = \sum _{i=0}^j \mathbb {1}_{{\mathcal {L}^*}}(w_1\cdots w_i) A(w_{i+1} \cdots w_j) = A(w_j). \end{aligned}$$

Therefore

$$\begin{aligned} \begin{aligned} \mathbb {1}_{{\mathcal {L}^*}} AD(w)&= \sum _{j=1}^l A(w_j)D(w_{j+1}\cdots w_l) =\\&= \sum _{j=1}^l A(w_j) \sum _{k=1}^{l-j} \sum _{j+1 \le i_1< \cdots< i_k \le l} A(w_{i_1}) \cdots A(w_{i_k}) \\&= \sum _{k=2}^l \sum _{1 \le i_1< \cdots < i_k \le l} A(w_{i_1}) \cdots A(w_{i_k}). \end{aligned} \end{aligned}$$

Thus \((1-\mathbb {1}_{{\mathcal {L}^*}}A)D(w) = \sum _{k=1}^l A(w_k)\). \(\square \)

A series \(a \in \mathbb {Z}\langle \langle X\rangle \rangle \) is linearly bounded if there exists \(C \ge 0\) such that \({|}a(w){|} \le C {|}w{|}\) for all nonempty words \(w \in X^*\).

Proposition 7.2

Let \(S \in R \langle \langle X \rangle \rangle \) be an unambiguous rational series. Then there exist \(\lambda _1\), \(\ldots \,\)\(\lambda _k \in R \setminus \{0\}\), linearly bounded rational series \(a_1\), \(\ldots \,\)\(a_k \in \mathbb {Z}\langle \langle X^*\rangle \rangle \), and a rational language \(\mathcal {L}\) such that \({{\,\mathrm{supp}\,}}(a_i) \subseteq \mathcal {L}\) for all \(i \in [1,k]\) and

$$\begin{aligned} S(w) = {\left\{ \begin{array}{ll} \lambda _1^{a_1(w)} \cdots \lambda _k^{a_k(w)} &{} \text {if } w \in \mathcal {L},\\ 0 &{}\text {if } w \not \in \mathcal {L}. \end{array}\right. } \end{aligned}$$

Proof

The claim is trivially true if S is a polynomial. We show that the property is preserved under unambiguous \(+\), \(\cdot \), and \({}^*\) constructions.

Let S, T be rational series such that there exist \(\lambda _1\), \(\ldots \,\), \(\lambda _k \in R \setminus \{0\}\), linearly bounded rational series \(a_1\), \(\ldots \,\)\(a_k\)\(b_1\), \(\ldots \,\)\(b_k \in \mathbb {Z}\langle \langle X \rangle \rangle \), and rational languages \(\mathcal {L}\), \(\mathcal {K}\) such that \({{\,\mathrm{supp}\,}}(a_i) \subseteq \mathcal {L}\), \({{\,\mathrm{supp}\,}}(b_i) \subseteq \mathcal {K}\), and

$$\begin{aligned} S(w) = {\left\{ \begin{array}{ll} \lambda _1^{a_1(w)} \cdots \lambda _k^{a_k(w)} &{} \text {if } w \in \mathcal {L},\\ 0 &{}\text {if } w \not \in \mathcal {L}; \end{array}\right. } \\ T(w) = {\left\{ \begin{array}{ll} \lambda _1^{b_1(w)} \cdots \lambda _k^{b_k(w)} &{} \text {if } w \in \mathcal {K},\\ 0 &{}\text {if } w \not \in \mathcal {K}. \end{array}\right. } \end{aligned}$$

(We can assume that the \(\lambda _i\)’s are the same, as we can always extend the set of constants, and set \(a_i = 0\), respectively, \(b_i = 0\), if \(\lambda _i\) does not appear in the expression for S, respectively, T.)

We first consider \(S + T\) with \(\mathcal {L}\cap \mathcal {K}= \emptyset \). Then \(\mathcal {L}\cup \mathcal {K}\) is a rational language, \(S+T(w)=0\) if \(w \not \in \mathcal {L}\cup \mathcal {K}\), and

$$\begin{aligned} (S+T)(w) = S(w) + T(w) = {\left\{ \begin{array}{ll} S(w) = \lambda _1^{a_1(w)} \cdots \lambda _k^{a_k(w)} &{}\text {if } w \in \mathcal {L},\\ T(w) = \lambda _1^{b_1(w)} \cdots \lambda _k^{b_k(w)} &{}\text {if } w \in \mathcal {K}. \end{array}\right. } \end{aligned}$$

Since \(\mathcal {L}\cap \mathcal {K}= \emptyset \), we get \((S+T)(w) = \lambda _1^{a_1(w)+b_1(w)} \cdots \lambda _k^{a_k(w)+b_k(w)}\) for all \(w \in \mathcal {L}\cup \mathcal {K}\). Clearly \(a_i + b_i\) is a linearly bounded rational series, and \({{\,\mathrm{supp}\,}}(a_i+b_i) \subseteq \mathcal {L}\cup \mathcal {K}\).

Now consider ST with \(\mathcal {L}\mathcal {K}\) unambiguous. Then \(\mathcal {L}\mathcal {K}\) is a rational language, and for \(w=uv\) with \(u \in \mathcal {L}\), \(v \in \mathcal {K}\),

$$\begin{aligned} (ST)(w) = S(u)T(v) = \lambda _1^{a_1(u)+b_1(v)} \cdots \lambda _k^{a_k(u) + b_k(v)}. \end{aligned}$$

Define series \(c_i\) by \(c_i(uv) = a_i(u) + b_i(v)\) if \(w=uv \in \mathcal {L}\mathcal {K}\) with \(u \in \mathcal {L}\), \(v \in \mathcal {K}\), and \(c_i(w) =0\) for \(w \not \in \mathcal {L}\mathcal {K}\). Clearly \(c_i\) is linearly bounded. Since \(\mathcal {L}\mathcal {K}\) is unambiguous, Lemma 7.1 implies that \(c_i\) is rational.

Now suppose that \(\mathcal {L}= {{\,\mathrm{supp}\,}}(S)\) is a code and consider \(S^*\). Then \(\mathcal {L}^*\) is a rational language. For \(w \in \mathcal {L}^*\) there exist uniquely determined \(w_1\), \(\ldots \,\)\(w_l \in \mathcal {L}\) with \(w=w_1\cdots w_l\). We have

$$\begin{aligned} S^*(w) = S^*(w_1\cdots w_l) = S(w_1)\cdots S(w_l) = \lambda _1^{a_1(w_1) + \cdots + a_1(w_l)} \cdots \lambda _k^{a_k(w_1) + \cdots + a_k(w_l)}. \end{aligned}$$

Define

$$\begin{aligned} c_i(w_1 \cdots w_l) = a_i(w_1) + \cdots + a_i(w_l). \end{aligned}$$

and \(c_i(w) = 0\) if \(w \not \in \mathcal {L}^*\). Then \(c_i\) is linearly bounded and, by Lemma 7.1, again rational. \(\square \)

8 Hadamard sub-invertibility

It is known that every unambiguous rational series is Hadamard sub-invertible, and every Hadamard sub-invertible rational series is a Pólya series [6, Exercise 3.1 of Chapter 6]. This is particularly easy for \(K=\mathbb {Q}\). For arbitrary fields, the same argument works but requires a theorem of Roquette; hence we give the proof in full.

Lemma 8.1

Every unambiguous rational series is Hadamard sub-invertible.

Proof

Every noncommutative polynomial is Hadamard sub-invertible. If ST are Hadamard sub-invertible, then it is easy to see that the unambiguous sums, products, and star operations preserve this property. \(\square \)

Lemma 8.2

Every Hadamard sub-invertible series is a Pólya series.

Proof

Let \(S \in R\langle \langle X \rangle \rangle \subseteq K\langle \langle X \rangle \rangle \) be a Hadamard sub-invertible rational series. If \({{\,\mathrm{char}\,}}K > 0\), let k be the (finite) prime field of K; if \({{\,\mathrm{char}\,}}K = 0\), let \(k = \mathbb {Z}\). It is immediate from the definition of a rational series that there exists a finitely generated k-subalgebra A of K containing all coefficients of S. Since \(\sum _{w \in {{\,\mathrm{supp}\,}}(S)} S(w)^{-1} w\) is also rational, we may moreover assume that A also contains all \(S(w)^{-1}\) with \(S(w) \ne 0\). Hence, the nonzero coefficients of S are contained in \(A^\times \). The group \(A^\times \) is finitely generated by a theorem of Roquette [13, Corollary 7.5] in characteristic 0, and a slightly easier argument in positive characteristic [13, Corollary 7.3]. \(\square \)

9 Putting it all together

The proofs of Theorems 1.1 and 1.2 in the case where \(R=K\) is a field are now a formality. To obtain the more general result for domains, we first need to extend Proposition 5.3 to completely integrally closed domains.

Proposition 9.1

Let R be a completely integrally closed domain. Every rational Pólya series in \(R\langle \langle X \rangle \rangle \) is unambiguous rational (over R).

Proof

Let K be the quotient field of R. If \(S \in R\langle \langle X \rangle \rangle \) is a rational Pólya series, Proposition 5.3 together with Proposition 6.1 shows that S is unambiguous rational as a series over K. What remains to be shown is that we can obtain an unambiguous rational decomposition of S in such a way that all the component series have their coefficients in R.

For a domain A, let \(U_0(A) = A\langle X \rangle \). For \(k \ge 1\), inductively define \(U_k(A)\) as the set of all series obtained as unambiguous sums (that is, having pairwise disjoint support) of series of the form

$$\begin{aligned} S = \lambda w_0 S_1^* w_1 S_2^* w_2 \cdots S_l^* w_l, \end{aligned}$$
(7)

with \(l \ge 0\), with \(0 \ne \lambda \in A\), with \(w_0\), \(\ldots \,\)\(w_l \in X^*\), with \(S_1\), \(\ldots \,\)\(S_l \in \bigcup _{k'=0}^{k-1} U_{k'}(A) \setminus \{0\}\) and satisfying the following conditions:

  1. (1)

    The operations \(S_i^*\) are unambiguous, that is \({{\,\mathrm{supp}\,}}(S_i)\) is a code, for all \(i \in [1,l]\), and

  2. (2)

    the products in (7) are unambiguous, that is, for every word \(w \in {{\,\mathrm{supp}\,}}(S)\) and every \(i \in [1,l]\), there exists a unique word \(w_i' \in {{\,\mathrm{supp}\,}}(S_i^*)\) such that \(w=w_0w_1'w_1 \cdots w_l'w_l\).

Let \(\mathcal {U}(A)\) denote the set of all unambiguous sums of elements of \(\bigcup _{k\ge 0} U_k(A)\). By construction, each series in \(\mathcal {U}(A)\) is unambiguous rational. Note that \(A\langle X \rangle \subseteq \mathcal {U}(A)\) and that \(\mathcal {U}(A)\) is closed under unambiguous sums and the unambiguous star operation. Moreover, by distributivity we see that \(\mathcal {U}(A)\) is closed under unambiguous products. Thus \(\mathcal {U}(A)\) is the set of all unambiguous rational series over A.

To conclude the proof of the proposition, we show \(U_k(K) \cap R\langle \langle X \rangle \rangle = U_k(R)\) by induction on k. For \(k=0\) the claim is trivial because \(K\langle X \rangle \cap R \langle \langle X \rangle \rangle = R\langle X \rangle \). Suppose now \(k \ge 1\) and the claim has been established for \(k' < k\). Let \(S \in U_k(K) \cap R \langle \langle X \rangle \rangle \). Decomposing along unambiguous sums, it suffices to consider S as in (7). Let \(i \in [1,l]\) and let \(a_i\) be a nonzero coefficient of \(S_i\). For each \(j \in [1,l] \setminus \{i\}\) pick an arbitrary nonzero coefficient \(a_j\) of \(S_j\). Then \(\lambda a_1 \cdots a_{i-1} a_i^m a_{i+1} \cdots a_l\) is a coefficient of S for every \(m \ge 0\), and hence contained in R. Since R is completely integrally closed, we conclude \(a_i \in R\). Thus \(S_i \in U_{k'}(K) \cap R\langle \langle X \rangle \rangle = U_{k'}(R)\) for some \(k' < k\). Since \(\lambda \) appears as coefficient of \(w_0w_1\cdots w_l\) (taking the empty word in each \(S_j^*\)), we also must have \(\lambda \in R\). Thus \(S \in U_{k}(R)\). \(\square \)

Proof of Theorem 1.2

The equivalence (b)\(\,\Leftrightarrow \,\)(c) is shown in Sect. 6. For \(R=K\) a field, the implication (a)\(\,\Rightarrow \,\)(b) follows by Proposition 5.3. More generally, for R a completely integrally closed domain, the implication (a)\(\,\Rightarrow \,\)(c) is shown in Proposition 9.1. Next, the implication (c)\(\,\Rightarrow \,\)(d) follows from Proposition 7.2. The implication (d)\(\,\Rightarrow \,\)(a) is trivial.

Finally, the implication (c)\(\,\Rightarrow \,\)(e) follows from Lemma 8.1, and (e) \(\,\Rightarrow \,\)(a) holds by Lemma 8.2. \(\square \)

Remark 9.2

Every completely integrally closed domain is integrally closed, and a noetherian domain is completely integrally closed if and only if it is integrally closed. Krull domains, and thus in particular factorial domains such as \(\mathbb {Z}\), are completely integrally closed. The ring of all algebraic integers is a non-noetherian, completely integrally closed domain.

To illustrate that some condition needs to be imposed on the domain R, the following example gives a Pólya series over an integrally closed, but not completely integrally closed, domain R that is not unambiguous rational over R.

Example 9.3

Let \(R = \mathbb {Z}[y^iz : i \ge 1] \subseteq \mathbb {Z}[y,z]\) and \(S = \sum _{i \ge 1} y^izx^i \in R\llbracket x \rrbracket \). Then S is a Pólya series, and indeed, over \(\mathbb {Z}[y,z]\) is unambiguous rational as \(S=zyx (yx)^*\).

However, suppose S were unambiguous rational over R. Then Proposition 5.6 applies. In particular, there exist \(d > 0\), \(r \ge 0\), and f\(g \in R\) such that \(y^{dk+r}z=fg^k\) for every \(k \ge 0\). However, in R the element \(y^iz\) is an irreducible element for each \(i \ge 1\), so we must have \(g = \pm 1\) and \(y^{dk+r}z = \pm f\) for all \(k \ge 0\), a contradiction.

The ring R is not completely integrally closed, because \((yz) y^i \in R\) for all \(i \ge 0\), but \(y \not \in R\). However it is integrally closed: Note that \(R \subseteq \mathbb {Z}[yz,y]\) and the latter ring is factorial, hence integrally closed, so that the integral closure of R must be contained in \(\mathbb {Z}[yz,y]\). Let \(a \in K\) be integral over R. Then \(a \in \mathbb {Z}[yz,y]\) and hence \(a=a' + a''\) with \(a' \in \mathbb {Z}[y]\) and \(a'' \in R\). Then \(a'\) is integral over R. Hence there exist \(m \ge 1\) and \(b_0\), \(\ldots \,\)\(b_{m-1} \in R\) such that

$$\begin{aligned} (a')^m + b_{m-1} (a')^{m-1} + \cdots + b_0 = 0. \end{aligned}$$

Taking this equation modulo z, we see that \(a' \in \mathbb {Z}[y]\) is integral over \(\mathbb {Z}\), forcing \(a' \in \mathbb {Z}\). Hence \(a \in R\).

The previous example is no coincidence; more generally the following holds.

Lemma 9.4

Suppose R is integrally closed but not completely integrally closed. Then there exists a rational Pólya series \(S \in R\langle \langle X \rangle \rangle \) such that S is not unambiguous rational.

Proof

Since R is not integrally closed, there exists \(0 \ne a \in R\) and \(b \in K \setminus R\) such that \(ab^i \in R\) for all \(i \ge 0\). Let \(S = \sum _{i \ge 0} ab^ix^i\). We show that S is not unambiguous rational. Suppose to the contrary that it is. Then it is recognized by an unambiguous weighted automaton with weights in R, and Proposition 5.6 shows that there exist \(m > 0\), \(n \ge 0\) and c\(d \in R \setminus \{0\}\) such that \(ab^{mk+n} = cd^k\) for all \(k \ge 0\). Then \((b^md^{-1})^k=ca^{-1}b^{-n}\) is constant for all \(k \ge 0\). Substituting \(k=0\) and \(k=1\), we see \(b^md^{-1}=1\). Thus b is a root of the polynomial \(t^m - d \in R[t]\), hence integral over R. By hypothesis \(b \in R\), a contradiction. \(\square \)

Finally, we deduce the classical theorem for the univariate theorem as a special case of our result.

Proof of Theorem 1.1

The series S is rational by [6, Proposition 6.1.1]. By Theorem 1.2, there exists an unambiguous weighted automaton \(\mathcal {A}\) on the alphabet \(X=\{x\}\) recognizing S. The claim now follows from Proposition 5.6. \(\square \)

10 Determinizability

We finish by showing  (a)\(\,\Rightarrow \,\)(b) of Theorem 1.3. The approach in this section is inspired by [16, Theorem 9]. Mohri shows that a deterministic weighted automaton over the tropical semiring \((\mathbb {R},\max ,+)\) has bounded variation; and that an unambiguous weighted automaton with bounded variation is determinizable.

Definition 10.1

Let \((G,\cdot )\) be a group. A map \(\ell :G \rightarrow \mathbb {R}_{\ge 0}\) is a length function if

  1. (1)

    \(\ell (1_G) = 0\).

  2. (2)

    \(\ell (gh) \le \ell (g) + \ell (h)\) for all g\(h \in G\).

  3. (3)

    \(\ell (g) = \ell (g^{-1})\) for all \(g \in G\).

If K has an absolute value \(|\cdot |\), and \(G \le K^\times \), then \(\ell (g) = {|}\log ({|}g{|}){|}\) defines a length function. On \((\mathbb {Z}^r,+)\) we have a length function \((a_1,\ldots ,a_r) \mapsto {|}a_1{|} + \cdots + {|}a_r{|}\). This induces a length function \(\ell \) on any finitely generated free abelian group, since \(G/G_{\text {tor}} \cong \mathbb {Z}^r\). This length function satisfies

$$\begin{aligned} {|}\{\, g \in G : \ell (g) \le C \,\}{|} < \infty \quad \text {for all } C \ge 0. \end{aligned}$$
(*)

There is a metric \(\texttt {d}:X^* \times X^* \rightarrow \mathbb {Z}_{\ge 0}\), given by

$$\begin{aligned} \texttt {d}(u,v) = {|}u{|} + {|}v{|} - 2 {|}{\text {lgcd}}(u,v){|}, \end{aligned}$$

where \({\text {lgcd}}(u,v)\) is the longest common prefix of u and v.

Definition 10.2

Let G be a group and \(\ell :G \rightarrow \mathbb {R}_{\ge 0}\) a length function. A function \(f:X^* \rightarrow G_0\) has bounded \(\ell \)-variation if, for every \(c \ge 0\), there exists \(C \ge 0\) such that for all u, v with \(f(u) \ne 0\) and \(f(v) \ne 0\),

$$\begin{aligned} \texttt {d}(u,v) \le c \quad \text {implies}\quad \ell (f(u)f(v)^{-1}) \le C. \end{aligned}$$

Lemma 10.3

Let \(\mathcal {A}\) be a deterministic weighted automaton, and let S be the series recognized by \(\mathcal {A}\). Let \(G \le K^\times \) be such that

  • \(S(w) \in G_0\) for \(w \in X^*\).

  • all edge and terminal weights of \(\mathcal {A}\) are contained in \(G_0\).

If \(\ell :G \rightarrow \mathbb {R}_{\ge 0}\) is a length function, then S has bounded \(\ell \)-variation.

Proof

Define

$$\begin{aligned} C =\max \big \{\, \ell (E(p,a,q)),\, \ell (T(q)) : p,q \in Q, a \in X \text { with } T(q) \ne 0, E(p,a,q) \ne 0 \,\big \}. \end{aligned}$$

Let w\(w' \in X^*\). Assume \(S(w)\ne 0\) and \(S(w') \ne 0\), as otherwise there is nothing to show.

We may suppose

$$\begin{aligned} w = u_1 \cdots u_k v_{1} \cdots v_l \quad \text {and}\quad w' = u_1\cdots u_k v_1'\ldots v_m', \end{aligned}$$

with \(u_i\), \(v_i\)\(v_i' \in X\) and \(\texttt {d}(w,w') = l+m\). Let c and \(c'\) denote the accepting paths labeled by w and \(w'\). Since \(\mathcal {A}\) is deterministic, we must have

$$\begin{aligned} c&= (p_0,u_1,p_1) \cdots (p_{k-1},u_k,p_k) (p_k,v_1,q_1) \cdots (q_{l-1},v_l,q_l), \quad \text { and} \\ c'&= (p_0,u_1,p_1) \cdots (p_{k-1},u_k,p_k) (p_k,v_1',q_1') \cdots (q_{l-1}',v_m,q_m'), \end{aligned}$$

with states \(p_i\), \(q_i\), \(q_i'\). For notational convenience, set \(p_k=q_0=q_0'\).

Now

$$\begin{aligned} \begin{aligned} \ell \bigg (\frac{S(w)}{S(w')}\bigg )&= \ell \bigg (\frac{I(p_0)E(c)T(q_l)}{I(p_0)E(c')T(q_m')}\bigg ) \\&\le \sum _{i=1}^l \ell (E(q_{i-1},v_i,q_i)) + \sum _{i=1}^m \ell (E(q_{i-1}',v_i',q_i')) + \ell (T(q_l)) + \ell (T(q_m'))\\&\le (2+m+l)C = (\texttt {d}(w,w') + 2)C. \end{aligned} \end{aligned}$$

If \(w \ne w'\), then \(\texttt {d}(w,w') \ge 2\). Choosing \(C'=C+1\), we have \(\ell ( S(w)S(w')^{-1} ) \le C'\texttt {d}(w,w')\). \(\square \)

Lemma 10.4

Let S be a Pólya series. Let \(\ell :G \rightarrow \mathbb {R}_{\ge 0}\) be a length function satisfying *. Suppose that S has bounded \(\ell \)-variation. If \((u,\mu ,v)\) is a minimal linear representation of S, its linear hull has dimension at most 1.

Proof

Let \(w_1\), \(\ldots \,\)\(w_n \in X^*\) be such that the vectors \(b_i=\mu (w_i)v\) form a basis of \(K^{n \times 1}\). Let B be the matrix whose i-th column is \(b_i\). Let \(\Omega = u \mu (X^*) \subseteq K^{1\times n}\). We have to show that \(\Omega \) can be covered by finitely many 1-dimensional vector spaces. Since B is invertible, it suffices to show the same for \(\Omega B = \{\, u \mu (w)B : w \in X^* \,\}\).

Since S has bounded \(\ell \)-variation, there exists \(C \ge 0\) such that for every \(w \in X^*\) and i, \(j \in [1,n]\)

$$\begin{aligned} \Big |\ell \big ( S(ww_i) S(ww_j)^{-1} \big )\Big |\le C, \end{aligned}$$

whenever \(S(ww_i) \ne 0\) and \(S(ww_j) \ne 0\). By our assumptions, this implies that the ratio \(S(ww_i)S(ww_j)^{-1}\) can only take finitely many values (when \(S(ww_j) \ne 0\)). Noting that \(S(ww_i) = u \mu (w) b_i\) is the i-th coordinate of \(u\mu (w)B\), we conclude that we can cover \(\Omega B\) by finitely many 1-dimensional vector spaces. \(\square \)

Proof of Theorem 1.3

(a)\(\,\Rightarrow \,\)(b) Let \(\mathcal {A}\) be a deterministic weighted automaton that recognizes S. There exists a finitely generated \(G \le K^\times \) such that \(S(w) \in G_0\) for all \(w \in X^*\). This follows from Theorem 1.2, but can also be easily seen directly: Since \(\mathcal {A}\) is deterministic, and in particular unambiguous, each coefficient is a product of some weights of the automaton. By enlarging G if necessary, we can further assume that all edge and terminal weights are contained in \(G_0\).

Let \(\ell :G \rightarrow \mathbb {R}_{\ge 0}\) be a length function on G satisfying *. By Lemma 10.3, the series S has bounded \(\ell \)-variation. Lemma 10.4 implies the claim.

(b)\(\,\Rightarrow \,\)(a) By Proposition 3.14. \(\square \)