Pushdown automata and constant height: decidability and bounds

It cannot be decided whether a pushdown automaton accepts using a pushdown height, which does not depend on the input length, i.e., when it accepts using constant height. Furthermore, when a pushdown automaton accepts in constant height, the height can be arbitrarily large with respect to the size of the description of the machine, namely it does not exist any recursive function in the size of the description of the machine bounding the height of the pushdown. In contrast, in the restricted case of pushdown automata over a one-letter input alphabet, i.e., unary pushdown automata, the situation is different. First, acceptance in constant height is decidable. Moreover, in the case of acceptance in constant height, the height is at most exponential with respect to the size of the description of the pushdown automaton. We also prove a matching lower bound. Finally, if a unary pushdown automaton uses nonconstant height to accept, then the height should grow at least as the logarithm of the input length. This bound is optimal.


Introduction
The investigation of computational devices working with a limited amount of resources is a classical topic in automata theory. It is well known that by limiting the memory size of a device by some constant, the computational power of the resulting model cannot exceed that of finite automata. For instance, if we consider pushdown automata in which the maximum height of the pushdown is limited by some constant, the resulting devices, called constant-height pushdown automata, can recognize regular languages only. Despite their limited computational power, constant-height pushdown automata are interesting since they allow more succinct representations of regular languages than finite automata [8]. Further properties of these devices have been recently considered. A double-exponential size increase when converting a nondeterministic constant-height pushdown automaton into an equivalent deterministic one, which cannot be avoided in the worst case, has been proven in [4]. A double-exponential size gap also holds for the conversion of deterministic and nondeterministic constant-height pushdown automata with a two-way input head into equivalent one-way devices [3]. Tight bounds for the size costs of Boolean operations on constant-height pushdown automata have been stated in [5].
A natural generative counterpart of constant-height pushdown automata are nonselfembedding context-free grammars, roughly context-free grammars without "true" recursion [7], which have been recently showed to be polynomially related in size to constant-height pushdown automata [10].
In this paper, we focus on pushdown automata with an unrestricted pushdown store, namely classical pushdown automata, that, however, are able to accept their inputs by making use only of a constant amount of pushdown store. More precisely, we say that a pushdown automaton M accepts in constant height h, for some given integer h ≥ 0, if, for each word in the language accepted by M, there exists at least one accepting computation in which the maximum height reached by the store is bounded by h. Notice that this does not prevent the existence of accepting or rejecting computations using an unbounded pushdown height. However, M can be converted into an equivalent constant-height pushdown automaton, which stops and rejects each time a computation tries to exceed the height limit h, and has a description whose size is a polynomial in both h and the size of the description of M.
While studying these size relationships, we tried to understand how large can h be with respect to the size of the description of M. We discovered that h can be arbitrarily large. Indeed, in the first part of the paper we prove that there is no recursive function bounding the maximal height reached by the pushdown store in a pushdown automaton accepting in constant height, with respect to the size of its description.
We also prove that it cannot be decided if a pushdown automaton accepts in constant height. We point out that this problem is different from the classical problem of deciding if a given context-free language is regular, which has been proven to be undecidable long time ago [2]. In fact, there exist pushdown automata that recognize regular languages using nonconstant height (an example is presented in the paper). Hence, while acceptance in constant height is sufficient for the regularity of the accepted language, it is not necessary.
In the second part of the paper, we restrict the attention to the case of pushdown automata with a one-letter input alphabet, namely unary pushdown automata. By studying the structure of the computations of these devices, we are able to prove that, in contrast to the general case, it can be decided whether or not they accept in constant height. Furthermore, we also prove that if a unary pushdown automaton M accepts in height h, constant with respect to the input length, then h is bounded by an exponential function in the size of M. By presenting a suitable family of pushdown automata, we show that this bound cannot be reduced.
In the final part of the paper, we consider pushdown automata that accept using height which is not constant in the input length. Our aim is to investigate how the pushdown height grows. In particular, we want to know if there exists a minimum growth of the pushdown height, with respect to the length of the input, when it is not constant. The answer to this question is already known, and it derives from results on Turing machines: the height of the store should grow at least as a double logarithmic function [1]. This lower bound cannot be increased, because a matching upper bound has been recently obtained in [6], where a witness language defined over an alphabet of 6 letters is presented. Using standard arguments, such language can be encoded on a binary alphabet, without changing the use of the pushdown store. Hence, in the case of an input alphabet with at least two letters, there are languages accepted by using a pushdown height which is double logarithmic with respect to the input length. When we restrict to a unary alphabet, the situation is different. In fact, as a consequence of the constructions presented in the second part of the paper, we are able to prove that if a unary pushdown automaton accepts using height, which is not constant with respect to the input length, then the height should grow at least as a logarithmic function. We also show that this logarithmic lower bound cannot be further increased, by presenting a unary pushdown automaton accepting every word using logarithmic pushdown height.

Preliminaries
We assume the reader to be familiar with the standard notions from formal language and automata theory, including the concepts of configurations and computations of recognizing devices, as presented in classical textbooks, e.g., [12]. As usual, the cardinality of a set S is denoted by #S, the length of a string x is denoted by |x|, the empty string is denoted by ε.
We first recall the notion of pushdown automata and present the form for these devices that will be used in the paper. A pushdown automaton (pda, for short) is a tuple M = Q, Σ, Γ , δ, q I , Z 0 , q F where Q is the finite set of states, Σ is the input alphabet, Γ is the pushdown alphabet, q I ∈ Q is the initial state, Z 0 ∈ Γ is the start symbol, q F ∈ Q is the final state. We shall specify the transition function δ below, according to Items 3 and 4. Without loss of generality, we make the following assumptions about pdas: 1. at the start of the computation the pushdown store contains only the start symbol Z 0 , being at height 0, the input head is scanning the first input symbol, the finite control contains the initial state q I ; 2. the input is accepted if and only if the automaton reaches the final state q F , the pushdown store contains only Z 0 and all the input has been scanned; 3. when the automaton reads an input symbol, it moves the head to the next symbol, and it does not make any change on the pushdown. Notice that this implies that the contents of the pushdown store can be changed only by ε-moves; 4. every push operation adds exactly one symbol on the pushdown.
The transition function δ of a pda M in this form can be written as: In particular, for q, p ∈ Q, A, B ∈ Γ , σ ∈ Σ, ( p, −) ∈ δ(q, σ, A) means that the pda M, in the state q, with A at the top of the pushdown, by consuming the input σ , can reach the state p without changing the pushdown contents; ( p, pop) ∈ δ(q, ε, A) (( p, push(B)) ∈ δ(q, ε, A), ( p, −) ∈ δ(q, ε, A), respectively) means that M, in the state q, with A at the top of the pushdown, without reading any input symbol, can reach the state p by popping off the pushdown the symbol A from the top (by pushing the symbol B onto the top of the pushdown, without changing the pushdown, respectively).
As usual, a configuration of a pda at a given instant represents its instantaneous description. According to [12], it records the internal state of the pda, the portion of the input that has not been scanned yet, and the pushdown contents. Accepting configurations can be described as indicated above, according to Item 2. Notice that in any accepting computation the occurrence of the start symbol Z 0 at the bottom of the pushdown is never removed, otherwise the next move would be undefined, so halting in a nonaccepting configuration. 1 Now we present the main measure we consider in the paper, namely the pushdown height. The height of a pda M in a given configuration is the number of symbols in the pushdown store besides the occurrence of the start symbol Z 0 at the bottom. Hence, in the initial and in the accepting configurations the height is 0. The height of a computation C is the maximum height reached in the configurations occurring in C.
We say that M uses height h(x) on an accepted input x ∈ Σ * if and only if h(x) is the minimum pushdown height necessary to accept such a string, namely there exists a computation accepting x of height h(x), and no computation accepting x of height smaller than h(x). Moreover, if x is rejected, then h(x) = 0. To study pushdown height with respect to the input length, we consider the worst case among all possible inputs of the same length. Hence, for each integer n ≥ 0, we define h(n) = max {h(x) | x ∈ Σ * , |x| = n}. When there is a constant H such that, for each n, h(n) is bounded by H , we say that M accepts in constant height. Each pda accepting in constant height can be easily transformed into an equivalent finite automaton. So the language accepted by it is regular.
In the following, by the size of a pda we mean the length of its description. Notice that for each pda in the above-defined form, over a fixed input alphabet Σ, the size is O((#Q) 2 (#Γ ) 2 ), namely a polynomial in the cardinalities of the set of states and of the pushdown alphabet.
If we consider pdas in different forms, as that given in [12] in which any push operation can replace the top of the pushdown by a string of symbols, to define the size we have to take into account also the number of symbols that can be pushed on the store in one single operation. However, pdas in that form can be turned into the form we consider here with a polynomial increase in size and by preserving the property of being constant height. For a further discussion on this point, we address the reader to [4].
We now present some technical notions and results that will be useful in order to state our results. Let M = Q, Σ, Γ , δ, q I , Z 0 , q F be a fixed pda.
A surface pair is defined by a state q ∈ Q and a symbol A ∈ Γ , and it is denoted by [q A]. The surface pair in a given configuration is defined by the current state and the topmost pushdown symbol, namely the only part of the store which is relevant in order to decide the next move.
A surface triple is defined by two states q, p ∈ Q and a symbol A ∈ Γ , and it is denoted by [q Ap]. Surface triples are used to study parts of computations starting and ending at the same pushdown height and that do not go below that height in between. More precisely, a [q Ap]-computation on a string x ∈ Σ * is a computation C which starts from the state q with A on the top of the pushdown at some height h and, after reading x from the input tape, ends in the state p with A on the top of the pushdown at the same height h without reaching pushdown height smaller than h in between. We also say that C consumes the string x. Notice that, at the beginning of C, the input head is on the tape cell containing the leftmost symbol of x, while at the end it is one position to the right of the rightmost symbol of x (this includes also the case of inputs whose suffix is x, and the possibility of ε-transitions after reading the rightmost of x). We point out that, during C, the symbol A at height h is never replaced. Hence, C does not depend on h and on the symbols stored in the pushdown below A. The pushdown increment during C is the difference between the pushdown height of C and the pushdown height at the beginning and at the end of C. Notice  If a [q Ap]-computation C consists of three parts, namely it begins with a prefix X , followed by a proper [q Ap]-subcomputation C using the same triple [q Ap], and ends by a suffix Y, such that the middle part C starts and ends with pushdown higher than at the beginning of C, and then, the pair (X , Y) is called vertical loop. Note that, during the execution of X , a nonempty string Aα is saved on the top of the pushdown store 2 above the symbol A which was on the top at the beginning of C, and this string is popped off during the execution of Y.
A context-free grammar is a tuple G = V , Σ, P, S , where V is the set of variables, Σ is the set of terminals, P is the set of productions of the form A → β, where A ∈ V and β ∈ (V ∪ Σ) * , and S ∈ V is the start symbol. If all productions in P are of the form A → BC or A → a, where A, B, and C are variables and a is a terminal, then G is in Chomsky normal form. Here, we will consider grammars in binary normal form, an extension of Chomsky normal form where also unit productions A → B and ε-productions A → ε are allowed.
It is well known that context-free languages defined over a one-letter alphabet, i.e., unary context-free languages, are regular [9]. The size cost of the conversion of unary context-free grammar and pushdown automata into equivalent nondeterministic and deterministic finite automata (nfas and dfas, respectively) has been investigated in [16]. In the paper, we will use the following small extension of [16,Thms. 4,6]: -an equivalent nfa with at most 2 2v−1 + 1 states, and -an equivalent dfa with less than 2 v 2 states.
Proof By applying a standard construction (see, e.g., [12]), from G we can obtain a grammar G in Chomsky normal form, having the same set of variables as G and generating the same language, with the possible exception of the empty word (if generated by G).
Using Theorems 4 and 6 in [16], we can convert G into equivalent finite automata satisfying the bounds on the number of the states given in the statement of the lemma. By inspecting the proofs of those results, it can be observed that there are no transitions entering the initial states of the resulting automata. This allows to safely mark the initial states as accepting, in the case G generates the empty word, in order to make the resulting automata equivalent to the original grammar G.
The following result, related to Diophantine equations, will be used in the paper: , i s be integers with 0 < i j ≤ n, j = 0, . . . , s, and z ≥ 0. If the equation i 0 x 0 + i 1 x 1 + · · · + i s x s = z has a solution in natural numbers, then it also has a solution in natural numbers satisfying i 1 x 1 + · · · + i s x s ≤ n 2 .

Undecidability and nonrecursive bounds
In this section, we prove that the problem of whether a given pda M accepts in constant height is not decidable. In addition, with respect to the size of M that does accept in constant height, neither the maximal height, which is reached by the pushdown store of M, nor the number of states of the minimal finite automaton equivalent to M can be bounded by any recursive function.
These results are proven by using a technique introduced in [11], based on suitable encodings of single-tape Turing machine computations. Roughly, configurations of such a machine T with state set Q and alphabet Γ are denoted in a standard way as strings from Γ * QΓ * . A computation consisting of m configurations α 1 , α 2 , . . . , α m is encoded as a string of blocks, separated by a delimiter $ / ∈ Q ∪ Γ , where the ith block is α i when i is odd, and α R i when i is even (in the following, we use α (R) i to denote either α R i or α i according to the parity of the index i).
Hence, the (encoding of a) valid computation of T on input w is a string C = . α m is a halting configuration of T , namely a configuration from which no move is possible.
A partial valid computation is defined in a similar way, by dropping Condition 4.
As proven in [11], the complement of the set of all valid computations of T is a context-free language.

Theorem 1 It is undecidable whether a pda accepts in constant height.
Proof We give a reduction from the halting problem. Let T be a deterministic Turing machine. With an easy modification, we suppose that arbitrarily long computations use arbitrarily large amounts of tape (to this aim, it is sufficient to modify T by adding to the tape a track where the machine, between any two original consecutive moves, marks a tape cell not yet visited).
By adapting the techniques used in [11] to prove the above-mentioned result, we show that the complement of the language partial(T, w) of partial computations of T on a given input w, denoted ( partial(T, w)) c , is accepted by a pda M T,w in the following way.
. . , r , in order to decide whether D ∈ ( partial(T, w)) c , M T,w guesses which one among Conditions 1, 2 and 3 is not satisfied. For the first two conditions, the verification of the guess is done by only using the finite control. For the third condition, M T,w nondeterministically selects one block β (R) i , 1 ≤ i ≤ r , copies it on the pushdown store and then makes the verification. If i < r , this is done by scanning the (i + 1)th block and by suitably comparing it with the block saved on the pushdown store. If i = r , then the verification fails immediately.
We remind the reader that the pushdown height used to accept any input string x is the minimum height of accepting computations on x. Hence, if D does not satisfy Condition 1 or Condition 2, then it is accepted with pushdown height 0; otherwise, the height is bounded by the length of the first block β (R) i for which Condition 3 is not satisfied, i.e., the block corresponding to the largest i such that β j = α j for j = 1, . . . , i, where α 1 , α 2 , . . . is the (possibly infinite) sequence of configurations in the computation of T on w.
If T halts on w in m steps, then the maximum height of the pushdown store used to accept strings in ( partial(T, w)) c is equal to |α m |. Otherwise, for each arbitrarily large integer h, we can find an index i > 0 such that |α i | > h. To accept any string This allows to conclude that T halts on input w if and only if M T,w accepts in constant height. Hence, it cannot be decided whether a pda accepts in constant height.
In Theorem 8, we will present a pda which recognizes a regular language but does not accept in constant height. Hence, the problem of deciding whether a pda accepts in constant height is different from the regularity problem for context-free languages, namely the problem of deciding if a given context-free language is regular, which is also undecidable [2].
We point out that in the restricted case of deterministic context-free languages, namely languages accepted by deterministic pushdown automata, the regularity problem is decidable [18]. Even the property in Theorem 1 becomes decidable when we consider deterministic pdas. Indeed, it is already decidable in the case of unambiguous pdas [13].
Each pda M accepting in height h can be converted into an equivalent pda M in which the height of each computation is bounded by h. This can be done by attaching a counter either to the pushdown symbols or to the states to keep track, in any configuration, of the current height, in order to stop and reject when a computation tries to exceed the height limit. By encoding the pushdown store of M in a finite control, equivalent nfas and dfas with a number of states exponential and double exponential in h, respectively, are easily obtained. In the worst case, these bounds cannot be reduced [8]. We now show that, however, h cannot be bounded by any recursive function in the size of M.

Theorem 2 For any recursive function f : N → N and for infinitely many integers n there exists a pda of size n accepting in constant height H (n), where H(n) cannot be bounded by f (n). 3
Proof The argument is derived from [15,Prop. 7]. From among all single-tape deterministic Turing machines having n states and tape alphabet Γ = {1,b} (a finite number of machines), let us take all those that, starting with the empty tape, halt after a finite number of steps (clearly, a finite number of machines again) and, from among them, let BB n be one-called a busy beaver-that stops with the largest number of 1's, denoted as Σ(n), written down on the tape. (There may exist more than one such machine; in that case, we can take the first one in some fixed enumeration of Turing machines.) It is known that Σ(n) cannot be bounded by any recursive function [17]. Hence, also the maximal length of configurations occurring in such a computation cannot be bounded by any recursive function.
Let C n be the encoding of the valid computation of BB n on ε. By adapting the arguments used to prove Theorem 1, we can define a pda M n which accepts all the strings over (Q n ∪ Γ ∪ {$}) * different from C n , using height bounded by the length of the longest configuration occurring in C n . Since n is fixed, M n accepts in constant height. Moreover, it uses a constant number of states for testing that either one of Conditions 1, 2, and 4 does not hold for C n to be a valid computation. For Condition 3, namely to check whether two configurations of BB n are not reachable in one step, the machine has to compare the parts of configurations representing the cells of the tape different from the head position in the first configuration, which can be done using the pushdown store and a constant number of states, and the part (state and symbol) that is modified according to the transition function of BB n . Each transitions can be checked using a constant number of states. Because BB n has n states and its working alphabet is fixed, it has O(n) transitions. Hence, to check Condition 3, M n uses O(n) states. Summing up, the number of states of M n is O(n), its pushdown alphabet has cardinality O(n). So, according to Sect. 2, M n has size O(n 4 ).
Furthermore, by suitably modifying C n (with the same method we applied in the last part of the proof of Theorem 1 to a prefix of the string encoding the infinite computation of the machine T on input w), we can obtain a string that requires height equal to the maximal length of configurations occurring in C n to be accepted by M n .
This allows to conclude that the pushdown height used by M n cannot be bounded by any recursive function in the size of M n .
The pda M n used to prove Theorem 2 accepts the complement of the singleton language {C n }. This implies that each equivalent deterministic automaton requires more than |C n | states. Hence, Corollary 1 There is no recursive function bounding the size blowup from pdas accepting in constant height to finite automata.

Constant height decidability in the unary case
In Sect. 3, we proved that it cannot be decided if a pda accepts in constant height. This section is devoted to showing that this property turns out to be decidable in the restricted case of pdas with a one-letter input alphabet. We point out that it is well known that unary context-free languages are regular [9], so unary pdas can always be converted into equivalent dfas and therefore also into equivalent pdas with constant height equal to zero. The problem considered here is whether the given pda works or does not work with a pushdown of constant height. We first give an informal outline of the argument.
Any accepting computation on a sufficiently long input should contain horizontal or vertical loops. The use of vertical loops can lead to computations using unbounded height. However, we prove that if an accepting computation on an input a visits a surface pair on which there exists a horizontal loop, then there is another accepting computation for the same input in which almost all occurrences of the vertical loops are replaced by occurrences of such a horizontal loop. The number of vertical loops that remain in the resulting computation is bounded by a constant that only depends on the automaton. Hence, the height of such a computation is bounded by a constant. As a consequence, a is accepted in constant height. This result is obtained by refining pumping arguments on grammars and the fact that, in the unary case, input symbols commute. In contrast, if no accepting computation on a long string a visits any surface pair having a horizontal loop, vertical loops and an increasing of the pushdown height cannot be avoided. Hence, the given pda works in constant height if and only if the cardinality of the language L v \ L h is finite, where L h (L v , resp.) is the set of strings which are accepted by a computation visiting a (not visiting any, resp.) surface pair having a horizontal loop. Since the languages L v and L h are accepted by pdas, so they are context free, and they are defined over a one-letter alphabet, by a well-known result proved in [9], they are regular. So the finiteness of their difference is decidable.
To obtain these results, we refine some of the arguments given in [16] to study the size costs of the transformations of unary context-free grammars and pushdown automata into equivalent finite automata.

Loops and grammars
In the following, we consider a grammar G = V , Σ, P, S in binary normal form and we denote by v = #V the number of its variables.
If T is a derivation tree whose root is labeled with a variable A ∈ V and such that the labels of the leaves, from left to right, form a string γ ∈ (V ∪ Σ) * , then we write T : A ⇒ γ . Furthermore, we indicate by ν(T ) the set of variables occurring as labels of the nodes in T . As usual, the height of a derivation tree T is the maximum number of edges along the path from the root to a leaf in T .
A gap tree T from a variable A ∈ V , also called A-gap tree, is a tree corresponding to a nonempty derivation of the form A + ⇒ x Ay, with x, y ∈ Σ * . When x = y = ε, the gap tree T is said to be trivial, otherwise, i.e., when it has at least one leaf labeled by a terminal, T is nontrivial.
Proof Given a derivation tree T : A ⇒ γ of height h > (|γ | + 1)v, let n 1 , n 2 , . . . , n h be the sequence of the internal nodes which are encountered on a longest path in T , moving from the leaf to the root. With each node n k , we associate the pair (A k , γ k ), where A k is the variable labeling n k and γ k is the string generated by the subtree rooted at n k .
Hence, for k = 2, . . . , h, γ k−1 is a factor of γ k and γ h = γ . Considering that γ 1 could be ε, the number of possible different second components in these pairs is bounded by |γ | + 1.
Since h > (|γ | + 1)v, this implies that there is a sequence of v namely the treeT obtained by removing from the subtree of T rooted at n j the subtree rooted at n i , is a trivial A i -gap tree. By removingT from T , i.e., by replacing the subtree rooted at n j by the subtree rooted at n i , we obtain a tree T : A ⇒ γ with a smaller number of nodes than T .
We can iterate this process, up to obtain a tree for a derivation A ⇒ γ of height bounded by (|γ | + 1)v.

Let k be the length of the longest path from the root to a leaf labeled by a terminal symbol,
if any. Then, γ contains at most 2 k−1 terminal symbols, and less than 2 k−1 symbols when γ contains at least one variable. 2. If γ contains only terminal symbols, i.e., γ ∈ Σ * , and the height of T is k, then |γ | ≤ 2 k−1 . 3. If γ = x Ay, xy ∈ Σ + , and T has a minimal number of nodes among all nontrivial A-gap trees, then |x y| < 2 2v−1 .
Proof The proof is given by adapting standard properties of derivation trees of grammars in Chomsky normal form (see, e.g., [12]).
1. The statement can be proven by induction on k. If k = 1, then the tree consists only of the root, labeled by A, with one son, labeled by a ∈ Σ, where A → a is a production of G. In this case, the statement is trivial. If k > 1 then, in any subtree of the root, the longest path to a leaf labeled by a terminal symbol has length at most k − 1, so generating, by induction hypothesis, a string containing at most 2 k−2 terminal symbols. Due to the form of the grammar, the root can have at most 2 subtrees. Hence, the number of terminal symbols in γ is bounded by 2 k−1 . Furthermore, when γ contains one variable, one of the subtrees of the root derives a factor of γ containing such a variable. Hence, by induction, it generates a number of terminals which is strictly less than 2 k−2 . As a consequence, the number of terminals in γ is less than 2 k−1 . 2. Consequence of Item 1. 3. Let n 1 , n 2 , . . . , n k be the sequence of the internal nodes on a longest path in T from a leaf labeled by a terminal symbol a ∈ Σ to the root. With each node n i , i = 1, . . . , n, we associate a pair 1} is 1 if and only if the factor of γ generated by the subtree rooted at n i contains the variable A. Hence, the pair associated with n 1 is (B, 0) for some B ∈ V having the production B → a, while the pair associated with the root n k is (A, 1). Suppose k > 2v. Then, there are two nodes n i , n j , with 1 By replacing in T the subtree rooted at n j by the subtree rooted at n i , we obtain an A-gap tree T which still generates at least one terminal symbol and has less nodes than T , which is a contradiction. Hence, in T , each path connecting the root and a leaf labeled by a terminal symbol should have length at most 2v, which, according to Item 1, implies |x y| < 2 2v−1 .
From now on, let us suppose that G is unary, i.e., Σ = {a}. The following modified version of Lemma 2(ii) in [16] is derived from the arguments of the classical "pumping lemma" for context-free languages. In other words, we can find two nodes n x and n y , 0 < x < y ≤ v 2 + 1, such that (A x , α x ) = (A y , α y ) and y − x ≤ v. By replacing in T the subtree rooted at n y by the subtree rooted at n x , we get a new tree T 1 : S ⇒ a s with s ≤ . Let T 2 be the gap tree obtained from T by taking as root n y and by deleting the subtree rooted at n x . Then, T 2 : A x + ⇒ a i A x a j , for some integers i, j with s + i + j = . Furthermore, Since the root of T 2 is n y , its height is at most v 2 + 1. By Lemma 5(1), this implies i + j < 2 v 2 .
Finally, we observe that in case i + j = 0 and s = , we can repeat the same argument after replacing T by T 1 . Since the number of nodes in the "new" T is smaller than in the "old" one, by iterating this process, at some point we will finally obtain a tree T 1 producing a shorter string and a gap tree T 2 producing at least one terminal symbol.
The following lemma will be crucial to obtain our main result. We prove that each long enough string a can be derived by pumping a derivation tree of some short string by many occurrences of a same gap tree. Furthermore, such a gap tree can be arbitrarily chosen among "small" nontrivial A-gap trees, with A occurring in the derivation of a .

Lemma 7 For any derivation tree T : S ⇒ a and for any A-gap tree T A : A ⇒ a i Aa j , with 0 < i + j < 2 2v−1 and A ∈ ν(T ), there exists a derivation tree T : S ⇒ a which is obtained by pumping a tree T
Proof If ≤ 2 2v 2 − 3 · 2 v 2 −1 + 1, then we take T 0 = T , 0 = , and k = 0. Otherwise, we repeatedly apply Lemma 6 to "unpump" the tree T up to find a tree T r : S ⇒ a r , with r ≤ 2 v 2 −1 and ν(T r ) = ν(T ).
Let {i 1 , . . . , i s } ⊆ {1, . . . , 2 v 2 − 1} be the set of numbers of terminals that are generated by the gap trees removed during this process. Hence, = r + i 1 x 1 + · · · + i s x s , where, for t = 1, . . . , s, x t > 0 is the number of gap trees generating i t terminal symbols that have been removed to obtain T r . Let i 0 = i + j < 2 2v−1 ≤ 2 v 2 be the number of terminals generated by the tree T A . By Lemma 3 (applied with z = − r and x 0 = 0), we can find integers x 0 , x 1 , . . . , x s ≥ 0 in such a way that = r + i 0 x 0 + i 1 x 1 + · · · + i s x s and i 1 x 1 + · · · + i s x s ≤ (2 v 2 − 1) 2 . This means that we can pump the tree T r with a suitable number of occurrences of some of the gap trees removed in the previous process, in order to get a tree T 0 : S ⇒ a 0 , with 0 = r +i 1 and ν(T 0 ) = ν(T ). Furthermore, by pumping T 0 with x 0 occurrences of T A , we finally get a tree T : S ⇒ a .

Simulating vertical loops by a horizontal loop
From now on, let us consider a fixed pda M = Q, Σ, Γ , δ, q I , Z 0 , q F . We are going to define a context-free grammar G = V , Σ, P, S , in binary normal form, which generates the same language accepted by M. We give the same construction as in [16], which is a minor variation of that used in classical textbooks (see, e.g., [12]) to present the standard transformation of pdas into cfgs. The grammar G is defined as follows: -P contains the following productions: We point out that the number of variables of G is v = (#Q) 2 · #Γ . Furthermore, it is in binary normal form.
We are going to prove that G generates the same language accepted by M. Since we are interested in the height of M's computations, we state such equivalence in a stronger form, which also considers the use of the pushdown.
In particular, we relate the pushdown increment to the unit production height which, for a derivation tree T of the above grammar G, is defined as the maximum number of edges corresponding to unit productions, i.e., productions of the form A → B, with A, B ∈ V , in a path from the root to a leaf of T .

Lemma 8 For any x ∈ Σ * , q, p ∈ Q, A ∈ Γ , h ∈ N, there exists a derivation tree T : [q Ap] ⇒ x with unit production height h if and only if there exists a [q Ap]-computation C on x with pushdown increment h.
Proof Let T : [q Ap] k ⇒ x, for some k > 0, be a derivation tree with unit production height h. We prove by induction on k that there exists a [q Ap]-computation C on x with pushdown increment h.
If k = 1, then the tree contains only the root and one leaf and it corresponds to the use of one of the productions of the form 3 or 4. So the statement is trivial.
If k > 1, then the production used at the root level of T is either of the form 1 or of the form 2. In Then, the unit production height h of T is the maximum between the unit production heights h and h of T and T , respectively, i.e., h = max{h , h }. According to the induction hypothesis, there exist a [q Ar]-computation on x and a [r Ap]-computation on x with pushdown increment h and h , respectively. By concatenating these two computations, we obtain a [q Ap]-computation on x in which the pushdown increment is max{h , h }, namely h.
In case the production applied to the root of T is x be the subtree of T rooted at the only son of the root. Since in T at the top level a unit production is used, the unit production height on a path from the root of T is h − 1. Also in this case, from the induction hypothesis we obtain a [q Bp ]-computation on x with pushdown increment h − 1. By adding to this computation the initial push and the final pop from which the production [q Ap] → [q Bp ] is defined, we obtain a [q Ap]-computation on x with pushdown increment h.

Conversely, let us consider a [q Ap]-computation C on x with pushdown increment h.
We proceed on the number k of steps of C.
If k = 1, then the computation C does not make any pushdown increment and it can only correspond to a one-step derivation consisting of a production of the form 3 or 4. So the statement is trivial.
If k > 1, then we consider two cases, depending on whether or not at some configuration in C, after the first and before the last configuration, the pushdown is at the same height than at the beginning and at the end of C.
-If such configuration exists, then we split C at that configuration into a [q Ar]computation C and a [r Ap]-computation C , for some r ∈ Q, consuming some x , x , with x x = x and with pushdown increment h , h , respectively. Then, the pushdown increment in C is max{h , h }. Using the induction hypothesis, we find two trees T and T corresponding to such computations, with unit production heights h and h , respectively. We can suitably combine T and T , using a production of form 1, in order to obtain a tree T which derives x and has height equal to max{h , h }. -If such configuration does not exist, then the computation of C should start with a push of a symbol B which is removed in the last step. Let C be [q Br ]-subcomputation of C which is obtained by removing the first and the last step. If h is the pushdown increment in C, then the pushdown increment in C is h − 1. Let T : [q Br ] ⇒ x be the tree corresponding to C , obtained according to the induction hypothesis. Its unit production height is h − 1. The tree T , which is obtained by taking T as only subtree of a root with label [q Ar], derives x and has unit production height h.
As a consequence of Lemma 8, we get:

Corollary 2 For any integer h ≥ 0, a string x is accepted by M using pushdown height h if and only if there is a derivation tree T of x in G with unit production height h.
Combining Corollary 2 with Lemma 4, we get the following upper bound for the height of the pushdown store necessary to accept a string x: Proof By contradiction, suppose that each computation of M accepting x uses pushdown height greater than (|x|+1)v. As a consequence of Corollary 2, each derivation tree of x in G has unit production height, and so height, greater than (|x| + 1)v, which is a contradiction to Lemma 4.
Let us go back to the case of unary pushdown automata. Hence, from now on let M = Q, {a}, Γ , δ, q I , Z 0 , q F be a fixed unary pda. Using Lemma 8, we can reformulate Lemma 7 in terms of pushdown automata. Roughly, we can say that for each computation C accepting a "long" input, there is another computation accepting the same input, which is obtained by pumping a suitable computation C 0 , chosen from a finite set, with a repeated pattern which is arbitrarily selected from another finite set that depends on C 0 . We will use this property to replace, in any accepting computation C, almost all the vertical loops with many occurrences of a horizontal loop, in the case a surface pair [r B] having a horizontal loop occurs in C. In this way, we will be able to obtain an accepting computation of bounded height on the same input.

Theorem 3 Let C be an accepting computation on input a which visits a surface pair
According to Lemma 7, we can obtain another tree T : S ⇒ a by pumping a tree T 0 : We observe that in the tree T , some of the k occurrences of T [r Br] , say t, could be nested, possibly giving a pushdown height of the corresponding computation which linearly increases with k. To fix this problem, we modify T as we now describe.
Let u be a node of T 0 labeled by [r Br] and T u be the subtree of T 0 rooted at u, such that T 0 is pumped starting from u with t nested occurrences of T [r Br] , 1 < t ≤ k. (The subtree rooted at u after the pumping is shown in Fig. 1, on  we append the tree T u , and to each of the remaining t − 1 leaves labeled [r Br] we append one leaf labeled with the empty word (we remind the reader that [r Br] → ε is a production of G). The subtree rooted at u after the pumping and the one obtained after the rearrangement are shown in Fig. 1.
Let T be the tree obtained after this modification, which still generates a . Using Corollary 2, we now estimate the height of the computation C corresponding to T , by calculating the unit production height of T , which is bounded by the maximum number h 0 of edges corresponding to unit productions in any path in T 0 plus the maximum number h 1 of such edges in any path in T [r Br] which, in turn, are bounded by the height of T 0 and T [r Br] , respectively. Using Lemma 4, we get h 0 ≤ ( 0 + 1)v and h 1 ≤ (i + j + 2)v (we remind the reader that the tree T [r Br] generates a string of length i + j + 1). Hence, the height of the pushdown is bounded by h 0 + h 1 ≤ ( 0 + i + j + 3)v. Considering the bounds on 0 and i + j, we In the case v = 1, the pda M can have only one state q, which is both initial and final, and only one pushdown symbol Z 0 . Since in the form we are considering for pdas transitions consuming input symbols do not change the pushdown (cf. Sect. 2), the only possibility to read an input symbol is that of having the transition (q, −) ∈ δ(q, a, Z 0 ). If this is the case, then any string in a * can be accepted by a computation which does not use the pushdown

Vertical increase without horizontal loops
Now we evaluate the increase of the pushdown in computations that do not use horizontal loops, i.e., between any two repetitions of a same surface pair [r B] at the same height either no input is consumed or there is at least one configuration with lower pushdown height.

Lemma 10 Let C be a [q Ap]-computation on a with pushdown increment bounded by h and without horizontal loops. Then
Proof We give the proof by induction on h. Let h 0 be the pushdown height at the beginning and at the end of C. We preliminary observe that since in C the pushdown height cannot be lower than h 0 and there are no horizontal loops, between any two repetitions of a same state at pushdown height h 0 no input symbols can be consumed, or else we would have a horizontal loop, a contradiction. Hence, we can remove from C the part between any two repetitions of such a state, to obtain a shorter [q Ap]-computation on the same input having pushdown increment bounded by h. By iterating this process, we finally get C with at most #Q configurations at pushdown height h 0 . If h = 0, i.e., the pushdown height is never incremented, then C consists of at most #Q − 1 moves. Hence, ≤ #Q − 1 = (#Q − 1) h+1 . Otherwise, we decompose C in k < #Q subcomputations C 1 , . . . , C k , where, for i = 1, . . . , k, C i starts with a push of a symbol, which is popped off the pushdown only in the last move of C i . Let C i be the subcomputation obtained by removing from C i the first and the last move and let a i be the input consumed during it. Then, the pushdown increment in C i is at most h − 1. By induction hypothesis, this implies i ≤ (#Q − 1) h . Since push and pop moves do not consume input symbols, we get that ≤ k(#Q − 1) h ≤ (#Q − 1) h+1 .
As a consequence of Lemma 10, the recognition of arbitrarily long strings without making use of horizontal loops requires unbounded pushdown height. This fact will be used later to derive a lower bound for such a pushdown height.

Decidability
Using the tools we developed so far, we are now able to prove the main result of this section: Clearly, the language L accepted by M is the union of L h and L v . According to Theorem 3, all strings in L h are accepted in constant height. More precisely, from M we can build a unary pda M h which accepts L h by simulating M and by accepting when the simulated computation is accepting and visits at least one surface pair having a horizontal loop, which can be decided according to Lemma 1.

Theorem 4 Let
To implement M h , we double the cardinality of the state set, in order to remember if some surface pair having a horizontal loop has been reached during the computation. That is, for each state q we create a copy q . Thus, the simulation is straightforward but, after visiting a surface pair having a horizontal loop, M switches to q instead of q. Hence, the final state of M h is q F . From M h , we can obtain an equivalent grammar in binary normal form with (2n) 2 m variables. However, in such a grammar, the triples [q Ap], where q and p are states of M, cannot generate any string (in fact, once a pair having a horizontal loop is reached, the computation of M h can only visit states in the copy of Q). This allows to reduce the number of variables to 3n 2 m = 3v. According to Theorem 3, each string in L h can be accepted using height smaller than 2 2(3v) 2 +log 2 (3v) = 2 18v 2 +log 2 v+log 2 3 .
If the set L v \ L h is infinite, then it should contain arbitrarily long strings; by Lemma 10, an arbitrarily high pushdown is required to accept them.
Otherwise, when L v \ L h is finite, M accepts in constant height, which is bounded by the maximum between the height used to accept strings in L h and the height used to accept strings in L v \ L h . To estimate the latter amount, first we notice that L v is accepted by a pda M v , which can be obtained by just removing from M all the transitions defined from surface pairs [r B] having horizontal loops. Hence, L v is generated by a context-free grammar in binary normal form with v = n 2 m variables. According to Lemma 2, from M h and M v , we obtain equivalent dfas with less than 2 9v 2 and 2 v 2 states, respectively. From them, using a standard product construction, we can obtain a dfa with less than 2 10v 2 states accepting L v \ L h . Since such a language is finite, the length of each string in it is less than the number of states of such a dfa, i.e., it is bounded by 2 10v 2 . By Lemma 9, this implies that each string in L v \ L h is accepted using height bounded by v2 10v 2 , which is lower than the bound we obtained for strings in L h . By summarizing, we can conclude that if M accepts in constant height, then it accepts in height smaller than 2 18v 2 +log 2 v+log 2 3 .

Theorem 5 It is decidable whether a unary pda accepts in constant height.
Proof As seen in the proof of Theorem 4, the language accepted by a pda M is the union of the languages L h (composed by the strings accepted by the computations of M which visit

Size versus height in the unary case
The arguments used in Sect. 4 to prove that it is decidable whether a unary pda accepts in constant height give an exponential upper bound for the maximum pushdown height, with respect to the size of a pda working in constant height (see Theorem 4). In this section, we prove that such an exponential bound cannot be reduced.
To prove this result, we will make use of some modifications of the pda described in the following example.
Example 1 Let us consider the language L k = a 2 k , where k > 0 is a given integer. A deterministic pda A k for L k might work as follows. The automaton can exploit its pushdown to implement the recursive function in order to read f (k) = 2 k input symbols. To this aim, it uses the state set Q = {q, r , p}, and the pushdown alphabet Γ = {A 0 , A 1 , . . . , A k , B 0 , B 1 , . . . , B k−1 }.
One call to f (i) is implemented by a [q X i p]-computation, with X i ∈ Γ . For i = 0, such a computation consists of one move which reads one input symbol (Transitions 1 or 2). Otherwise, the computation is split into two parts, both consuming 2 i−1 input symbols, as depicted in Fig. 2 We point out that the size of A k is linear in the parameter k, while the minimum dfa accepting L k has 2 k + 1 states.
We now present the main result of this section, by presenting a family of pdas accepting in height which is constant in the input length but exponential with respect to the size of the machines: Theorem 6 For each integer k > 0, there exists a pda M k having a size linear in k and accepting in height which is constant with respect to the input length but exponential in k.
Proof For each integer k > 0, let us consider two automata A k and A k , accepting the languages a 2 k * and a 2 k +1 * , respectively, obtained by modifying the automaton A k of Example 1 as follows: -A k is obtained by adding to A k the transition δ( p, ε, A k ) = (q, −) and by choosing q as final state. This allows A k to recognize a 2 k * with pushdown height k, using 3 states and a pushdown alphabet of size 2k + 1. We point out that, from such a definition, each accepting computation of A k visits the surface pair [q A k ] which has a horizontal loop. -A k , at the beginning of the computation, guesses how many repetitions of the word a 2 k +1 are concatenated in the input word. This is done, in a preliminary phase, by pushing one occurrence of the symbol A k on the store for each guessed repetition (Transitions 9 below). Then, for any such occurrence, A k makes the following operations: -it reads one a from the input (Transition 10), -it simulates one execution of A k , using Transitions 1-8 in Example 1, -it pops the symbol A k off the pushdown (Transition 11).
, and δ is a copy of δ with the addition of the following nondeterministic transitions: Notice that A k has 5 states and 2k + 2 pushdown symbols. Furthermore, the pushdown height used to accept the string a β(2 k +1) is β + k. So, A k does not accept in constant height.
It is easy to see that the automaton M k obtained by concatenating the automata A k and A k using standard techniques (after renaming the states in such a way that the two sets of states are disjoint) recognizes the language H k = {a t | t = α2 k + β(2 k + 1), α, β ≥ 0} and has 8 states and a pushdown alphabet of 2k + 2 symbols. By construction, the first part of each accepting computation of M k is an accepting computation of A k which, as above observed, visits a surface pair having a horizontal loop. Hence, from Theorem 3 it follows that M k accepts in constant height, with respect to the input length.
We now prove that a height exponential in k is necessary. Let us consider the string a t ∈ H k obtained by choosing α = 0 and β = 2 k − 1, namely t = (2 k − 1)(2 k + 1) = 2 2k − 1. We are going to prove that there is only one accepting computation on a t .
Hence, to accept a t an exponential height, with respect to the size of M k , is necessary.

An optimal lower bound for nonconstant height
In this section, we turn our attention to pdas accepting in nonconstant height. First of all, we mention that each nondeterministic Turing machine, with a two-way read-only input tape, which accepts in o(log log n) space, where n is the input length and the space is measured by considering the portion of an auxiliary work tape used during the least expensive computation, actually uses only a constant amount of space [1]. Since pushdown automata can be seen as a special case of this kind of machines, as a direct consequence, the height of the pushdown store in any pda accepting in nonconstant height should be at least log log n, for infinitely many n's. Furthermore, this lower bound is optimal [6].
We show that in the unary case the optimal bound increases to a logarithmic function. Let us start by proving the lower bound: Theorem 7 Let M be a unary pda using height h(n). Then, either h(n) is bounded by a constant or there exists c > 0 such that h(n) ≥ c log n infinitely often.
Proof According to the proof of Theorem 4, if h(n) is not constant, then there exist infinitely many strings in L v \ L h that are accepted only by computations that use vertical loops and do not visit surface pairs having horizontal loops. We are going to prove that to accept all these strings a logarithmic pushdown height is necessary. To this aim, let us consider the pda M v accepting L v , introduced in the proof of Theorem 4. This pda is obtained from M by removing all transitions from surface pairs having horizontal loops. Hence, M v uses the same set of states Q and the same pushdown alphabet Γ as M, but accepts without using surface pairs having horizontal loops. Let us fix an integer n such that a n ∈ L v \ L h . We first notice that by the construction of M v , the sets of computations of M and M v on a n coincide. Hence, M v accepts a n in the same height h(n) as M. Let us consider the pda M h(n) obtained by bounding the height of the pushdown of M v to h(n), which is a constant since n is fixed. In this way, the language accepted by M h(n) is a subset of the language accepted by M h(n) . However, M h(n) still accepts a n . To obtain M h(n) , the pushdown alphabet of M v is extended in order to keep track of the pushdown height, together with each symbol pushed on the pushdown. Hence, since the only symbol that appears at height 0 is Z 0 , the cardinality of the pushdown alphabet of M h(n) is bounded by #Γ · h(n) + 1. According to the construction in Sect. 4.2, M h(n) can be converted into an equivalent grammar in binary normal form with (#Q) 2 · (#Γ · h(n) + 1) variables, from which, using Lemma 2, we can obtain an nfa N h(n) which is equivalent to M h(n) , and whose number of states is 2 O(h(n)) .
Since M h(n) has pushdown height bounded by h(n), it cannot have vertical loops. Furthermore, since accepting computations of M v do not use surface pairs with horizontal loops, also accepting computations of M h(n) do not use horizontal loops. This allows to conclude that the language accepted by M h(n) is finite. Thus, in the equivalent nfa N h(n) , the string a n is accepted by a path without any repeated state. Hence, the number of states of N h(n) , which we already observed to be 2 O(h(n)) , must be greater than n.
To complete the proof, we finally notice that the previous argument can be applied to each string in L v \ L h . Since the cardinality of L v \ L h is infinite, this allows to conclude that 2 O(h(n)) > n infinitely often, thus implying the existence of a constant c such that h(n) ≥ c log n for infinitely many integers n.
In the next theorem, we prove a matching lower bound. The language accepted by the pda we present is a * . It should be clear that such a pda is not the best machine for this language: instead of a trivial one-state finite automaton, we use an inefficient pda which requires an unbounded pushdown store.
Theorem 8 There exists a unary pda accepting every word a , > 0, using pushdown height exactly log 2 + 1 and the empty word using height 0.
Proof Consider the pda = Q, {a}, Γ , δ, q I , Z 0 , q F , where Q = {q I , q 1 , q 2 , q F }, Γ = {Z 0 , 0, 1}, and the transition function δ is defined as follows: Fig. 3 The evolution of the pushdown store of A during the recursive subroutine leading from q I to q F , when recursive calls are made. The dashed lines should be replaced either by an ε-move or, recursively, by the same pattern According to the recursive subroutine implemented by A (see also Fig. 3), we can write the following recurrence: which has solution (h) = 2 h − 1. As a consequence, pushdown height h is sufficient to accept all strings of length up to 2 h − 1. Furthermore, since (h − 1) = 2 h−1 − 1 is the maximal length of strings accepted using height h − 1, we conclude that pushdown height h is necessary and sufficient to accept all strings of length , with 2 h−1 ≤ < 2 h . Hence, for > 0, the string a is accepted using pushdown height exactly log 2 + 1.
Funding Open access funding provided by Università degli Studi di Milano within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.