The Complexity of Compressed Membership Problems for Finite Automata
 983 Downloads
 4 Citations
Abstract
In this paper, a compressed membership problem for finite automata, both deterministic (DFAs) and nondeterministic (NFAs), with compressed transition labels is studied. The compression is represented by straightline programs (SLPs), i.e. contextfree grammars generating exactly one string. A novel technique of dealing with SLPs is employed: the SLPs are recompressed, so that substrings of the input word are encoded in SLPs labelling the transitions of the NFA (DFA) in the same way, as in the SLP representing the input text. To this end, the SLPs are locally decompressed and then recompressed in a uniform way. Furthermore, in order to reflect the recompression in the NFA, we need to modify it only a little, in particular its size stays polynomial in the input size.
Using this technique it is shown that the compressed membership for NFA with compressed labels is in NP, thus confirming the conjecture of Plandowski and Rytter (Jewels Are Forever, pp. 262–272, Springer, Berlin, 1999) and extending the partial result of Lohrey and Mathissen (in CSR, LNCS, vol. 6651, pp. 275–288, Springer, Berlin, 2011); as this problem is known to be NPhard (in Plandowski and Rytter, Jewels Are Forever, pp. 262–272, Springer, Berlin, 1999), we settle its exact computational complexity. Moreover, the same technique applied to the compressed membership for DFA with compressed labels yields that this problem is in P, and this problem is known to be Phard (in Markey and Schnoebelen, Inf. Process. Lett. 90(1):3–6, 2004; Beaudry et al., SIAM J. Comput. 26(1):138–152, 1997).
Keywords
Compressed membership problem SLP Finite automata Algorithms for compressed data1 Introduction
1.1 Compression and StraightLine Programmes
Due to everincreasing amount of data, compression methods are widely applied in order to decrease the data’s size. The stored data is processed from time to time and decompressing it on each occasion is wasteful. Thus there is a large demand for algorithms working directly on the compressed representation of the data, without explicit decompression. Such task is not as hopeless as it may seem at the first sight: it is a popular outlook that compression basically extracts the hidden structure of the text and if the compression rate is high, the text must have a lot of internal structure. So if data is compressed well, it has a structure that can be exploited by algorithms. In some sense such an intuition is correct: efficient algorithms for fundamental text operations (pattern matching, checking equality, etc.) are known for various practically used compression methods (LZ, LZW, their variants etc.) [10, 11, 12, 13, 15].
Practical compression methods, like LZW or LZ variants, differ both in main idea as well in details and so also algorithms for data compressed using various compression methods are different as well. This leads to a plethora of algorithms for various compression variants and string operations [2, 4, 7, 8, 10, 11, 12, 13, 15, 16, 17, 22, 23, 35, 36]. However, a different approach is also explored: for some applications and for most of theoryoriented considerations it would be useful to model the practical compression standard by a more mathematically wellfounded method. This idea lay at the foundations of the notion of StraightLine Programms (SLP), whose instances can be simply seen as contextfree grammars generating exactly one string. In particular, SLPs belong to a broader family of grammarbased compression methods.
SLPs are the most popular theoretical model of compression. This is on one hand motivated by a simple, ‘clean’ and appealing definition, on the other hand, they model well the LZ compression standard: each LZcompressed text can be converted into an equivalent SLP with \(\mathcal {O}(\log( N / n))\) multiplicative increase in length and in \(\mathcal {O}(n \log(N/n))\) time (where N is the size of the decompressed text) [5, 40] while each SLP can be converted to an equivalent LZcompressed text with a constant increase of length (and in linear time).
The approach of modelling compression by SLP in order to develop efficient algorithms turned out to be fruitful. Algorithmic problems for SLPcompressed strings were considered and successfully solved [25, 26, 36]. In particular, the recent stateoftheart efficient algorithms for pattern matching in LZ compressed text essentially use the reformulation of LZ methods in terms of SLPs [11]. SLPs found their usage also in programme verification [14] as well as in verifying the bisimulation [6, 24]. Surprisingly, while SLPs were introduced mainly as a model for practical applications, they turned out to be useful also in strictly theoretical branches of computer science, for instance, in the word equations [37, 38]; in particular, the currently best PSPACE bound was obtained in this fashion by Plandowski [37].
For more information about the SLPs and their applications, please look at the recent survey of Lohrey [29].
1.2 Membership Problem
As SLPs are used both in theoretical and applied research in computer science, tools for dealing with them should be developed. In particular, one should be aware that whenever working with strings, these may be supplied as respective SLPs. Hence, all the usual string problems should be reinvestigated in the compressed setting, as the classical algorithms may not apply directly, be inefficient or the problems themselves may become computationally difficult.
From language theory point of view, the crucial questions stated in terms of strings, is the one of compressed string recognition. To be more precise, we consider classic membership problems, i.e. recognition by automata, generation by a grammar etc., in which the input is supplied as an SLP. We refer to such problems as compressed membership problems. These were first studied in the pioneering work of Plandowski and Rytter [39], who considered compressed membership problem for various formalism for defining languages. Already in this work it was observed that we should precisely specify, what part of the input is compressed. Clearly the input string, but what about the language representation (i.e. regular expression, automaton, grammar, etc.). Should it be also compressed or not? Both variant of the problem are usually considered, with the following naming convention: when only the input string is compressed, we use a name compressed membership, when also the language representation, we prepend fully to the name.
In years to come, the compressed membership problem was investigated for various language classes [15, 20, 21, 27, 28, 39]. Compressed word problem for groups and monoids [27, 31, 32], which can be seen as a generalisation of membership problem, was also considered.
Despite the large attention in the research community, the exact computational complexity of some problems remained open. The most notorious of those is the fully compressed membership problem (FCMP) for NFA, considered already in the work of Plandowski and Rytter [39]. Here, the compression of NFA is done by allowing it to have transitions by strings, instead of single letters, and representing these strings as SLPs.
It is relatively easy to observe that the compressed membership problem for the NFA is in P, however, the status of the fully compressed variant remained open for over a decade. Some partial results were already obtained by Plandowski and Rytter [39], who observed that it is in PSPACE and is NPhard for the case of oneletter alphabet, both of these bounds being relatively natural. Moreover, they showed that this problem is in NP for some particular cases, for instance, for oneletter alphabet. Further work on the problem was done by Lohrey and Mathissen [30], who demonstrated that if the strings defined by SLP have polynomial periods, the problem is in NP, and when all strings are highly aperiodic, it is in P. Concerning the case of DFAs, it is known that even for a fixed regular language, the compressed membership is Phard [3, 33], and no upperbound better than PSPACE (which holds for NFAs as well) was known.
1.3 Our Results and Techniques
We establish the computational complexity of fully compressed membership problems for both NFAs and DFAs.
Theorem 1
Fully compressed membership problem for NFA is in NP, for DFA it is in P.
Our approach to the problem is essentially different than the ones of Plandowski and Rytter [39] and Lohrey and Mathissen [30]. The earlier work focused on the properties of strings described by SLPs and tried to use this knowledge in order to analyse the automaton and its input, hopefully this should result in an efficient algorithm. We take a completely different route: we analyse and change the way strings are described by the SLPs in the instance. That is, we focus on the SLPs, and not on the encoded strings. Roughly, our algorithm replaces a substring of two letters ab appearing in the input string with a new letter c throughout the instance and iterates this process. In this way strings in the instance are compressed ‘in the same way’. The intuition behind this is that already testing the equality of SLPs is a nontrivial task [36] and consequently performing other operations on SLPS is even more involved. When all strings are compressed in the same way, two appearances of the same string in the instance are represented in a canonical way, so, for instance, testing equivalence is trivial, and in fact other operations on the SLPs can be performed more efficiently.
In order to perform the compressions of a pair ab we first decompress the SLPs, so that appearances of ab can be easily identified. Since the decompressed text can be exponentially long, we do the decompression locally: we introduce explicit strings into the rules’ bodies. Then, we compress these explicit strings uniformly. Since such pieces of text are compressed in the same way, we can ‘forget’ about the original substrings of the input and treat the introduced nonterminals as atomic letters. Such recompression shortens the text significantly: one ‘round’ of recompression, in which every pair of letters that was present at the beginning of the ‘round’ is compressed, should shorten the encoded strings by a constant factor.
1.4 Similar Techniques
While application of the idea of recompression to compressed membership problem is new, related approaches were previously employed, and somehow inspired the presented technique: most notably the idea of replacing short strings by a fresh letter and iterating this procedure was used by Mehlhorn et. al [34], in their work on data structure for equality testing for dynamic strings. They viewed this process as ‘hashing’ or ‘signature building’. In particular their method can be straightforwardly applied to equality testing for SLPs, yielding a nearly cubic algorithm (as observed by Gawrychowski [9]); a faster implementation of the employed data structure was also proposed [1] and it leads to a nearly quadratic algorithm for the SLP equivalence testing [9]. However, the inside technical details of the construction makes the extension to FCMP problematic: while this method can be used to build ‘canonical’ SLPs for strings in the instance, there is no apparent way to control how these SLPs actually look like and how do they encode the strings. Thus it is unknown how to modify the NFA and its transitions in the process of building canonical SLPs.
The (mentioned earlier) recent approaches to FCMP by Mathissen and Lohrey [30] already implemented the idea of replacing strings with fresh letters as well as modifications of the instance such that such replacement is possible. Also, while it was known that the problem is NPhard already in the case of a oneletter alphabet [39], their algorithm used nondeterminsim only in this case; thus suggesting that the nondeterminism in the problem is strongly connected to long blocks of the same letter. However, the replacement was not iterated, and the newly introduced letters could not be further compressed. Also, they replaced blocks of letters chosen by some predefined criteria, in particular, only parts of the instance were compressed.
1.5 Other Applications of the Technique
The technique of local recompression was also successfully applied to fully compressed pattern matching [18], obtaining a faster (i.e. almost quadratic) algorithm for this problem. Furthermore, a variant of this method was also applied in the area of word equations. While not claiming any essentially new results, the recompression approach yielded much simpler proofs and algorithms with smaller memory consumption (though still polynomial) of many classical results in the area, like PSPACE algorithm for solving word equations, double exponential bound on the size of the solution, exponential bound on the exponent of periodicity, contextsensitiveness in case of \(\mathcal {O}(1)\) variables, etc. [19].
2 Preliminaries
2.1 Straight Line Programmes
Formally, a Straight line programme (SLP) is a CFG such that each nonterminal has a unique rule and it cannot derive itself, i.e. the grammar is acyclic. Usually it is assumed that G is in a Chomsky normal form, i.e. each production is either of the form X→YZ or X→a. By this assumption, strings defined by G’s nonterminals have length at most 2^{ n }, where n is the number of nonterminals of the grammar; since our algorithm will replace some substrings by shorter ones, none string defined by SLPs during the run of algorithm will exceed this length.
Remark 1
During the run of the CompMem, each string derived by a nonterminal has length at most 2^{ n }.
We denote the unique string derived by nonterminal A by \(\operatorname {val}(A)\) (like value). A symbol is either a letter or a nonterminal. The notion of \(\operatorname {val}\) extends to strings of symbols in an obvious way.
2.2 Input
The instance of the fully compressed membership problem (FCMP) for NFA consists of an input string, represented by an SLP, and an NFA N, whose transitions are labelled by SLPs.
The strings that appear in the rule’s body appear explicitly in this rule, or alternatively they are called explicit strings; this notion is introduced to distinguish them from the substrings of \(\operatorname {val}(X_{i})\).
Without loss of generality we may assume that the input string starts and ends with designated, unique symbols, denoted as $ and #. These are not essential, however, the first and last letter of \(\operatorname {val}(X_{n})\) need to be treated in a somewhat special manner, furthermore, also the transitions by $ and # in the NFA are treated in a special way (for instance, there is a unique transition by each of them). Having special symbols for the first and last letter makes the analysis smoother.
2.3 Input Size, Complexity Classes
The size G of the representation of grammar G is the sum of the lengths of G’s rules’ bodies (we count ϵ as occupying a single entry). The size N of the representation of NFA N is the sum of number of its states and transitions. The size Σ of alphabet Σ is simply the number of elements in Σ.
By npolytime (polytime) we denote the nondeterministic polynomial running time (deterministic, respectively), with respect to N, Σ, G and n. As usual, depending on the context, this describes either the running time of a specific algorithm, or a class of algorithms running in such time; the context will always uniquely determine, which of this meaning applies. By NP (P, respectively) we denote the complexity classes of the decision problems solvable in npolytime (polytime, respectively).
The input instance size is polynomial in N, G, Σ and n, which denotes the number of nonterminals in G. One of the crucial properties of our algorithm is that n only decreases during the run of the algorithm. For this reason we modify the input instance so that its size is polynomial in n alone:
Remark 2
Without loss of generality we may assume that initially N, Σ and G are all at most n.
To satisfy this additional condition it is enough to add dummy nonterminals (with rules) that are not used anywhere in the instance. Clearly this increases the size of the instance by a constant factor.
2.4 Automata, Paths and Labels, Determinism
Since we investigate automata, proofs deal mainly with (accepting) paths for strings. The constructed NFAs have transitions labelled with either letters, or nonterminals of G. That is \(\delta \subseteq Q \times (\varSigma \cup \mathcal{X}) \times Q\). Consequently, a path \(\mathcal{P}\) from state p _{1} to p _{ k+1} (with intermediate states p _{2}, …, p _{ k }) is a sequence α _{1} α _{2}…α _{ k }, where \(\alpha_{i} \in \varSigma \cup \mathcal{X}\) and δ(p _{ i },α _{ i },p _{ i+1}). We write that \(\mathcal{P}\) induces such a list of labels. The \(\operatorname {val}(\mathcal{P})\) defined by such a path \(\mathcal{P}\) is simply \(\operatorname {val}(\alpha_{1}\dots \alpha_{k})\). We also say that \(\mathcal{P}\) is a path for a string \(\operatorname {val}(\mathcal{P})\). A path is accepting, if it ends in an accepting state. We usually consider paths beginning in a starting state. A string w is accepted by N if there is an accepting path from the starting state for w.
We consider also DFAs with compressed labels. Let us comment, what ‘determinism’ means here: a NFA with compressed labels is deterministic, when for each state q and any two transitions from q labelled with α and α′, the first letters of \(\operatorname {val}(\alpha)\) and \(\operatorname {val}(\alpha')\) are different. Other known (meaningful) definitions of determinism are polynomially equivalent to the given one.
2.5 Known Results
We use the following basic result: when the input string is over a oneletter alphabet the FCMP is in NP for NFA and in P for DFA.
Lemma 1
(cf. [39, Theorem 5])
The FCMP restricted to the input string over an alphabet Σ={a} is in NP for NFA and in P for DFA.
Proof
Notice that if the input string w is over an alphabet {a}, no accepting path in the NFA for w may use transitions that denote strings having letters other than a. Thus, any such transitions can be deleted from N and we end up with an instance, in which Σ={a}, i.e. also transitions in the NFA are labelled either with a letter a or by nonterminals defining some powers of a. Then the result of Plandowski and Rytter [39, Theorem 5] can be applied directly, claiming an NP upperbound.
Their proof follows by an observation that when Σ={a} then an accepting path in the NFA exists if and only if the NFA satisfies an Euleriantype condition: each state is entered and leaved the same number of times. Since each transition is used at most exponentially many times, a description of such a path can be guessed and then it can be verified whether it satisfies the condition and defines a word of appropriate length.
Consider now the deterministic automaton. As shown in the beginning, we can limit ourselves to transitions by powers of a. Since the automaton is deterministic, for each state there is at most one transition labelled with a power of a, and so the path for w cycles after at most n transitions. As the lengths of the cycle can be calculated in polytime, the whole problem can be easily checked in polytime. □
3 Basic Classifications and Outline of the Algorithm
In this section we present the outline of the algorithm for FCMP for NFAs. Its main part consist of recompression, i.e. replacing strings appearing in \(\operatorname {val}(X_{n})\) by shorter ones. In some cases, such replacing is harder, in other easier and identifying such hard cases is the first step in this section.
3.1 (Non)Crossing Appearances, Maximal Blocks
We say that a string s has a crossing appearance in a (unique string derived by) nonterminal X _{ i } with a production X _{ i }→uX _{ j } vX _{ k } w, if s appears in \(\operatorname {val}(X_{i})\), but this appearance is not contained in neither u, v, w, \(\operatorname {val}(X_{j})\) nor \(\operatorname {val}(X_{k})\). Intuitively, this appearance ‘crosses’ the symbols in \(u\operatorname {val}(X_{j})v\operatorname {val}(X_{k})w\), i.e. at the same time part of s is in the explicit substring (u, v or w) and part is in the compressed strings (\(\operatorname {val}(X_{j})\) or \(\operatorname {val}(X_{k})\)). This notion is similarly defined for nonterminals with productions of only one nonterminal, i.e. of the form X _{ i }→uX _{ j } v, productions of the form X _{ i }→u clearly do not have crossing appearances.
A string s has a crossing appearance in the NFA N, if there is a path in N inducing list of labels α _{1} α _{2}, where \(\alpha_{1}, \alpha_{2} \in \mathcal{X} \cup \varSigma\) with at least one of α _{1}, α _{2} being a nonterminal, such that s appears in a \(\operatorname {val}(\alpha_{1}\alpha_{2})\), but this appearance is not contained in the \(\operatorname {val}(\alpha_{1})\), nor in \(\operatorname {val}(\alpha_{2})\). The intuition is similar as in the case of crossing appearance in a rule: it is possible that a string s is split between two transitions’ labels. Still, there is nothing difficult in consecutive letter transitions, thus we treat such a case as a simple one.
We say that a pair of letters ab is a crossing pair, if ab has a crossing appearance of any kind. Otherwise, such a pair is noncrossing. Unless explicitly written, whenever we talk about crossing/noncrossing pair ab we assume that a≠b.
We say that a letter a∈Σ has a crossing block, if for some ℓ the a ^{ ℓ } has a crossing appearance; otherwise, a has no crossing block. This can be equivalently characterised by saying that a has a crossing block if and only if aa is a crossing pair.
The letters with crossing blocks and crossing pairs correspond to the intuitive notion of being ‘hard’ to compress.
The following lemma shows that while G may encode long strings, they have relatively few different short substrings and that they can be established efficiently (recall that n is the number of all noterminals in G, more precisely, they are X _{1}, …, X _{ n }).
Lemma 2
There are at most 2n different letters with crossing blocks and at most G+4n different pairs of letters appearing in \(\operatorname {val}(X_{1}), \ldots, \operatorname {val}(X_{n})\).
The set of letters with crossing blocks, the set of crossing pairs and the set of noncrossing pairs appearing in \(\operatorname {val}(X_{1}), \ldots, \operatorname {val}(X_{n})\) can be computed in polytime.
Proof
Since a letter a has a crossing block if and only if aa is a crossing pair, it follows that if a has a crossing block then it is either the first, or the last letter of some \(\operatorname {val}(X_{i})\). Since there are at most n nonterminals, there are at most 2n letters with a crossing block. In order to calculate the set of letters with a crossing block, it is enough to calculate the set of crossing pairs, as a has a crossing block if and only if aa is a crossing pair. Hence, in the rest of this proof all crossing pair can be of the form aa.

a is the last letter of u and b is the first letter of \(\operatorname {val}(X_{j})\),

a is the last letter of \(\operatorname {val}(X_{j})\) and b is the first letter of v,

a is the last letter of v and b is the first letter of \(\operatorname {val}(X_{k})\),

a is the last letter of \(\operatorname {val}(X_{k})\) and b is the first letter of w.
The above description can be turned to a straightforward algorithm computing both the list of all noncrossing and crossing pairs appearing in \(\operatorname {val}(X_{1})\), …, \(\operatorname {val}(X_{n})\). First, the list of all pairs of letters with such appearances is calculated: clearly, it is enough to read every rule (for X _{ i }) and store the pairs that appear in the explicit strings and the pairs that are assigned to X _{ i }. Then, for each pair of letters it should be decided, whether it is crossing. To this end, we check, whether it has a crossing appearance in any nonterminal or in N, which can be done in polytime. Such pairs are crossing, other are noncrossing. Lastly, we filter out the pairs of the form aa from these lists, which gives the list of letters with crossing blocks. □
The notions of (non) crossing pairs do not apply to aa, still, an analog can be defined: for a letter a∈Σ we say that a ^{ ℓ } is a a’s maximal block of length ℓ (or simply ℓblock), if it appears in some string defined by some nonterminal and it is surrounded by letters other than a, formally, if there exist two letters x,y∈Σ, where x≠a≠y and a nonterminal X _{ i }, such that xa ^{ ℓ } y is a substring of \(\operatorname {val}(X_{i})\). Similarly to crossing pairs, it can be shown that there are not too many different maximal blocks of a.
Lemma 3
For a letter a there are at most G+4n different lengths of a’s maximal blocks in \(\operatorname {val}(X_{1})\), …, \(\operatorname {val}(X_{n})\). The set of these lengths can be calculated in polytime.
The proof of Lemma 3 is similar to the proof of Lemm 2; however, Lemma 3 is not shown now, instead, we give a proof of a stronger Lemma 13 in a later section.
3.2 Outline of the Algorithm
 blocks compression of a

For each a ^{ ℓ } that is an ℓblock in \(\operatorname {val}(X_{n})\) and ℓ>0, replace all a’s ℓblocks in \(\operatorname {val}(X_{1})\), …, \(\operatorname {val}(X_{n})\) by a fresh letter a _{ ℓ }. Modify N accordingly.
 pair compression of ab

For two different letters a, b such that substring ab appears in \(\operatorname {val}(X_{1})\), …, \(\operatorname {val}(X_{n})\) replace each substring ab in \(\operatorname {val}(X_{1})\), …, \(\operatorname {val}(X_{n})\) by a fresh letter c. Modify N accordingly.
We denote the string obtained from w by a’s blocks compression by BC _{ a }(w), and the string obtained by compression of a pair ab into c by PC _{ ab→c }(w).
We adopt the following notational convention throughout rest of the paper: whenever we refer to a letter a _{ ℓ }, it means that the last block compression was done for a and a _{ ℓ } replaced a’s ℓblocks.
The preprocessing, which is described in details later, modifies the instance slightly, so that the number of crossing pairs can be upperbounded in terms of n: note that as crossing pairs can come from the NFA transitions, in general there can be as much as Ω(N) crossing pairs, which is more than the analysis can handle.

there is no explicit nondeterministic operation in the code, however, it appears implicitly in the term ‘modify the NFA accordingly’ in lines 12 and 14. Roughly, to perform such a modification, one needs to solve FCMP for string a ^{ ℓ }, and this is known to be NPcomplete.

the compression (both of pairs and blocks) is never applied to $, nor to #. The markers were introduced so that we do not bother with strange behaviour when first or last letter is compressed, and so we do not touch the markers.

CompMem, as presented, is deterministic, however some of its subprocedures are nondeterministic. This can make a false impression that it is in the class P ^{ NP }. However, we never alter the results returned by the nondeterministic procedures, and so CompMem is in fact in NP; this is formally stated and proved in the later sections.
Ideally, each letter of the input is compressed and so the \(\operatorname {val}(X_{n})\) halves in an iteration of the main loop. The worst case scenario is not far from the ideal behaviour.
Lemma 4
There are \(\mathcal {O}(n)\) executions of the loop in line 1 of CompMem.
Proof
Consider any 2 consecutive letters ab, where a≠$ and b≠#, appearing in the \(\operatorname {val}(X_{n})\) at the beginning of loop starting in line 1. We show that at least one of these two letters is compressed before the next execution of this loop. In this way, if we partition \(\operatorname {val}(X_{n})\) into blocks of 4 consecutive letters, each block is shortened by at least one letter in each iteration of the loop from line 1. Thus the length of \(\operatorname {val}(X_{n})\) decreases by a factor of 3/4 in each iteration and so this loop is executed at most \(\mathcal {O}( n )\) times, as in the input instance satisfies \(\operatorname {val}(X_{n}) \leq 2^{n}\), see Remark 1.
Assume for the sake of contradiction that none of letters a, b is compressed during this iteration of the loop.
If a≠b, then ab is going to be included in P or P′ in line 3 or 4, respectively, depending on whether ab is crossing or not. Then CompMem will attempt to compress ab, either in line 6 or 8, and this fails only if one of the letters a or b was already compressed. This contradicts the assumption that none of the a, b was compressed.
So suppose now that a=b and that none of these letters is compressed. In particular, none of these two appearances of a were compressed in line 6 or 8. Thus, a will be listed in L′ in line 10 or in L in line 9, depending on whether it has crossing appearances or not. In the latter case, CompMem will compress the maximal block, in which these letters appear, in line 12, in the former in line 14. In either case, these letters a are compressed, a contradiction with the assumption that they were not. □
Remark 3
Notice that pair compression PC _{ ab→b } is in fact introducing a new nonterminal with a production c→ab, similarly BC _{ a } introduces nonterminals a _{ ℓ } with productions a _{ ℓ }→a ^{ ℓ } (notice that in order to transform this production to Chomsky normal form, introduction of some other nonterminals is needed). Hence, CompMem creates new SLPs that encode strings from the instance. However, these new nonterminals are never expanded, they are always treated as individual symbols. Thus it is better to think of them as letters. Moreover, the analysis of running time of CompMem relies on the fact that no new nonterminals are introduced by CompMem.
4 Details
In this section we describe in detail how to implement the block compression and pair compression and how to modify the NFA. In particular, we are going to formulate the connections between NFA and SLPs preserved during CompMem.
4.1 Invariants
 1.
every transition of N is labelled by a single letter of Σ (letter transition) or by a nonterminal (nonterminal transition) that does not define ϵ, each nonterminal labels at most one transition. No transition is labelled with X _{ n }.
 2.
there is a unique starting state that has a unique outgoing transition labelled by letter $, and no incoming transitions; there is no other transition by $. Similarly, there is a unique accepting state that has a unique incoming transition labelled by letter #, it does not have any outgoing transitions; there is no other transition by # in N.
We assume that the input instance satisfies (SLP 1)–(Aut 2), moreover that the input grammar is in the Chomsky normal form. It is routine to transform (in polytime) the input instances not satisfying these conditions into equivalent instances that satisfy them; the only (seemingly) nontrivial one is the second requirement of (Aut 2) that there is a unique accepting state with a unique incoming transition for a DFA: to satisfy this condition we add two symbols #_{1}# to the end of the X _{ n } and create two new states in N, p _{1} and p. Then we make a transition by #_{1} from each accepting state to p _{1} and a unique transition from p _{1} to p by #. Lastly, p becomes the unique accepting state.
4.2 Compression of Pairs
The compression of noncrossing pairs is intuitively easy: whenever these appear in strings encoded by G or on paths in N, they cannot be split between nonterminals or between transitions. So we replace their explicit appearances in the grammar and in the NFA. This is formalised and shown in the first subsection.
The compression of the crossing pairs does not directly follow this approach, however, for a fixed crossing pair ab we show that a simple transformation of the SLP makes ab a noncrossing pair, so that it consequently can be compressed using the known procedure. Thus, the compression of crossing pairs also can be done: first we fix a pair ab, then transform the instance, so that ab is noncrossing and then compress ab, using the already described procedure for compression of a noncrossing pair. There can be as much as Ω(N) crossing pairs, however, a simple preprocessing reduces this number to \(\mathcal {O}(n)\), see Lemma 10 later in this section. In particular, the transformation is used in total \(\mathcal {O}(n)\) many times. This is described in detail in the second subsection.
4.2.1 Compression of Noncrossing Pairs
To distinguish between the input and output G and N, we utilise the following convention: ‘unprimed’ names refer to the input (like G, X _{ i }, N), while ‘primed’ symbols refer to the output (like G′, \(X_{i}'\), N′). This convention is used in lemmata concerning algorithms through the paper.
Lemma 5
PairComp(ab,c) runs in polytime and preserves (SLP 1)–(Aut 2). When applied to a noncrossing pair of letters ab, where a,b∉{$,#}, it implements the pair compression, i.e. \(\operatorname {val}(X_{i}') = PC_{ab \to c}(\operatorname {val}(X_{i}))\), for each X _{ i }.
N′ recognises \(\operatorname {val}(X_{n}')\) if and only if N recognises \(\operatorname {val}(X_{n})\). If N is a DFA, so is N′.
If de was a noncrossing pair in G, N and d≠c≠e then de is also a noncrossing pair in G′, N′.
Proof
The bound on the running time is obvious from the code.
Since PairComp only modifies the grammar by shortening some strings in the productions (it does not create ϵrules), and it does not affect $ and # in the rules, (SLP 1)–(SLP 2) are preserved. The only modification in N is the introduction of new transition by a single letter (namely, by c) between states that are joined by a path for ab. Moreover, if there is a new transition δ _{ N′}(p′,c,q′), then p has an outgoing production by a∉{$,#}, and so it was not a starting or accepting state, and q had an incoming transition by b∉{$,#}, and similarly it was not a starting nor accepting state. Thus (Aut 1)–(Aut 2) hold for N′ as well. Notice that if N is deterministic, so is N′: suppose that there are two different transitions starting with a letter d from state p in N′. If d≠c, then these two transition are also present in N and they begin with the same letter, which is not possible, as N is deterministic. If d=c, then either in N there are two transitions from p whose strings begin with a or there is a unique such transition, but δ(p,a) has two transitions whose strings begin with b. In both cases this is a contradiction.
We now show that N′ recognises \(\operatorname {val}(X_{n}')\) if and only if N recognises \(\operatorname {val}(X_{n})\). To this end we demonstrate, how PairComp affects \(\operatorname {val}(X_{i})\):
Claim 1
Proof
Notice that as a≠b, PC _{ ab→c } is well defined for each string.
The second claim similarly establishes, how the pair compression of a noncrossing pair affects the NFA. To be more precise, what happens to a string defined by a path in the NFA after applying pair compression to the underlying NFA.
Claim 2
Proof
Similarly as in Claim 1, notice that as ab is a noncrossing pair, the appearance of ab in the string defined by \(\mathcal{P}\) cannot be split between a nonterminal and a string (or other nonterminal). Thus, replacement of pairs ab takes place either wholly inside string u or inside \(\operatorname {val}(X_{i})\). The former is done explicitly by PC _{ ab→c }, while (2) establishes the form of the latter. This ends claim’s proof. □
After proving Claims 1–2, it is easy to show the main thesis of the lemma, i.e. that \(\operatorname {val}(X_{n}')\) is accepted by N′ if and only if \(\operatorname {val}(X_{n})\) is accepted by N.

if there is a transition δ _{ N }(p,d,q) for a letter d∈Σ in N, then there is the same transition δ _{ N′}(p,d,q) in N′.

if there is a path from p to q for a string ab in N then there is a transition δ _{ N′}(p,c,q) in N′.

if there is a letter transition δ _{ N′}(p,c,q) in N′, there is a path from p to q for a string ab in N.

if there is a letter transition δ _{ N′}(p,d,q) for a letter d≠c, there is the same transition δ _{ N }(p,d,q) in N.
It is left to show that \(\operatorname {val}(\mathcal{P}) = \operatorname {val}(X_{n})\). Since PC _{ ab→c } is a onetoone function on string that do not contain c, and both \(\operatorname {val}(\mathcal{P})\) and \(\operatorname {val}(X_{n})\) do not contain c, it is enough to show that \(PC_{ab \to c}(\operatorname {val}(\mathcal{P})) = PC_{ab \to c}(\operatorname {val}(X_{n}))\). Notice that the latter equals \(\operatorname {val}(X_{n}')\), by (2).
Finally, consider the last claim of the lemma: let a pair de∈P be a noncrossing pair. Since the whole modification to G is the replacement of pairs ab by a letter c, if de (where d≠c≠e) had no crossing appearances in G, then it does not have them in G′. Similarly, observe that transition by d and e in N and N′ are the same, and if \(\operatorname {val}(X_{i}')\) begins with e (ends with d) then also \(\operatorname {val}(X_{i}')\) begins with e (ends with d, respectively). Thus, if de has a crossing appearance in N′ then it also has it in N. □
4.2.2 Crossing Pair Compression
In this section we first show how to transform a crossing pair to a noncrossing one, so that PairComp can be applied to it. Then, we show how to perform a preprocessing after which the number of different crossing pairs is at most linear.
There are two possible reasons, why ab is a crossing pair: it may be that it is a crossing pair for some nonterminal, or it has a crossing appearance in the NFA. In both cases, the problem has something to do with the fact that b is the first letter of \(\operatorname {val}(X_{i})\) or a is the last letter of some \(\operatorname {val}(X_{i})\). To ‘fix’ this, we ‘pop’ such b and a from respective nonterminal: consider b and suppose that \(\operatorname {val}(X_{i}) = bw\). Then we modify G so that \(\operatorname {val}(X_{i}') = w\) and modify N to reflect it, if X _{ i } labels a transition in N. Clearly, after performing such operation for each nonterminal (including the symmetric procedure for a), b can still be a first letter of \(\operatorname {val}(X_{i})\) (or a can be a last letter of \(\operatorname {val}(X_{i})\), respectively); however, we show that ab is no longer a crossing pair and that these procedures can be easily performed.
Lemma 6
LeftPop runs in polytime and when applied to b∉{$,#} it preserves (SLP 1)–(Aut 2). If \(\operatorname {val}(X_{i})=bu\) for some u∈Σ ^{∗} then \(\operatorname {val}(X_{i}') = u\); otherwise \(\operatorname {val}(X_{i}') = \operatorname {val}(X_{i})\).
N′ accepts \(\operatorname {val}(X_{n}')\) if and only if N accepts \(\operatorname {val}(X_{n})\). If N is deterministic, so is N′.
Proof
The loop is executed n times, and also each line of the code can be performed in polytime, and so in total LeftPop runs in polytime.
Concerning the preservation of invariants: since the only operation performed on G is replacing nonterminal X _{ i } by \(bX_{i}'\) and then deleting the first letter of the nonterminal, and nonterminals generating ϵ are explicitly removed from the rules, the resulting grammar is in the form (1a)–(1c). Notice that the invariant (SLP 1) is clearly preserved by the listed operations. As b∉{#,$}, the first and last symbol of \(\operatorname {val}(X_{n})\) are not modified, and so (SLP 2) holds as well.
Let us move to the NFA invariants: the only change applied to NFA is the replacement of the transitions δ _{ N }(p,X _{ i },q) by a path δ _{ N′}(p,b,p _{1}), δ _{ N′}(p _{1},X _{ i },q), or by δ _{ N′}(p,b,q), where b is the first letter of \(\operatorname {val}(X_{i})\). Clearly this does not affect (Aut 1)–(Aut 2). For the same reason, if N is deterministic, so is N′.
 if the rule for X _{1} is X _{1}→bu for some u∈Σ ^{∗}

then this b is removed from the rule and each appearance of X _{1} in the rules’ bodies is replaced with \(b X_{1}'\). Hence, \(\operatorname {val}(X_{1}) = b\operatorname {val}(X_{1}')\) and for each other nonterminal \(\operatorname {val}(X_{j}) = \operatorname {val}(X_{j}')\).
 if the rule for X _{1} is X _{1}→u for some u not beginning with b

then nothing is changed and so \(\operatorname {val}(X_{j}) = \operatorname {val}(X_{j}')\).
So consider an inductive step, let LeftPop consider the nonterminal X _{ i+1}. We distinguish three copies of nonterminals now: the original one (so X _{ i+1}), the one obtained after the processing of X _{ i } but before X _{ i+1} (those are denoted with primes, i.e. \(X_{i+1}'\)) and the ones that are obtained after considering X _{ i+1} (which are denoted with double prime, i.e. \(X_{i+1}''\)). By the inductive assumption \(\operatorname {val}(X_{i+1}) = \operatorname {val}(X_{i+1}')\). If \(\operatorname {val}(X_{i+1})\) does not begin with b, then the rule for \(X_{i+1}'\) does not begin with b either and so LeftPop performs no action and we are done. So suppose that \(\operatorname {val}(X_{i+1})\) begins with b. By the inductive assumption \(X_{i+1}'\) defines the same string, and so begins with b as well. We claim that the first letter in the rule for \(X_{i+1}'\) is b: as \(\operatorname {val}(X_{i+1}')\) begins with b, the only other option is that it begins with some nonterminal \(X_{k}'\) for k<i. But then a contradiction is easily obtained: \(\operatorname {val}(X_{k}')\) begins with b and so it can be concluded that \(\operatorname {val}(X_{k})\) begins with b as well, as by inductive assumption \(\operatorname {val}(X_{k}) = \operatorname {val}(X_{k}')\) or \(\operatorname {val}(X_{k}) = b\operatorname {val}(X_{k}')\), and both these strings begin with b. But as \(\operatorname {val}(X_{k})\) begins with b, by the code of LeftPop, X _{ k } was replaced with \(bX_{k}'\) and this b is still in the rule for \(X_{i+1}'\).
So the first letter in the rule for \(X_{i+1}'\) is b, and LeftPop removes this b from the rule and replaces each \(X_{i+1}'\) in the rules by \(bX_{i+1}''\). Thus \(\operatorname {val}(X_{i+1}) = \operatorname {val}(X_{i+1}') = b\operatorname {val}(X_{i+1}'')\). And in the rules, each \(X_{i+1}'\) was replaced with \(bX_{i+1}''\), which evaluate to the same string, so values of other nonterminals have not changed.
We now show the second claim that N′ accepts exactly the same strings as N. The only change done in the NFA is the replacement of transitions of the form δ _{ N }(p,X _{ i },q) by a path inducing list of labels with b and \(X_{i}'\), where b is the first letter of \(\operatorname {val}(X_{i})\), or by a transition δ _{ N }(p,b,q), when \(\operatorname {val}(X_{i}) = b\). Let us consider the former case, the latter is similar. Notice that \(\operatorname {val}(X_{i}) = b\operatorname {val}(X_{i}')\) and so the new path denotes the same string, as the replaced transition. Furthermore, the newly introduced state in the middle of this path has only one ingoing and outgoing transition. Since b∉{#,$}, the starting and accepting states were not modified, and so the both automata recognise the same strings. Since \(\operatorname {val}(X_{n}) = \operatorname {val}(X_{n}')\) (again by b∉{#,$}), this shows the claim. □
A symmetric variant RightPop of LeftPop, which pops the ending a from each nonterminal is easily defined. It satisfies the symmetric analogue of Lemma 6:
Lemma 7
RightPop runs in time polytime and preserves (SLP 1)–(Aut 2). If \(\operatorname {val}(X_{i})=ua\) then \(\operatorname {val}(X_{i}') = u\); otherwise \(\operatorname {val}(X_{i}') = \operatorname {val}(X_{i})\).
N′ accepts \(\operatorname {val}(X_{n}')\) if and only if N accepts \(\operatorname {val}(X_{n})\). If N is deterministic, so is N′.
Running both of LeftPop(b) and RightPop(a) makes a pair ab noncrossing.
Lemma 8
After running LeftPop(b) and RightPop(a) the pair ab is noncrossing.
Proof
There are two different cases, why the pair ab is crossing: it is crossing in a rule or crossing in the NFA, consider first the former. As previously, let primed nonterminals, like \(X_{i}'\) denote the nonterminals in the instance after application of LeftPop(b) and RightPop(a) and X _{ i } before this application.

\(aX_{j}'\) appears in the rule and the first letter of \(\operatorname {val}(X_{j}')\) is b,

\(X_{j}'b\) appears in the rule and a is the last letter of \(\operatorname {val}(X_{j}')\),

\(X_{j}'X_{k}'\) appears in the rule, where a is the last letter of \(\operatorname {val}(X_{j}')\) and b is the first letter of \(\operatorname {val}(X_{k}')\).
So suppose that ab has a crossing appearance in the NFA. So there are two consecutive transitions α and β, such that the last letter of \(\operatorname {val}(\alpha)\) is a and the first of the \(\operatorname {val}(\beta)\) is b. Furthermore, at least one of α, β is a nonterminal. Without loss of generality assume that \(\beta = X_{i}'\) and let the transition be from state p to state q. As in the previous case, from the fact that the first letter of \(\operatorname {val}(X_{i}')\) is b we conclude that LeftPop modified X _{ i }. In particular, the unique transition going to p is labelled with b, thus β=b, which is a contradiction. □
Lemma 9
CrPairComp runs in polytime and preserves (SLP 1)–(Aut 2). N′ accepts \(\operatorname {val}(X_{n}')\) if and only if N accepts \(\operatorname {val}(X_{n})\). If N is deterministic, so is N′.
It implements pair compression for ab, in the sense that \(\operatorname {val}(X_{n}') = PC_{ab \to c}(\operatorname {val}(X_{n}))\).
Proof
The running time follows by Lemmata 5, 6 and 7.
Note that by Lemma 8 after the application of LeftPop(b) and RightPop(a) the pair ab is noncrossing.
By Lemmata 6–7, LeftPop(b) and RightPop(a) preserve (SLP 1)–(Aut 2) and after their application N recognises \(\operatorname {val}(X_{n})\) if and only if N′ recognises \(\operatorname {val}(X_{n}')\). As ab is noncrossing, Lemma 5 guarantees that after the application of PairComp(ab,c), N recognises \(\operatorname {val}(X_{n})\) if and only if N′ recognises \(\operatorname {val}(X_{n}')\).
Lastly, by Lemmata 6–7 the value of \(\operatorname {val}(X_{n})\) does not change when LeftPop(b) and RightPop(a) are applied and as ab is a noncrossing pair when PairComp(ab,c) is applied, Lemma 5 guarantees that \(\operatorname {val}(X_{n}') = PC_{ab \to c}(\operatorname {val}(X_{n}))\). □
We apply CrPairComp to each of the crossing pairs separately, and each such application increases the size of N and G. Thus it would be good to bound the number of crossing pairs in terms of n rather than in terms of N and n. In general, this is not possible, however, a simple preprocessing reduces the number of crossing pairs to \(\mathcal {O}(n)\). It is enough to ‘pop’ the first and last letter from each of the nonterminals.
Lemma 10
N′ accepts \(\operatorname {val}(X_{n}')\) if and only if N accepts \(\operatorname {val}(X_{n})\). If N is deterministic, so is N′.
Proof
The proof is similar to the proof of Lemma 6 and thus it is omitted. The only (slight) difference is that now we need to show that for each nonterminal X _{ i }, when it is considered, its rule begins and ends with a letter. Still, this is easy: suppose that the rule for X _{ i } begins with X _{ j }. But then, when X _{ j } was considered, X _{ j } was replaced in all rules with aX _{ j }, for some letter a, see line 4 of PreProc. In particular, this was done in the rule for X _{ i } and so the rule for X _{ i } cannot start with a nonterminal. Symmetric analysis shows that the rule cannot end with a nonterminal. □
As promised, after PreProc the number of crossing pairs is \(\mathcal {O}(n)\):
Lemma 11
After PreProc there are at most 2n crossing pairs.
Proof
 1.
in some rule there is a substring aX _{ i } (or X _{ j } X _{ i });
 2.
there is a pair of consecutive transitions α and X _{ i } in the NFA;
 3.
in some rule there is a substring X _{ i } a (or X _{ i } X _{ j });
 4.
there is a pair of consecutive transitions X _{ i } and α in the NFA.
So consider a substring aX _{ i } and a′X _{ i } that both appear in rules. Then both a and a′ were obtained in the same way: they were introduced in the respective rules in line 4 of PreProc and there was no way to change them afterwards. Hence, a=a′ is the unique letter popped in the line 4 of PreProc. Note that this analysis shows also that a substring X _{ j } X _{ i } cannot appear in any rule. In the same way, consider the transition by X _{ i } in the NFA N, let it be from p to q. But the line 11 of PreProc guarantees that there is a unique incoming transition to p, which is the same letter as the one popped in line 4 of PreProc, i.e. a. Consequently, cases (1)–(2) can introduce one crossing pair per nonterminal.
A symmetric analysis applies to (3)–(4). □
4.3 Blocks Compression
The block compression is very similar in spirit to pair compression, the only additional difficulty is the fact that blocks can be long, i.e. up to exponential. In case of letters without a crossing block, the compression can be done as in the case of noncrossing pairs, i.e. by replacing explicit blocks in G and adding some transitions to N. The case of a letter with a crossing block is reduced to the simpler case of letter without such a block: similarly to the compression of crossing pairs, we need to ‘pop’ letters from the beginning and the end of a nonterminal. However, this time popping one letter is not enough, we need to remove the whole aprefix and asuffix.
4.3.1 Compression of Noncrossing Blocks
Consider a that has no crossing block. Then every maximal block of a in \(\operatorname {val}(X_{1})\), …, \(\operatorname {val}(X_{n})\) is an explicit substring in one of the rule’ bodies; so we simply replace explicit a ^{ ℓ } by a fresh letter a _{ ℓ } in rules’ bodies, for each ℓ. Furthermore, as a’s block cannot have a crossing appearance in N, when a ^{ ℓ } is a substring of a string defined by a path in N, then a ^{ ℓ } appears wholly inside a nonterminal transition, or a ^{ ℓ } labels a path using letter transitions only. The former case is taken care of by compression of a maximal blocks in G, and in the latter case for each a ^{ ℓ } and each pair of states p and q we check whether there is a path for a ^{ ℓ } from p to q using letter transitions only, which is done using the method from Lemma 1.
Lemma 12
Suppose that BlockCompNcr is applied for a letter a∉{$,#} without crossing blocks. Then it preserves (SLP 1)–(Aut 2) and properly implements maximal block compression, i.e. \(\operatorname {val}(X_{i}') = BC_{a}(\operatorname {val}(X_{i}))\) for each X _{ i }.
The operations in line 6 of BlockCompNcr can be performed in npolytime, other operations can be performed in polytime.
N recognises \(\operatorname {val}(X_{n})\) if and only if N′ recognises \(\operatorname {val}(X_{n}')\) for some nondeterministic choices. If N is DFA, so is N′.
If b that is not of the form a _{ ℓ } for any ℓ had no crossing blocks in G, N, then it does not have them in G′, N′.
This lemma is shown in a stronger version in the next section, as Lemma 17.
4.3.2 Removing Crossing Blocks of a Letter
It was already mentioned that a has a crossing block if and only if aa is a crossing pair. We know a method transforming a crossing pair to a noncrossing pair, see Lemma 5, however, it essentially assumes that the pair consists of two different letter: in short, when aX _{ i } appears in the rule and we leftpop a letter a from X _{ i }, then the pair aa is still crossing, whenever \(\operatorname {val}(X_{i})\) still begins with a. This is fixed in the most straightforward manner: we keep leftpopping the letters from X _{ i }, until the first letter of \(\operatorname {val}(X_{i})\) is different from a. In other words, it is enough to remove each nonterminal’s aprefix (and asuffix). To be more precise: fix i and let \(\operatorname {val}(X_{i}) = a^{\ell_{i}} u a^{r_{i}}\), where u does not start nor end with a. Then our goal is to modify G so that \(\operatorname {val}(X_{i}') = u\). (If \(\operatorname {val}(X_{i})\) is a power of a, we simply give u=ϵ and r _{ i }=0.) This can be done in a bottomup fashion, by two subprocedures, one of them removes the prefix, the other the suffix; these subprocedures work similarly as LeftPop and RightPop. We need to modify the NFA accordingly: it is enough to replace the transition labelled with X _{ i } by path consisting of three transitions, labelled with \(a^{\ell_{i}}\), \(X_{i}'\) and \(a^{r_{i}}\).
The removed aprefixes and asuffixes can be exponentially long, and so we store them in the rules in a succinct way, i.e. a ^{ ℓ } is represented as (a,ℓ); the size of representation of ℓ is \(\mathcal {O}(\log \ell)\), i.e. \(\mathcal {O}(n)\), see Remark 1. We say that such a grammar is in an asuccinct form. The NFA N might have transitions labelled with a ^{ ℓ }, which are stored in succinct way as well. We say that N satisfies arelaxed (Aut 1), if its transitions are labelled by nonterminals, a single letter or by a ^{ ℓ }, where ℓ≤2^{ n }. The semantics of the new automaton should be clear: it can traverse the transition labelled with a ^{ ℓ } when consuming the a ^{ ℓ } from the beginning of the input word. Similarly, this automaton is deterministic, if for any two transitions, labelled with α and β, originating from the same state, the first letters of \(\operatorname {val}(\alpha)\) and \(\operatorname {val}(\beta)\) are different.
Before stating the appropriate algorithms, we show that even with such a succinct representation, the number of different lengths of maximal blocks of a is also linearly bounded, similarly as in Lemma 3.
Lemma 13
(stronger variant of Lemma 3)
For a letter a and a grammar G, which can be given in an asuccinct form, there are at most G+4n different lengths of a’s maximal blocks in \(\operatorname {val}(X_{1}), \ldots, \operatorname {val}(X_{n})\). The set of these lengths can be calculated in polytime.
Proof
Consider first the maximal blocks of a that are fully contained within some rule’s body. Then each symbol (a letter a or a string of letters a ^{ ℓ }, represented by one symbol) can be uniquely assigned to the maximal block to which it belongs; in particular, there are at most G such blocks. To calculate the lengths of such blocks, it is enough to read the explicit strings in the rules, adding the appropriate lengths, which can be done in polytime, as these lengths are at most 2^{ n }.
The other maximal blocks are the ones with the crossing appearances. Assign a maximal block to the nonterminal X _{ i } with the smallest i, such that a ^{ ℓ } is a substring of X _{ i } and a ^{ ℓ } has a crossing appearance in X _{ i }. Then there are at most 4 such blocks assigned to this rule: suppose it is of the form X _{ i }→uX _{ j } vX _{ k } w, then a ^{ ℓ } stretches over u and \(\operatorname {val}(X_{j})\) or over \(\operatorname {val}(X_{j})\) and v or v and \(\operatorname {val}(X_{k})\) or \(\operatorname {val}(X_{k})\) and w. Hence, there are at most 4n such lengths of maximal blocks. To calculate the lengths of these crossing blocks it is enough to calculate first the length of the aprefix and asuffix of each nonterminal, which is done in a straightforward bottomup manner. Then it is enough to look at the rules and calculate the lengths of the a blocks stretching over both explicit letters and nonterminals in the rule. This is easily doable in polytime. □
Lemma 14
Let \(\operatorname {val}(X_{i}) = a^{\ell_{i}}u_{i}\), where u _{ i } does not begin with a. After CutPref(a) \(\operatorname {val}(X_{i}') = u_{i}\).
N accepts \(\operatorname {val}(X_{n})\) if and only if N′ accepts \(\operatorname {val}(X_{n}')\). If N is a DFA, so is N′.
Proof
We first explain, how to calculate the aprefix \(a^{\ell_{i}}\) of \(\operatorname {val}(X_{i})\): since G is in asuccinct form, this might be nonobvious. It is enough to scan the explicit strings stored in the productions’ righthand sides, summing the lengths of the consecutive a’s appearances. This clearly works in polytime also for G stored in an asuccinct form, as the powers of a may have length at most 2^{ n }, see Remark 1, and so the length of their representation is linear in n (the correctness of this approach is shown later).
The running time of CutPref(a) is in polytime, as the loop has n iterations and all lines can be performed in polytime.
Concerning the preservation of the invariants: as in each rule there are at most two nonterminals and each nonterminal introduces at most one a’s block to the rule (in line 5 of CutPref) in each rule of the grammar at most 2 maximal blocks of a are introduced. They may be long, however, in compressed form we treat them as singular symbols. In this way rules of G are stored in an asuccinct form, which was explicitly allowed. Then the aprefix is removed and if X _{ i } defines ϵ, it is removed from the righthand sides of the productions. This does not affect the (SLP 1)–(SLP 2) (recall that a is not $, neither #). Since the NFA is also changed, we inspect the invariants regarding N: introducing new states p _{1} and replacing transition δ _{ N }(p,X _{ i },q) by two transitions \(\delta_{N'}(p,a^{\ell_{i}},p_{1})\), δ _{ N′}(p _{1},X,q _{1}) preserves the (Aut 1)–(Aut 2), with the exception that it arelaxes (Aut 1); the same holds in the case, when α=ϵ and the transition δ(p,X _{ i },q) is replaced with \(\delta(p,a^{\ell}_{i},q)\).
Notice that if N is deterministic, so is N′: as already mentioned the only change done to N is the replacement of transition by X _{ i } by a path of two transitions \(a^{\ell_{i}}\) and \(X_{i}'\), such that the first letter of \(\operatorname {val}(X_{i})\) is a and the state in the middle have exactly one incoming and outgoing transition. This preserves determinism of the automaton: Let X _{ i } label a transition from p to q and let p _{1} be the new state in the middle. Then p _{1} has one outgoing transition, furthermore, the first letter of the word defined by the label of the transition from p′ to p _{1} is a, so the same as it used to be for X _{ i } which led from p to q. As N was deterministic, no other transition from p had a as the first letter, and so also no such transition, except the one to p _{1}, exists in N′.

CutPref correctly calculates the length of the aprefix of \(\operatorname {val}(X_{i})\), i.e. ℓ _{ i },

\(\operatorname {val}(X_{i}) = a^{\ell_{i}} \operatorname {val}(X_{i}')\).
For i=1 notice that the whole production for X _{1} is stored explicitly, and so CutPref correctly calculates the aprefix of \(\operatorname {val}(X_{1})\) and after its removal, \(\operatorname {val}(X_{1}) = a^{\ell_{1}} \operatorname {val}(X_{1}')\). Furthermore, as each X _{1} in the rules was replaced with \(a^{\ell_{1}} \operatorname {val}(X_{1}')\), the \(\operatorname {val}(X_{j})\) for j>i is not changed.
It is left to show that N accepts \(\operatorname {val}(X_{n})\) if and only if N′ accepts \(\operatorname {val}(X_{n}')\). To this end, notice that the only modification to N is the replacement of the transition of the form δ _{ N }(p,X _{ i },q) by a path labelled with \(a^{\ell_{i}},X_{i}'\) (or by \(a^{\ell_{i}}\) alone). Furthermore, the vertex inside this path has only one incoming and one outgoing transition. The path labelled with \(a^{\ell_{i}},X_{i}'\) defines the string \(a^{\ell_{i}}\operatorname {val}(X_{i}')\), which was already shown to be \(\operatorname {val}(X_{i})\). It is left to observe that the newly introduced state in the middle of the path is not accepting, nor starting. Hence the starting (accepting) states of N and N′ coincide, and so each string is accepted by N if and only if it is accepted by N′. □
A symmetric algorithm CutSuff, which removes the asuffix, is also defined; it has similar properties as CutPref.
Lemma 15
The CutSuff(a) for a∉{$,#} runs in polytime time and preserves (SLP 1)–(Aut 2), except that it arelaxes (Aut 1). G′ is in the asuccinct form.
Let \(\operatorname {val}(X_{i}) = u_{i}a^{r_{i}}\), where u _{ i } does not end with a. After CutSuff(a) the \(\operatorname {val}(X_{i}') = u_{i}\).
N accepts \(\operatorname {val}(X_{n})\) if and only if N′ accepts \(\operatorname {val}(X_{n}')\). If N is a DFA, so is N′.
Accordingly to the intuition provided at the beginning of this subsection, and similarly as in the case of the crossing pairs and procedures LeftPop, RightPop, it can be easily shown that after applying CutPref(a) and CutSuff(a) the letter a no longer has crossing blocks.
Lemma 16
After application of CutPref(a) and CutSuff(a), for each nonterminal X _{ i } neither first, nor last letter of \(\operatorname {val}(X_{i})\) is a; in particular, the letter a has no crossing blocks.
Proof
Observe that by Lemma 14 and 15 the letter a is not the first, nor the last letter of any \(\operatorname {val}(X_{i})\). Hence, it cannot has crossing blocks. □
Since after CutPref and CutSuff letter a no longer has crossing blocks, we may compress its maximal blocks using BlockCompNcr. Some small twitches are needed to accommodate the asuccinct form of G and the fact that N is arelaxed: the nontrivial part of BlockCompNcr was the application of Lemma 1, which works for such large powers of a in npolytime, see Lemma 1. Other actions of BlockCompNcr generalise in a simple way.
Lemma 17
BlockCompNcr can be extended, so that it applies to instances satisfying (SLP 1)–(SLP 2) with G in the asuccinct form and arelaxed (Aut 1)–(Aut 2). The output satisfies (SLP 1)–(Aut 2) and the claim of Lemma 12 applies to such an extension.
Proof

restrict NFA N to transitions by powers of a,

make p the unique starting state,

make q the unique accepting state.
Concerning the preservation of invariants: we first show that the G′ is not in the asuccinct form, nor N′ is arelaxed. When BlockCompNcr finishes its work, all maximal blocks of a are replaced, in particular, there are no succinct representations of a powers inside the grammar. Notice that there should be no transition labelled with a ^{ ℓ } in the NFA. To this end a new line at the end of BlockCompNcr should be added, so that all transitions by powers of a are removed.
Now we can return to showing the preservation of the invariants: since the only change to the productions consists of replacing maximal blocks of a by a single letter, (SLP 1)–(SLP 2) are preserved. Also, the only modifications to the NFA is the addition of new letter transitions. Thus, (Aut 1) holds. To see that also (Aut 2) holds, notice that if p receives new incoming (outgoing) transition in N′, this transition is of the form a _{ ℓ } and p had an incoming (outgoing, respectively) transition by a ^{ ℓ } in N. In particular, the starting and accepting state remain unaffected and no transition by $ and # are introduced. Thus, also (Aut 2) holds for N′.
Notice that if N is deterministic, so is N′: suppose that there are two different transitions in N′ whose strings start with d. If d is not one of the new letters, then the same transitions were present in N, contradiction. So suppose it is one of the new letters, say a _{ ℓ }. Observe that by Lemma 16 none of the strings \(\operatorname {val}(X_{i})\) started with a, and so after the block compression none of them starts with a _{ ℓ }. So this means that there are two letter transitions starting from state p and labelled with a _{ ℓ }. Thus, in N there are two paths for the string a ^{ ℓ } starting from p, which is a contradiction.
Before showing the main property of BlockCompNcr, we briefly comment on the last claim of the lemma: if BlockCompNcr is applied to a letter a and b≠a _{ ℓ } had no crossing blocks before this application, then it also does not have after the application. This should be obvious: application of BlockCompNcr(a) introduces some new transitions to N, by letters other than b, changes transitions by a ^{ ℓ } into fresh letters (other than b) and changes a’s blocks in G into fresh letters (again other than b); so all these operations do not influence, whether b has crossing blocks or not.
We proceed to the proof of the main property of BlockCompNcr: N accepts \(\operatorname {val}(X_{n})\) if and only if for some nondeterministic choices N′ accepts \(\operatorname {val}(X_{n}')\). To this end we first show, how application of BlockCompNcr affects the words defined by G and the NFA N.
Claim 3
Claim 4
The proofs are analogous as the proofs of Claims 1–2 in Lemma 5 and are thus omitted. Notice that the properties stated in Claims 3–4 do not depend on the nondeterministic choices of BlockCompNcr.
It is left to show the main claims of the lemma: N recognises \(\operatorname {val}(X_{n})\) if and only if the NFA N′ obtained for some nondeterministic choices recognises \(\operatorname {val}(X_{n}')\).

if there is a transition \(\delta_{N'}(p,X_{i}',q)\) in N′ then there is a transition δ _{ N }(p,X _{ i },q) in N;

if there is a transition δ _{ N′}(p,b,q) for \(b \neq a_{\ell_{k}}\) in N′, then there is a transition δ _{ N }(p,b,q) in N;

if there is a transition \(\delta_{N}(p,a_{\ell_{k}},q)\) in N′ for some \(a^{\ell_{k}}\) that is a maximal block in one of \(\operatorname {val}(X_{1}), \ldots, \operatorname {val}(X_{n})\), then there is a path from p to q for a string \(a^{\ell_{k}}\) in N.
Open image in new window Suppose now that N accepts \(\operatorname {val}(X_{n})\). Consider the case in which BlockCompNcr always made a correct nondeterministic choice, i.e. that each time it correctly guessed in line 6.

if there is a transition δ _{ N }(p,X _{ i },q) in N then there is a transition \(\delta_{N'}(p,X_{i}',q)\) in N′;

if there is a transition δ _{ N }(p,b,q) for b≠a in N, then there is a transition δ _{ N′}(p,b,q) in N′;

if there is a path in N from p to q for string a ^{ ℓ } that has maximal block in \(\operatorname {val}(X_{n})\), then there is a transition δ _{ N′}(p,a _{ ℓ },q) in N′ (by the assumption that BlockCompNcr guessed correctly).
The CutPref, CutSuff and BlockCompNcr can be now used to implement the blocks’ compression for an arbitrary letter, with crossing blocks or not.
Lemma 18
It works in npolytime; the only operation requiring nondeterminism is line 6 of BlockCompNcr, other operations can be performed in polytime.
N recognises \(\operatorname {val}(X_{n})\) if and only if N′ recognises \(\operatorname {val}(X_{n}')\) for some nondeterministic choices. If N is DFA, so is N′.
If b, which is not of the form a _{ ℓ } for any ℓ, had no crossing blocks in G, N, then it does not have them in G′, N′.
Proof
This easily follows from Lemma 14, 15 and 17. □
4.4 Running Time and Correctness
Since the running time of each algorithm is npolytime, it is enough to show that the size of Σ, G and N are always polynomial in n (recall that n is unchanged throughout CompMem).
Lemma 19
During CompMem, the sizes of Σ, G, N are polynomial in n.
Proof
We first bound the size of G. We show that at the beginning of each iteration of the main loop in CompMemeach righthand side of the production has at most 64n+16 explicit letters, and that inside each iteration of the main loop of CompMem there are at most 16n+4 new symbols added (we exclude the letter replacing compressed strings). Notice that during the run of CompMem grammar G may be in succinct form, and accordingly we treat a ^{ ℓ } as one symbol.
 popping letters done by PreProc

(line 4) In this way at most 4 new letters can be introduced to a rule.
 popping letters by LeftPop

(line 5) Each invocation of LeftPop introduces at most 2 new symbols to a rule. As LeftPop is used once for each crossing pair and there are at most 2n such pairs, see Lemma 11, in this way at most 4n new symbols were added per each iteration of the main loop of CompMem.
 popping letters by RightPop

Symmetric analysis, as in the previous case gives the same bound 4n on the number of letters introduced in this way to a rule.
 cutting a prefix in CutPref

(line 5) There are at most 2 new powers of a (all possibly in succinct form) that may be introduced to a rule in one invocation of CutPref. While these powers of a are written in a succinct representation, they will be all replaced by single letters in the later blocks compression for a. In total, CutPref is invoked for each letter with a crossing block, and there are at most 2n such letters, by Lemma 2. So there are at most 4n new symbols introduced in this way for one iteration of the main loop of CompMem.
 cutting a suffix in CutSuff

Analysis symmetric to the one for the cutting of suffix yields that at most 4n letters are introduced in one phase to a rule.
 compression of a noncrossing pair

(line 2 of PairComp) Each compression of a noncrossing pair decreases the total length of explicit strings used in G by at least 1. Since the size of each righthand side is at most 64n+16 at the beginning of the iteration and there are at most 16n+4 new letters added to a rule in each iteration, there can be at most 80n ^{2}+20n such compressions, and so as many new letters added in this was.
 compression of a crossing pair

(call to PairComp made by CrPairComp) The crossing pairs compression is run for each of the crossing pairs, and there are at most 2n of them, see Lemma 11. So, at most 2n letters are introduced in this way.
 block compression of a letter without a crossing block

(line 4 of BlockCompNcr) The same argument as in the case of compression of noncrossing pairs applies.
 block compression for of a letter with a crossing block

(call to BlockCompNcr in BlockComp) There are at most 2n letters with crossing blocks, by Lemma 2, and each of them has at most G+4n different lengths of maximal blocks, by Lemma 13. Compressing all of them introduces at most 2n(G+4n) new letters to Σ.
 popping letters in PreProc

(line 10) This introduces at most two states per nonterminal transition (one for popping the first letter and one for the last letter). As there are at most n nonterminal transitions, by (Aut 1), this adds at most 2n states.
 leftpopping letters in LeftPop

(line 11) LeftPop introduces one state per nonterminal and there are at most n such transitions, by (Aut 1). Furthermore, LeftPop is invoked once per each crossing pair, i.e. at most 2n times, see Lemma 11. So, at most 2n ^{2} states are introduced in this way.
 rightpopping letters in RightPop

The same analysis as in the previous case yields that at most 2n ^{2} states are introduced in this way.
 cutting a prefix in CutPref

(in line 11) CutPref introduces one state per nonterminal transition. By (Aut 1) there are at most n such transitions. So it is enough to estimate, how many times CutPref is invoked. CutPref is run for each letter with a crossing block, and there are at most 2n such letters, see Lemma 2. Thus, one iteration of the main loop of CompMem adds at most 2n ^{2} states in total.
 cutting a suffix in CutSuff

The same analysis as in the previous case yields that at most 2n ^{2} states are introduced in this way.
Using Lemmas 5–19 it is now possible to conclude that CompMem correctly solves the FCMP for NFA, in nondeterministic polynomial (in n) time. The only source of nondeterminism is the one in Lemma 1, and so for DFA the corresponding problem can be solved deterministically.
Proof of Theorem 1
The proof follows by showing that CompMem properly verifies, whether \(\operatorname {val}(X_{n})\) is accepted by N and that CompMem runs in npolytime.
Let us first show correctness of CompMem. All subroutines of CompMem (nondeterministically) modify the instance, changing G, N and X _{ n } into G′, N′ and \(X_{n}'\) (notice that the output depends on the nondeterministic choices). Let N ^{(i)}, G ^{(i)}, \(X_{n}^{(i)}\) for i=1, …, k be the consecutive obtained instances, with i=1 representing the input instance. Then N ^{(i)} accepts \(\operatorname {val}(X_{n})^{(i)}\) if and only if for some nondeterministic choices the resulting N ^{(i+1)} accepts \(\operatorname {val}(X_{n})^{(i+1)}\). This is shown in Lemmata 5, 9, 10, 12, 18 (if some of the procedures are deterministic, then the output does not depend on any choices). So, if N ^{(1)} does not accept \(\operatorname {val}(X_{n}^{(1)})\) also N ^{(k)} does not accept \(\operatorname {val}(X_{n}^{(k)})\). On the other hand, if N ^{(1)} accepts \(\operatorname {val}(X_{n}^{(1)})\), then there exits a sequence of instances (representing proper nondeterministic guesses), such that for each i the N ^{(i)} accepts \(\operatorname {val}(X_{n}^{(i)})\). In particular, N ^{(k)} accepts \(\operatorname {val}(X_{n}^{(k)})\) and as \(\operatorname {val}(X_{n}^{(k)}) < n\), \(\operatorname {val}(X_{n}^{(k)})\) can be decompressed and acceptance by N ^{(k)} can be checked naively in polytime.
Now, we should show that the running time is in fact (nondeterministic) polynomial. Lemmata 5, 9, 10, 12, 18 claim that each of the subroutine runs in npolytime in the size of the current instance. However, by Lemma 19, the size of this instance is always polynomial in n. Furthermore, each of such application introduces a new letter to Σ, and we know by Lemma 19 that the final size of Σ is polynomial in n. Therefore these subroutines are run at most polynomially many (in n) times. Hence, the total running time is npolytime.
It is left to show that if the input is a DFA, CompMem can be determinised. Firstly, notice that by Lemmata 5, 9, 10, 12, 18, if the instance consisted of a DFA, each instance kept by CompMem is also a DFA.
The only nondeterministic choices in CompMem are performed when calling a subroutine for a fully compressed membership problem for a string over an alphabet consisting of a single letter (see Lemma 1). However, the same lemma states that when the input consists of a deterministic automaton, the problem is in P. Thus, there is no nondeterminism in CompMem, when it is applied to a DFA. □
Notes
References
 1.Alstrup, S., Brodal, G.S., Rauhe, T.: Pattern matching in dynamic texts. In: Proc. 11th Annual ACMSIAM Symposium on Discrete Algorithms, pp. 819–828 (2000). doi: 10.1145/338219.338645 Google Scholar
 2.Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in Zcompressed files. In: SODA, pp. 705–714 (1994) Google Scholar
 3.Beaudry, M., McKenzie, P., Péladeau, P., Thérien, D.: Finite moniods: from word to circuit evaluation. SIAM J. Comput. 26(1), 138–152 (1997) MathSciNetCrossRefzbMATHGoogle Scholar
 4.Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammarcompressed strings. In: Randall, D. (ed.) SODA, pp. 373–389. SIAM, Philadelphia (2011) Google Scholar
 5.Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005) MathSciNetCrossRefzbMATHGoogle Scholar
 6.Czerwiński, W., Lasota, S.: Fast equivalencechecking for normed contextfree processes. In: Lodaya, K., Mahajan, M. (eds.) FSTTCS, LIPIcs, vol. 8, pp. 260–271. Schloss Dagstuhl—LeibnizZentrum fuer Informatik, Wadern (2010) Google Scholar
 7.Farach, M., Thorup, M.: String matching in LempelZiv compressed strings. In: STOC, pp. 703–712. ACM Press, New York (1995) Google Scholar
 8.Ferragina, P., Muthukrishnan, S., de Berg, M.: Multimethod dispatching: a geometric approach with applications to string matching problems. In: STOC, pp. 483–491 (1999) Google Scholar
 9.Gawrychowski, P.: (2011). Personal communication Google Scholar
 10.Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: Randall, D. (ed.) SODA, pp. 362–372. SIAM, Philadelphia (2011) Google Scholar
 11.Gawrychowski, P.: Pattern matching in LempelZiv compressed strings: fast, simple, and deterministic. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA. LNCS, vol. 6942, pp. 421–432. Springer, Berlin (2011) Google Scholar
 12.Gawrychowski, P.: Simple and efficient LZWcompressed multiple pattern matching. In: CPM. LNCS Springer, Berlin (2012) Google Scholar
 13.Gawrychowski, P.: Tying up the loose ends in fully LZWcompressed pattern matching. In: Dürr, C., Wilke, T. (eds.) STACS, LIPIcs, vol. 14, pp. 624–635. Schloss Dagstuhl—LeibnizZentrum fuer Informatik, Wadern (2012) Google Scholar
 14.Genest, B., Muscholl, A.: Pattern matching and membership for hierarchical message sequence charts. Theory Comput. Syst. 42(4), 536–567 (2008). doi: 10.1007/s0022400790541 MathSciNetCrossRefzbMATHGoogle Scholar
 15.Gąsieniec, L., Karpiński, M., Plandowski, W., Rytter, W.: Efficient algorithms for LempelZiv encoding. In: Karlsson, R.G., Lingas, A. (eds.) SWAT. LNCS, vol. 1097, pp. 392–403. Springer, Berlin (1996) Google Scholar
 16.Gąsieniec, L., Karpiński, M., Plandowski, W., Rytter, W.: Randomized efficient algorithms for compressed strings: the fingerprint approach (extended abstract). In: Hirschberg, D.S., Myers, E.W. (eds.) CPM. LNCS, vol. 1075, pp. 39–49. Springer, Berlin (1996) Google Scholar
 17.Gąsieniec, L., Rytter, W.: Almost optimal fully LZWcompressed pattern matching. In: Data Compression Conference, pp. 316–325 (1999) Google Scholar
 18.Jeż, A.: Faster fully compressed pattern matching by recompression. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP. LNCS, vol. 7391, pp. 533–544. Springer, Berlin (2012) Google Scholar
 19.Jeż, A.: Recompression: a simple and powerful technique for word equations. In: STACS 2013 Conference, LIPIcs. Schloss Dagstuhl—LeibnizZentrum fuer Informatik, Wadern (2013). Available at http://arxiv.org/abs/1203.3705 Google Scholar
 20.Jeż, A., Okhotin, A.: Complexity of equations over sets of natural numbers. Theory Comput. Syst. 48(2), 319–342 (2011) MathSciNetCrossRefzbMATHGoogle Scholar
 21.Jeż, A., Okhotin, A.: Onenonterminal conjunctive grammars over a unary alphabet. Theory Comput. Syst. 49(2), 319–342 (2011) MathSciNetCrossRefzbMATHGoogle Scholar
 22.Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., Arikawa, S.: Multiple pattern matching in LZW compressed text. In: Data Compression Conference, pp. 103–112 (1998) Google Scholar
 23.Kosaraju, S.R.: Pattern matching in compressed texts. In: Thiagarajan, P.S. (ed.) FSTTCS. LNCS, vol. 1026, pp. 349–362. Springer, Berlin (1995) Google Scholar
 24.Lasota, S., Rytter, W.: Faster algorithm for bisimulation equivalence of normed contextfree processes. In: Královič, R., Urzyczyn, P. (eds.) MFCS. LNCS, vol. 4162, pp. 646–657. Springer, Berlin (2006). doi: 10.1007/11821069_56 Google Scholar
 25.Lifshits, Y.: Solving classical string problems an compressed texts. In: Ahlswede, R., Apostolico, A., Levenshtein, V.I. (eds.) Combinatorial and Algorithmic Foundations of Pattern and Association Discovery, Dagstuhl Seminar Proceedings, vol. 06201. IBFI, Schloss Dagstuhl, Wadern (2006) Google Scholar
 26.Lifshits, Y., Lohrey, M.: Querying and embedding compressed texts. In: Královič, R., Urzyczyn, P. (eds.) MFCS. LNCS, vol. 4162, pp. 681–692. Springer, Berlin (2006) Google Scholar
 27.Lohrey, M.: Word problems and membership problems on compressed words. SIAM J. Comput. 35(5), 1210–1240 (2006). doi: 10.1137/S0097539704445950 MathSciNetCrossRefzbMATHGoogle Scholar
 28.Lohrey, M.: Compressed membership problems for regular expressions and hierarchical automata. Int. J. Found. Comput. Sci. 21(5), 817–841 (2010) MathSciNetCrossRefzbMATHGoogle Scholar
 29.Lohrey, M.: Algorithmics on SLPcompressed strings: a survey. Groups Complex. Cryptol. 4(2), 241–299 (2012) MathSciNetCrossRefzbMATHGoogle Scholar
 30.Lohrey, M., Mathissen, C.: Compressed membership in automata with compressed labels. In: Kulikov, A.S., Vereshchagin, N.K. (eds.) CSR. LNCS, vol. 6651, pp. 275–288. Springer, Berlin (2011). doi: 10.1007/9783642207129_21 Google Scholar
 31.Lohrey, M., Schleimer, S.: Efficient computation in groups via compression. In: Diekert, V., Volkov, M.V., Voronkov, A. (eds.) CSR. LNCS, vol. 4649, pp. 249–258. Springer, Berlin (2007). doi: 10.1007/9783540745105_26 Google Scholar
 32.MacDonald, J.: Compressed words and automorphisms in fully residually free groups. Int. J. Autom. Comput. 20(3), 343–355 (2010) MathSciNetCrossRefzbMATHGoogle Scholar
 33.Markey, N., Schnoebelen, P.: A PTIMEcomplete matching problem for SLPcompressed words. Inf. Process. Lett. 90(1), 3–6 (2004) MathSciNetCrossRefzbMATHGoogle Scholar
 34.Mehlhorn, K., Sundar, R., Uhrig, C.: Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17(2), 183–198 (1997) MathSciNetCrossRefzbMATHGoogle Scholar
 35.Navarro, G., Raffinot, M.: Practical and flexible pattern matching over ZivLempel compressed text. J. Discrete Algorithms 2(3), 347–371 (2004) MathSciNetCrossRefzbMATHGoogle Scholar
 36.Plandowski, W.: Testing equivalence of morphisms on contextfree languages. In: van Leeuwen, J. (ed.) ESA, LNCS, vol. 855, pp. 460–470. Springer, Berlin (1994). doi: 10.1007/BFb0049431 Google Scholar
 37.Plandowski, W.: Satisfiability of word equations with constants is in PSPACE. J. ACM 51(3), 483–496 (2004). doi: 10.1145/990308.990312 MathSciNetCrossRefzbMATHGoogle Scholar
 38.Plandowski, W., Rytter, W.: Application of LempelZiv encodings to the solution of words equations. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP. LNCS, vol. 1443, pp. 731–742. Springer, Berlin (1998). doi: 10.1007/BFb0055097 Google Scholar
 39.Plandowski, W., Rytter, W.: Complexity of language recognition problems for compressed words. In: Karhumäki, J., Maurer, H.A., Paun, G., Rozenberg, G. (eds.) Jewels Are Forever, pp. 262–272. Springer, Berlin (1999) CrossRefGoogle Scholar
 40.Rytter, W.: Application of LempelZiv factorization to the approximation of grammarbased compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003) MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.