1 Introduction

A finite language L is decomposable (or composite) if there are two nontrivial languages L1 and L2 such that L = L1L2. Otherwise, L is prime. It was proved that the complexity of deciding primality of a finite language is NP-hard [1].

The main contribution of this paper is an adaptive parallel algorithm that finds all decompositions L1L2 of L, or concludes that L is prime when no decomposition is found. The algorithm is based on an exhaustive search, and incorporates several original methods for pruning the search space. Moreover, the algorithm is adaptive; it changes its behavior based on the runtime acquired data related to the performance of its recursive phase. Since the question of decomposing finite languages is computationally hard, a motivation of our work is to investigate to what extent the power of parallel computing makes it possible to tackle that question for large-size instances of finite languages.

The decomposition algorithm has a number of applications. It is useful for determining the prime decomposition of a supercode. A finite language L can be represented by the list of words. However, such representation is memory intensive if L is large. Therefore a better option is to decompose L and store its factors L1 and L2. The decomposition algorithm can be used to find a context-free grammar G. Given the sets of words S+ and S, called examples and counterexamples, a grammar G should be found that accepts words in set S+, and rejects words in set S. We describe these applications in more detail in Examples 3–6.

The rest of the paper is organized into six sections. Section 2 presents a survey of previous work on finite language decomposition. Section 3 recalls selected concepts of the theory of languages and automata. Section 4 describes the basic algorithm, which has been a starting point for developing the adaptive algorithm proposed in Section 5. Section 6 reports on the results of the computational experiments conducted using the algorithms. Section 7 contains conclusions and future work.

2 Related work

M. Martin and T. Kutsia [2] proposed a representation of regular languages by linear systems of language equations, which is suitable for computing left and right factors of a regular language. An n-subfactorization of a regular language L is a tuple of languages (L1, L2, …, Ln) for some n ≥ 1 such that the language concatenation L1L2Ln is a proper subset of L, so \(L_{1} L_{2} {\ldots } L_{n}\subsetneq L\). A left (resp. right) factor of L is the leftmost (resp. rightmost) term in a factorization of L. The algorithm for computing the sets of left and right factors of L is proposed.

S. Afonin and D. Golomazov [3] presented an algorithm for constructing a minimal union-free decomposition of a regular language L. A representation L = L1L2 ∪… ∪ Lk is called a union-free decomposition of L iff Li is a union-free language for all i = 1, 2, …, k. The decomposition is called minimal iff there is no other union-free decomposition of L with fewer factors. The algorithm for constructing a minimal union-free decomposition for a given regular language L is provided. The algorithm involves an exhaustive search, so its computational complexity is exponential in the size of the DFA accepting L.

W. Wieczorek and A. Nowakowski [4] considered the problem of finding a multi-decomposition of a finite language W = {w1,w2,…,wn}, n ≥ 1, that contains words wi of fixed length d ≥ 2, composed of symbols taken from the nonempty finite alphabet Σ. It is assumed that a desired multi-decomposition is a set of concatenations LiRi whose union contains W, that is \(W\subseteq L_{1} R_{1}\cup L_{2} R_{2} \cup {\ldots } \cup L_{m} R_{m}\) for some m ≥ 1 depending on the size of W. The nonempty sets Li and Ri, which include the prefixes and suffixes of words wiW, respectively, are the subsets of complete sets of prefixes and suffixes obtained from all possible splits of words wiW. Denote the splits by wi = uijvij, i ∈ [1,n], and j ∈ [1,d − 1]. The prefix uij consists of j leading symbols of wi, while the suffix vij consists of the dj trailing symbols of wi. A multi-decomposition is related to the cliques of an undirected graph G = (V,E) with the set of vertices V = {(uij,vij)|wi = uijvij,wiW} and the set of edges E = {((uij,vij),(ukl,vkl))|uijvkl,uklvijW}. It can be shown that each concatenation LiRi of a multi-decomposition is represented by the corresponding clique in graph G, that is \(\{(u_{t_{1}},v_{t_{1}})\), \((u_{t_{2}},v_{t_{2}})\), …, \((u_{t_{r}},v_{t_{r}})\}\) where \(L_{i}=\cup _{j=1}^{r} \{u_{t_{j}}\}\) and \(R_{i}=\cup _{j=1}^{r} \{v_{t_{j}}\}\). The authors provide a randomized algorithm to find all cliques in graph G, which runs in polynomial time with respect to the size of W. As an example, consider language W = {aa,ab}. The graph G for it has two vertices labeled (a,a) and (a,b) (u11 = v11 = a, u21 = a, v21 = b), and one edge ((a,a),(a,b)), as u11v21 = ab and u21v11 = aa are in W. There is only one clique in the graph, so L1 = u11u21 = {a} and R1 = v11v21 = {a,b}, and finally W = {a}{a,b}. The proposed method of finding a multi-decomposition was applied to create an opening book for the Toads-and-Frogs game.

H. J. Bravo et al. [5] investigate the manufacturing problem that involves batches of identical or similar items produced together for different sized production runs. A production run, consisting of a fixed number of interleaved workcycles, and is controlled by a monolithic supervisor, whose operation is described using a deterministic finite automaton (DFA). The automaton accepts a regular language R over some finite alphabet Σ, with states representing points in time, and transitions representing events occurring within a production run. Events are operations that are performed by machines (for example, take a workpiece, process it, etc.) to obtain the product. An event a has a specific time of execution d(a), and belongs to the alphabet of a DFA, aΣ. A production run, being a sequence s = a1a2an of events, has a makespan equal to the sum of the event execution times, \({\sum }_{i=1}^{n} d(a_{i})\). For a product there can be a number of sequences s that differ in the order of individual events ai, with each sequence corresponding to a route in a DFA. The paper proposes a factorization-based approach to compute a route with the minimum makespan. As for a given product P the production run length is finite (measured by the number of events), the finite language L is extracted from a regular language R, \(L\subseteq R\), which describes all possible routes for P. Then L is decomposed into factors L = L1L2Lk for some k, where each factor Li describes a set of possible subroutes betwen the so called symmetrically reachable idle states (the notion introduced in the paper). Finally, for all sets Li, i ∈ [1,k], the subroutes ri of the minimum makespan are determined, which give the optimal route r = r1r2rk. The authors claim that the proposed approach mitigates the computational complexity of finding route r.

Several decomposition algorithms were developed for codes, which are sets of words and hence they are formal languages. Codes are categorized by their defining properties, for example, prefix-freeness, suffix-freeness, infix-freeness, etc.

Y. -S. Han and K. Salomaa [6] studied solid codes defined as follows: a set S of words is a solid code, if S satisfies two conditions: (i) no word of S is a subword of another word of S (infix-freeness), and (ii) no prefix of a word in S is a suffix of a word in S (overlap-freeness). In other words, S has to be an infix code and all words of S should not overlap. Moreover, a language L is a regular solid code if L is regular and a solid code. The paper proposed two algorithms related to the decomposition problem for solid codes. The first algorithm determines in polynomial time whether or not a given regular language is a solid code. The second efficient algorithm finds a prime solid code decomposition for a regular solid code when it is not prime.

K. V. Hung [7] considered the prime decomposition of supercodes. A finite language L is a supercode, if no word in L is a proper permu-subword of another word in it. Let u and v be words of L. A word u is called a permu-subword of v, if u is a subword of v in which symbol permutations are allowed. A supercode L is prime if LL1L2 for any supercodes L1 and L2. A linear-time algorithm was provided that for a given supercode L discovered that it was prime, or returned the unique sequence of prime supercodes L1, L2, …, Lk+ 1 such that L = L1L2Lk+ 1 with k ≥ 1.

In addition to the above recent work on supercodes, there were previous works on this subject. J. Czyzowicz et al. [8] proved that for a given prefix-free regular language L, the prime prefix-free decomposition is unique, and the decomposition for L that is not prime can be computed in O(m) worst-case time, where m is the size of the minimal DFA accepting L. Y. -S. Han et al. [9] investigated the prime infix-free decomposition of infix-free regular languages and demonstrated that the prime infix-free decomposition is not unique. An algorithm for the infix-free primality test of an infix-free regular language was given. It was also shown that the prime infix-free decomposition can be computed in polynomial time. Y. -S. Han and D. Wood [10] investigated the finite outfix-free regular languages. A word x is an outfix of a word y if there is a word w such that x1wx2 = y and x = x1x2. A set X of words is outfix-free if no word in X is an outfix of any other word in X. A polynomial-time algorithm was developed to determine outfix-freeness of regular languages. Furthermore, a linear-time algorithm that computed a prime outfix-free decomposition for outfix-free regular languages was given. There are also papers on theoretical issues related to the problem of formal language decomposition [11,12,13,14,15,16].

Let us note that all the works discussed above differ from our approach to solving the decomposition problem. Firstly, we consider finite languages that do not have to satisfy any specific conditions. Secondly, the parallel algorithm we propose is intended to compute all decompositions of a finite language L in the form of L = L1L2.

This paper builds on our previous efforts in developing the sequential and parallel algorithms for finite language decompositions. Let us briefly review the results obtained in those efforts. A sequential algorithm for finding the decomposition of a finite language was proposed in [17]. The algorithm returned only the first decomposition found of a given language. A threshold parameter T (see p. 12) that impacted the operation of the algorithm was kept constant, meaning it was not adaptively adjusted while the algorithm was running. The article introduced the concept of significant state along with the proof that for every composite language there is at least one decomposition based solely on significant states. In the experimental part of the paper, 240 languages of size less than 2000 words were studied, including 120 prime languages. The implementation was done in Python, and the average running time of the algorithm for test languages was in the order of a few seconds.

The approach for finding the first decoposition of a finite language by using selected meta-heuristics was discussed in [18]. The paper presented the results for the simulated annealing algorithm, tabu search, and genetic and randomized algorithms, all sequentially implemented in Python. Computational experiments were carried out on 1200 languages of sizes less than 2000 words with algorithm execution time limits of 10 and 60 seconds. Within these limits, the algorithms returned quite a lot of wrong answers for composite languages, claiming they were prime.

A basic parallel algorithm (its short description is included in Section 4) for finding all decompositions of a finite language using the concept of significant state was given in [19]. The algorithm consisted of two phases. In the first phase, each process executed the same code, and in the second phase the computation was spread across the processes available based on their ranks. The algorithm was implemented in the C language and Message Passing Interface (MPI). The experiments, conducted with up to 22 processes covered four languages with a word count between 800–6583 words.

A preliminary version of an adaptive parallel algorithm for solving the decomposition problem was given in [20]. The adaptive algorithm was based on the modified concept of significant state compared with that given in [17]. The algorithm consisted of two phases and included a method of pruning of the search space, and a simplified verification of prospective decomposition sets. It also involved an adaptive way for adjusting the threshold parameter T to keep the balance between the times spent in two phases of the algorithm. The experiments concerned nine languages up to 90000 words in size, solved with 32 processes in the run time of a few minutes. The algorithm was implemented in the C language and MPI interface.

To summarize, our previous work encompassed several sequential and parallel algorithms to solve the decomposition problem for finite languages. The sequential algorithms were able to decompose the languages of size up to 2000 words, while the parallel algorithms could tackle the languages of size up to 90000 words with 32 processes.

In the current paper we provide the advanced adaptive parallel algorithm, which can solve languages of size between 160000 and more than 200000 words in the run time of tens of minutes by using 128 processes. The results of comprehensive computational experiments on the variety of 4000 benchmark languages are also given. To the best of our knowledge, the parallel algorithm and in-depth experimental study of the problem under consideration have not been reported in the literature thus far.

3 Preliminaries

Below, we recall selected concepts from the theory of formal languages and automata. For more details the reader may refer to the textbooks [21, 22].

An alphabet Σ is a nonempty finite set of symbols. A word w is a sequence of zero or more symbols taken from Σ. The length of a word is denoted by |w|, with the special case of zero-length word being the empty word λ, for which |λ| = 0. A prefix of a word is any number of leading symbols of that word, and a suffix is any number of trailing symbols. By convention, Σ denotes the set of all words over an alphabet Σ, and Σ+ denotes the set Σ−{λ}.

A finite language LΣ is a finite set of words wΣ. A finite language L is trivial if it consists of the empty word λ, that is L = {λ}, and it is nontrivial if it contains at least one nonempty word. Let u,v,wΣ. A concatenation of words u and v produces a word w = uv, where w is created by making a copy of word u and following it by a copy of word v. Let \(U, V, W \subseteq {\varSigma }^{*}\). Then the concatenation (or product) of sets U and V produces a set W = UV such that W = {w : w = uv,uU,vV }.

A deterministic finite automaton (DFA) is defined by a quintuple A = (Q, Σ, δ, s, QF), where Q denotes a finite set of automaton states, Σ is an alphabet, \(\delta : Q\times {\varSigma }\rightarrow Q\) is a transition function, sQ is the initial (start) state, and \(Q_{F} \subseteq Q\) is a set of final (accepting) states. An automaton is deterministic iff for all states qQ and symbols aΣ, |δ(q,a)|≤ 1; in other words, each state q has at most one out-transition marked by a. A deterministic automaton is finite iff its state set Q is finite. Let us extend the transition function δ to words over Σ. Formally we define δ(q,λ) = q, and for all words w and input symbols a, δ(q,wa) = δ(δ(q,w),a). So \(\delta (q,w)=q^{\prime }\) means that the word w takes A from the state q to the state \(q^{\prime }\).

A directed graph G = (V,E), called a transition diagram, is associated with a finite automaton A, where V = Q, \(E=\{p\overset {a}\rightarrow q | q=\delta (p,a)\}\), and \(p\overset {a}\rightarrow q\) denotes an arc labeled a in the transition diagram. Given a word w = a1a2am, aiΣ, i ∈ [1,m], a w path is a sequence of transitions labeled with symbols of w in the transition diagram. A w path is denoted by a sequence of states q1,q2,…,qm+ 1 where qjQ, j ∈ [1,m + 1], lying on the path. A deterministic automaton A accepts a word w if there is the w path q1q2qm+ 1 leading from the initial state to a final state of A, so q1 = s and qm+ 1F.

Define the left (resp. right) language of a state qQ as \(\overleftarrow {q} = \{w : \delta (s, w) = q\}\) (resp. HCode \(\overrightarrow {q} = \{w : \delta (q, w) \in Q_{F}\}\)). Put simply, the left (resp. right) language of q consists of words w for which there is a w path from the initial state s to q (resp. from q to a final state).

A finite automaton A = (Q,Σ,δ,s,QF) is said to accept a language L when for each word wL there is a w path beginning in the state s, and ending in a state qQF, or more formally L = {w|δ(s,w) ∈ F}. Deterministic and nondeterministicFootnote 1finite automata accept the same set of languages, namely the set of regular languages. Minimum-state acyclic DFAs accept the set of finite languages. Given all states p,qQ, pq, a DFA is minimum-state iff \(\overrightarrow {p}\neq \overrightarrow {q}\),Footnote 2 and it is acyclic iff δ(q,w)≠q for every word wΣ+ and state qQ.

In what follows, we only consider minimum-state acyclic deterministic automata, so each such automaton will be referred to as automaton herein.

Let L be a nontrivial finite language. A decomposition of L of index k where k ≥ 2 is the family of languages Li for i ∈ [1,k] called factors that once concatenated give language L, so we have L = L1L2Lk. The decomposition is nontrivial if languages Li are nontrivial, that is Li≠{λ} for i ∈ [1,k]. Otherwise, the decomposition is trivial. In every nontrivial decomposition of a finite language L, the number of factors, k, is at most equal to the length of the longest word in L. Clearly, any language L has the trivial decompositions L{λ} and {λ}L. In what follows, by a decomposition we always mean a nontrivial decomposition. A language L is called prime if L has no decomposition of index 2, otherwise it is called composite (or decomposable). The prime decomposition of L is a decomposition L = L1L2Lk, k ≥ 2, where each language Li for i ∈ [1,k] is prime. It has been proven that every finite language is prime or has a prime decomposition, which is generally not true for infinite languages [13,14,15,16].

In this paper, we investigate the problem of finding all decompositions of index 2 of a nontrivial finite language L. For a particular decomposition we are looking for two nonempty finite languages L1 and L2 that once concatenated give language L, so we have L = L1L2. Note that a given language L can have many decompositions of index 2.

Let us introduce the notion of a decomposition set that is suitable for the study of decompositions of regular languages.Footnote 3 The notion is related to the left quotients of regular languages. Let L be a regular language over an alphabet Σ, and let A = (Q,Σ,δ,s,QF) be the minimum-state finite deterministic automaton accepting L. For any nonempty set \(D\subseteq Q\), we define the left and right languages of D:Footnote 4

$$ {L_{1}^{D}} = \cup_{g\in D} \overleftarrow{g}~~~\text{and}~~~ {L_{2}^{D}} = \cap_{g \in D} \overrightarrow{g}. $$
(1)

Note that languages \({L_{1}^{D}}\) and \({L_{2}^{D}}\) are regular as they are subsets of the regular language L.

Theorem 1

Having L and A defined as above, assume that L is composite, so L = L1L2 for regular languages L1 and L2. Define a set \(D\subseteq Q\), called a decomposition set, by

$$ D=\{z\in Q~|~\delta(s,w)=z,~\textit{for~some~}w\in L_{1}\}. $$

Then \(L_{1}\subseteq {L_{1}^{D}}\), \(L_{2}\subseteq {L_{2}^{D}}\) and

$$ L = {L_{1}^{D}} {L_{2}^{D}} $$
(2)

is the decomposition of L into two regular languages.

Proof

See [1, 16]. □

The decomposition \(L={L_{1}^{D}} {L_{2}^{D}}\) is referred to as decomposition induced by the set D. Theorem 1 implies that every decomposition of the regular language L is included in the decomposition of L induced by the decomposition set. The decomposition L = L1L2 is said to be included in the decomposition \(L=L^{\prime }_{1}L^{\prime }_{2}\) if \(L_{i}\subseteq L^{\prime }_{i}\) for i = 1,2.

Corollary 1

To solve the problem of finding all decompositions of index 2 of a nontrivial finite language L, we need to check (2) for all subsets D of Q. If none of these subsets induces a decomposition, we conclude that L is prime.

It follows from the corollary that solving this problem is equivalent to solving the primality problem.

Problem 1

(Primality) Let L be a finite language over a finite alphabet Σ given as a DFA. Answer the question whether L is prime.

It was shown that the primality problem for finite languages is NP-hard [1], and for regular languages it is PSPACE-complete [23, 24].

Example 1

As an illustration of Theorem 1, consider a finite composite language L = {λ, a, b, aa, ab, aaa, aab, aaaa} that has a single decomposition of index 2. The transition diagram for the minimum-state deterministic automaton A accepting language L is depicted in Fig. 1. Note that L = L1L2, where L1 = {λ,a} and L2 = {λ,b,aa,ab,aaa}.

Fig. 1
figure 1

Transition diagram for automaton accepting language L from Example 1

Then, according to Theorem 1 there is the decomposition set \(D\subseteq Q\) where Q = {s,q,r,p,t} such that \(L = {L_{1}^{D}} {L_{2}^{D}}\), \(L_{1}\subseteq {L_{1}^{D}}\) and \(L_{2}\subseteq {L_{2}^{D}}\). Indeed, in our case D = {s,q}, the left language \({L_{1}^{D}}\) includes words w satisfying δ(s,w) ∈ D: \({L_{1}^{D}} = \overleftarrow {s}\cup \overleftarrow {q}= \{\lambda \}\cup \{a\} =\{\lambda , a\}\), and the right language \({L_{2}^{D}}\) includes words w satisfying δ(g,w) ∈ QF where gD: \({L_{2}^{D}} = \overrightarrow {s}\cap \overrightarrow {q} = \{\lambda , a, b, \textit {aa}, \textit {ab}, \textit {aaa}, \textit {aab}, \textit {aaaa} \} \cap \{\lambda , a, b, \textit {aa}, \textit {ab}, \textit {aaa}\} = \{\lambda , a, b, \textit {aa}, \textit {ab}, \textit {aaa}\}\). Moreover, \(L_{1}={L_{1}^{D}}\) and \(L_{2}\subset {L_{2}^{D}}\). Observe that words in \({L_{1}^{D}}\) and \({L_{2}^{D}}\) are prefixes and suffixes of L, and each word wL is divided by at least one state gD such that w = xgy where x and y are a prefix and suffix of w, respectively. For example, considering words of L we have (s,qD): λsλ, λsa, aqλ, λsb, λsaa, aqa, λsab, aqb, λsaaa, aqaa, aqab, aqaaa.

Example 2

A finite language may have more than one decomposition of index 2. The following language L = {ab, aba, abb, bb, bba, bbb} (Fig. 2) has two decompositions, the first defined by D = {p}, \(L_{1} = {L_{1}^{D}} = \overleftarrow {p} = \{a, b\}\), \(L_{2} = {L_{2}^{D}} = \overrightarrow {p} = \{b\), ba, bb}, and the second by \(D^{\prime }=\{q\}\), \({L_{1}}^{\prime } = {L_{1}}^{D^{\prime }} = \overleftarrow {q} = \{\)ab, bb}, \({L_{2}}^{\prime } = {L_{2}}^{D^{\prime }} = \overrightarrow {q} = \{\lambda , a, b\}\).

Fig. 2
figure 2

Transition diagram of automaton A accepting language L from Example 2

Below we give a few examples of application of finite language decomposition.

Example 3

As already mentioned in Section 2, K. V. Hung considered factorization of supercodes [7]. He designed the algorithm to decompose a supercode L into prime components. The algorithm is based on the so-called bridge states, which are found in a non-returning and non-exiting acyclic deterministic finite automaton (N-ADFA) \({\mathscr{A}}\) accepting the code-words wL. An automaton \({\mathscr{A}}\) is non-returning if its start state has no in-transitions, and it is non-exiting if its final states have no out-transitions. A state b in automaton \({\mathscr{A}}\) is called a bridge state if b is neither the start state nor a final state, and each w path in \({\mathscr{A}}\) passes through b. Assume that \({\mathscr{A}}\) is the minimal N-ADFA accepting a supercode L that has k, k ≥ 1, bridge states. Then L can be decomposed into k + 1 prime supercodes L1, L2, …, Lk+ 1 such that L = L1L1Lk+ 1. The bridge states for a given automaton \({\mathscr{A}}\) can be identified in O(|Q| + |δ|) time where |Q| and |δ| are the number of states and transitions of \({\mathscr{A}}\), respectively. As an example the supercode L = {ab2ac,ab5c,ac3bac,ac3b4c} is considered.Footnote 5 The minimal N-ADFA \({\mathscr{A}}\) accepting L has four bridge states, so L may be decomposed uniquely into five prime supercodes: L = {a}{b,c3}{b}{a,b3}{c} [7].

Instead of employing the notion of bridge states, one can readily find this factorization by calling recursively our parallel decomposition algorithm (described in Section 5). The sequence of calls gives the following: L = L1L2 where L1 = {ab,ac3}, L2 = {bac,b4c}; then \(L_{1}={L_{1}^{1}} {L_{2}^{1}}\) where \({L_{1}^{1}}=\{a\}\), \({L_{2}^{1}}=\{b,c^{3}\}\); \(L_{2}={L_{1}^{2}} {L_{2}^{2}}\) where \({L_{1}^{2}}=\{b\}\), \({L_{2}^{2}}=\{ac,b^{3}c\}\); and finally \({L_{2}^{2}}={L_{1}^{3}} {L_{2}^{3}}\) where \({L_{1}^{3}}=\{a,b^{3}\}\), \({L_{2}^{3}}=\{c\}\). In general, to find the prime decomposition of a supercode consisting of n prime components, at most n calls of the decomposition algorithm are needed.

Example 4

Deterministic finite automata (DFAs) are of importance in many fields of science. They also have many practical applications—in the design of compilers and text processors, in natural language processing, speech recognition, among others. One of the concerns regarding DFAs is the memory efficient storage of their representation. A DFA \({\mathscr{A}}\) can be represented by the list of words of a language L (assuming it is finite) accepted by \({\mathscr{A}}\). Such representation can be memory intensive if L is large, therefore a better option is to decompose L and store its factors L1 and L2. Table 1 shows the reduction of size of languages (expressed in word count) from Section 6, obtained by using the decomposition based option. As can be seen, the reduction rate exceeds 99% for all languages in the table.

Table 1 Storage reduction R obtained by replacing language L by its factors, L = L1L2

Example 5

A company signed the contract to supply the pipes with lengths 30, 31, 32, 36, 37, and 38 m. For technical reasons, the company can manufacture pipes of any length, but no greater than 30 m. A longer pipe can be produced by welding shorter pipes. The main part of the pipe production cost is to prepare a mold of the given length, so the number of molds needed to complete the contract should be minimized. The question is, which lengths should have the molds that must be prepared to implement the supply contract mentioned above. To answer this question, let us define the set of words L = {a30, a31, a32, a36, a37, a38}.

Then, among all decompositions L = L1L2, we look for the one for which the size of set L1L2 is minimal, and each word in L1 and L2 is no longer than 30. The solution is the decomposition L = {a15, a16, a17}{a15, a21}, so we need only four molds of lengths 15, 16, 17, and 21 m.

Example 6

Given the sets of words S+ and S, called examples and counterexamples, find a context-free grammar G = (V,Σ,P,S) such that \(S_{+}\subseteq {\mathscr{L}}(G)\), and \(S_{-}\cap {\mathscr{L}}(G)=\emptyset \) where \({\mathscr{L}}(G)\) denotes the language generated by G. The basic decomposition algorithm (described in Section 4) can be readily modified to find an incomplete decomposition that allows a finite language L to be written as L = L1L2R where concatenation L1L2 is the greatest possible, and R is the set of other words belonging to L.

Let S+ = {ab,ba,aabb,abab,baba,bbaa,baab,abba} be the set of examples, and let S = {a,b}dS+ with d ≤ 4 be the set of counterexamples. Based on these sets the context-free grammar can be found in two stages. In the first stage, using the modified basic decomposition algorithm, we create a series of incomplete decompositions. In particular, we get S+ = LLR where L = {ab,ba}, and R = {ab,ba,aabb,bbaa}; then R = ABF where A = {a},B = {b,abb}, and F = {ba,bbaa}; and further F = HI where H = {b}, and I = {a,baa}, and so on. This procedure is iterated for the subsequent sets of words until a grammar in the Chomsky normal form is obtained (a context-free grammar G not generating the empty word λ is said to be in Chomsky normal form if all of its production rules are of the form \(A\rightarrow BC\) or \(A \rightarrow a\) where A, B, and C are nonterminal symbols and a is a terminal symbol). The sets S+,L,R,A,B,F,H,I, etc., are represented by grammar variables, and each set concatenation (decomposition) corresponds to the grammar production rule. The second stage simplifies the grammar through examining whether each pair of grammar variables can be merged. It can be done if after such merger the grammar does not accept any word from the set S. Furthermore, the unit production rules are eliminated. For the sets S+ and S specified above, we get the following context-free grammar:

$$ \begin{array}{rcl@{\qquad}rcl} S & \!\rightarrow\! & S S | A B | B A & A & \!\rightarrow\! & a \\ B & \!\rightarrow\! & b | A C | B S & C & \!\rightarrow\! & B B \end{array} $$

which defines the infinite language of words having the same number of symbols a and b. For more details, see [25, p. 71], [26, 27].

4 Basic parallel decomposition algorithm

In this section we outline the basic parallel algorithm (or shortly, basic algorithm) for finite language decomposition [19]. It has been a starting point for devising the adaptive parallel decomposition algorithm (or, adaptive algorithm). The pseudocode of the basic algorithm expressed as a recursive procedure DECOMPOSE(W,D) is shown in Fig. 3.

Let A = (Q,Σ,δ,s,QF) be the minimum-state acyclic DFAFootnote 6 accepting an input language L = {wi}, i ∈ [1,n]. The basic algorithm explores the set of states Q to find the decomposition sets D where \(D\subseteq Q\). According to Theorem 1 each decomposition set D induces a decomposition \(L=L_1^D L_2^D\). The algorithm finds all decompositions of the input language L by employing an exhaustive search of Q with pruning.

Let W be a set of pairs \((w_i,Q_{w_i})\) where the word wi = a1a2am, wiL, ajΣ, j ∈ [1,m], and let \(Q_{w_i}=\{q_1, q_2, \ldots , q_{m+1}\}\), be a set of states qkQ, k ∈ [1,m + 1] lying on the wi path where q1 = s, and qm+ 1QF. Let STATES(W) be the function that returns a set of states appearing in all sets \(Q_{w_i}\) for wiL, and let MINSTATES(W) be the function that returns a pair \((w_i,Q_{w_i})\in W\) in which set \(Q_{w_i}\) of states dividing wi, is minimal.

Assume that D is a decomposition set to be found (see Phase1 in Fig. 3). Initially this set is empty, and it is gradually built up as the basic algorithm processes the words wiL. The words to be processed are selected in order of increasing sizes of their sets \(Q_{w_i}\) by using the function MINSTATES(W). Each state qQ that divides a word wiL into two parts is inserted in set D, which is then checked to see if it is a decomposition set. The basic algorithm is recursive, so each subset of states in Q that can be a candidate for a decomposition set is examined.

Definition 1

Suppose a word wL is divided by a state qQ into two parts, so w = xy where \(x\in \overleftarrow {q}\) and \(y\in \overrightarrow {q}\), and suppose the state q is inserted in set D. Then each state rSTATES(W) for which \(y\notin \overrightarrow {r}\) is considered redundant (or significant otherwise).

In words, suppose w = xy. If state qQ is inserted in D then each state rQ that does not have y in its suffix set becomes redundant. Note that a given state r may be either redundant or significant depending on the states that are in set D in the current recursive execution of the algorithm.

Lemma 1

Let D where \(D\subseteq Q\) be the decomposition set for a finite language L. Then each state qD is significant.

Fig. 3
figure 3

Basic algorithm

Proof

If D is a decomposition set for L, then each wL is divided into two parts by at least one state qD, that is w = xy, \(x\in \overleftarrow {q}\) and \(y\in \overrightarrow {q}\). Once state q is inserted in D, suffix y appears in the right language \(L_2^D\). The definition \(L_2^D = \cap _{r \in D} \overrightarrow {r}\) involves the intersection of sets \(\overrightarrow {q}\), thus suffix y must be in a set \(\overrightarrow {r}\) for each rD. If a state r does not satisfy this condition, it is redundant and can be omitted during further recursive search. To conclude, only significant states can occur in a decomposition set [17]. □

Let SUFFIX(w,q) be the function that returns suffix y of a word w = xy that is divided by a state q. Suppose the set D has already been built based on words w1, w2, …, wi− 1, and we want to extend D for a word wi with \(Q_{w_i}=\{q_1, q_2, q_3, \ldots , q_{m+1}\}\) for some m. With this aim, we first select state q1 that divides wi (\(w_i=x_i^1 y_i^1\)), add q1 to D, and remove redundant states (based on Definition 1) from all sets \(Q_{w_k}\in W\) for k = i + 1, i + 2, …, n, using the procedure REMOVERED(W,y) with \(y=y_i^1\) (Fig. 4). Once the recursive call DECOMPOSE(W ∪{q1}) completes, we carry out the above operations for states q2, q3, …, qm+ 1, and different divisions \(w_i=x_i^2 y_i^2\), \(w_i=x_i^3 y_i^3\), etc. Due to the removal of redundant states, the number of states in structure W decreases in subsequent recursive executions of procedure DECOMPOSE(W,D). To sum up, Phase1 of the basic algorithm builds up the decomposition set D by processing words w1, w2, …, and so on, while downscaling the set \({\mathscr{Q}}\) = STATES(W) − D of candidates to extend the set D.

Fig. 4
figure 4

Procedure REMOVERED

The basic algorithm shown in Fig. 3 is run by a set of sequential processes \({\mathscr{P}}_r\), r ∈ [0,π − 1], where r is the rank (index) of a process, and π is the number of processes available. Each process \({\mathscr{P}}_r\) executes the code of procedure DECOMPOSE(W,D), and all processes in the set are running in parallel.

A process is executed by a conventional processor, or core of a multi-core processor (from now on, we will use term processor for both of these computing devices).

The execution of the basic algorithm in each process consists of two phases. In Phase1 subsequent sets D are established, which are then processed in Phase2. The code executed in Phase1 is the same in all processes, while in Phase2 the computations performed by processes differ from each other.

As mentioned above, Phase1 builds gradually decomposition sets D while reducing sets \({\mathscr{Q}}\) containing states, which are candidates to extend sets D. Notice that Phase1 does not determine the complete set D but only a partial set \({\mathscr{D}}\) where \({\mathscr{D}}\subset D\).Footnote 7 When \({\mathscr{Q}}\) becomes small enough due to the removal of redundant states, that is \(|{\mathscr{Q}}| \leq T\) for some threshold T, the algorithm moves to Phase2 in which each process takes a collection of prospective sets \({\mathscr{D}}\cup C_j\) to verify as to whether they constitute decomposition sets for L, where each Cj is a subset of \({\mathscr{Q}}\) for \(j\in [0,2^{|{\mathscr{Q}}|}-1]\).

More specifically, process \({\mathscr{P}}_0\) takes sets \({\mathscr{D}}\cup C_j\) with Cj = C0, Cπ, C2π, ..., process \({\mathscr{P}}_1\) sets \({\mathscr{D}}\cup C_j\) with Cj = C1, Cπ+ 1, C2π+ 1, …, etc. The advantage of such an arrangement is that each collection of sets \({\mathscr{D}}\cup C_j\) can be verified separately from other collections. As a result, it allows the work related to verifying sets \({\mathscr{D}}\cup C_j\) to be readily spread across processes \({\mathscr{P}}_r\).

In order to boost performance by making the most of parallel computing capabilities, the basic algorithm controls the depth of recursion in Phase1. For this purpose, a threshold T, already mentioned, is imposed on the size of set \({\mathscr{Q}}\) containing candidate states to extend set \({\mathscr{D}}\) built to date. The threshold T is a parameter of the algorithm (the parameter is adjusted at runtime in the adaptive algorithm, see Section 5). When the size of \({\mathscr{Q}}\) becomes reasonably small as a result of the downscaling process, then instead of looking for the complete decomposition set on a deep level of recursion in Phase1, the processes move to Phase2 to verify sets \({\mathscr{D}}\cup C_j\). It is worth noting that while running the basic algorithm the processes do not communicate with one another. They only synchronize their action at the beginning of computation to enter the input language, and at the end of computation when the results obtained by the processes are collected.

Let us consider the average time complexity of basic algorithm, \({\mathscr{T}}_b(\pi ,n)\). Let d(n,T) be the average number of sets \({\mathscr{D}}\) found in Phase1. Let t1(n,T) be the average time to find a single set \({\mathscr{D}}\) in Phase1, and let t2(n,ψT) be the average time to verify as to whether a set \({\mathscr{D}}\cup C_j\), j = 0, 1, …, 2ψT − 1, is a final decomposition set for L in Phase2. The value of ψT determines the average size of sets \({\mathscr{Q}}\) processed in Phase2 where ψ ∈ (0.0,1.0]. Considering the above, the average time complexity of basic algorithm is as follows:Footnote 8

$$ \mathscr{T}_{b}(\pi, n) = d(n,T) \cdot (t_{1}(n,T) + (t_{2}(n,\psi T) \cdot 2^{\psi T})/\pi). $$
(3)

There are two components in this equation: the first, d(n,T) ⋅ t1(n,T), determines the total average run time of Phase1, and the second, d(n,T) ⋅ (t2(n,ψT) ⋅ 2ψT)/π, the total average run time of Phase2. For a fixed value of n, the run time of Phase2 grows exponentially as a function of ψT. This growth is due to the exponential number of prospective decomposition sets \({\mathscr{D}}\cup C_j\) to be verified in Phase2.

5 Adaptive parallel decomposition algorithm

With the aim of improving the basic algorithm, we propose several refinements. The first three refinements introduce into the adaptive algorithm the effective methods for pruning the search space (Fig. 5). The fourth refinement adjusts threshold T while the adaptive algorithm is executed, based on the runtime acquired data related to the performance of Phase1.

Fig. 5
figure 5

Adaptive algorithm

Let us discuss the refinements in more detail. The first refinement concentrates on removing the redundant states qQ of the automaton accepting the input language L. The redundant states have already been eliminated in the basic algorithm (see procedure REMOVERED(W,y)). We extend the scope of such an elimination in the adaptive algorithm.

Definition 2

A state qQw, \(Q_w\subseteq Q\), is considered redundant (or significant otherwise) if for a word w = xy, wL, \(x\in \overleftarrow {q}\), and \(y\in \overrightarrow {q}\), the following holds:

$$ U(y)\cdot |\overrightarrow{q}| < |L| $$
(4)

where U(y) denotes the number of occurrences of suffix y in all words wL.

To justify (4), suppose q is the only state in the decomposition set D defined in Theorem 1. Then, given word w = xy that is divided by state q, the values of U(y) and \(|\overrightarrow {q}|\) determine the sizes of sets \(L_1^D = \cup _{q\in D} \overleftarrow {q}\) and \(L_2^D = \cap _{q \in D} \overrightarrow {q}\), respectively. In fact, the value of U(y) is the number of prefixes x belonging to set \(L_1^D\), since U(y) is counted over all words w = xy, wL, with prefix x followed by suffix y. In view of the above, the product \(U(y)\cdot |\overrightarrow {q}|\) on the left side of (4) determines the upper limit of the number of words that could be created by concatenating sets \(L_1^D\) and \(L_2^D\). Now, if this product is less than |L|, then \(L=L_1^D L_2^D\) is not satisfied, which means that D cannot be the decomposition set for L. So the state qD is redundant.

Elimination of redundant states qQw satisfying (4) is implemented in the procedure REMOVERED2(W,y) (Fig. 6). When one or more states are found redundant and removed from sets QwW, which is indicated by variable f, it can cause a decrease in the number of occurrences of suffix y, given by U(y), and also in the size of right language \(\overrightarrow {q}\). As a result, more redundant states can be removed from sets Qw.

Fig. 6
figure 6

Procedure REMOVERED2 – improved version of REMOVERED

Using (4), we can also remove redundant states before the adaptive algorithm begins. Once automaton A has been constructed, the procedure BUILDW(A,W) (Fig. 7) builds structure W, which is the input parameter to procedure DECOMPOSE2(W,D) (Fig. 5). While creating sets Qw only significant states of Q are considered. Hence, the number of states in W that are then processed by the adaptive algorithm is smaller than the number of states in A.

Fig. 7
figure 7

Procedure BUILDW

Example 7

To clarify how structure \(W=\{(w_i,Q_{w_i})\}\) is built, consider the language L = {a, aaa, aab, b} (Fig. 8).

Fig. 8
figure 8

Transition diagram of automaton accepting language L = {a, aaa, aab, b}

The suffixes of words wiL along with their frequency counts in L are (λ,4), (a,2), (aaa,1), (aa,1), (aab,1), (ab,1), (b,2), and the sizes of right languages are \(|\overrightarrow {q0}| = 4\), \(|\overrightarrow {q1}| = 3\), \(|\overrightarrow {q2}| = 2\) (we omit state q3, because it leads to the trivial decomposition L = Lλ). To establish a pair \((w_i,Q_{w_i})\), we need to check (4) for word wi. Let us check this equation for wi = aab, which is divided by states q0, q1, and q2. The suffixes and checks are aab, ab, b, and \(U(aab) \cdot |\overrightarrow {q0}| = 4 \ge |L|\), \(U(ab) \cdot |\overrightarrow {q1}| = 3 < |L|\), and \(U(b) \cdot |\overrightarrow {q2}| = 4 \ge |L|\). Thus, states q0 and q2 are significant, while state q1 is redundant. Continuing the similar analysis for the remaining words wiL, we end up with W = {(a,{q0,q1}), (aaa,{q0,q2}), (aab,{q0,q2}), (b,{q0})}. Note, that the decomposition set for L is D = {q0,q2} with \(L_1^D=\{\lambda , aa\}\) and \(L_2^D=\{a, b\}\).

The second refinement implements a method for reducing the search space by skipping the verification, if possible, of prospective decomposition sets \(D={\mathscr{D}}\cup C_j\). The verification is carried out in the basic algorithm by checking the condition \(L=L_1^D L_2^D\) (Fig. 3). To make this verification more efficient, we determine the upper bound size of the language generated by set \({\mathscr{D}}\cup C_j\) (7). If the size of the input language L exceeds this bound, then we can omit the verification of \({\mathscr{D}}\cup C_j\).

Lemma 2

Let A = (Q,Σ,δ,s,QF) be the automaton accepting a finite language L, let \(D\subseteq Q\) be the final decomposition set for L, and let sets \(L^D_1\) and \(L^D_2\) be defined as in (1). Then the upper bound size of set \(L^D_1L^D_2\) is:

$$ |{L^{D}_{1}}{L^{D}_{2}}| \leqslant |{L^{D}_{1}}|\cdot \min_{q \in D}|\overrightarrow{q}| $$
(5)

where \(|L^D_1|\) is the size of \(L^D_1\), and \(\min \limits _{q \in D}|\overrightarrow {q}|\) is the minimum size of the right language for states qD.

Proof

The lemma follows directly from the definition of set \(L^D_2 = \bigcap _{q \in D} \overrightarrow {q}\). The size of \(L^D_2\) defined as the intersection of right languages \(\overrightarrow {q}\) cannot be greater than \(\min \limits _{q \in D}|\overrightarrow {q}|\). □

Lemma 3

The necessary condition for a finite language L to be decomposed by set \(D\subseteq Q\) is:

$$ |L| \leqslant |{L^{D}_{1}}{L^{D}_{2}}|. $$
(6)

Proof

Suppose we have L = L1L2. By Theorem 1, \(L_i\subseteq L_i^D\) for i = 1,2. Hence, \(L\subseteq L_1^DL_2^D\), and then \(|L|\leqslant |L_1^DL_2^D| \leqslant |L_1^D|\cdot |L_2^D|\). □

Combining (5) and (6) we get the upper bound for |L|:

$$ |L| \leqslant |{L^{D}_{1}}|\cdot \min_{q \in D}|\overrightarrow{q}| $$
(7)

which makes it possible to verify as to whether D can be the final decomposition set. The full verification first requires computing the sets \({L}_{1}^{D}\) and \({L}_{2}^{D}\), then concatenating them, and finally checking whether \(L={L}_{1}^{D} {L}_{2}^{D}\). However, if (7) does not hold, then these operations can be avoided. The procedure VERIFY(D) (Fig. 9) performs a double-check of the constraints related to the upper bound size of set D. Both checks may result in the rejection of set D. We implement them consecutively because the cost of the first check is lower than the cost of the second one.

Fig. 9
figure 9

Procedure VERIFY

Example 8

To illustrate the procedure VERIFY let us examine language L = {a, aa, aaa, aaab, aaaab, aab, ab, abab, abb, b, ba, baab, bab, bb} (Fig. 10). Assume that the adaptive algorithm moves from Phase1 to Phase2 with \({\mathscr{D}} = \{q0\}\) and \({\mathscr{Q}} = \{q1, q2\}\) (such assignments are made when the considered word a is divided by state q0). Suppose the procedure VERIFY is called with D = {q0,q2}. Then \(|L_1^D| = 3\) (\(L_1^D=\{\lambda , \textit {aa}, b\}\)), and \(\min \limits _{q \in D} |\overrightarrow {q}| = 5\) (\(\overrightarrow {q0} = L\) and \(\overrightarrow {q2} = \{\lambda , a, aab, ab, b\}\)). So the result of the first check: \(|L|\leqslant |L_1^D|\cdot \min \limits _{q \in D} |\overrightarrow {q}|\) is positive as 14 ≤ 3 ⋅ 5. The size \(|L_2^D| = 4\) as \(\overrightarrow {q_0} \cap \overrightarrow {q2} = \{a, aab, ab, b\}\). Thus the condition \(|L|\leqslant |L_1^D|\cdot |L^D_2|\) in the second check is not met because it holds 14 > 3 ⋅ 4. Therefore D = {q0,q2} is not a decomposition set for L. Due to double-checking of conditions in the procedure under consideration, we do not need to concatenate sets \(L_1^D\) and \(L_2^D\), nor compare the concatenation result with L. The decomposition set for the language in question is D = {q1,q2,q3}, and L1 = {a, b, aa, ab, ba, aaa}, L2 = {λ, b, ab}.

Fig. 10
figure 10

Transition diagram of DFA accepting language L from Example 8

The approach taken in the third refinement is similar to that of the second refinement. We have established the lower bound size of subsets Cj (Lemma 4), which complete the partial decomposition set \({\mathscr{D}}\). A subset Cj—and consequently a set \({\mathscr{D}}\cup C_j\) as well—can be disregarded when the size of Cj is below the lower bound.

Lemma 4

Let \({\mathscr{D}}\) be a partial decomposition set for L. Let \(C_j\subseteq {\mathscr{Q}}\) be an arbitrary subset of candidate states to extend \({\mathscr{D}}\). Let \({\mathscr{D}}\cup C_j\) be a prospective decomposition set for L. Then

$$ |L| \leqslant |L_{1}^{\mathscr{D}} \cup L_{1}^{C_{j}}| \cdot \min_{q \in \mathscr{D}}|\overrightarrow{q}|. $$
(8)

Proof

By Lemma 3, \(|L| \le |L^{{\mathscr{D}} \cup C_j}_1 L^{{\mathscr{D}}\cup C_j}_2|\), and by Lemma 2, \(|L^{{\mathscr{D}}\cup C_j}_1 L^{{\mathscr{D}}\cup C_j}_2| \leq |L_1^{{\mathscr{D}}} \cup L^{C_j}_1| \cdot \min \limits _{q \in {\mathscr{D}}} |\overrightarrow {q}|\), as \({\mathscr{D}}\cup C_j\) is a prospective decomposition set for L. □

From (8) we can derive the lower bound of \(L_1^{C_j}\) size for an arbitrary subset Cj:

$$ |L_{1}^{C_{j}}| \geqslant |L|/ \min_{q \in \mathscr{D}}|\overrightarrow{q}| - |L_{1}^{\mathscr{D}}|. $$
(9)

Based on (9) the procedure COMPLOWBOUND (Fig. 11) helps to reduce the number of subsets \(C_j\subseteq {\mathscr{Q}}\), \(j\in [0, 2^{|{\mathscr{Q}}|}-1]\), which are verified in Phase2 of the adaptive algorithm. Recall that set \({\mathscr{Q}}\) = STATES\((W)-{\mathscr{D}}\) includes candidate states to extend the partial decomposition set \({\mathscr{D}}\) obtained in Phase1. The set \(L_1^{C_j}\) occurring on the left side of (9) is the sum of left languages: \(L_1^{C_j}=\bigcup _{q\in C_j}\overleftarrow {q}\) where \(C_j\subseteq {\mathscr{Q}}\) (see (1)). The procedure COMPLOWBOUND computes the minimum cardinality of Cj such that the sum of sizes of left languages generated by states qCj, determined by the function PREFIXSUM, is greater than or equal to the value appearing on the right-hand side of (9). Using the required minimum cardinality of a subset Cj, it is either processed or discarded from further analysis in Phase2.

Fig. 11
figure 11

Procedure COMPLOWBOUND

Example 9

To illustrate procedure COMPLOWBOUND consider language L = {a, aa, aaa, aaab, aaaab, aab, ab, abab, abb, b, ba, baab, bab, bb, bbab, bbb} (Fig. 12). Suppose we enter Phase2 with \({\mathscr{D}}=\{q3\}\) and \({\mathscr{Q}}=\{q1, q2, q6\} \) (such assignments result from dividing word aaa by state q3). Then we have \(\min \limits _{q3 \in {\mathscr{D}}} |\overrightarrow {q3}| = 3\) (based on words λ, ab, b) and \(|L_1^{{\mathscr{D}}}| =|\overleftarrow {q3}| = 4\) (based on words aaa, ab, ba, bb), and thus \(|L_1^{C_j}| = |L|/\min \limits _{q\in {\mathscr{D}}} - |L_1^{{\mathscr{D}}}|= 16/3 - 4 = 1.33\).Footnote 9 Sizes of left languages for states q1, q2, and q6 are 1, and the prefix sums equal then S = {1,2,3}, which means that the value of c returned from COMPLOWBOUND is 2. Consequently, we do not need to analyze single state subsets of \(\mathcal {Q}\). The final decomposition set for L = L1L2 is {q1,q2,q3,q6} where L1 = {a, aa, aaa, ab, b, ba, bb} and L2 = {λ, ab, b}.

Fig. 12
figure 12

Transition diagram of DFA accepting language L from Example 9

The fourth refinement makes the algorithm adaptive. We have found out that depending on the input language, the time to execute Phase1 of basic algorithm could be much longer than the time to execute Phase2. This is a disadvantage as in Phase1 the processes run the same code while in Phase2 they work in parallel verifying sets \({\mathscr{D}}\cup C_j\). Therefore, when the run time of Phase2 is shorter compared to Phase1, the capacity to take advantage of parallel computation is not fully utilized.

The purpose of Phase1 is to reduce the set \({\mathscr{Q}}\) = STATES\((W)-{\mathscr{D}}\) so that its size becomes smaller than the threshold T. Apparently, the cause of a long run time of Phase1 is that the size of \({\mathscr{Q}}\) remains constant through a series of recursive runs of Phase1. So instead of repeating Phase1, it is better to start Phase2 by increasing the value of threshold T. Setting the new value of T (procedure ADJUSTT, Fig. 13) is triggered when a specified number of recursive runs of Phase1 is completed with no change of \({\mathscr{Q}}\). More precisely, when the number of runs wherein old_s = s reaches the fixed value of e, the value of T is increased. However, the rate of growth of T should be controlled so that it does not become too large. Once the value of T is doubled (or tripled) in relation to T0, the number of recursive runs e to be performed before T is increased again, is also doubled (or quadrupled).

Fig. 13
figure 13

Threshold adjustment procedure

Note that the greater value of threshold T causes the size of set \({\mathscr{Q}}\) to grow. Consequently, the number of subsets Cj where \(C_j\subseteq {\mathscr{Q}}\), and thus the number of sets \({\mathscr{D}}\cup C_j\) to verify, increases (the number of subsets Cj is exponential and equal to \(2^{|{\mathscr{Q}}|}\), as the subsets are members of the power set of \({\mathscr{Q}}\)). This means that the degree of parallelism grows, which is desirable since we may use more processes to conduct the search.

The average time complexity of adaptive algorithm given by

$$ \mathscr{T}_{a}(\pi, n) = d(n,\widehat{T}) \cdot (t_{1}(n,\widehat{T}) + (t_{2}(n,\varepsilon,\psi\widehat{T}) \cdot \varepsilon \cdot 2^{\psi\widehat{T}})/\pi) $$
(10)

this is similar to that of basic algorithm (3). There are, however, two differences. First, the coefficient ε takes into account the fraction of sets \({\mathscr{D}}\cup C_j\) that skip verification in Phase2. Second, the threshold T that was kept constant in the basic algorithm can now be increased adaptively, so it holds that \(\widehat {T}\geq T\). The coefficient ε where ε ∈ (0.0,1.0] can considerably reduce the total average run time \(d(n,\widehat {T}) \cdot (t_2(n,\varepsilon ,\psi \widehat {T}) \cdot \varepsilon \cdot 2^{\psi \widehat {T}})/\pi \) of Phase2 (see Table 8). Similarly, the growing average value of \(\widehat {T}\) resulting from adaptation reduces the amount of computation in Phase1 while increasing the amount of computation in Phase2, which is distributed among π processes.

To conclude, the adaptive algorithm introduces the three refinements aimed at pruning the search space. In contrast to the basic algorithm, which only eliminates particular redundant states, the adaptive algorithm also targets whole sets of states that may generate resultant decompositions. The fourth refinement involving the adjustment of threshold T ensures not only a better balance between the run times of Phase1 and Phase2, but it also provides a better exploitation of parallelism in the decomposition problem. However, all the refinements do not reduce the order of the adaptive algorithm complexity, which remains exponential.

6 Computational experiments

This section reports on the comprehensive experiments conducted to evaluate the performance of basic and adaptive algorithms. The run times to solve the decomposition problem were measured for almost 1450 languages over an alphabet of size |Σ| = 3–5, and for more than 2700 languages over binary and unary alphabets, and over an alphabet of size |Σ| = 10 (in what follows we refer to these alphabets as Σ3 − 5, Σ2, Σ1, and Σ10). Furthermore, the impact of the adaptive setting on the results obtained, and the speed-ups of adaptive algorithm were studied.

The basic and adaptive algorithmsFootnote 10 were implemented in C language using the MPI library functions in the Intel MPI 5.1.1.109 version. Each process ran a sequential stream of instructions defined by DECOMPOSE (or DECOMPOSE2) procedure. The processes running the algorithms were independent of one another, and synchronized their operation only at the beginning and end of computation. The implementation structure based on the master-worker paradigm is shown in Fig. 14. The aim of the master process was to send the input language L to all the workers, and to collect the decompositions of L found (in the actual implementation, the master process M and worker process W0 were combined into a single process).

Fig. 14
figure 14

Master-worker structure of implementation of algorithms (M – master process, Wi, i = 0, 1, …, π − 1 – worker processes)

The experiments were carried out on the Tryton supercomputer with a computation speed of 1.48 Pflop/s, running the Linux kernel 2.6.32-754.3.5.el6.x86_64 along with the Slurm utility (Simple Linux utility for resource management). The supercomputer is composed of 1607 compute nodes, each equipped with two 12-core Intel Haswell processors (Xeon E5 v3) operated at 2.3 GHz, with 128 GB of RAM memory. The processors are connected by the 56 Gb/s Infiniband fat tree network. The complete system with a cluster architecture, located in the Computer Centre in Gdańsk, Poland (http://task.gda.pl/centre-en), houses 3214 processors (38568 cores), and 48 Nvidia Tesla accelerators.

6.1 Benchmark languages

For the purpose of the experiments, we generated four sets of languages. The sets E1 and E2 contained composite languages, while the sets P1 and P2 prime languages. The sets E1 and P1 included between 6000 and 15000 words, and the sets E2 and P2 between 60000 and 90000 words (Table 2a). The composite languages were created using random grammars [17]. Let Σ = {a1, a2, …, al} be the set of terminal symbols, l ≥ 1, let V = {V1, V2, …, Vr} be the set of nonterminal symbols, r > l, and let Vr be the initial symbol. The grammars for composite languages were obtained as follows:

  1. 1.

    For each terminal symbol aiΣ, create a production \(V_i \rightarrow a_i\).

  2. 2.

    For each nonterminal symbol Vj where j = l + 1,l + 2,…,r − 1:

    • Draw at random a terminal symbol aΣ. Create a production \(V_j \rightarrow a\).

    • Draw at random l pairs (a,Vi), where aΣ and ViV, i < j. Create a production \(V_j \rightarrow aV_i\).

  3. 3.

    Create a production \(V_r \rightarrow V_{r - 2}V_{r - 1}\).

Table 2 Characteristics of languages (a) and automata built by basic (b) and adaptive (c) algorithm (N – number of languages in the set, |Σ| – alphabet size; Min, Max – minimum and maximum values, 1st-q, 3rd-q – first and third quartiles, Med (or 2nd quartile) – median; |L| – number of words in the language, |Q| and \(|Q^{\prime }|\) – numbers of states searched by algorithms)

When creating the composite languages, we rejected the grammars that generated a number of words outside the ranges 6000–15000 (for set E1) and 60000–90000 (for set E2). The values of l and r were selected from the ranges 3–5 and 11–21, respectively, so the maximum length of a word for composite languages was 2(rl) − 1 = 35. The sets of prime languages P1 and P2 were created based on sets E1 and E2. Let \({\mathscr{L}}\) be a composite language in set E1 (or E2). The language \({\mathscr{L}}\) can be transformed into a prime language \({\mathscr{L}}^{\prime }\) belonging to set P1 (or P2) using the following steps: (1) find the longest word \(\omega \in {\mathscr{L}}\); (2) generate a random word ωr over Σ such that |ωr| = |ω|; (3) if \(\omega _r\notin {\mathscr{L}}\) then copy language \({\mathscr{L}}\) into \({\mathscr{L}}^{\prime }\), and replace \(\omega \in {\mathscr{L}}^{\prime }\) with ωr. (Note that these steps do not guarantee that \({\mathscr{L}}^{\prime }\) will always be prime. However, the probability to get a composite language is small. If \({\mathscr{L}}^{\prime }\) is composite, one can repeat the steps.)

Consider the size of the input data of the algorithms. There are three independent variables defining this size: the number n of words in L, the size |Σ| of the alphabet, and the maximum length h of a word in L. In Section 6.2 we limit the alphabet size to |Σ| = 5, and the maximum length of a word to h = 35. Consequently, the values of |Σ| and h become the parameters of the algorithms. So we can assume that the only variable defining the size of the input data of the decomposition problem is the number n of words in L.

6.2 Experimental results

While performing the experiments we ran the basic and adaptive algorithms by employing 16 processesFootnote 11 for languages in sets E1, E2, P1, and P2. We set the maximum run time allowed to solve a given language L to six hours. By solving the language we mean that the algorithm either determines all decompositions of L, or finds out that L is prime. Since the basic algorithm failed to solve some languages within the six-hour limit, we defined the success rate as

$$ R = \frac{N_{s}}{N} \cdot 100\% $$
(11)

where Ns was the number of languages solved by both algorithms, and N was the number of languages in the set. As shown in Table 3, the adaptive algorithm outperformed the basic algorithm with respect to success rates for all sets under consideration.

Table 3 Success rates of algorithms

A comparison of run times measured by the MPI_Wtime() function is shown in Table 4 and Fig. 15. Out of a total of 1446 languages, the comparison relates only to 1168 languages that were solved by both algorithms within the six-hour limit. The box plots of Fig. 15 depict the times through their quartiles. The bottom and top of each box are the first and third quartiles of measurements, and the band inside the box is the second quartile (the median). The lines extending vertically from the boxes, the so-called whiskers, indicate minimum and maximum measurements.

Table 4 Run times (in seconds) of algorithms (Min, Max – minimum and maximum times, 1st-q, 3rd-q – first and third quartile, Med – median)
Fig. 15
figure 15

Comparison of run times (in seconds) of algorithms for 1168 languages

The median run times in Table 4 show that both algorithms solve prime languages faster than composite languages. As the aim is to find all decompositions of a language L, the algorithms have to explore the whole solution space for L. The size of this space is similar for both types of languages, because the cardinalities of languages in sets E1 and P1, and E2 and P2 are the same. The experiments prove that the number of sets DCj verified by both algorithms for prime languages is smaller compared to composite languages (Table 5). Consequently, the run times for prime languages are shorter, because fewer decomposition sets need to be verified.

Table 5 Numbers of decomposition sets DCj verified by the algorithms (IQR – interquartile range, difference between third and first quartile)

Comparing the median run times (Table 4), we can see that the adaptive algorithm outperforms the basic algorithm for sets E1, E2, and P1. However, for set P2 the adaptive algorithm shows a slightly worse performance. One of the refinements considers only significant states of automaton A accepting the language L. Due to this refinement, fewer decomposition sets DCj are verified. The redundant states are removed in the course of building structure W, which is created by procedure BUILDW. Its execution takes a certain amount of time, but one can expect that this amount will be amortized by fewer sets DCj that need to be verified. However, such amortization did not occur for set P2, because no sets DCj for those languages were discovered by the adaptive algorithm (Table 5).

Considering the above, we make the claim that the adaptive algorithm is faster than the basic algorithm while solving composite and prime languages. We test this claim statistically on the four pairs of data samples by using the one-sided two-sample test for comparing two means (Table 6). The data samples were created by eliminating the outliers. For example, by means of basic algorithm, the set of 323 measurements for set E1 was acquired (column Ns in Table 3). From this set, 28 measurements were eliminated as outliers (n1 = 295 in Table 6). A measurement was considered as an outlier if it fell outside the range [m,M] where m = 1st-q − 1.5 ⋅ (3rd-q − 1st-q) and M = 3rd-q + 1.5 ⋅ (3rd-q − 1st-q) (Table 4).

Table 6 Data samples \(d_1=(n_1,\bar {y}_1,s_1)\) and \(d_2=(n_2,\bar {y}_2,s_2)\) to test the equality of mean run times for the algorithms; the threshold was initially set to T = 20 (n1,n2 – numbers of measurements; \(\bar {y}_1,\bar {y}_2\) – means, in seconds; s1,s2 – standard deviations)

For samples d1,d2, and sets Ei,Pi, i = 1,2, we set up the null and alternative hypotheses

$$ \begin{array}{@{}rcl@{}} H_{0}: \mu_{1}=\mu_{2}&\text{and}&H_{1}: \mu_{1}>\mu_{2} \end{array} $$

where μ1 and μ2, respectively, are the mean values of run time to solve the population of composite and prime languages by the algorithms. Using the test statistic:

$$ Z=\frac{\bar{y}_{1}-\bar{y}_{2}}{\sqrt{\frac{{s_{1}^{2}}}{n_{1}}+ \frac{{s_{2}^{2}}}{n_{2}}}} $$

where ni, \(\bar {y}_i\), and si are components of sample di, the hypothesis H0 is rejected at the α = 0.01 significance level if Z > zα where zα = 2.326. The sizes of our data samples are in the range of 115–345 (Table 6). In statistics, a sample size of ni ≥ 30 is considered large enough to assume that its distribution is normal. Thus, the critical value of zα is determined based on the standard normal distribution N(0,1). Clearly, all values of Z are above the critical value of 2.326 (Table 6), so we reject the null hypothesis H0 in favor of the alternative hypothesis H1. This means that there is sufficient evidence at the α = 0.01 level of significance to claim that the adaptive algorithm solves composite and prime languages faster than the basic algorithm.Footnote 12

Several methods for reducing the search space are proposed in Section 5. The idea behind those methods is to omit the full verification of a set DCj as to whether it is a decomposition set. When the set is large, its verification can be computationally expensive. However, Lemmas 2-4 allow us to discard a vast majority of the sets without verification. As seen in Table 5, the numbers of sets verified by the adaptive algorithm for composite languages are reduced by several orders of magnitude compared with the corresponding numbers for the basic algorithm. This indicates that the proposed methods of pruning the search space are effective.

Another way to prune the search space is by removal of redundant states of automaton A accepting the input language L. Due to such removal the state sets of A for the composite languages decrease by approximately 10%, and for the prime languages by approximately 16%–17% (see Med |Q| and Med \(|Q^{\prime }|\) in Table 2b–c). The basic algorithm discovered some prospective decomposition sets for prime languages while the adaptive algorithm did not find any set of that type (Table 5). We believe that the reason for this was smaller automata processed by the adaptive algorithm compared with the basic algorithm.

The adaptive algorithm changes its behavior by setting the value of threshold T (denoted then by \(\widehat {T}\)) at runtime. The adaptation was most beneficial for languages of set E2, which turned out to be most demanding in terms of solving the decomposition problem. For these languages the values of \(\widehat {T}\) varied in a wide range of 20–58 (Table 7). The capability of adaptation was exploited to a lesser extent for languages in set E1 with the values of \(\widehat {T}\) varying within a range of 20–32, and for the prime languages the adaptive adjustment of \(\widehat {T}\) did not occur.

Table 7 Values of threshold \(\widehat {T}\) set by the adaptive algorithm

The time complexity formulas \({\mathscr{T}}_b(\pi ,n)\) and \({\mathscr{T}}_a(\pi ,n)\) for the algorithms include several terms and coefficients. To investigate the variability of these quantities, we took the measurements reported in Table 8. The Avg entry contains the average value of a quantity, q, calculated over a language set. The Range entry describes the variability of q through a pair \((q_{\min \limits },q_{\max \limits })\) where \(q_{\min \limits }\) and \(q_{\max \limits }\) are the minimum and maximum average values of q calculated over average values in a distinguished interval. The range [6000,15000] of language size for sets E1 and P1 was divided into nine equal intervals, and the range [60000,90000] for E2 and P2 into ten intervals.

Table 8 Variability and average values of terms and coefficients appearing in time complexity formulas given in (3) and (10); times t1 and t2 are in seconds

The Range values indicate that the times t1 and t2 are slowly increasing functions of language size n (Table 8). Recall that t1 is the average time to find a partial decomposition set D in Phase1, and t2 the average time to verify a set DCj in Phase2. Small values of coefficient ε indicate that pruning of the search space is effective. The values of d(n,T) and t1(n,T) allow us to estimate the average run time of Phase1 of basic algorithm for set E2. This time is 2.1 ⋅ 45.6 ≈ 95.8 s. For the adaptive algorithm the run time of Phase1 is equal to 9.7 s. Since, we have the average run times of complete algorithms (\(\bar {y}_1\) and \(\bar {y}_2\) for E2 in Table 6), we can calculate execution times of Phase2 for both algorithms. We get 1230.7 − 95.8 = 1134.9 s for the basic algorithm, and 23.6 − 9.7 = 13.9 s for the adaptive algorithm. Clearly, the run time balance between Phase1 and Phase2 is much better for the adaptive algorithm (9.7 vs. 13.9 s) compared to the basic algorithm (95.8 vs. 1134.9 s). The better balance was achieved due to effective pruning of the search space, and to the increase in threshold T done by the adaptive algorithm at runtime.

We also conducted the experiments on languages over a comparatively large alphabet Σ10, and over small alphabets, in particular on the binary and unary languages. The setting was the same as before. The adaptive algorithm was run by 16 processes, and the time limit for solving a language was six hours. The languages over alphabets Σ2 and Σ10 were created in a similar fashion as described in Section 6.1. To produce unary languages, the random number of ones forming the words of a language were generated.

The experiments have shown that the languages over an alphabet Σ10 were easy to solve. Their decomposition times, in the order of seconds (column Med in Table 9, and Fig. 16a–d), compared favorably with the languages over an alphabet Σ3 − 5 (column Med in Table 4, and Fig. 15). The experiments revealed that the binary languages were harder to solve, while the unary languages were the worst-case input data to solve the problem. The median run times for the binary languages were in the order of tens of seconds, and for the unary languages somewhat longer than 80 minutes (column Med in Table 9, and Figs. 17 and 16e–f). A major difficulty in solving these languages were the large sizes of automata that had to be searched. The ranges of median size of automata accepting the binary and unary languages were, respectively, [104,261] and [1994,1997] (column Med\(|Q^{\prime }|\) in Table 10). As a result, the run times for those larger automata were longer, compared to the languages over an alphabet Σ10 for which the sizes of automata were in the range of [50,106].

Table 9 Run times (in seconds) of adaptive algorithm for languages over alphabets Σ10, Σ2, and Σ1 where R – range of language size, N – number of languages in the set
Fig. 16
figure 16

Run times (in seconds) of adaptive algorithm for languages over alphabet Σ10, and for unary languages

Fig. 17
figure 17

Run times (in seconds) of adaptive algorithm for binary languages

Table 10 Sizes of state sets \(Q^{\prime }\) of automata searched by adaptive algorithm

6.2.1 Scalability study

The results of the adaptive algorithm speed-up evaluation are presented in Fig. 18.Footnote 13 The speed-ups achieved for the binary languages (set \({\mathscr{E}}^b\)) and the sets of languages over an alphabet Σ3 − 5 (set E1) were quite good. For the set E2 the result was satisfactory.

Fig. 18
figure 18

Speed-ups, S, for binary languages (a)–(b), and for languages over alphabet Σ3 − 5: sets E1 (c)–(d), E2 (e)–(f), and large sets (g)–(i), (|L| – number of words in the language, p – number of processes)

As can be seen, the speed-ups obtained are not linear. The reason for this is that the parallel processes in the algorithm execute the same code in Phase1. So, the overhead of excess computation performed by the processes occurs, which decreases the speed-up. We have significantly reduced that overhead by shortening the run time of Phase1 (compare Avg times t1 for the basic and adaptive algorithms in Table 8). It was done by means of the algorithmic refinements, in particular by removing redundant states in procedures REMOVERED2 and BUILDW, and by the adaptive adjustment of threshold T.

The large languages in size ranging from 160000 to more than 200000 words scaled very well (Fig. 18g–i). The computational work for these languages was higher, and so the impact of the overhead on the speed-up obtained was smaller. The times to find decompositions of large languages by using 128 processes varied in a range of 7–37 minutes.

As mentioned before, we solve the problem of finding all decompositions of a finite language L in the form of L = L1L2. The language L does not have to satisfy any specific conditions. To the best of our knowledge, parallel algorithms to solve this problem have not been presented in the literature so far. Therefore we could not compare the outcome of our experiments with the results of other algorithms.

7 Conclusions and future work

In this paper the problem of finite language decomposition is investigated. The problem under consideration, assuming that a language is given as a DFA, is NP-hard. The main contribution of the paper is the adaptive parallel algorithm based on an exhaustive search used for finding all decompositions of a given finite language. The algorithm implements several methods for pruning the search space. Furthermore, the algorithm is adaptive; it modifies its behavior at the time it is run by adjusting one of the parameters based on the runtime acquired data related to its performance. As a consequence, a substantial reduction in the amount of computation necessary to solve the problem has been achieved.

Comprehensive computational experiments carried out on almost 1450 languages over an alphabet Σ3 − 5 proved that the methods for pruning the search space proposed in Lemmas 2–4 were very effective. The methods allowed the adaptive algorithm to reduce the search space by several orders of magnitude compared with the basic algorithm. As a result, the median run time to solve the languages in set E2 by the adaptive algorithm was equal to approximately 15 s whereas by the basic algorithm it was 1296 s. The adaptive feature of the algorithm proved most beneficial for languages from set E2 for which the value of threshold T varied in a range of 20–58. The higher value of T is advantageous, because it gives rise to an increase in computational parallelism, which enables better use of available processes.

We also tested more than 2700 languages over a large alphabet Σ10 and over small alphabets, specifically the binary and unary languages. The results indicated that the languages over an alphabet Σ10 were easier to solve than those over an alphabet Σ3 − 5. Furthermore, it took longer to decompose the binary languages in comparison to the languages over the alphabets Σ10 and Σ3 − 5, while the unary languages turned out to be the worst-case input data to solve the decomposition problem. Based on these findings, we conclude that finite languages over small alphabets are more difficult to decompose than those over large alphabets.

The scalability study revealed that the binary languages, and the languages generated over an alphabet |Σ| = 3–5, containing from 6000 to more than 200000 words, scaled well, especially those with larger sizes.

In terms of future work, the two issues can be investigated. The first is adaptive setting of the algorithm, which we believe has the potential to be improved. Presently, the algorithm establishes the value of threshold T based solely on the number of recursive runs of Phase1. We suppose that the number of processes executing the algorithm should also be considered while determining the value of T. Another issue to investigate is the further scalability of the adaptive algorithm. At present, the algorithm by using 16 processes can solve the language instances of up to 90000 words in the median run time of tens of seconds, and by using 128 processes the languages of sizes between 160000 and more than 200000 words, in the run time of tens of minutes. The question is to what extent the language size could be enlarged by increasing the number of processes while maintaining the short run time of the algorithm, and possibly high processor utilization.