An adaptive parallel algorithm for finite language decomposition

The computationally hard problem of finite language decomposition is investigated. A finite language L is decomposable if there are two languages L1 and L2 such that L = L1L2. Otherwise, L is prime. The main contribution of the paper is an adaptive parallel algorithm for finding all decompositions L1L2 of L. The algorithm is based on an exhaustive search and incorporates several original methods for pruning the search space. Moreover, the algorithm is adaptive since it changes its behavior based on the runtime acquired data related to its performance. Comprehensive computational experiments on more than 4000 benchmark languages generated over alphabets of various sizes have been carried out. The experiments showed that by using the power of parallel computing the decompositions of languages containing more than 200000 words can be found. Decompositions of languages of that size have not been reported in the literature so far.


Introduction
A finite language L is decomposable (or composite) if there are two nontrivial languages L 1 and L 2 such that L = L 1 L 2 . Otherwise, L is prime. It was proved that the complexity of deciding primality of a finite language is NP-hard [1].
The main contribution of this paper is an adaptive parallel algorithm that finds all decompositions L 1 L 2 of L, or concludes that L is prime when no decomposition is found. The algorithm is based on an exhaustive search, and incorporates several original methods for pruning the search space. Moreover, the algorithm is adaptive; it changes its behavior based on the runtime acquired data related to the performance of its recursive phase. Since the question of decomposing finite languages is computationally hard, a motivation of our work is to investigate to what extent the power of parallel computing makes it possible to tackle that question for large-size instances of finite languages.
The decomposition algorithm has a number of applications. It is useful for determining the prime decomposition of a supercode. A finite language L can be represented by the list of words. However, such representation is memory intensive if L is large. Therefore a better option is to decompose L and store its factors L 1 and L 2 . The decomposition algorithm can be used to find a context-free grammar G. Given the sets of words S + and S − , called examples and counterexamples, a grammar G should be found that accepts words in set S + , and rejects words in set S − . We describe these applications in more detail in Examples 3-6.
The rest of the paper is organized into six sections. Section 2 presents a survey of previous work on finite language decomposition. Section 3 recalls selected concepts of the theory of languages and automata. Section 4 describes the basic algorithm, which has been a starting point for developing the adaptive algorithm proposed in Section 5. Section 6 reports on the results of the computational experiments conducted using the algorithms. Section 7 contains conclusions and future work.
H. J. Bravo et al. [5] investigate the manufacturing problem that involves batches of identical or similar items produced together for different sized production runs. A production run, consisting of a fixed number of interleaved workcycles, and is controlled by a monolithic supervisor, whose operation is described using a deterministic finite automaton (DFA). The automaton accepts a regular language R over some finite alphabet Σ, with states representing points in time, and transitions representing events occurring within a production run. Events are operations that are performed by machines (for example, take a workpiece, process it, etc.) to obtain the product. An event a has a specific time of execution d(a), and belongs to the alphabet of a DFA, a ∈ Σ. A production run, being a sequence s = a 1 a 2 . . . a n of events, has a makespan equal to the sum of the event execution times, n i=1 d(a i ). For a product there can be a number of sequences s that differ in the order of individual events a i , with each sequence corresponding to a route in a DFA. The paper proposes a factorization-based approach to compute a route with the minimum makespan. As for a given product P the production run length is finite (measured by the number of events), the finite language L is extracted from a regular language R, L ⊆ R, which describes all possible routes for P . Then L is decomposed into factors L = L 1 L 2 . . . L k for some k, where each factor L i describes a set of possible subroutes betwen the so called symmetrically reachable idle states (the notion introduced in the paper). Finally, for all sets L i , i ∈ [1, k], the subroutes r i of the minimum makespan are determined, which give the optimal route r = r 1 r 2 . . . r k . The authors claim that the proposed approach mitigates the computational complexity of finding route r.
Several decomposition algorithms were developed for codes, which are sets of words and hence they are formal languages. Codes are categorized by their defining properties, for example, prefix-freeness, suffix-freeness, infix-freeness, etc.
Y. -S. Han and K. Salomaa [6] studied solid codes defined as follows: a set S of words is a solid code, if S satisfies two conditions: (i) no word of S is a subword of another word of S (infix-freeness), and (ii) no prefix of a word in S is a suffix of a word in S (overlap-freeness). In other words, S has to be an infix code and all words of S should not overlap. Moreover, a language L is a regular solid code if L is regular and a solid code. The paper proposed two algorithms related to the decomposition problem for solid codes. The first algorithm determines in polynomial time whether or not a given regular language is a solid code. The second efficient algorithm finds a prime solid code decomposition for a regular solid code when it is not prime.
K. V. Hung [7] considered the prime decomposition of supercodes. A finite language L is a supercode, if no word in L is a proper permu-subword of another word in it. Let u and v be words of L. A word u is called a permu-subword of v, if u is a subword of v in which symbol permutations are allowed. A supercode L is prime if L = L 1 L 2 for any supercodes L 1 and L 2 . A linear-time algorithm was provided that for a given supercode L discovered that it was prime, or returned the unique sequence of prime supercodes L 1 , L 2 , . . . , L k+1 such that L = L 1 L 2 . . . L k+1 with k ≥ 1.
In addition to the above recent work on supercodes, there were previous works on this subject. J. Czyzowicz et al. [8] proved that for a given prefix-free regular language L, the prime prefix-free decomposition is unique, and the decomposition for L that is not prime can be computed in O(m) worst-case time, where m is the size of the minimal DFA accepting L. Y. -S. Han et al. [9] investigated the prime infix-free decomposition of infix-free regular languages and demonstrated that the prime infix-free decomposition is not unique. An algorithm for the infix-free primality test of an infix-free regular language was given. It was also shown that the prime infix-free decomposition can be computed in polynomial time. Y. -S. Han and D. Wood [10] investigated the finite outfix-free regular languages. A word x is an outfix of a word y if there is a word w such that x 1 wx 2 = y and x = x 1 x 2 . A set X of words is outfix-free if no word in X is an outfix of any other word in X. A polynomialtime algorithm was developed to determine outfix-freeness of regular languages. Furthermore, a linear-time algorithm that computed a prime outfix-free decomposition for outfixfree regular languages was given. There are also papers on theoretical issues related to the problem of formal language decomposition [11][12][13][14][15][16].
Let us note that all the works discussed above differ from our approach to solving the decomposition problem. Firstly, we consider finite languages that do not have to satisfy any specific conditions. Secondly, the parallel algorithm we propose is intended to compute all decompositions of a finite language L in the form of L = L 1 L 2 .
This paper builds on our previous efforts in developing the sequential and parallel algorithms for finite language decompositions. Let us briefly review the results obtained in those efforts. A sequential algorithm for finding the decomposition of a finite language was proposed in [17]. The algorithm returned only the first decomposition found of a given language. A threshold parameter T (see p. 12) that impacted the operation of the algorithm was kept constant, meaning it was not adaptively adjusted while the algorithm was running. The article introduced the concept of significant state along with the proof that for every composite language there is at least one decomposition based solely on significant states. In the experimental part of the paper, 240 languages of size less than 2000 words were studied, including 120 prime languages. The implementation was done in Python, and the average running time of the algorithm for test languages was in the order of a few seconds.
The approach for finding the first decoposition of a finite language by using selected meta-heuristics was discussed in [18]. The paper presented the results for the simulated annealing algorithm, tabu search, and genetic and randomized algorithms, all sequentially implemented in Python. Computational experiments were carried out on 1200 languages of sizes less than 2000 words with algorithm execution time limits of 10 and 60 seconds. Within these limits, the algorithms returned quite a lot of wrong answers for composite languages, claiming they were prime.
A basic parallel algorithm (its short description is included in Section 4) for finding all decompositions of a finite language using the concept of significant state was given in [19]. The algorithm consisted of two phases. In the first phase, each process executed the same code, and in the second phase the computation was spread across the processes available based on their ranks. The algorithm was implemented in the C language and Message Passing Interface (MPI). The experiments, conducted with up to 22 processes covered four languages with a word count between 800-6583 words.
A preliminary version of an adaptive parallel algorithm for solving the decomposition problem was given in [20]. The adaptive algorithm was based on the modified concept of significant state compared with that given in [17]. The algorithm consisted of two phases and included a method of pruning of the search space, and a simplified verification of prospective decomposition sets. It also involved an adaptive way for adjusting the threshold parameter T to keep the balance between the times spent in two phases of the algorithm. The experiments concerned nine languages up to 90000 words in size, solved with 32 processes in the run time of a few minutes. The algorithm was implemented in the C language and MPI interface.
To summarize, our previous work encompassed several sequential and parallel algorithms to solve the decomposition problem for finite languages. The sequential algorithms were able to decompose the languages of size up to 2000 words, while the parallel algorithms could tackle the languages of size up to 90000 words with 32 processes.
In the current paper we provide the advanced adaptive parallel algorithm, which can solve languages of size between 160000 and more than 200000 words in the run time of tens of minutes by using 128 processes. The results of comprehensive computational experiments on the variety of 4000 benchmark languages are also given. To the best of our knowledge, the parallel algorithm and in-depth experimental study of the problem under consideration have not been reported in the literature thus far.

Preliminaries
Below, we recall selected concepts from the theory of formal languages and automata. For more details the reader may refer to the textbooks [21,22].
An alphabet Σ is a nonempty finite set of symbols. A word w is a sequence of zero or more symbols taken from Σ. The length of a word is denoted by |w|, with the special case of zero-length word being the empty word λ, for which |λ| = 0. A prefix of a word is any number of leading symbols of that word, and a suffix is any number of trailing symbols. By convention, Σ * denotes the set of all words over an alphabet Σ, and Σ + denotes the set Σ * − {λ}.
A finite language L ⊂ Σ * is a finite set of words w ∈ Σ * . A finite language L is trivial if it consists of the empty word λ, that is L = {λ}, and it is nontrivial if it contains at least one nonempty word. Let u, v, w ∈ Σ * . A concatenation of words u and v produces a word w = uv, where w is created by making a copy of word u and following it by a copy of word v. Let U, V , W ⊆ Σ * . Then the concatenation (or product) of sets U and V produces a where Q denotes a finite set of automaton states, Σ is an alphabet, δ : Q × Σ → Q is a transition function, s ∈ Q is the initial (start) state, and Q F ⊆ Q is a set of final (accepting) states. An automaton is deterministic iff for all states q ∈ Q and symbols a ∈ Σ, |δ(q, a)| ≤ 1; in other words, each state q has at most one out-transition marked by a. A deterministic automaton is finite iff its state set Q is finite. Let us extend the transition function δ to words over Σ. Formally we define δ(q, λ) = q, and for all words w and input symbols a, δ(q, wa) = δ(δ(q, w), a). So δ(q, w) = q means that the word w takes A from the state q to the state q .
A directed graph G = (V , E), called a transition diagram, is associated with a finite automaton A, where V = Q, E = {p a → q | q = δ(p, a)}, and p a → q denotes an arc labeled a in the transition diagram. Given a word w = a 1 a 2 . . . a m , a i ∈ Σ, i ∈ [1, m], a w path is a sequence of transitions labeled with symbols of w in the transition diagram. A w path is denoted by a sequence of states q 1 , q 2 , . . . , q m+1 where q j ∈ Q, j ∈ [1, m + 1], lying on the path. A deterministic automaton A accepts a word w if there is the w path q 1 q 2 . . . q m+1 leading from the initial state to a final state of A, so q 1 = s and q m+1 ∈ F .
Define the left (resp. right) language of a state q ∈ Q as ← − q = {w : δ(s, w) = q} (resp. − → q = {w : δ(q, w) ∈ Q F }). Put simply, the left (resp. right) language of q consists of words w for which there is a w path from the initial state s to q (resp. from q to a final state).
A finite automaton A = (Q, Σ, δ, s, Q F ) is said to accept a language L when for each word w ∈ L there is a w path beginning in the state s, and ending in a state q ∈ Q F , or more formally L = {w|δ(s, w) ∈ F }. Deterministic and nondeterministic 1 finite automata accept the same set of languages, namely the set of regular languages. Minimumstate acyclic DFAs accept the set of finite languages. Given all states p, q ∈ Q, p = q, a DFA is minimum-state iff − → p = − → q , 2 and it is acyclic iff δ(q, w) = q for every word w ∈ Σ + and state q ∈ Q.
In what follows, we only consider minimum-state acyclic deterministic automata, so each such automaton will be referred to as automaton herein.
Let L be a nontrivial finite language. A decomposition of L of index k where k ≥ 2 is the family of languages L i for i ∈ [1, k] called factors that once concatenated give language L, so we have L = L 1 L 2 . . . L k . The decomposition is nontrivial if languages L i are nontrivial, that is L i = {λ} for i ∈ [1, k]. Otherwise, the decomposition is trivial. In every nontrivial decomposition of a finite language L, the number of factors, k, is at most equal to the length of the longest word in L. Clearly, any language L has the trivial decompositions L{λ} and {λ}L. In what follows, by a decomposition we always mean a nontrivial decomposition. A language L is called prime if L has no decomposition of index 2, otherwise it is called composite (or decomposable). The prime decomposition of L is a decomposition L = L 1 L 2 . . . L k , k ≥ 2, where each language L i for i ∈ [1, k] is prime. It has been proven that every finite language is prime or has a prime decomposition, which is generally not true for infinite languages [13][14][15][16].
In this paper, we investigate the problem of finding all decompositions of index 2 of a nontrivial finite language L. For a particular decomposition we are looking for two nonempty finite languages L 1 and L 2 that once concatenated give language L, so we have L = L 1 L 2 . Note that a given language L can have many decompositions of index 2.
Let us introduce the notion of a decomposition set that is suitable for the study of decompositions of regular languages. 3 The notion is related to the left quotients of regular languages. Let L be a regular language over an alphabet Σ, and let A = (Q, Σ, δ, s, Q F ) be the minimum-1 A state q ∈ Q of a nondeterministic automaton may have more than one out-transition marked by a given symbol a ∈ Σ; then |δ(q, a)| > 1. 2 When two distinct states have the same right language, they could be merged into one, making a smaller deterministic automaton. Two states p, q can be merged giving a single state ( ← − p ∪ ← − q , − → p ) by combining their in-transitions and using the out-transition from just one of them. 3 Recall that finite languages we deal with are regular. state finite deterministic automaton accepting L. For any nonempty set D ⊆ Q, we define the left and right languages of D: 4 Note that languages L D 1 and L D 2 are regular as they are subsets of the regular language L. Theorem 1 Having L and A defined as above, assume that L is composite, so L = L 1 L 2 for regular languages L 1 and L 2 . Define a set D ⊆ Q, called a decomposition set, by is the decomposition of L into two regular languages.
The decomposition L = L D 1 L D 2 is referred to as decomposition induced by the set D. Theorem 1 implies that every decomposition of the regular language L is included in the decomposition of L induced by the decomposition set. The decomposition L = L 1 L 2 is said to be included in the decomposition L = L 1 L 2 if L i ⊆ L i for i = 1, 2.

Corollary 1
To solve the problem of finding all decompositions of index 2 of a nontrivial finite language L, we need to check (2) for all subsets D of Q. If none of these subsets induces a decomposition, we conclude that L is prime.
It follows from the corollary that solving this problem is equivalent to solving the primality problem.
Problem 1 (Primality) Let L be a finite language over a finite alphabet Σ given as a DFA. Answer the question whether L is prime.
It was shown that the primality problem for finite languages is NP-hard [1], and for regular languages it is PSPACE-complete [23,24].
Example 1 As an illustration of Theorem 1, consider a finite composite language L = {λ, a, b, aa, ab, aaa, aab, aaaa} 4 The languages can be also written as that has a single decomposition of index 2. The transition diagram for the minimum-state deterministic automaton A accepting language L is depicted in Fig. 1. Note that L = L 1 L 2 , where L 1 = {λ, a} and L 2 = {λ, b, aa, ab, aaa}. Then, according to Theorem 1 there is the decomposition Observe that words in L D 1 and L D 2 are prefixes and suffixes of L, and each word w ∈ L is divided by at least one state g ∈ D such that w = xgy where x and y are a prefix and suffix of w, respectively. For example, considering words of L we have (s, q ∈ D): λsλ, λsa, aqλ, λsb, λsaa, aqa, λsab, aqb, λsaaa, aqaa, aqab, aqaaa.
Example 2 A finite language may have more than one decomposition of index 2. The following language L = {ab, aba, abb, bb, bba, bbb} ( Fig. 2) has two decompositions, the first defined by D = {p}, Below we give a few examples of application of finite language decomposition.
Example 3 As already mentioned in Section 2, K. V. Hung considered factorization of supercodes [7]. He designed the algorithm to decompose a supercode L into prime components. The algorithm is based on the so-called bridge states, which are found in a non-returning and nonexiting acyclic deterministic finite automaton (N-ADFA) A accepting the code-words w ∈ L. An automaton A is non-returning if its start state has no in-transitions, and it is non-exiting if its final states have no out-transitions. A state b in automaton A is called a bridge state if b is neither the start state nor a final state, and each w path in A passes through b. Assume that A is the minimal N-ADFA accepting a supercode L that has k, k ≥ 1, bridge states. Then L can be decomposed into k + 1 prime supercodes L 1 , L 2 , . . . , L k+1 such that L = L 1 L 1 . . . L k+1 . The bridge states for a given automaton A can be identified in O(|Q|+ |δ|) time where |Q| and |δ| are the number of states and transitions of A , respectively. As an example the supercode L = {ab 2 ac, ab 5 c, ac 3 bac, ac 3 b 4 c} is considered. 5 The Instead of employing the notion of bridge states, one can readily find this factorization by calling recursively our parallel decomposition algorithm (described in Section 5). The sequence of calls gives the following: In general, to find the prime decomposition of a supercode consisting of n prime components, at most n calls of the decomposition algorithm are needed.
Example 4 Deterministic finite automata (DFAs) are of importance in many fields of science. They also have many practical applications-in the design of compilers and text processors, in natural language processing, speech recognition, among others. One of the concerns regarding DFAs is the memory efficient storage of their representation. A DFA A can be represented by the list of words of a language L (assuming it is finite) accepted by A . Such representation can be memory intensive if L is large, therefore a better option is to decompose L and store its factors L 1 and L 2 . Table 1 shows the reduction of size of languages (expressed in word count) from Section 6, obtained by using the decomposition based option. As can be seen, the reduction rate exceeds 99% for all languages in the table.
Example 5 A company signed the contract to supply the pipes with lengths 30, 31, 32, 36, 37, and 38 m. For technical reasons, the company can manufacture pipes of any length, but no greater than 30 m. A longer pipe can be produced by welding shorter pipes. The main part of the pipe production cost is to prepare a mold of the given length, so the number of molds needed to complete the contract should be minimized. The question is, which lengths should have the molds that must be prepared to implement the supply contract mentioned above. To answer this question, let us define the set of words L = {a 30 , a 31 , a 32 , a 36 , a 37 , a 38 }.
Then, among all decompositions L = L 1 L 2 , we look for the one for which the size of set L 1 ∪ L 2 is minimal, and each word in L 1 and L 2 is no longer than 30. The solution is the decomposition L = {a 15 , a 16 , a 17 }{a 15 , a 21 }, so we need only four molds of lengths 15, 16, 17, and 21 m.

Example 6
Given the sets of words S + and S − , called examples and counterexamples, find a context-free grammar G = (V , Σ, P , S) such that S + ⊆ L (G), and S − ∩ L (G) = ∅ where L (G) denotes the language generated by G. The basic decomposition algorithm (described in Section 4) can be readily modified to find an incomplete decomposition that allows a finite language L to be written as L = L 1 L 2 ∪ R where concatenation L 1 L 2 is the greatest possible, and R is the set of other words belonging to L.
Let S + = {ab, ba, aabb, abab, baba, bbaa, baab, abba} be the set of examples, and let S − = {a, b} d − S + with d ≤ 4 be the set of counterexamples. Based on these sets the context-free grammar can be found in two stages. In the first stage, using the modified basic decomposition algorithm, we create a series of incomplete decompositions.   corresponds to the grammar production rule. The second stage simplifies the grammar through examining whether each pair of grammar variables can be merged. It can be done if after such merger the grammar does not accept any word from the set S − . Furthermore, the unit production rules are eliminated. For the sets S + and S − specified above, we get the following context-free grammar: which defines the infinite language of words having the same number of symbols a and b. For more details, see [25, p. 71], [26,27].

Basic parallel decomposition algorithm
In this section we outline the basic parallel algorithm (or shortly, basic algorithm) for finite language decomposition [19]. It has been a starting point for devising the adaptive parallel decomposition algorithm (or, adaptive algorithm). The pseudocode of the basic algorithm expressed as a recursive procedure DECOMPOSE(W, D) is shown in Fig. 3. Let A = (Q, Σ, δ, s, Q F ) be the minimum-state acyclic DFA 6 accepting an input language L = {w i }, i ∈ [1, n]. The basic algorithm explores the set of states Q to find the decomposition sets D where D ⊆ Q. According to Theorem 1 each decomposition set D induces 6 There are several algorithms of time complexity O( w i ∈L |w i |) to construct such automaton for a given language L; see for example [28].
The algorithm finds all decompositions of the input language L by employing an exhaustive search of Q with pruning.
Let W be a set of pairs (w i , Q w i ) where the word w i = a 1 a 2 . . . a m , w i ∈ L, a j ∈ Σ, j ∈ [1, m], and let Q w i = {q 1 , q 2 , . . . , q m+1 }, be a set of states q k ∈ Q, k ∈ [1, m + 1] lying on the w i path where q 1 = s, and q m+1 ∈ Q F . Let STATES(W ) be the function that returns a set of states appearing in all sets Q w i for w i ∈ L, and let MINSTATES(W ) be the function that returns a pair Assume that D is a decomposition set to be found (see Phase1 in Fig. 3). Initially this set is empty, and it is gradually built up as the basic algorithm processes the words w i ∈ L. The words to be processed are selected in order of increasing sizes of their sets Q w i by using the function MINSTATES(W ). Each state q ∈ Q that divides a word w i ∈ L into two parts is inserted in set D, which is then checked to see if it is a decomposition set. The basic algorithm is recursive, so each subset of states in Q that can be a candidate for a decomposition set is examined.
q and y ∈ − → q , and suppose the state q is inserted in set D. Then each state r ∈ STATES(W ) for which y / ∈ − → r is considered redundant (or significant otherwise).
In words, suppose w = xy. If state q ∈ Q is inserted in D then each state r ∈ Q that does not have y in its suffix set becomes redundant. Note that a given state r may be either redundant or significant depending on the states that are in set D in the current recursive execution of the algorithm. Lemma 1 Let D where D ⊆ Q be the decomposition set for a finite language L. Then each state q ∈ D is significant.
Proof If D is a decomposition set for L, then each w ∈ L is divided into two parts by at least one state q ∈ D, that is w = xy, x ∈ ← − q and y ∈ − → q . Once state q is inserted in D, suffix y appears in the right language L D 2 . The definition L D 2 = ∩ r∈D − → r involves the intersection of sets − → q , thus suffix y must be in a set − → r for each r ∈ D. If a state r does not satisfy this condition, it is redundant and can be omitted during further recursive search. To conclude, only significant states can occur in a decomposition set [17].
Let SUFFIX(w, q) be the function that returns suffix y of a word w = xy that is divided by a state q. Suppose the set D has already been built based on words w 1 , w 2 , . . . , w i−1 , and we want to extend D for a word w i with Q w i = {q 1 , q 2 , q 3 , . . . , q m+1 } for some m. With this aim, we first select state q 1 that divides w i (w i = x 1 i y 1 i ), add q 1 to D, and remove redundant states (based on Definition 1) from all sets Q w k ∈ W for k = i + 1, i + 2, . . . , n, using the procedure REMOVERED(W, y) with y = y 1 i (Fig. 4). Once the recursive call DECOMPOSE(W ∪ {q 1 }) completes, we carry out the above operations for states q 2 , q 3 , . . . , q m+1 , and different divisions The basic algorithm shown in Fig. 3 is run by a set of sequential processes P r , r ∈ [0, π − 1], where r is the rank (index) of a process, and π is the number of processes available. Each process P r executes the code of procedure DECOMPOSE (W, D), and all processes in the set are running in parallel.
A process is executed by a conventional processor, or core of a multi-core processor (from now on, we will use term processor for both of these computing devices).
The execution of the basic algorithm in each process consists of two phases. In Phase1 subsequent sets D are established, which are then processed in Phase2. The code executed in Phase1 is the same in all processes, while in Phase2 the computations performed by processes differ from each other.
As mentioned above, Phase1 builds gradually decomposition sets D while reducing sets Q containing states, which are candidates to extend sets D. Notice that Phase1 does not determine the complete set D but only a partial set D where D ⊂ D. 7 When Q becomes small enough due to the removal of redundant states, that is |Q| ≤ T for some threshold T , the algorithm moves to Phase2 in which each process takes a collection of prospective sets D ∪ C j to verify as to whether they constitute decomposition sets for L, where each C j is a subset of Q for j ∈ [0, 2 |Q| − 1].
More specifically, process P 0 takes sets D ∪ C j with C j = C 0 , C π , C 2π , ..., process P 1 sets D ∪ C j with C j = C 1 , C π+1 , C 2π +1 , . . . , etc. The advantage of such an arrangement is that each collection of sets D ∪ C j can be verified separately from other collections. As a result, it allows the work related to verifying sets D ∪C j to be readily spread across processes P r .
In order to boost performance by making the most of parallel computing capabilities, the basic algorithm controls the depth of recursion in Phase1. For this purpose, a threshold T , already mentioned, is imposed on the size of set Q containing candidate states to extend set D built to date. The threshold T is a parameter of the algorithm (the parameter is adjusted at runtime in the adaptive algorithm, see Section 5). When the size of Q becomes reasonably small as a result of the downscaling process, then instead of 7 The partial set D is denoted by D in the code of the basic algorithm in Fig. 3.  Fig. 4 Procedure REMOVERED looking for the complete decomposition set on a deep level of recursion in Phase1, the processes move to Phase2 to verify sets D ∪ C j . It is worth noting that while running the basic algorithm the processes do not communicate with one another. They only synchronize their action at the beginning of computation to enter the input language, and at the end of computation when the results obtained by the processes are collected.
Let us consider the average time complexity of basic algorithm, T b (π, n). Let d(n, T ) be the average number of sets D found in Phase1. Let t 1 (n, T ) be the average time to find a single set D in Phase1, and let t 2 (n, ψT ) be the average time to verify as to whether a set D ∪ C j , j = 0, 1, . . . , 2 ψT − 1, is a final decomposition set for L in Phase2. The value of ψT determines the average size of sets Q processed in Phase2 where ψ ∈ (0.0, 1.0]. Considering the above, the average time complexity of basic algorithm is as follows: 8 There are two components in this equation: the first, d(n, T ) · t 1 (n, T ), determines the total average run time of Phase1, and the second, d(n, T ) · (t 2 (n, ψT ) · 2 ψT )/π , the total average run time of Phase2. For a fixed value of n, the run time of Phase2 grows exponentially as a function of ψT . This growth is due to the exponential number of prospective decomposition sets D ∪ C j to be verified in Phase2.

Adaptive parallel decomposition algorithm
With the aim of improving the basic algorithm, we propose several refinements. The first three refinements introduce into the adaptive algorithm the effective methods for pruning the search space (Fig. 5). The fourth refinement adjusts threshold T while the adaptive algorithm is executed, based on the runtime acquired data related to the performance of Phase1.
Let us discuss the refinements in more detail. The first refinement concentrates on removing the redundant states q ∈ Q of the automaton accepting the input language L. The redundant states have already been eliminated in the basic algorithm (see procedure REMOVERED(W, y)). We extend the scope of such an elimination in the adaptive algorithm.
q , and y ∈ − → q , the following holds: where U(y) denotes the number of occurrences of suffix y in all words w ∈ L.
To justify (4), suppose q is the only state in the decomposition set D defined in Theorem 1. Then, given word w = xy that is divided by state q, the values of U(y) and | − → q | determine the sizes of sets respectively. In fact, the value of U(y) is the number of prefixes x belonging to set L D 1 , since U(y) is counted over all words w = xy, w ∈ L, with prefix x followed by suffix y. In view of the above, the product U(y) · | − → q | on the left side of (4) determines the upper limit of the number of words that could be created by concatenating sets L D 1 and L D 2 . Now, if this product is less than |L|, then L = L D 1 L D 2 is not satisfied, which means that D cannot be the decomposition set for L. So the state q ∈ D is redundant.
Elimination of redundant states q ∈ Q w satisfying (4) is implemented in the procedure REMOVERED2(W, y) (Fig. 6). When one or more states are found redundant and removed from sets Q w ∈ W , which is indicated by variable f , it can cause a decrease in the number of occurrences of suffix y, given by U(y), and also in the size of right language − → q . As a result, more redundant states can be removed from sets Q w .
Using (4), we can also remove redundant states before the adaptive algorithm begins. Once automaton A has been constructed, the procedure BUILDW(A, W ) (Fig. 7) builds structure W , which is the input parameter to procedure DECOMPOSE2(W, D) (Fig. 5). While creating sets Q w only significant states of Q are considered. Hence, the number of states in W that are then processed by the adaptive algorithm is smaller than the number of states in A.

Example 7
To clarify how structure W = {(w i , Q w i )} is built, consider the language L = {a, aaa, aab, b} (Fig. 8).
The suffixes of words w i ∈ L along with their frequency counts in L are (λ, 4), (a, 2), (aaa, 1), (aa, 1), (aab, 1), (ab, 1), (b, 2), and the sizes of right languages are | − → q0| = 4, The second refinement implements a method for reducing the search space by skipping the verification, if possible, of prospective decomposition sets D = D ∪ C j . The verification is carried out in the basic algorithm by checking the condition L = L D 1 L D 2 (Fig. 3). To make this verification more efficient, we determine the upper bound size of the language generated by set D ∪ C j (7). If the size of the input language L exceeds this bound, then we can omit the verification of D ∪ C j . Lemma 2 Let A = (Q, Σ, δ, s, Q F ) be the automaton accepting a finite language L, let D ⊆ Q be the final decomposition set for L, and let sets L D 1 and L D 2 be defined as in (1). Then the upper bound size of set L D 1 L D 2 is: where |L D 1 | is the size of L D 1 , and min q∈D | − → q | is the minimum size of the right language for states q ∈ D.
Proof The lemma follows directly from the definition of set L D 2 = q∈D − → q . The size of L D 2 defined as the intersection of right languages − → q cannot be greater than min q∈D | − → q |.

Lemma 3 The necessary condition for a finite language L to be decomposed by set D ⊆ Q is:
Proof Suppose we have L = L 1 L 2 . By Theorem 1, Combining (5) and (6) we get the upper bound for |L|: which makes it possible to verify as to whether D can be the final decomposition set. The full verification first requires computing the sets L D 1 and L D 2 , then concatenating them, and finally checking whether L = L D 1 L D 2 . However, if (7) does not hold, then these operations can be avoided. The procedure VERIFY(D) (Fig. 9) performs a double-check of the constraints related to the upper bound size of set D. Both checks may result in the rejection of set D. We implement them consecutively because the cost of the first check is lower than the cost of the second one. The approach taken in the third refinement is similar to that of the second refinement. We have established the lower bound size of subsets C j (Lemma 4), which complete the partial decomposition set D. A subset C jand consequently a set D ∪ C j as well-can be disregarded when the size of C j is below the lower bound.

Lemma 4
Let D be a partial decomposition set for L. Let C j ⊆ Q be an arbitrary subset of candidate states to extend D. Let D ∪ C j be a prospective decomposition set for L. Then From (8) we can derive the lower bound of L C j 1 size for an arbitrary subset C j : Based on (9) the procedure COMPLOWBOUND (Fig. 11) helps to reduce the number of subsets C j ⊆ Q, j ∈ [0, 2 |Q| − 1], which are verified in Phase2 of the adaptive algorithm. Recall that set Q = STATES(W ) − D includes candidate states to extend the partial decomposition set D obtained in Phase1. The set L C j 1 occurring on the left side of (9) is the sum of left languages: L (1)). The procedure COMPLOWBOUND computes the minimum cardinality of C j such that the sum of sizes of left languages generated by states q ∈ C j , determined by the function PREFIXSUM, is greater than or equal to the value appearing on the right-hand side of (9). Using the required minimum cardinality of a subset C j , it is either processed or discarded from further analysis in Phase2.
Example 9 To illustrate procedure COMPLOWBOUND consider language L = {a, aa, aaa, aaab, aaaab, aab, ab, abab, abb, b, ba, baab, bab, bb, bbab, bbb} (Fig. 12). The fourth refinement makes the algorithm adaptive. We have found out that depending on the input language, the time to execute Phase1 of basic algorithm could be much longer than the time to execute Phase2. This is a disadvantage as in Phase1 the processes run the same code while in Phase2 they work in parallel verifying sets 9 Note that on input to COMPLOWBOUND D = D. Fig. 12 Transition diagram of DFA accepting language L from Example 9 D ∪ C j . Therefore, when the run time of Phase2 is shorter compared to Phase1, the capacity to take advantage of parallel computation is not fully utilized.
The purpose of Phase1 is to reduce the set Q = STATES(W ) − D so that its size becomes smaller than the threshold T . Apparently, the cause of a long run time of Phase1 is that the size of Q remains constant through a series of recursive runs of Phase1. So instead of repeating Phase1, it is better to start Phase2 by increasing the value of threshold T . Setting the new value of T (procedure ADJUSTT, Fig. 13) is triggered when a specified number of recursive runs of Phase1 is completed with no change of Q. More precisely, when the number of runs wherein old s = s reaches the fixed value of e, the value of T is increased. However, the rate of growth of T should be controlled so that it does not become too large. Once the value of T is doubled (or tripled) in relation to T 0 , the number of recursive runs e to be performed before T is increased again, is also doubled (or quadrupled).
Note that the greater value of threshold T causes the size of set Q to grow. Consequently, the number of subsets C j where C j ⊆ Q, and thus the number of sets D ∪C j to verify, increases (the number of subsets C j is exponential and equal to 2 |Q| , as the subsets are members of the power set of Q). This means that the degree of parallelism grows, which is desirable since we may use more processes to conduct the search.
The average time complexity of adaptive algorithm given by T a (π, n) = d(n, T ) · (t 1 (n, T ) + (t 2 (n, ε, ψ T ) · ε · 2 ψ T )/π) (10) this is similar to that of basic algorithm (3). There are, however, two differences. First, the coefficient ε takes into account the fraction of sets D ∪ C j that skip verification in Phase2. Second, the threshold T that was kept constant in the basic algorithm can now be increased adaptively, so it holds that T ≥ T . The coefficient ε where ε ∈ (0.0, 1.0] can considerably reduce the total average run time d(n, T ) · (t 2 (n, ε, ψ T ) · ε · 2 ψ T )/π of Phase2 (see Table 8). Similarly, the growing average value of T resulting from adaptation reduces the amount of computation in Phase1 while increasing the amount of computation in Phase2, which is distributed among π processes.
To conclude, the adaptive algorithm introduces the three refinements aimed at pruning the search space. In contrast to the basic algorithm, which only eliminates particular redundant states, the adaptive algorithm also targets whole sets of states that may generate resultant decompositions. The fourth refinement involving the adjustment of threshold T ensures not only a better balance between the run times of Phase1 and Phase2, but it also provides a better exploitation of parallelism in the decomposition problem. However, all the refinements do not reduce the order of the adaptive algorithm complexity, which remains exponential.

Computational experiments
This section reports on the comprehensive experiments conducted to evaluate the performance of basic and adaptive algorithms. The run times to solve the decomposition problem were measured for almost 1450 languages over an alphabet of size |Σ| = 3-5, and for more than 2700 languages over binary and unary alphabets, and over an alphabet of size |Σ| = 10 (in what follows we refer to these  alphabets as Σ 3−5 , Σ 2 , Σ 1 , and Σ 10 ). Furthermore, the impact of the adaptive setting on the results obtained, and the speed-ups of adaptive algorithm were studied. The basic and adaptive algorithms 10 were implemented in C language using the MPI library functions in the Intel MPI 5.1.1.109 version. Each process ran a sequential stream of instructions defined by DECOMPOSE (or DECOMPOSE2) procedure. The processes running the algorithms were independent of one another, and synchronized their operation only at the beginning and end of computation. The implementation structure based on the master-worker paradigm is shown in Fig. 14. The aim of the master process was to send the input language L to all the workers, and to collect the decompositions of L found (in the actual implementation, the master process M and worker process W 0 were combined into a single process).
The experiments were carried out on the Tryton supercomputer with a computation speed of 1.48 Pflop/s, running the Linux kernel 2.6.32-754.3.5.el6.x86 64 along with the Slurm utility (Simple Linux utility for resource management). The supercomputer is composed of 1607 compute nodes, each equipped with two 12-core Intel Haswell processors (Xeon E5 v3) operated at 2.3 GHz, with 128 GB of RAM memory. The processors are connected by the 56 Gb/s Infiniband fat tree network. The complete system with a cluster architecture, located in the Computer Centre in Gdańsk, Poland (http://task.gda.pl/centre-en), 10 The source code of the algorithms and selected benchmark languages used in the experiments are available on the GitHub website and service (https://github.com/tjastrzab/ai). houses 3214 processors (38568 cores), and 48 Nvidia Tesla accelerators.

Benchmark languages
For the purpose of the experiments, we generated four sets of languages. The sets E 1 and E 2 contained composite languages, while the sets P 1 and P 2 prime languages. The sets E 1 and P 1 included between 6000 and 15000 words, and the sets E 2 and P 2 between 60000 and 90000 words (Table 2a). The composite languages were created using random grammars [17]. Let Σ = {a 1 , a 2 , . . . , a l } be the set of terminal symbols, l ≥ 1, let V = {V 1 , V 2 , . . . , V r } be the set of nonterminal symbols, r > l, and let V r be the initial symbol. The grammars for composite languages were obtained as follows: 1. For each terminal symbol a i ∈ Σ, create a production V i → a i . 2. For each nonterminal symbol V j where j = l + 1, l + 2, . . . , r − 1: -Draw at random a terminal symbol a ∈ Σ. Create a production V j → a. -Draw at random l pairs (a, V i ), where a ∈ Σ and V i ∈ V , i < j. Create a production V j → aV i .
3. Create a production V r → V r−2 V r−1 .
When creating the composite languages, we rejected the grammars that generated a number of words outside the ranges 6000-15000 (for set E 1 ) and 60000-90000 (for set E 2 ). The values of l and r were selected from the ranges 3-5 and 11-21, respectively, so the maximum length of a word for composite languages was 2(r − l) − 1 = 35. The sets of prime languages P 1 and P 2 were created based on sets E 1 and E 2 . Let L be a composite language in set E 1 (or E 2 ). The language L can be transformed into a prime language L belonging to set P 1 (or P 2 ) using the following steps: (1) find the longest word ω ∈ L ; (2) generate a random word ω r over Σ such that |ω r | = |ω|; (3) if ω r / ∈ L then copy language L into L , and replace ω ∈ L with ω r . (Note that these steps do not guarantee that L will always be prime. However, the probability to get a composite language is small. If L is composite, one can repeat the steps.) Consider the size of the input data of the algorithms. There are three independent variables defining this size: the number n of words in L, the size |Σ| of the alphabet, and the maximum length h of a word in L. In Section 6.2 we limit the alphabet size to |Σ| = 5, and the maximum length of a word to h = 35. Consequently, the values of |Σ| and h become the parameters of the algorithms. So we can assume that the only variable defining the size of the input data of the decomposition problem is the number n of words in L.

Experimental results
While performing the experiments we ran the basic and adaptive algorithms by employing 16 processes 11 for languages in sets E 1 , E 2 , P 1 , and P 2 . We set the maximum run time allowed to solve a given language L to six hours. By solving the language we mean that the algorithm either determines all decompositions of L, or finds out that L is prime. Since the basic algorithm failed to solve some languages within the six-hour limit, we defined the success rate as where N s was the number of languages solved by both algorithms, and N was the number of languages in the set. As shown in Table 3, the adaptive algorithm outperformed the basic algorithm with respect to success rates for all sets under consideration. A comparison of run times measured by the MPI Wtime() function is shown in Table 4 and Fig. 15. Out of a total of 1446 languages, the comparison relates only to 1168 languages that were solved by both algorithms within the six-hour limit. The box plots of Fig. 15 depict the times through their quartiles. The bottom and top of each box are the first and third quartiles of measurements, and the band inside the box is the second quartile (the median). The lines extending vertically from the boxes, the so-called whiskers, indicate minimum and maximum measurements.
The median run times in Table 4 show that both algorithms solve prime languages faster than composite languages. As the aim is to find all decompositions of a language L, the algorithms have to explore the whole solution space for L. The size of this space is similar for both types of languages, because the cardinalities of languages in sets E 1 and P 1 , and E 2 and P 2 are the same. The experiments prove that the number of sets D ∪ C j verified by both algorithms for prime languages is smaller compared to composite languages (Table 5). Consequently, the run times for prime languages are shorter, because fewer decomposition sets need to be verified. Comparing the median run times (Table 4), we can see that the adaptive algorithm outperforms the basic algorithm for sets E 1 , E 2 , and P 1 . However, for set P 2 the adaptive algorithm shows a slightly worse performance. One of the refinements considers only significant states of automaton A accepting the language L. Due to this refinement, fewer decomposition sets D ∪ C j are verified. The redundant states are removed in the course of building structure W , which is created by procedure BUILDW. Its execution takes a certain amount of time, but one can expect that this amount will be amortized by fewer sets D ∪ C j that need to be verified. However, such amortization did not occur for  set P 2 , because no sets D ∪ C j for those languages were discovered by the adaptive algorithm (Table 5).
Considering the above, we make the claim that the adaptive algorithm is faster than the basic algorithm while solving composite and prime languages. We test this claim statistically on the four pairs of data samples by using the one-sided two-sample test for comparing two means ( Table 6). The data samples were created by eliminating the outliers. For example, by means of basic algorithm, the set of 323 measurements for set E 1 was acquired (column N s in Table 3). From this set, 28 measurements were eliminated as outliers (n 1 = 295 in Table 6). A measurement was considered as an outlier if it fell outside the range [m, M] where m = 1st-q − 1.5 · (3rd-q − 1st-q) and M = 3rd-q + 1.5 · (3rd-q − 1st-q) ( Table 4).
For samples d 1 , d 2 , and sets E i , P i , i = 1, 2, we set up the null and alternative hypotheses  (Table 6). In statistics, a sample size of n i ≥ 30 is considered large enough to assume that its distribution is normal. Thus, the critical value of z α is determined based on the standard normal distribution  20 20 Another way to prune the search space is by removal of redundant states of automaton A accepting the input language L. Due to such removal the state sets of A for the composite languages decrease by approximately 10%, and for the prime languages by approximately 16%-17% (see Med |Q| and Med |Q | in Table 2b-c). The basic algorithm discovered some prospective decomposition sets for prime languages while the adaptive algorithm did not find any set of that type (Table 5). We believe that the reason for this was smaller automata processed by the adaptive algorithm compared with the basic algorithm.
The adaptive algorithm changes its behavior by setting the value of threshold T (denoted then by T ) at runtime. The adaptation was most beneficial for languages of set E 2 , which turned out to be most demanding in terms of solving the decomposition problem. For these languages the values of T varied in a wide range of 20-58 (Table 7). The capability of adaptation was exploited to a lesser extent for languages in set E 1 with the values of T varying within a range of 20-32, and for the prime languages the adaptive adjustment of T did not occur.
The time complexity formulas T b (π, n) and T a (π, n) for the algorithms include several terms and coefficients. To investigate the variability of these quantities, we took the measurements reported in Table 8. The Avg entry contains the average value of a quantity, q, calculated over a language set. The Range entry describes the variability of q through a pair (q min , q max ) where q min and q max are the minimum and maximum average values of q calculated over average values in a distinguished interval. The range [6000, 15000] of language size for sets E 1 and P 1 was divided into nine equal intervals, and the range [60000, 90000] for E 2 and P 2 into ten intervals.
The Range values indicate that the times t 1 and t 2 are slowly increasing functions of language size n ( Table 8).
Recall that t 1 is the average time to find a partial decomposition set D in Phase1, and t 2 the average time to verify a set D ∪ C j in Phase2. Small values of coefficient This time is 2.1 · 45.6 ≈ 95.8 s. For the adaptive algorithm the run time of Phase1 is equal to 9.7 s. Since, we have the average run times of complete algorithms (ȳ 1 andȳ 2 for E 2 in Table 6), we can calculate execution times of Phase2 for both algorithms. We get 1230.7 − 95.8 = 1134.9 s for the basic algorithm, and 23.6 − 9.7 = 13.9 s for the adaptive algorithm. Clearly, the run time balance between Phase1 and Phase2 is much better for the adaptive algorithm (9.7 vs. 13.9 s) compared to the basic algorithm (95.8 vs. 1134.9 s). The better balance was achieved due to effective pruning of the search space, and to the increase in threshold T done by the adaptive algorithm at runtime.
We also conducted the experiments on languages over a comparatively large alphabet Σ 10 , and over small alphabets, in particular on the binary and unary languages. The setting was the same as before. The adaptive algorithm was run by 16 processes, and the time limit for solving a language was six hours. The languages over alphabets Σ 2 and Σ 10  were created in a similar fashion as described in Section 6.1.
To produce unary languages, the random number of ones forming the words of a language were generated. The experiments have shown that the languages over an alphabet Σ 10 were easy to solve. Their decomposition times, in the order of seconds (column Med in Table 9, and Fig. 16a-d), compared favorably with the languages over an alphabet Σ 3−5 (column Med in Table 4, and Fig. 15). The experiments revealed that the binary languages were harder to solve, while the unary languages were the worstcase input data to solve the problem. The median run times for the binary languages were in the order of tens of seconds, and for the unary languages somewhat longer than 80 minutes (column Med in Table 9, and Figs. 17 and 16ef). A major difficulty in solving these languages were the large sizes of automata that had to be searched. The ranges of median size of automata accepting the binary and unary languages were, respectively, [104, 261] and [1994,1997] (column Med|Q | in Table 10). As a result, the run times for those larger automata were longer, compared to the languages over an alphabet Σ 10 for which the sizes of automata were in the range of [50, 106].

Scalability study
The results of the adaptive algorithm speed-up evaluation are presented in Fig. 18. 13 The speed-ups achieved for the binary languages (set E b ) and the sets of languages over an alphabet Σ 3−5 (set E 1 ) were quite good. For the set E 2 the result was satisfactory.
As can be seen, the speed-ups obtained are not linear. The reason for this is that the parallel processes in the algorithm execute the same code in Phase1. So, the overhead of excess computation performed by the processes occurs, which decreases the speed-up. We have significantly reduced that overhead by shortening the run time of Phase1 (compare Avg times t 1 for the basic and adaptive algorithms in Table 8). It was done by means of the algorithmic refinements, in particular by removing redundant states in procedures REMOVERED2 and BUILDW, and by the adaptive adjustment of threshold T .
The large languages in size ranging from 160000 to more than 200000 words scaled very well (Fig. 18g-i). The computational work for these languages was higher, and so the impact of the overhead on the speed-up obtained was smaller. The times to find decompositions of large languages by using 128 processes varied in a range of 7-37 minutes.
As mentioned before, we solve the problem of finding all decompositions of a finite language L in the form of L = L 1 L 2 . The language L does not have to satisfy any specific conditions. To the best of our knowledge, parallel algorithms to solve this problem have not been presented in the literature so far. Therefore we could not compare the outcome of our experiments with the results of other algorithms.

Conclusions and future work
In this paper the problem of finite language decomposition is investigated. The problem under consideration, assuming that a language is given as a DFA, is NP-hard. The main contribution of the paper is the adaptive parallel algorithm based on an exhaustive search used for finding all decompositions of a given finite language. The algorithm implements several methods for pruning the search space. Furthermore, the algorithm is adaptive; it modifies its behavior at the time it is run by adjusting one of the parameters based on the runtime acquired data related to its performance. As a consequence, a substantial reduction in the amount of computation necessary to solve the problem has been achieved.
Comprehensive computational experiments carried out on almost 1450 languages over an alphabet Σ 3−5 proved that the methods for pruning the search space proposed in Lemmas 2-4 were very effective. The methods allowed the adaptive algorithm to reduce the search space by several orders of magnitude compared with the basic algorithm. As a result, the median run time to solve the languages in set E 2 by the adaptive algorithm was equal to approximately 15 s whereas by the basic algorithm it was 1296 s. The adaptive feature of the algorithm proved most beneficial for languages from set E 2 for which the value of threshold T varied in a range of 20-58. The higher value of T is advantageous, because it gives rise to an increase in computational parallelism, which enables better use of available processes.
We also tested more than 2700 languages over a large alphabet Σ 10 and over small alphabets, specifically the binary and unary languages. The results indicated that the languages over an alphabet Σ 10 were easier to solve than those over an alphabet Σ 3−5 . Furthermore, it took longer to decompose the binary languages in comparison to the languages over the alphabets Σ 10 and Σ 3−5 , while the unary languages turned out to be the worst-case input data to solve the decomposition problem. Based on these findings, we conclude that finite languages over small alphabets are more difficult to decompose than those over large alphabets.
The scalability study revealed that the binary languages, and the languages generated over an alphabet |Σ| = 3-5, containing from 6000 to more than 200000 words, scaled well, especially those with larger sizes.
In terms of future work, the two issues can be investigated. The first is adaptive setting of the algorithm, which we believe has the potential to be improved. Presently, the algorithm establishes the value of threshold T based solely on the number of recursive runs of Phase1. We suppose that the number of processes executing the algorithm should also be considered while determining the value of T . Another issue to investigate is the further scalability of the adaptive algorithm. At present, the algorithm by using 16 processes can solve the language instances of up to 90000 words in the median run time of tens of seconds, and by using 128 processes the languages of sizes between 160000 and more than 200000 words, in the run time of tens of minutes. The question is to what extent the language size could be enlarged by increasing the number of processes while maintaining the short run time of the algorithm, and possibly high processor utilization.
University of Technology grant for maintaining and developing research potential. The computation of the project was carried out using the infrastructure supported by the Silesian BIO-FARMA project POIG.02.01.00-00-166/08 and POIG.02.03.01-24-099/13 grant GeCONiI. We thank the following computing centres where the computation of our project was also carried out: Academic Computer Center in Gdańsk, Interdisciplinary Centre for Mathematical and Computational Modeling at Warsaw University (computing grant G27-9), and Wrocław Centre for Networking and Supercomputing (computing grant 30).
Funding The research was funded by the National Science Centre Poland (NCN) project 2016/21/B/ST6/02158, and by the Silesian University of Technology grant for maintaining and developing research potential.
Code and data availability The source code of the algorithms and selected benchmark languages used in the experiments are available on the GitHub website and service (https://github.com/tjastrzab/ai).

Conflicts of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.