An adaptive parallel algorithm for finite language decomposition

Jastrzęb, Tomasz; Czech, Zbigniew J.; Wieczorek, Wojciech

doi:10.1007/s10489-021-02488-y

An adaptive parallel algorithm for finite language decomposition

Open access
Published: 26 June 2021

Volume 52, pages 3029–3050, (2022)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

An adaptive parallel algorithm for finite language decomposition

Download PDF

1181 Accesses
Explore all metrics

Abstract

The computationally hard problem of finite language decomposition is investigated. A finite language L is decomposable if there are two languages L₁ and L₂ such that L = L₁L₂. Otherwise, L is prime. The main contribution of the paper is an adaptive parallel algorithm for finding all decompositions L₁L₂ of L. The algorithm is based on an exhaustive search and incorporates several original methods for pruning the search space. Moreover, the algorithm is adaptive since it changes its behavior based on the runtime acquired data related to its performance. Comprehensive computational experiments on more than 4000 benchmark languages generated over alphabets of various sizes have been carried out. The experiments showed that by using the power of parallel computing the decompositions of languages containing more than 200000 words can be found. Decompositions of languages of that size have not been reported in the literature so far.

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Parallel Multi-Block ADMM with o(1 / k) Convergence

Article 12 November 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A finite language L is decomposable (or composite) if there are two nontrivial languages L₁ and L₂ such that L = L₁L₂. Otherwise, L is prime. It was proved that the complexity of deciding primality of a finite language is NP-hard [1].

The main contribution of this paper is an adaptive parallel algorithm that finds all decompositions L₁L₂ of L, or concludes that L is prime when no decomposition is found. The algorithm is based on an exhaustive search, and incorporates several original methods for pruning the search space. Moreover, the algorithm is adaptive; it changes its behavior based on the runtime acquired data related to the performance of its recursive phase. Since the question of decomposing finite languages is computationally hard, a motivation of our work is to investigate to what extent the power of parallel computing makes it possible to tackle that question for large-size instances of finite languages.

The decomposition algorithm has a number of applications. It is useful for determining the prime decomposition of a supercode. A finite language L can be represented by the list of words. However, such representation is memory intensive if L is large. Therefore a better option is to decompose L and store its factors L₁ and L₂. The decomposition algorithm can be used to find a context-free grammar G. Given the sets of words S₊ and S₋, called examples and counterexamples, a grammar G should be found that accepts words in set S₊, and rejects words in set S₋. We describe these applications in more detail in Examples 3–6.

The rest of the paper is organized into six sections. Section 2 presents a survey of previous work on finite language decomposition. Section 3 recalls selected concepts of the theory of languages and automata. Section 4 describes the basic algorithm, which has been a starting point for developing the adaptive algorithm proposed in Section 5. Section 6 reports on the results of the computational experiments conducted using the algorithms. Section 7 contains conclusions and future work.

2 Related work

M. Martin and T. Kutsia [2] proposed a representation of regular languages by linear systems of language equations, which is suitable for computing left and right factors of a regular language. An n-subfactorization of a regular language L is a tuple of languages (L₁, L₂, …, L_n) for some n ≥ 1 such that the language concatenation L₁L₂…L_n is a proper subset of L, so $L_{1} L_{2} {\ldots } L_{n}\subsetneq L$. A left (resp. right) factor of L is the leftmost (resp. rightmost) term in a factorization of L. The algorithm for computing the sets of left and right factors of L is proposed.

S. Afonin and D. Golomazov [3] presented an algorithm for constructing a minimal union-free decomposition of a regular language L. A representation L = L₁ ∪ L₂ ∪… ∪ L_k is called a union-free decomposition of L iff L_i is a union-free language for all i = 1, 2, …, k. The decomposition is called minimal iff there is no other union-free decomposition of L with fewer factors. The algorithm for constructing a minimal union-free decomposition for a given regular language L is provided. The algorithm involves an exhaustive search, so its computational complexity is exponential in the size of the DFA accepting L.

W. Wieczorek and A. Nowakowski [4] considered the problem of finding a multi-decomposition of a finite language W = {w₁,w₂,…,w_n}, n ≥ 1, that contains words w_i of fixed length d ≥ 2, composed of symbols taken from the nonempty finite alphabet Σ. It is assumed that a desired multi-decomposition is a set of concatenations L_iR_i whose union contains W, that is $W\subseteq L_{1} R_{1}\cup L_{2} R_{2} \cup {\ldots } \cup L_{m} R_{m}$ for some m ≥ 1 depending on the size of W. The nonempty sets L_i and R_i, which include the prefixes and suffixes of words w_i ∈ W, respectively, are the subsets of complete sets of prefixes and suffixes obtained from all possible splits of words w_i ∈ W. Denote the splits by w_i = u_ijv_ij, i ∈ [1,n], and j ∈ [1,d − 1]. The prefix u_ij consists of j leading symbols of w_i, while the suffix v_ij consists of the d − j trailing symbols of w_i. A multi-decomposition is related to the cliques of an undirected graph G = (V,E) with the set of vertices V = {(u_ij,v_ij)|w_i = u_ijv_ij,w_i ∈ W} and the set of edges E = {((u_ij,v_ij),(u_kl,v_kl))|u_ijv_kl,u_klv_ij ∈ W}. It can be shown that each concatenation L_iR_i of a multi-decomposition is represented by the corresponding clique in graph G, that is $\{(u_{t_{1}},v_{t_{1}})$, $(u_{t_{2}},v_{t_{2}})$, …, $(u_{t_{r}},v_{t_{r}})\}$ where $L_{i}=\cup _{j=1}^{r} \{u_{t_{j}}\}$ and $R_{i}=\cup _{j=1}^{r} \{v_{t_{j}}\}$. The authors provide a randomized algorithm to find all cliques in graph G, which runs in polynomial time with respect to the size of W. As an example, consider language W = {aa,ab}. The graph G for it has two vertices labeled (a,a) and (a,b) (u₁₁ = v₁₁ = a, u₂₁ = a, v₂₁ = b), and one edge ((a,a),(a,b)), as u₁₁v₂₁ = ab and u₂₁v₁₁ = aa are in W. There is only one clique in the graph, so L₁ = u₁₁ ∪ u₂₁ = {a} and R₁ = v₁₁ ∪ v₂₁ = {a,b}, and finally W = {a}{a,b}. The proposed method of finding a multi-decomposition was applied to create an opening book for the Toads-and-Frogs game.

H. J. Bravo et al. [5] investigate the manufacturing problem that involves batches of identical or similar items produced together for different sized production runs. A production run, consisting of a fixed number of interleaved workcycles, and is controlled by a monolithic supervisor, whose operation is described using a deterministic finite automaton (DFA). The automaton accepts a regular language R over some finite alphabet Σ, with states representing points in time, and transitions representing events occurring within a production run. Events are operations that are performed by machines (for example, take a workpiece, process it, etc.) to obtain the product. An event a has a specific time of execution d(a), and belongs to the alphabet of a DFA, a ∈Σ. A production run, being a sequence s = a₁a₂…a_n of events, has a makespan equal to the sum of the event execution times, ${\sum }_{i=1}^{n} d(a_{i})$. For a product there can be a number of sequences s that differ in the order of individual events a_i, with each sequence corresponding to a route in a DFA. The paper proposes a factorization-based approach to compute a route with the minimum makespan. As for a given product P the production run length is finite (measured by the number of events), the finite language L is extracted from a regular language R, $L\subseteq R$, which describes all possible routes for P. Then L is decomposed into factors L = L₁L₂…L_k for some k, where each factor L_i describes a set of possible subroutes betwen the so called symmetrically reachable idle states (the notion introduced in the paper). Finally, for all sets L_i, i ∈ [1,k], the subroutes r_i of the minimum makespan are determined, which give the optimal route r = r₁r₂…r_k. The authors claim that the proposed approach mitigates the computational complexity of finding route r.

Several decomposition algorithms were developed for codes, which are sets of words and hence they are formal languages. Codes are categorized by their defining properties, for example, prefix-freeness, suffix-freeness, infix-freeness, etc.

Y. -S. Han and K. Salomaa [6] studied solid codes defined as follows: a set S of words is a solid code, if S satisfies two conditions: (i) no word of S is a subword of another word of S (infix-freeness), and (ii) no prefix of a word in S is a suffix of a word in S (overlap-freeness). In other words, S has to be an infix code and all words of S should not overlap. Moreover, a language L is a regular solid code if L is regular and a solid code. The paper proposed two algorithms related to the decomposition problem for solid codes. The first algorithm determines in polynomial time whether or not a given regular language is a solid code. The second efficient algorithm finds a prime solid code decomposition for a regular solid code when it is not prime.

K. V. Hung [7] considered the prime decomposition of supercodes. A finite language L is a supercode, if no word in L is a proper permu-subword of another word in it. Let u and v be words of L. A word u is called a permu-subword of v, if u is a subword of v in which symbol permutations are allowed. A supercode L is prime if L≠L₁L₂ for any supercodes L₁ and L₂. A linear-time algorithm was provided that for a given supercode L discovered that it was prime, or returned the unique sequence of prime supercodes L₁, L₂, …, L_k+ 1 such that L = L₁L₂…L_k+ 1 with k ≥ 1.

In addition to the above recent work on supercodes, there were previous works on this subject. J. Czyzowicz et al. [8] proved that for a given prefix-free regular language L, the prime prefix-free decomposition is unique, and the decomposition for L that is not prime can be computed in O(m) worst-case time, where m is the size of the minimal DFA accepting L. Y. -S. Han et al. [9] investigated the prime infix-free decomposition of infix-free regular languages and demonstrated that the prime infix-free decomposition is not unique. An algorithm for the infix-free primality test of an infix-free regular language was given. It was also shown that the prime infix-free decomposition can be computed in polynomial time. Y. -S. Han and D. Wood [10] investigated the finite outfix-free regular languages. A word x is an outfix of a word y if there is a word w such that x₁wx₂ = y and x = x₁x₂. A set X of words is outfix-free if no word in X is an outfix of any other word in X. A polynomial-time algorithm was developed to determine outfix-freeness of regular languages. Furthermore, a linear-time algorithm that computed a prime outfix-free decomposition for outfix-free regular languages was given. There are also papers on theoretical issues related to the problem of formal language decomposition [11,12,13,14,15,16].

Let us note that all the works discussed above differ from our approach to solving the decomposition problem. Firstly, we consider finite languages that do not have to satisfy any specific conditions. Secondly, the parallel algorithm we propose is intended to compute all decompositions of a finite language L in the form of L = L₁L₂.

This paper builds on our previous efforts in developing the sequential and parallel algorithms for finite language decompositions. Let us briefly review the results obtained in those efforts. A sequential algorithm for finding the decomposition of a finite language was proposed in [17]. The algorithm returned only the first decomposition found of a given language. A threshold parameter T (see p. 12) that impacted the operation of the algorithm was kept constant, meaning it was not adaptively adjusted while the algorithm was running. The article introduced the concept of significant state along with the proof that for every composite language there is at least one decomposition based solely on significant states. In the experimental part of the paper, 240 languages of size less than 2000 words were studied, including 120 prime languages. The implementation was done in Python, and the average running time of the algorithm for test languages was in the order of a few seconds.

The approach for finding the first decoposition of a finite language by using selected meta-heuristics was discussed in [18]. The paper presented the results for the simulated annealing algorithm, tabu search, and genetic and randomized algorithms, all sequentially implemented in Python. Computational experiments were carried out on 1200 languages of sizes less than 2000 words with algorithm execution time limits of 10 and 60 seconds. Within these limits, the algorithms returned quite a lot of wrong answers for composite languages, claiming they were prime.

A basic parallel algorithm (its short description is included in Section 4) for finding all decompositions of a finite language using the concept of significant state was given in [19]. The algorithm consisted of two phases. In the first phase, each process executed the same code, and in the second phase the computation was spread across the processes available based on their ranks. The algorithm was implemented in the C language and Message Passing Interface (MPI). The experiments, conducted with up to 22 processes covered four languages with a word count between 800–6583 words.

A preliminary version of an adaptive parallel algorithm for solving the decomposition problem was given in [20]. The adaptive algorithm was based on the modified concept of significant state compared with that given in [17]. The algorithm consisted of two phases and included a method of pruning of the search space, and a simplified verification of prospective decomposition sets. It also involved an adaptive way for adjusting the threshold parameter T to keep the balance between the times spent in two phases of the algorithm. The experiments concerned nine languages up to 90000 words in size, solved with 32 processes in the run time of a few minutes. The algorithm was implemented in the C language and MPI interface.

To summarize, our previous work encompassed several sequential and parallel algorithms to solve the decomposition problem for finite languages. The sequential algorithms were able to decompose the languages of size up to 2000 words, while the parallel algorithms could tackle the languages of size up to 90000 words with 32 processes.

In the current paper we provide the advanced adaptive parallel algorithm, which can solve languages of size between 160000 and more than 200000 words in the run time of tens of minutes by using 128 processes. The results of comprehensive computational experiments on the variety of 4000 benchmark languages are also given. To the best of our knowledge, the parallel algorithm and in-depth experimental study of the problem under consideration have not been reported in the literature thus far.

3 Preliminaries

Below, we recall selected concepts from the theory of formal languages and automata. For more details the reader may refer to the textbooks [21, 22].

An alphabet Σ is a nonempty finite set of symbols. A word w is a sequence of zero or more symbols taken from Σ. The length of a word is denoted by |w|, with the special case of zero-length word being the empty word λ, for which |λ| = 0. A prefix of a word is any number of leading symbols of that word, and a suffix is any number of trailing symbols. By convention, Σ^∗ denotes the set of all words over an alphabet Σ, and Σ⁺ denotes the set Σ^∗−{λ}.

A finite language L ⊂Σ^∗ is a finite set of words w ∈Σ^∗. A finite language L is trivial if it consists of the empty word λ, that is L = {λ}, and it is nontrivial if it contains at least one nonempty word. Let u,v,w ∈Σ^∗. A concatenation of words u and v produces a word w = uv, where w is created by making a copy of word u and following it by a copy of word v. Let $U, V, W \subseteq {\varSigma }^{*}$. Then the concatenation (or product) of sets U and V produces a set W = UV such that W = {w : w = uv,u ∈ U,v ∈ V }.

A deterministic finite automaton (DFA) is defined by a quintuple A = (Q, Σ, δ, s, Q_F), where Q denotes a finite set of automaton states, Σ is an alphabet, $\delta : Q\times {\varSigma }\rightarrow Q$ is a transition function, s ∈ Q is the initial (start) state, and $Q_{F} \subseteq Q$ is a set of final (accepting) states. An automaton is deterministic iff for all states q ∈ Q and symbols a ∈Σ, |δ(q,a)|≤ 1; in other words, each state q has at most one out-transition marked by a. A deterministic automaton is finite iff its state set Q is finite. Let us extend the transition function δ to words over Σ. Formally we define δ(q,λ) = q, and for all words w and input symbols a, δ(q,wa) = δ(δ(q,w),a). So $\delta (q,w)=q^{\prime }$ means that the word w takes A from the state q to the state $q^{\prime }$.

A directed graph G = (V,E), called a transition diagram, is associated with a finite automaton A, where V = Q, $E=\{p\overset {a}\rightarrow q | q=\delta (p,a)\}$, and $p\overset {a}\rightarrow q$ denotes an arc labeled a in the transition diagram. Given a word w = a₁a₂…a_m, a_i ∈Σ, i ∈ [1,m], a w path is a sequence of transitions labeled with symbols of w in the transition diagram. A w path is denoted by a sequence of states q₁,q₂,…,q_m+ 1 where q_j ∈ Q, j ∈ [1,m + 1], lying on the path. A deterministic automaton A accepts a word w if there is the w path q₁q₂…q_m+ 1 leading from the initial state to a final state of A, so q₁ = s and q_m+ 1 ∈ F.

Define the left (resp. right) language of a state q ∈ Q as $\overleftarrow {q} = \{w : \delta (s, w) = q\}$ (resp. HCode $\overrightarrow {q} = \{w : \delta (q, w) \in Q_{F}\}$). Put simply, the left (resp. right) language of q consists of words w for which there is a w path from the initial state s to q (resp. from q to a final state).

A finite automaton A = (Q,Σ,δ,s,Q_F) is said to accept a language L when for each word w ∈ L there is a w path beginning in the state s, and ending in a state q ∈ Q_F, or more formally L = {w|δ(s,w) ∈ F}. Deterministic and nondeterministic^{Footnote 1}finite automata accept the same set of languages, namely the set of regular languages. Minimum-state acyclic DFAs accept the set of finite languages. Given all states p,q ∈ Q, p≠q, a DFA is minimum-state iff $\overrightarrow {p}\neq \overrightarrow {q}$,^{Footnote 2} and it is acyclic iff δ(q,w)≠q for every word w ∈Σ⁺ and state q ∈ Q.

In what follows, we only consider minimum-state acyclic deterministic automata, so each such automaton will be referred to as automaton herein.

Let L be a nontrivial finite language. A decomposition of L of index k where k ≥ 2 is the family of languages L_i for i ∈ [1,k] called factors that once concatenated give language L, so we have L = L₁L₂…L_k. The decomposition is nontrivial if languages L_i are nontrivial, that is L_i≠{λ} for i ∈ [1,k]. Otherwise, the decomposition is trivial. In every nontrivial decomposition of a finite language L, the number of factors, k, is at most equal to the length of the longest word in L. Clearly, any language L has the trivial decompositions L{λ} and {λ}L. In what follows, by a decomposition we always mean a nontrivial decomposition. A language L is called prime if L has no decomposition of index 2, otherwise it is called composite (or decomposable). The prime decomposition of L is a decomposition L = L₁L₂…L_k, k ≥ 2, where each language L_i for i ∈ [1,k] is prime. It has been proven that every finite language is prime or has a prime decomposition, which is generally not true for infinite languages [13,14,15,16].

In this paper, we investigate the problem of finding all decompositions of index 2 of a nontrivial finite language L. For a particular decomposition we are looking for two nonempty finite languages L₁ and L₂ that once concatenated give language L, so we have L = L₁L₂. Note that a given language L can have many decompositions of index 2.

Let us introduce the notion of a decomposition set that is suitable for the study of decompositions of regular languages.^{Footnote 3} The notion is related to the left quotients of regular languages. Let L be a regular language over an alphabet Σ, and let A = (Q,Σ,δ,s,Q_F) be the minimum-state finite deterministic automaton accepting L. For any nonempty set $D\subseteq Q$, we define the left and right languages of D:^{Footnote 4}

$$ {L_{1}^{D}} = \cup_{g\in D} \overleftarrow{g}~~~\text{and}~~~ {L_{2}^{D}} = \cap_{g \in D} \overrightarrow{g}. $$

(1)

Note that languages ${L_{1}^{D}}$ and ${L_{2}^{D}}$ are regular as they are subsets of the regular language L.

Theorem 1

Having L and A defined as above, assume that L is composite, so L = L₁L₂ for regular languages L₁ and L₂. Define a set $D\subseteq Q$, called a decomposition set, by

$$ D=\{z\in Q~|~\delta(s,w)=z,~\textit{for~some~}w\in L_{1}\}. $$

Then $L_{1}\subseteq {L_{1}^{D}}$, $L_{2}\subseteq {L_{2}^{D}}$ and

$$ L = {L_{1}^{D}} {L_{2}^{D}} $$

(2)

is the decomposition of L into two regular languages.

Proof

See [1, 16]. □

The decomposition $L={L_{1}^{D}} {L_{2}^{D}}$ is referred to as decomposition induced by the set D. Theorem 1 implies that every decomposition of the regular language L is included in the decomposition of L induced by the decomposition set. The decomposition L = L₁L₂ is said to be included in the decomposition $L=L^{\prime }_{1}L^{\prime }_{2}$ if $L_{i}\subseteq L^{\prime }_{i}$ for i = 1,2.

Corollary 1

To solve the problem of finding all decompositions of index 2 of a nontrivial finite language L, we need to check (2) for all subsets D of Q. If none of these subsets induces a decomposition, we conclude that L is prime.

It follows from the corollary that solving this problem is equivalent to solving the primality problem.

Problem 1

(Primality) Let L be a finite language over a finite alphabet Σ given as a DFA. Answer the question whether L is prime.

It was shown that the primality problem for finite languages is NP-hard [1], and for regular languages it is PSPACE-complete [23, 24].

Example 1

As an illustration of Theorem 1, consider a finite composite language L = {λ, a, b, aa, ab, aaa, aab, aaaa} that has a single decomposition of index 2. The transition diagram for the minimum-state deterministic automaton A accepting language L is depicted in Fig. 1. Note that L = L₁L₂, where L₁ = {λ,a} and L₂ = {λ,b,aa,ab,aaa}.

Then, according to Theorem 1 there is the decomposition set $D\subseteq Q$ where Q = {s,q,r,p,t} such that $L = {L_{1}^{D}} {L_{2}^{D}}$, $L_{1}\subseteq {L_{1}^{D}}$ and $L_{2}\subseteq {L_{2}^{D}}$. Indeed, in our case D = {s,q}, the left language ${L_{1}^{D}}$ includes words w satisfying δ(s,w) ∈ D: ${L_{1}^{D}} = \overleftarrow {s}\cup \overleftarrow {q}= \{\lambda \}\cup \{a\} =\{\lambda , a\}$, and the right language ${L_{2}^{D}}$ includes words w satisfying δ(g,w) ∈ Q_F where g ∈ D: ${L_{2}^{D}} = \overrightarrow {s}\cap \overrightarrow {q} = \{\lambda , a, b, \textit {aa}, \textit {ab}, \textit {aaa}, \textit {aab}, \textit {aaaa} \} \cap \{\lambda , a, b, \textit {aa}, \textit {ab}, \textit {aaa}\} = \{\lambda , a, b, \textit {aa}, \textit {ab}, \textit {aaa}\}$. Moreover, $L_{1}={L_{1}^{D}}$ and $L_{2}\subset {L_{2}^{D}}$. Observe that words in ${L_{1}^{D}}$ and ${L_{2}^{D}}$ are prefixes and suffixes of L, and each word w ∈ L is divided by at least one state g ∈ D such that w = xgy where x and y are a prefix and suffix of w, respectively. For example, considering words of L we have (s,q ∈ D): λsλ, λsa, aqλ, λsb, λsaa, aqa, λsab, aqb, λsaaa, aqaa, aqab, aqaaa.

Example 2

A finite language may have more than one decomposition of index 2. The following language L = {ab, aba, abb, bb, bba, bbb} (Fig. 2) has two decompositions, the first defined by D = {p}, $L_{1} = {L_{1}^{D}} = \overleftarrow {p} = \{a, b\}$, $L_{2} = {L_{2}^{D}} = \overrightarrow {p} = \{b$, ba, bb}, and the second by $D^{\prime }=\{q\}$, ${L_{1}}^{\prime } = {L_{1}}^{D^{\prime }} = \overleftarrow {q} = \{$ab, bb}, ${L_{2}}^{\prime } = {L_{2}}^{D^{\prime }} = \overrightarrow {q} = \{\lambda , a, b\}$.

Below we give a few examples of application of finite language decomposition.

Example 3

As already mentioned in Section 2, K. V. Hung considered factorization of supercodes [7]. He designed the algorithm to decompose a supercode L into prime components. The algorithm is based on the so-called bridge states, which are found in a non-returning and non-exiting acyclic deterministic finite automaton (N-ADFA) ${\mathscr{A}}$ accepting the code-words w ∈ L. An automaton ${\mathscr{A}}$ is non-returning if its start state has no in-transitions, and it is non-exiting if its final states have no out-transitions. A state b in automaton ${\mathscr{A}}$ is called a bridge state if b is neither the start state nor a final state, and each w path in ${\mathscr{A}}$ passes through b. Assume that ${\mathscr{A}}$ is the minimal N-ADFA accepting a supercode L that has k, k ≥ 1, bridge states. Then L can be decomposed into k + 1 prime supercodes L₁, L₂, …, L_k+ 1 such that L = L₁L₁…L_k+ 1. The bridge states for a given automaton ${\mathscr{A}}$ can be identified in O(|Q| + |δ|) time where |Q| and |δ| are the number of states and transitions of ${\mathscr{A}}$, respectively. As an example the supercode L = {ab²ac,ab⁵c,ac³bac,ac³b⁴c} is considered.^{Footnote 5} The minimal N-ADFA ${\mathscr{A}}$ accepting L has four bridge states, so L may be decomposed uniquely into five prime supercodes: L = {a}{b,c³}{b}{a,b³}{c} [7].

Instead of employing the notion of bridge states, one can readily find this factorization by calling recursively our parallel decomposition algorithm (described in Section 5). The sequence of calls gives the following: L = L₁L₂ where L₁ = {ab,ac³}, L₂ = {bac,b⁴c}; then $L_{1}={L_{1}^{1}} {L_{2}^{1}}$ where ${L_{1}^{1}}=\{a\}$, ${L_{2}^{1}}=\{b,c^{3}\}$; $L_{2}={L_{1}^{2}} {L_{2}^{2}}$ where ${L_{1}^{2}}=\{b\}$, ${L_{2}^{2}}=\{ac,b^{3}c\}$; and finally ${L_{2}^{2}}={L_{1}^{3}} {L_{2}^{3}}$ where ${L_{1}^{3}}=\{a,b^{3}\}$, ${L_{2}^{3}}=\{c\}$. In general, to find the prime decomposition of a supercode consisting of n prime components, at most n calls of the decomposition algorithm are needed.

Example 4

Deterministic finite automata (DFAs) are of importance in many fields of science. They also have many practical applications—in the design of compilers and text processors, in natural language processing, speech recognition, among others. One of the concerns regarding DFAs is the memory efficient storage of their representation. A DFA ${\mathscr{A}}$ can be represented by the list of words of a language L (assuming it is finite) accepted by ${\mathscr{A}}$. Such representation can be memory intensive if L is large, therefore a better option is to decompose L and store its factors L₁ and L₂. Table 1 shows the reduction of size of languages (expressed in word count) from Section 6, obtained by using the decomposition based option. As can be seen, the reduction rate exceeds 99% for all languages in the table.

Table 1 Storage reduction R obtained by replacing language L by its factors, L = L₁L₂

Full size table

Example 5

A company signed the contract to supply the pipes with lengths 30, 31, 32, 36, 37, and 38 m. For technical reasons, the company can manufacture pipes of any length, but no greater than 30 m. A longer pipe can be produced by welding shorter pipes. The main part of the pipe production cost is to prepare a mold of the given length, so the number of molds needed to complete the contract should be minimized. The question is, which lengths should have the molds that must be prepared to implement the supply contract mentioned above. To answer this question, let us define the set of words L = {a³⁰, a³¹, a³², a³⁶, a³⁷, a³⁸}.

Then, among all decompositions L = L₁L₂, we look for the one for which the size of set L₁ ∪ L₂ is minimal, and each word in L₁ and L₂ is no longer than 30. The solution is the decomposition L = {a¹⁵, a¹⁶, a¹⁷}{a¹⁵, a²¹}, so we need only four molds of lengths 15, 16, 17, and 21 m.

Example 6

Given the sets of words S₊ and S₋, called examples and counterexamples, find a context-free grammar G = (V,Σ,P,S) such that $S_{+}\subseteq {\mathscr{L}}(G)$, and $S_{-}\cap {\mathscr{L}}(G)=\emptyset $ where ${\mathscr{L}}(G)$ denotes the language generated by G. The basic decomposition algorithm (described in Section 4) can be readily modified to find an incomplete decomposition that allows a finite language L to be written as L = L₁L₂ ∪ R where concatenation L₁L₂ is the greatest possible, and R is the set of other words belonging to L.

Let S₊ = {ab,ba,aabb,abab,baba,bbaa,baab,abba} be the set of examples, and let S₋ = {a,b}^d − S₊ with d ≤ 4 be the set of counterexamples. Based on these sets the context-free grammar can be found in two stages. In the first stage, using the modified basic decomposition algorithm, we create a series of incomplete decompositions. In particular, we get S₊ = LL ∪ R where L = {ab,ba}, and R = {ab,ba,aabb,bbaa}; then R = AB ∪ F where A = {a},B = {b,abb}, and F = {ba,bbaa}; and further F = HI where H = {b}, and I = {a,baa}, and so on. This procedure is iterated for the subsequent sets of words until a grammar in the Chomsky normal form is obtained (a context-free grammar G not generating the empty word λ is said to be in Chomsky normal form if all of its production rules are of the form $A\rightarrow BC$ or $A \rightarrow a$ where A, B, and C are nonterminal symbols and a is a terminal symbol). The sets S₊,L,R,A,B,F,H,I, etc., are represented by grammar variables, and each set concatenation (decomposition) corresponds to the grammar production rule. The second stage simplifies the grammar through examining whether each pair of grammar variables can be merged. It can be done if after such merger the grammar does not accept any word from the set S₋. Furthermore, the unit production rules are eliminated. For the sets S₊ and S₋ specified above, we get the following context-free grammar:

$$ \begin{array}{rcl@{\qquad}rcl} S & \!\rightarrow\! & S S | A B | B A & A & \!\rightarrow\! & a \\ B & \!\rightarrow\! & b | A C | B S & C & \!\rightarrow\! & B B \end{array} $$

which defines the infinite language of words having the same number of symbols a and b. For more details, see [25, p. 71], [26, 27].

4 Basic parallel decomposition algorithm

In this section we outline the basic parallel algorithm (or shortly, basic algorithm) for finite language decomposition [19]. It has been a starting point for devising the adaptive parallel decomposition algorithm (or, adaptive algorithm). The pseudocode of the basic algorithm expressed as a recursive procedure DECOMPOSE(W,D) is shown in Fig. 3.

Let A = (Q,Σ,δ,s,Q_F) be the minimum-state acyclic DFA^{Footnote 6} accepting an input language L = {w_i}, i ∈ [1,n]. The basic algorithm explores the set of states Q to find the decomposition sets D where $D\subseteq Q$. According to Theorem 1 each decomposition set D induces a decomposition $L=L_1^D L_2^D$. The algorithm finds all decompositions of the input language L by employing an exhaustive search of Q with pruning.

Let W be a set of pairs $(w_i,Q_{w_i})$ where the word w_i = a₁a₂…a_m, w_i ∈ L, a_j ∈Σ, j ∈ [1,m], and let $Q_{w_i}=\{q_1, q_2, \ldots , q_{m+1}\}$, be a set of states q_k ∈ Q, k ∈ [1,m + 1] lying on the w_i path where q₁ = s, and q_m+ 1 ∈ Q_F. Let STATES(W) be the function that returns a set of states appearing in all sets $Q_{w_i}$ for w_i ∈ L, and let MINSTATES(W) be the function that returns a pair $(w_i,Q_{w_i})\in W$ in which set $Q_{w_i}$ of states dividing w_i, is minimal.

Assume that D is a decomposition set to be found (see Phase1 in Fig. 3). Initially this set is empty, and it is gradually built up as the basic algorithm processes the words w_i ∈ L. The words to be processed are selected in order of increasing sizes of their sets $Q_{w_i}$ by using the function MINSTATES(W). Each state q ∈ Q that divides a word w_i ∈ L into two parts is inserted in set D, which is then checked to see if it is a decomposition set. The basic algorithm is recursive, so each subset of states in Q that can be a candidate for a decomposition set is examined.

Definition 1

Suppose a word w ∈ L is divided by a state q ∈ Q into two parts, so w = xy where $x\in \overleftarrow {q}$ and $y\in \overrightarrow {q}$, and suppose the state q is inserted in set D. Then each state r ∈ STATES(W) for which $y\notin \overrightarrow {r}$ is considered redundant (or significant otherwise).

In words, suppose w = xy. If state q ∈ Q is inserted in D then each state r ∈ Q that does not have y in its suffix set becomes redundant. Note that a given state r may be either redundant or significant depending on the states that are in set D in the current recursive execution of the algorithm.

Lemma 1

Let D where $D\subseteq Q$ be the decomposition set for a finite language L. Then each state q ∈ D is significant.

Proof

If D is a decomposition set for L, then each w ∈ L is divided into two parts by at least one state q ∈ D, that is w = xy, $x\in \overleftarrow {q}$ and $y\in \overrightarrow {q}$. Once state q is inserted in D, suffix y appears in the right language $L_2^D$. The definition $L_2^D = \cap _{r \in D} \overrightarrow {r}$ involves the intersection of sets $\overrightarrow {q}$, thus suffix y must be in a set $\overrightarrow {r}$ for each r ∈ D. If a state r does not satisfy this condition, it is redundant and can be omitted during further recursive search. To conclude, only significant states can occur in a decomposition set [17]. □

Let SUFFIX(w,q) be the function that returns suffix y of a word w = xy that is divided by a state q. Suppose the set D has already been built based on words w₁, w₂, …, w_i− 1, and we want to extend D for a word w_i with $Q_{w_i}=\{q_1, q_2, q_3, \ldots , q_{m+1}\}$ for some m. With this aim, we first select state q₁ that divides w_i ($w_i=x_i^1 y_i^1$), add q₁ to D, and remove redundant states (based on Definition 1) from all sets $Q_{w_k}\in W$ for k = i + 1, i + 2, …, n, using the procedure REMOVERED(W,y) with $y=y_i^1$ (Fig. 4). Once the recursive call DECOMPOSE(W ∪{q₁}) completes, we carry out the above operations for states q₂, q₃, …, q_m+ 1, and different divisions $w_i=x_i^2 y_i^2$, $w_i=x_i^3 y_i^3$, etc. Due to the removal of redundant states, the number of states in structure W decreases in subsequent recursive executions of procedure DECOMPOSE(W,D). To sum up, Phase1 of the basic algorithm builds up the decomposition set D by processing words w₁, w₂, …, and so on, while downscaling the set ${\mathscr{Q}}$ = STATES(W) − D of candidates to extend the set D.

The basic algorithm shown in Fig. 3 is run by a set of sequential processes ${\mathscr{P}}_r$, r ∈ [0,π − 1], where r is the rank (index) of a process, and π is the number of processes available. Each process ${\mathscr{P}}_r$ executes the code of procedure DECOMPOSE(W,D), and all processes in the set are running in parallel.

A process is executed by a conventional processor, or core of a multi-core processor (from now on, we will use term processor for both of these computing devices).

The execution of the basic algorithm in each process consists of two phases. In Phase1 subsequent sets D are established, which are then processed in Phase2. The code executed in Phase1 is the same in all processes, while in Phase2 the computations performed by processes differ from each other.

As mentioned above, Phase1 builds gradually decomposition sets D while reducing sets ${\mathscr{Q}}$ containing states, which are candidates to extend sets D. Notice that Phase1 does not determine the complete set D but only a partial set ${\mathscr{D}}$ where ${\mathscr{D}}\subset D$.^{Footnote 7} When ${\mathscr{Q}}$ becomes small enough due to the removal of redundant states, that is $|{\mathscr{Q}}| \leq T$ for some threshold T, the algorithm moves to Phase2 in which each process takes a collection of prospective sets ${\mathscr{D}}\cup C_j$ to verify as to whether they constitute decomposition sets for L, where each C_j is a subset of ${\mathscr{Q}}$ for $j\in [0,2^{|{\mathscr{Q}}|}-1]$.

More specifically, process ${\mathscr{P}}_0$ takes sets ${\mathscr{D}}\cup C_j$ with C_j = C₀, C_π, C_2π, ..., process ${\mathscr{P}}_1$ sets ${\mathscr{D}}\cup C_j$ with C_j = C₁, C_π+ 1, C_2π+ 1, …, etc. The advantage of such an arrangement is that each collection of sets ${\mathscr{D}}\cup C_j$ can be verified separately from other collections. As a result, it allows the work related to verifying sets ${\mathscr{D}}\cup C_j$ to be readily spread across processes ${\mathscr{P}}_r$.

In order to boost performance by making the most of parallel computing capabilities, the basic algorithm controls the depth of recursion in Phase1. For this purpose, a threshold T, already mentioned, is imposed on the size of set ${\mathscr{Q}}$ containing candidate states to extend set ${\mathscr{D}}$ built to date. The threshold T is a parameter of the algorithm (the parameter is adjusted at runtime in the adaptive algorithm, see Section 5). When the size of ${\mathscr{Q}}$ becomes reasonably small as a result of the downscaling process, then instead of looking for the complete decomposition set on a deep level of recursion in Phase1, the processes move to Phase2 to verify sets ${\mathscr{D}}\cup C_j$. It is worth noting that while running the basic algorithm the processes do not communicate with one another. They only synchronize their action at the beginning of computation to enter the input language, and at the end of computation when the results obtained by the processes are collected.

Let us consider the average time complexity of basic algorithm, ${\mathscr{T}}_b(\pi ,n)$. Let d(n,T) be the average number of sets ${\mathscr{D}}$ found in Phase1. Let t₁(n,T) be the average time to find a single set ${\mathscr{D}}$ in Phase1, and let t₂(n,ψT) be the average time to verify as to whether a set ${\mathscr{D}}\cup C_j$, j = 0, 1, …, 2^ψT − 1, is a final decomposition set for L in Phase2. The value of ψT determines the average size of sets ${\mathscr{Q}}$ processed in Phase2 where ψ ∈ (0.0,1.0]. Considering the above, the average time complexity of basic algorithm is as follows:^{Footnote 8}

$$ \mathscr{T}_{b}(\pi, n) = d(n,T) \cdot (t_{1}(n,T) + (t_{2}(n,\psi T) \cdot 2^{\psi T})/\pi). $$

(3)

There are two components in this equation: the first, d(n,T) ⋅ t₁(n,T), determines the total average run time of Phase1, and the second, d(n,T) ⋅ (t₂(n,ψT) ⋅ 2^ψT)/π, the total average run time of Phase2. For a fixed value of n, the run time of Phase2 grows exponentially as a function of ψT. This growth is due to the exponential number of prospective decomposition sets ${\mathscr{D}}\cup C_j$ to be verified in Phase2.

5 Adaptive parallel decomposition algorithm

With the aim of improving the basic algorithm, we propose several refinements. The first three refinements introduce into the adaptive algorithm the effective methods for pruning the search space (Fig. 5). The fourth refinement adjusts threshold T while the adaptive algorithm is executed, based on the runtime acquired data related to the performance of Phase1.

Let us discuss the refinements in more detail. The first refinement concentrates on removing the redundant states q ∈ Q of the automaton accepting the input language L. The redundant states have already been eliminated in the basic algorithm (see procedure REMOVERED(W,y)). We extend the scope of such an elimination in the adaptive algorithm.

Definition 2

A state q ∈ Q_w, $Q_w\subseteq Q$, is considered redundant (or significant otherwise) if for a word w = xy, w ∈ L, $x\in \overleftarrow {q}$, and $y\in \overrightarrow {q}$, the following holds:

$$ U(y)\cdot |\overrightarrow{q}| < |L| $$

(4)

where U(y) denotes the number of occurrences of suffix y in all words w ∈ L.

To justify (4), suppose q is the only state in the decomposition set D defined in Theorem 1. Then, given word w = xy that is divided by state q, the values of U(y) and $|\overrightarrow {q}|$ determine the sizes of sets $L_1^D = \cup _{q\in D} \overleftarrow {q}$ and $L_2^D = \cap _{q \in D} \overrightarrow {q}$, respectively. In fact, the value of U(y) is the number of prefixes x belonging to set $L_1^D$, since U(y) is counted over all words w = xy, w ∈ L, with prefix x followed by suffix y. In view of the above, the product $U(y)\cdot |\overrightarrow {q}|$ on the left side of (4) determines the upper limit of the number of words that could be created by concatenating sets $L_1^D$ and $L_2^D$. Now, if this product is less than |L|, then $L=L_1^D L_2^D$ is not satisfied, which means that D cannot be the decomposition set for L. So the state q ∈ D is redundant.

Elimination of redundant states q ∈ Q_w satisfying (4) is implemented in the procedure REMOVERED2(W,y) (Fig. 6). When one or more states are found redundant and removed from sets Q_w ∈ W, which is indicated by variable f, it can cause a decrease in the number of occurrences of suffix y, given by U(y), and also in the size of right language $\overrightarrow {q}$. As a result, more redundant states can be removed from sets Q_w.

Using (4), we can also remove redundant states before the adaptive algorithm begins. Once automaton A has been constructed, the procedure BUILDW(A,W) (Fig. 7) builds structure W, which is the input parameter to procedure DECOMPOSE2(W,D) (Fig. 5). While creating sets Q_w only significant states of Q are considered. Hence, the number of states in W that are then processed by the adaptive algorithm is smaller than the number of states in A.

Example 7

To clarify how structure $W=\{(w_i,Q_{w_i})\}$ is built, consider the language L = {a, aaa, aab, b} (Fig. 8).

The suffixes of words w_i ∈ L along with their frequency counts in L are (λ,4), (a,2), (aaa,1), (aa,1), (aab,1), (ab,1), (b,2), and the sizes of right languages are $|\overrightarrow {q0}| = 4$, $|\overrightarrow {q1}| = 3$, $|\overrightarrow {q2}| = 2$ (we omit state q3, because it leads to the trivial decomposition L = Lλ). To establish a pair $(w_i,Q_{w_i})$, we need to check (4) for word w_i. Let us check this equation for w_i = aab, which is divided by states q0, q1, and q2. The suffixes and checks are aab, ab, b, and $U(aab) \cdot |\overrightarrow {q0}| = 4 \ge |L|$, $U(ab) \cdot |\overrightarrow {q1}| = 3 < |L|$, and $U(b) \cdot |\overrightarrow {q2}| = 4 \ge |L|$. Thus, states q0 and q2 are significant, while state q1 is redundant. Continuing the similar analysis for the remaining words w_i ∈ L, we end up with W = {(a,{q0,q1}), (aaa,{q0,q2}), (aab,{q0,q2}), (b,{q0})}. Note, that the decomposition set for L is D = {q0,q2} with $L_1^D=\{\lambda , aa\}$ and $L_2^D=\{a, b\}$.

The second refinement implements a method for reducing the search space by skipping the verification, if possible, of prospective decomposition sets $D={\mathscr{D}}\cup C_j$. The verification is carried out in the basic algorithm by checking the condition $L=L_1^D L_2^D$ (Fig. 3). To make this verification more efficient, we determine the upper bound size of the language generated by set ${\mathscr{D}}\cup C_j$ (7). If the size of the input language L exceeds this bound, then we can omit the verification of ${\mathscr{D}}\cup C_j$.

Lemma 2

Let A = (Q,Σ,δ,s,Q_F) be the automaton accepting a finite language L, let $D\subseteq Q$ be the final decomposition set for L, and let sets $L^D_1$ and $L^D_2$ be defined as in (1). Then the upper bound size of set $L^D_1L^D_2$ is:

$$ |{L^{D}_{1}}{L^{D}_{2}}| \leqslant |{L^{D}_{1}}|\cdot \min_{q \in D}|\overrightarrow{q}| $$

(5)

where $|L^D_1|$ is the size of $L^D_1$, and $\min \limits _{q \in D}|\overrightarrow {q}|$ is the minimum size of the right language for states q ∈ D.

Proof

The lemma follows directly from the definition of set $L^D_2 = \bigcap _{q \in D} \overrightarrow {q}$. The size of $L^D_2$ defined as the intersection of right languages $\overrightarrow {q}$ cannot be greater than $\min \limits _{q \in D}|\overrightarrow {q}|$. □

Lemma 3

The necessary condition for a finite language L to be decomposed by set $D\subseteq Q$ is:

$$ |L| \leqslant |{L^{D}_{1}}{L^{D}_{2}}|. $$

(6)

Proof

Suppose we have L = L₁L₂. By Theorem 1, $L_i\subseteq L_i^D$ for i = 1,2. Hence, $L\subseteq L_1^DL_2^D$, and then $|L|\leqslant |L_1^DL_2^D| \leqslant |L_1^D|\cdot |L_2^D|$. □

Combining (5) and (6) we get the upper bound for |L|:

$$ |L| \leqslant |{L^{D}_{1}}|\cdot \min_{q \in D}|\overrightarrow{q}| $$

(7)

which makes it possible to verify as to whether D can be the final decomposition set. The full verification first requires computing the sets ${L}_{1}^{D}$ and ${L}_{2}^{D}$, then concatenating them, and finally checking whether $L={L}_{1}^{D} {L}_{2}^{D}$. However, if (7) does not hold, then these operations can be avoided. The procedure VERIFY(D) (Fig. 9) performs a double-check of the constraints related to the upper bound size of set D. Both checks may result in the rejection of set D. We implement them consecutively because the cost of the first check is lower than the cost of the second one.

Example 8

To illustrate the procedure VERIFY let us examine language L = {a, aa, aaa, aaab, aaaab, aab, ab, abab, abb, b, ba, baab, bab, bb} (Fig. 10). Assume that the adaptive algorithm moves from Phase1 to Phase2 with ${\mathscr{D}} = \{q0\}$ and ${\mathscr{Q}} = \{q1, q2\}$ (such assignments are made when the considered word a is divided by state q0). Suppose the procedure VERIFY is called with D = {q0,q2}. Then $|L_1^D| = 3$ ($L_1^D=\{\lambda , \textit {aa}, b\}$), and $\min \limits _{q \in D} |\overrightarrow {q}| = 5$ ($\overrightarrow {q0} = L$ and $\overrightarrow {q2} = \{\lambda , a, aab, ab, b\}$). So the result of the first check: $|L|\leqslant |L_1^D|\cdot \min \limits _{q \in D} |\overrightarrow {q}|$ is positive as 14 ≤ 3 ⋅ 5. The size $|L_2^D| = 4$ as $\overrightarrow {q_0} \cap \overrightarrow {q2} = \{a, aab, ab, b\}$. Thus the condition $|L|\leqslant |L_1^D|\cdot |L^D_2|$ in the second check is not met because it holds 14 > 3 ⋅ 4. Therefore D = {q0,q2} is not a decomposition set for L. Due to double-checking of conditions in the procedure under consideration, we do not need to concatenate sets $L_1^D$ and $L_2^D$, nor compare the concatenation result with L. The decomposition set for the language in question is D = {q1,q2,q3}, and L₁ = {a, b, aa, ab, ba, aaa}, L₂ = {λ, b, ab}.

The approach taken in the third refinement is similar to that of the second refinement. We have established the lower bound size of subsets C_j (Lemma 4), which complete the partial decomposition set ${\mathscr{D}}$. A subset C_j—and consequently a set ${\mathscr{D}}\cup C_j$ as well—can be disregarded when the size of C_j is below the lower bound.

Lemma 4

Let ${\mathscr{D}}$ be a partial decomposition set for L. Let $C_j\subseteq {\mathscr{Q}}$ be an arbitrary subset of candidate states to extend ${\mathscr{D}}$. Let ${\mathscr{D}}\cup C_j$ be a prospective decomposition set for L. Then

$$ |L| \leqslant |L_{1}^{\mathscr{D}} \cup L_{1}^{C_{j}}| \cdot \min_{q \in \mathscr{D}}|\overrightarrow{q}|. $$

(8)

Proof

By Lemma 3, $|L| \le |L^{{\mathscr{D}} \cup C_j}_1 L^{{\mathscr{D}}\cup C_j}_2|$, and by Lemma 2, $|L^{{\mathscr{D}}\cup C_j}_1 L^{{\mathscr{D}}\cup C_j}_2| \leq |L_1^{{\mathscr{D}}} \cup L^{C_j}_1| \cdot \min \limits _{q \in {\mathscr{D}}} |\overrightarrow {q}|$, as ${\mathscr{D}}\cup C_j$ is a prospective decomposition set for L. □

From (8) we can derive the lower bound of $L_1^{C_j}$ size for an arbitrary subset C_j:

$$ |L_{1}^{C_{j}}| \geqslant |L|/ \min_{q \in \mathscr{D}}|\overrightarrow{q}| - |L_{1}^{\mathscr{D}}|. $$

(9)

Based on (9) the procedure COMPLOWBOUND (Fig. 11) helps to reduce the number of subsets $C_j\subseteq {\mathscr{Q}}$, $j\in [0, 2^{|{\mathscr{Q}}|}-1]$, which are verified in Phase2 of the adaptive algorithm. Recall that set ${\mathscr{Q}}$ = STATES$(W)-{\mathscr{D}}$ includes candidate states to extend the partial decomposition set ${\mathscr{D}}$ obtained in Phase1. The set $L_1^{C_j}$ occurring on the left side of (9) is the sum of left languages: $L_1^{C_j}=\bigcup _{q\in C_j}\overleftarrow {q}$ where $C_j\subseteq {\mathscr{Q}}$ (see (1)). The procedure COMPLOWBOUND computes the minimum cardinality of C_j such that the sum of sizes of left languages generated by states q ∈ C_j, determined by the function PREFIXSUM, is greater than or equal to the value appearing on the right-hand side of (9). Using the required minimum cardinality of a subset C_j, it is either processed or discarded from further analysis in Phase2.

Example 9

To illustrate procedure COMPLOWBOUND consider language L = {a, aa, aaa, aaab, aaaab, aab, ab, abab, abb, b, ba, baab, bab, bb, bbab, bbb} (Fig. 12). Suppose we enter Phase2 with ${\mathscr{D}}=\{q3\}$ and ${\mathscr{Q}}=\{q1, q2, q6\} $ (such assignments result from dividing word aaa by state q3). Then we have $\min \limits _{q3 \in {\mathscr{D}}} |\overrightarrow {q3}| = 3$ (based on words λ, ab, b) and $|L_1^{{\mathscr{D}}}| =|\overleftarrow {q3}| = 4$ (based on words aaa, ab, ba, bb), and thus $|L_1^{C_j}| = |L|/\min \limits _{q\in {\mathscr{D}}} - |L_1^{{\mathscr{D}}}|= 16/3 - 4 = 1.33$.^{Footnote 9} Sizes of left languages for states q1, q2, and q6 are 1, and the prefix sums equal then S = {1,2,3}, which means that the value of c returned from COMPLOWBOUND is 2. Consequently, we do not need to analyze single state subsets of $\mathcal {Q}$. The final decomposition set for L = L₁L₂ is {q1,q2,q3,q6} where L₁ = {a, aa, aaa, ab, b, ba, bb} and L₂ = {λ, ab, b}.

The fourth refinement makes the algorithm adaptive. We have found out that depending on the input language, the time to execute Phase1 of basic algorithm could be much longer than the time to execute Phase2. This is a disadvantage as in Phase1 the processes run the same code while in Phase2 they work in parallel verifying sets ${\mathscr{D}}\cup C_j$. Therefore, when the run time of Phase2 is shorter compared to Phase1, the capacity to take advantage of parallel computation is not fully utilized.

The purpose of Phase1 is to reduce the set ${\mathscr{Q}}$ = STATES$(W)-{\mathscr{D}}$ so that its size becomes smaller than the threshold T. Apparently, the cause of a long run time of Phase1 is that the size of ${\mathscr{Q}}$ remains constant through a series of recursive runs of Phase1. So instead of repeating Phase1, it is better to start Phase2 by increasing the value of threshold T. Setting the new value of T (procedure ADJUSTT, Fig. 13) is triggered when a specified number of recursive runs of Phase1 is completed with no change of ${\mathscr{Q}}$. More precisely, when the number of runs wherein old_s = s reaches the fixed value of e, the value of T is increased. However, the rate of growth of T should be controlled so that it does not become too large. Once the value of T is doubled (or tripled) in relation to T₀, the number of recursive runs e to be performed before T is increased again, is also doubled (or quadrupled).

Note that the greater value of threshold T causes the size of set ${\mathscr{Q}}$ to grow. Consequently, the number of subsets C_j where $C_j\subseteq {\mathscr{Q}}$, and thus the number of sets ${\mathscr{D}}\cup C_j$ to verify, increases (the number of subsets C_j is exponential and equal to $2^{|{\mathscr{Q}}|}$, as the subsets are members of the power set of ${\mathscr{Q}}$). This means that the degree of parallelism grows, which is desirable since we may use more processes to conduct the search.

The average time complexity of adaptive algorithm given by

$$ \mathscr{T}_{a}(\pi, n) = d(n,\widehat{T}) \cdot (t_{1}(n,\widehat{T}) + (t_{2}(n,\varepsilon,\psi\widehat{T}) \cdot \varepsilon \cdot 2^{\psi\widehat{T}})/\pi) $$

(10)

this is similar to that of basic algorithm (3). There are, however, two differences. First, the coefficient ε takes into account the fraction of sets ${\mathscr{D}}\cup C_j$ that skip verification in Phase2. Second, the threshold T that was kept constant in the basic algorithm can now be increased adaptively, so it holds that $\widehat {T}\geq T$. The coefficient ε where ε ∈ (0.0,1.0] can considerably reduce the total average run time $d(n,\widehat {T}) \cdot (t_2(n,\varepsilon ,\psi \widehat {T}) \cdot \varepsilon \cdot 2^{\psi \widehat {T}})/\pi $ of Phase2 (see Table 8). Similarly, the growing average value of $\widehat {T}$ resulting from adaptation reduces the amount of computation in Phase1 while increasing the amount of computation in Phase2, which is distributed among π processes.

To conclude, the adaptive algorithm introduces the three refinements aimed at pruning the search space. In contrast to the basic algorithm, which only eliminates particular redundant states, the adaptive algorithm also targets whole sets of states that may generate resultant decompositions. The fourth refinement involving the adjustment of threshold T ensures not only a better balance between the run times of Phase1 and Phase2, but it also provides a better exploitation of parallelism in the decomposition problem. However, all the refinements do not reduce the order of the adaptive algorithm complexity, which remains exponential.

6 Computational experiments

This section reports on the comprehensive experiments conducted to evaluate the performance of basic and adaptive algorithms. The run times to solve the decomposition problem were measured for almost 1450 languages over an alphabet of size |Σ| = 3–5, and for more than 2700 languages over binary and unary alphabets, and over an alphabet of size |Σ| = 10 (in what follows we refer to these alphabets as Σ_{3 − 5}, Σ₂, Σ₁, and Σ₁₀). Furthermore, the impact of the adaptive setting on the results obtained, and the speed-ups of adaptive algorithm were studied.

The basic and adaptive algorithms^{Footnote 10} were implemented in C language using the MPI library functions in the Intel MPI 5.1.1.109 version. Each process ran a sequential stream of instructions defined by DECOMPOSE (or DECOMPOSE2) procedure. The processes running the algorithms were independent of one another, and synchronized their operation only at the beginning and end of computation. The implementation structure based on the master-worker paradigm is shown in Fig. 14. The aim of the master process was to send the input language L to all the workers, and to collect the decompositions of L found (in the actual implementation, the master process M and worker process W₀ were combined into a single process).

The experiments were carried out on the Tryton supercomputer with a computation speed of 1.48 Pflop/s, running the Linux kernel 2.6.32-754.3.5.el6.x86_64 along with the Slurm utility (Simple Linux utility for resource management). The supercomputer is composed of 1607 compute nodes, each equipped with two 12-core Intel Haswell processors (Xeon E5 v3) operated at 2.3 GHz, with 128 GB of RAM memory. The processors are connected by the 56 Gb/s Infiniband fat tree network. The complete system with a cluster architecture, located in the Computer Centre in Gdańsk, Poland (http://task.gda.pl/centre-en), houses 3214 processors (38568 cores), and 48 Nvidia Tesla accelerators.

6.1 Benchmark languages

For the purpose of the experiments, we generated four sets of languages. The sets E₁ and E₂ contained composite languages, while the sets P₁ and P₂ prime languages. The sets E₁ and P₁ included between 6000 and 15000 words, and the sets E₂ and P₂ between 60000 and 90000 words (Table 2a). The composite languages were created using random grammars [17]. Let Σ = {a₁, a₂, …, a_l} be the set of terminal symbols, l ≥ 1, let V = {V₁, V₂, …, V_r} be the set of nonterminal symbols, r > l, and let V_r be the initial symbol. The grammars for composite languages were obtained as follows:

1.
For each terminal symbol a_i ∈Σ, create a production $V_i \rightarrow a_i$.
2.
For each nonterminal symbol V_j where j = l + 1,l + 2,…,r − 1:
- Draw at random a terminal symbol a ∈Σ. Create a production $V_j \rightarrow a$.
- Draw at random l pairs (a,V_i), where a ∈Σ and V_i ∈ V, i < j. Create a production $V_j \rightarrow aV_i$.
3.
Create a production $V_r \rightarrow V_{r - 2}V_{r - 1}$.

Table 2 Characteristics of languages (a) and automata built by basic (b) and adaptive (c) algorithm (N – number of languages in the set, |Σ| – alphabet size; Min, Max – minimum and maximum values, 1st-q, 3rd-q – first and third quartiles, Med (or 2nd quartile) – median; |L| – number of words in the language, |Q| and $|Q^{\prime }|$ – numbers of states searched by algorithms)

Full size table

When creating the composite languages, we rejected the grammars that generated a number of words outside the ranges 6000–15000 (for set E₁) and 60000–90000 (for set E₂). The values of l and r were selected from the ranges 3–5 and 11–21, respectively, so the maximum length of a word for composite languages was 2(r − l) − 1 = 35. The sets of prime languages P₁ and P₂ were created based on sets E₁ and E₂. Let ${\mathscr{L}}$ be a composite language in set E₁ (or E₂). The language ${\mathscr{L}}$ can be transformed into a prime language ${\mathscr{L}}^{\prime }$ belonging to set P₁ (or P₂) using the following steps: (1) find the longest word $\omega \in {\mathscr{L}}$; (2) generate a random word ω_r over Σ such that |ω_r| = |ω|; (3) if $\omega _r\notin {\mathscr{L}}$ then copy language ${\mathscr{L}}$ into ${\mathscr{L}}^{\prime }$, and replace $\omega \in {\mathscr{L}}^{\prime }$ with ω_r. (Note that these steps do not guarantee that ${\mathscr{L}}^{\prime }$ will always be prime. However, the probability to get a composite language is small. If ${\mathscr{L}}^{\prime }$ is composite, one can repeat the steps.)

Consider the size of the input data of the algorithms. There are three independent variables defining this size: the number n of words in L, the size |Σ| of the alphabet, and the maximum length h of a word in L. In Section 6.2 we limit the alphabet size to |Σ| = 5, and the maximum length of a word to h = 35. Consequently, the values of |Σ| and h become the parameters of the algorithms. So we can assume that the only variable defining the size of the input data of the decomposition problem is the number n of words in L.

6.2 Experimental results

While performing the experiments we ran the basic and adaptive algorithms by employing 16 processes^{Footnote 11} for languages in sets E₁, E₂, P₁, and P₂. We set the maximum run time allowed to solve a given language L to six hours. By solving the language we mean that the algorithm either determines all decompositions of L, or finds out that L is prime. Since the basic algorithm failed to solve some languages within the six-hour limit, we defined the success rate as

$$ R = \frac{N_{s}}{N} \cdot 100\% $$

(11)

where N_s was the number of languages solved by both algorithms, and N was the number of languages in the set. As shown in Table 3, the adaptive algorithm outperformed the basic algorithm with respect to success rates for all sets under consideration.

Table 3 Success rates of algorithms

Full size table

A comparison of run times measured by the MPI_Wtime() function is shown in Table 4 and Fig. 15. Out of a total of 1446 languages, the comparison relates only to 1168 languages that were solved by both algorithms within the six-hour limit. The box plots of Fig. 15 depict the times through their quartiles. The bottom and top of each box are the first and third quartiles of measurements, and the band inside the box is the second quartile (the median). The lines extending vertically from the boxes, the so-called whiskers, indicate minimum and maximum measurements.

Table 4 Run times (in seconds) of algorithms (Min, Max – minimum and maximum times, 1st-q, 3rd-q – first and third quartile, Med – median)

Full size table

The median run times in Table 4 show that both algorithms solve prime languages faster than composite languages. As the aim is to find all decompositions of a language L, the algorithms have to explore the whole solution space for L. The size of this space is similar for both types of languages, because the cardinalities of languages in sets E₁ and P₁, and E₂ and P₂ are the same. The experiments prove that the number of sets D ∪ C_j verified by both algorithms for prime languages is smaller compared to composite languages (Table 5). Consequently, the run times for prime languages are shorter, because fewer decomposition sets need to be verified.

Table 5 Numbers of decomposition sets D ∪ C_j verified by the algorithms (IQR – interquartile range, difference between third and first quartile)

Full size table

Comparing the median run times (Table 4), we can see that the adaptive algorithm outperforms the basic algorithm for sets E₁, E₂, and P₁. However, for set P₂ the adaptive algorithm shows a slightly worse performance. One of the refinements considers only significant states of automaton A accepting the language L. Due to this refinement, fewer decomposition sets D ∪ C_j are verified. The redundant states are removed in the course of building structure W, which is created by procedure BUILDW. Its execution takes a certain amount of time, but one can expect that this amount will be amortized by fewer sets D ∪ C_j that need to be verified. However, such amortization did not occur for set P₂, because no sets D ∪ C_j for those languages were discovered by the adaptive algorithm (Table 5).

Considering the above, we make the claim that the adaptive algorithm is faster than the basic algorithm while solving composite and prime languages. We test this claim statistically on the four pairs of data samples by using the one-sided two-sample test for comparing two means (Table 6). The data samples were created by eliminating the outliers. For example, by means of basic algorithm, the set of 323 measurements for set E₁ was acquired (column N_s in Table 3). From this set, 28 measurements were eliminated as outliers (n₁ = 295 in Table 6). A measurement was considered as an outlier if it fell outside the range [m,M] where m = 1st-q − 1.5 ⋅ (3rd-q − 1st-q) and M = 3rd-q + 1.5 ⋅ (3rd-q − 1st-q) (Table 4).

Table 6 Data samples $d_1=(n_1,\bar {y}_1,s_1)$ and $d_2=(n_2,\bar {y}_2,s_2)$ to test the equality of mean run times for the algorithms; the threshold was initially set to T = 20 (n₁,n₂ – numbers of measurements; $\bar {y}_1,\bar {y}_2$ – means, in seconds; s₁,s₂ – standard deviations)

Full size table

For samples d₁,d₂, and sets E_i,P_i, i = 1,2, we set up the null and alternative hypotheses

$$ \begin{array}{@{}rcl@{}} H_{0}: \mu_{1}=\mu_{2}&\text{and}&H_{1}: \mu_{1}>\mu_{2} \end{array} $$

where μ₁ and μ₂, respectively, are the mean values of run time to solve the population of composite and prime languages by the algorithms. Using the test statistic:

$$ Z=\frac{\bar{y}_{1}-\bar{y}_{2}}{\sqrt{\frac{{s_{1}^{2}}}{n_{1}}+ \frac{{s_{2}^{2}}}{n_{2}}}} $$

where n_i, $\bar {y}_i$, and s_i are components of sample d_i, the hypothesis H₀ is rejected at the α = 0.01 significance level if Z > z_α where z_α = 2.326. The sizes of our data samples are in the range of 115–345 (Table 6). In statistics, a sample size of n_i ≥ 30 is considered large enough to assume that its distribution is normal. Thus, the critical value of z_α is determined based on the standard normal distribution N(0,1). Clearly, all values of Z are above the critical value of 2.326 (Table 6), so we reject the null hypothesis H₀ in favor of the alternative hypothesis H₁. This means that there is sufficient evidence at the α = 0.01 level of significance to claim that the adaptive algorithm solves composite and prime languages faster than the basic algorithm.^{Footnote 12}

Several methods for reducing the search space are proposed in Section 5. The idea behind those methods is to omit the full verification of a set D ∪ C_j as to whether it is a decomposition set. When the set is large, its verification can be computationally expensive. However, Lemmas 2-4 allow us to discard a vast majority of the sets without verification. As seen in Table 5, the numbers of sets verified by the adaptive algorithm for composite languages are reduced by several orders of magnitude compared with the corresponding numbers for the basic algorithm. This indicates that the proposed methods of pruning the search space are effective.

Another way to prune the search space is by removal of redundant states of automaton A accepting the input language L. Due to such removal the state sets of A for the composite languages decrease by approximately 10%, and for the prime languages by approximately 16%–17% (see Med |Q| and Med $|Q^{\prime }|$ in Table 2b–c). The basic algorithm discovered some prospective decomposition sets for prime languages while the adaptive algorithm did not find any set of that type (Table 5). We believe that the reason for this was smaller automata processed by the adaptive algorithm compared with the basic algorithm.

The adaptive algorithm changes its behavior by setting the value of threshold T (denoted then by $\widehat {T}$) at runtime. The adaptation was most beneficial for languages of set E₂, which turned out to be most demanding in terms of solving the decomposition problem. For these languages the values of $\widehat {T}$ varied in a wide range of 20–58 (Table 7). The capability of adaptation was exploited to a lesser extent for languages in set E₁ with the values of $\widehat {T}$ varying within a range of 20–32, and for the prime languages the adaptive adjustment of $\widehat {T}$ did not occur.

Table 7 Values of threshold $\widehat {T}$ set by the adaptive algorithm

Full size table

The time complexity formulas ${\mathscr{T}}_b(\pi ,n)$ and ${\mathscr{T}}_a(\pi ,n)$ for the algorithms include several terms and coefficients. To investigate the variability of these quantities, we took the measurements reported in Table 8. The Avg entry contains the average value of a quantity, q, calculated over a language set. The Range entry describes the variability of q through a pair $(q_{\min \limits },q_{\max \limits })$ where $q_{\min \limits }$ and $q_{\max \limits }$ are the minimum and maximum average values of q calculated over average values in a distinguished interval. The range [6000,15000] of language size for sets E₁ and P₁ was divided into nine equal intervals, and the range [60000,90000] for E₂ and P₂ into ten intervals.

Table 8 Variability and average values of terms and coefficients appearing in time complexity formulas given in (3) and (10); times t₁ and t₂ are in seconds

Full size table

The Range values indicate that the times t₁ and t₂ are slowly increasing functions of language size n (Table 8). Recall that t₁ is the average time to find a partial decomposition set D in Phase1, and t₂ the average time to verify a set D ∪ C_j in Phase2. Small values of coefficient ε indicate that pruning of the search space is effective. The values of d(n,T) and t₁(n,T) allow us to estimate the average run time of Phase1 of basic algorithm for set E₂. This time is 2.1 ⋅ 45.6 ≈ 95.8 s. For the adaptive algorithm the run time of Phase1 is equal to 9.7 s. Since, we have the average run times of complete algorithms ($\bar {y}_1$ and $\bar {y}_2$ for E₂ in Table 6), we can calculate execution times of Phase2 for both algorithms. We get 1230.7 − 95.8 = 1134.9 s for the basic algorithm, and 23.6 − 9.7 = 13.9 s for the adaptive algorithm. Clearly, the run time balance between Phase1 and Phase2 is much better for the adaptive algorithm (9.7 vs. 13.9 s) compared to the basic algorithm (95.8 vs. 1134.9 s). The better balance was achieved due to effective pruning of the search space, and to the increase in threshold T done by the adaptive algorithm at runtime.

We also conducted the experiments on languages over a comparatively large alphabet Σ₁₀, and over small alphabets, in particular on the binary and unary languages. The setting was the same as before. The adaptive algorithm was run by 16 processes, and the time limit for solving a language was six hours. The languages over alphabets Σ₂ and Σ₁₀ were created in a similar fashion as described in Section 6.1. To produce unary languages, the random number of ones forming the words of a language were generated.

The experiments have shown that the languages over an alphabet Σ₁₀ were easy to solve. Their decomposition times, in the order of seconds (column Med in Table 9, and Fig. 16a–d), compared favorably with the languages over an alphabet Σ_{3 − 5} (column Med in Table 4, and Fig. 15). The experiments revealed that the binary languages were harder to solve, while the unary languages were the worst-case input data to solve the problem. The median run times for the binary languages were in the order of tens of seconds, and for the unary languages somewhat longer than 80 minutes (column Med in Table 9, and Figs. 17 and 16e–f). A major difficulty in solving these languages were the large sizes of automata that had to be searched. The ranges of median size of automata accepting the binary and unary languages were, respectively, [104,261] and [1994,1997] (column Med$|Q^{\prime }|$ in Table 10). As a result, the run times for those larger automata were longer, compared to the languages over an alphabet Σ₁₀ for which the sizes of automata were in the range of [50,106].

Table 9 Run times (in seconds) of adaptive algorithm for languages over alphabets Σ₁₀, Σ₂, and Σ₁ where R – range of language size, N – number of languages in the set

Full size table

Table 10 Sizes of state sets $Q^{\prime }$ of automata searched by adaptive algorithm

Full size table

6.2.1 Scalability study

The results of the adaptive algorithm speed-up evaluation are presented in Fig. 18.^{Footnote 13} The speed-ups achieved for the binary languages (set ${\mathscr{E}}^b$) and the sets of languages over an alphabet Σ_{3 − 5} (set E₁) were quite good. For the set E₂ the result was satisfactory.

As can be seen, the speed-ups obtained are not linear. The reason for this is that the parallel processes in the algorithm execute the same code in Phase1. So, the overhead of excess computation performed by the processes occurs, which decreases the speed-up. We have significantly reduced that overhead by shortening the run time of Phase1 (compare Avg times t₁ for the basic and adaptive algorithms in Table 8). It was done by means of the algorithmic refinements, in particular by removing redundant states in procedures REMOVERED2 and BUILDW, and by the adaptive adjustment of threshold T.

The large languages in size ranging from 160000 to more than 200000 words scaled very well (Fig. 18g–i). The computational work for these languages was higher, and so the impact of the overhead on the speed-up obtained was smaller. The times to find decompositions of large languages by using 128 processes varied in a range of 7–37 minutes.

As mentioned before, we solve the problem of finding all decompositions of a finite language L in the form of L = L₁L₂. The language L does not have to satisfy any specific conditions. To the best of our knowledge, parallel algorithms to solve this problem have not been presented in the literature so far. Therefore we could not compare the outcome of our experiments with the results of other algorithms.

7 Conclusions and future work

In this paper the problem of finite language decomposition is investigated. The problem under consideration, assuming that a language is given as a DFA, is NP-hard. The main contribution of the paper is the adaptive parallel algorithm based on an exhaustive search used for finding all decompositions of a given finite language. The algorithm implements several methods for pruning the search space. Furthermore, the algorithm is adaptive; it modifies its behavior at the time it is run by adjusting one of the parameters based on the runtime acquired data related to its performance. As a consequence, a substantial reduction in the amount of computation necessary to solve the problem has been achieved.

Comprehensive computational experiments carried out on almost 1450 languages over an alphabet Σ_{3 − 5} proved that the methods for pruning the search space proposed in Lemmas 2–4 were very effective. The methods allowed the adaptive algorithm to reduce the search space by several orders of magnitude compared with the basic algorithm. As a result, the median run time to solve the languages in set E₂ by the adaptive algorithm was equal to approximately 15 s whereas by the basic algorithm it was 1296 s. The adaptive feature of the algorithm proved most beneficial for languages from set E₂ for which the value of threshold T varied in a range of 20–58. The higher value of T is advantageous, because it gives rise to an increase in computational parallelism, which enables better use of available processes.

We also tested more than 2700 languages over a large alphabet Σ₁₀ and over small alphabets, specifically the binary and unary languages. The results indicated that the languages over an alphabet Σ₁₀ were easier to solve than those over an alphabet Σ_{3 − 5}. Furthermore, it took longer to decompose the binary languages in comparison to the languages over the alphabets Σ₁₀ and Σ_{3 − 5}, while the unary languages turned out to be the worst-case input data to solve the decomposition problem. Based on these findings, we conclude that finite languages over small alphabets are more difficult to decompose than those over large alphabets.

The scalability study revealed that the binary languages, and the languages generated over an alphabet |Σ| = 3–5, containing from 6000 to more than 200000 words, scaled well, especially those with larger sizes.

In terms of future work, the two issues can be investigated. The first is adaptive setting of the algorithm, which we believe has the potential to be improved. Presently, the algorithm establishes the value of threshold T based solely on the number of recursive runs of Phase1. We suppose that the number of processes executing the algorithm should also be considered while determining the value of T. Another issue to investigate is the further scalability of the adaptive algorithm. At present, the algorithm by using 16 processes can solve the language instances of up to 90000 words in the median run time of tens of seconds, and by using 128 processes the languages of sizes between 160000 and more than 200000 words, in the run time of tens of minutes. The question is to what extent the language size could be enlarged by increasing the number of processes while maintaining the short run time of the algorithm, and possibly high processor utilization.

Notes

A state q ∈ Q of a nondeterministic automaton may have more than one out-transition marked by a given symbol a ∈Σ; then |δ(q,a)| > 1.
When two distinct states have the same right language, they could be merged into one, making a smaller deterministic automaton. Two states p,q can be merged giving a single state $(\overleftarrow {p}\cup \overleftarrow {q}, \overrightarrow {p})$ by combining their in-transitions and using the out-transition from just one of them.
Recall that finite languages we deal with are regular.
The languages can be also written as ${L_{1}^{D}} = \{ w~|~\delta (s,w)\in D\}$, and ${L_{2}^{D}} = \bigcap _{g\in D}\{w~|~\delta (g,w)\in Q_{F}\}$.
A word aⁱ is a sequence of a’s of length i.
There are several algorithms of time complexity $O({\sum }_{w_i\in L}|w_i|)$ to construct such automaton for a given language L; see for example [28].
The partial set ${\mathscr{D}}$ is denoted by D in the code of the basic algorithm in Fig. 3.
Equation (3) defines the average time complexity of the algorithm solving composite languages. In case of prime languages, a partial decomposition set may not exist, and then d(n,T) = 0.
Note that on input to COMPLOWBOUND $D={\mathscr{D}}$.
The source code of the algorithms and selected benchmark languages used in the experiments are available on the GitHub website and service (https://github.com/tjastrzab/ai).
In order to run 16 processes (tasks) on the Tryton cluster, the Slurm utility allocated a single compute node equipped with two 12-core processors, and assigned eight tasks to each processor with one task per core.
This claim is also true if we take into account the outliers. After including the outliers into the data samples d₁ and d₂, all values of Z are still above the critical value. In general, the outlying data points may highly distort the mean and variance of measurements, which, as a result, may give a misleading impression regarding the shape of algorithm run time distributions.
For the experiments involving 128 processes (tasks), the Slurm utility allocated six compute nodes, and two nodes received 22 tasks, and the other 21 tasks each. Within a node the tasks were assigned as evenly as possible between the two 12-core Haswell processors.

References

Sieder P (2019) A lower bound for primality of finite languages. CoRR arXiv:1902.06253
Marin M, Kutsia T (2010) On the computation of quotients and factors of regular languages. Front Comput Sci China 4(2):173–184
Article Google Scholar
Afonin S, Golomazov D (2009) Minimal union-free decompositions of regular languages. In: Language and Automata Theory and Applications, Third International Conference, LATA 2009. Proceedings, Tarragona, pp 83–92
Wieczorek W, Nowakowski A (2015) Grammatical inference for the construction of opening books. In: 2015 Second International Conference on Computer Science, Computer Engineering, and Social Media (CSCESM). IEEE, pp 19–22
Bravo HJ, Pena PN, Alves LVR, Takahashi RHC (2018) Factorization-based approach for computing a minimum makespan controllable sublanguage. IFAC-PapersOnLine 51:19–24
Article Google Scholar
Han Y S, Salomaa K (2011) Overlap-free languages and solid codes. Int J Found Comput Sci 22(5):1197–1209
Article MathSciNet Google Scholar
Hung K V (2013) A prime decomposition algorithm for supercodes. J Comput Sci Cybern 29 (4):351–357
Google Scholar
Czyzowicz J, Fraczak W, Pelc A, Rytter W (2003) Linear-time prime decomposition of regular prefix codes. Int J Found Comput Sci 14(6):1019–1032
Article MathSciNet Google Scholar
Han Y-S, Wang Y, Wood D (2006) Infix-free regular expressions and languages. Int J Found Comput Sci 17(2):379–393
Article MathSciNet Google Scholar
Han Y-S, Wood D (2007) Outfix-free regular languages and prime outfix-free decomposition. Fund Inf 81(4):441–457
MathSciNet MATH Google Scholar
Domaratzki M, Salomaa K (2011) On language decompositions and primality. In: Calude C S, Rozenberg G, Salomaa A (eds) Language and Automata Theory and Applications, LNCS, vol 6570. Springer, Berlin, pp 63–75
Salomaa A, Salomaa K, Yu S (2008) Length codes, products of languages and primality. In: Martin-Vide C, Otto F, Fernau H (eds) Language and Automata Theory and Applications, LNCS, vol 5196. Springer, Berlin, pp 476–486
Han Y-S, Salomaa A, Salomaa K, Wood D, Yu S (2007) On the existence of prime decompositions. Theor Comput Sci 376(1-2):60–69
Article MathSciNet Google Scholar
Mateescu A, Salomaa A, Yu S (2002) Factorizations of languages and commutativity conditions. Acta Cybern 15:339–351
MathSciNet MATH Google Scholar
Salomaa A, Yu S (2000) On the decomposition of finite languages. In: Rozenberg G, Thomas W (eds) Developments in Language Theory: Foundations, Applications and Perspectives. World Scientific Publishing, Singapore, pp 22–31
Mateescu A, Salomaa A, Yu S (1998) On the decomposition of finite languages. Turku Centre for Computer Science
Wieczorek W (2009) An algorithm for the decomposition of finite languages. Log J IGPL 18 (3):355–366
Article MathSciNet Google Scholar
Wieczorek W (2009) Metaheuristics for the decomposition of finite languages. In: Kłopotek M A, Przepiórkowski A, Wierzchoń S T, Trojanowski K (eds) Recent Advances in Intelligent Information Systems. Akademicka Oficyna Wydawnicza EXIT, Warszawa, pp 495–505
Jastrza̧b T, Czech Z J (2014) A parallel algorithm for the decomposition of finite languages. Studia Informatica 35(4):5–16
Google Scholar
Jastrza̧b T, Czech Z J, Wieczorek W (2015) A parallel algorithm for decomposition of finite languages. In: Joubert G R, Leather H, Parsons M, Peters F J, Sawyer M (eds) Parallel Computing: On the Road to Exascale. IOS Press, Netherlands, pp 401–410
de la Higuera C (2010) Grammatical inference: Learning automata and grammars. Cambridge University Press, New York
Book Google Scholar
Hopcroft J E, Motwani R, Ullman J D (2013) Introduction to automata theory, languages, and computation, 3rd ed. Pearson international, Addison-Wesley
Martens W, Niewerth M, Schwentick T (2010) Schema design for XML repositories: complexity and tractability. In: Paredaens J, van Gucht D (eds) Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, New York, pp 239–250
Niewerth M (2015) Data definition language for xml repository management systems. Ph.D. Thesis, Technischen Universitat Dortmund an der Fakultat Informatik
Wieczorek W (2017) Grammatical inference: Algorithms, routines and applications, vol 673. Studies in Computational Intelligence. Springer
Wieczorek W (2010) A local search algorithm for grammatical inference. In: Sempere J M, García P (eds) Grammatical Inference: Theoretical Results and Applications, LNCS, vol 6339. Springer, Berlin, pp 217–229
Wieczorek W (2016) Inductive synthesis of cover-grammars with the help of ant colony optimization. Found Comput Decis Sci 41(4):297–315
Article MathSciNet Google Scholar
Runge T, Schaefer I, Cleophas L, Watson B W (2017) Many-MADFAct: Concurrently constructing MADFAs. In: Holub J, Zdárek J (eds) Proceedings of the Prague Stringology Conference 2017, Prague, pp 126–142

Download references

Acknowledgements

The research presented in this paper was supported by the National Science Centre Poland (NCN) project 2016/21/B/ST6/02158, and it was co-financed by the Silesian University of Technology grant for maintaining and developing research potential. The computation of the project was carried out using the infrastructure supported by the Silesian BIO-FARMA project POIG.02.01.00-00-166/08 and POIG.02.03.01-24-099/13 grant GeCONiI. We thank the following computing centres where the computation of our project was also carried out: Academic Computer Center in Gdańsk, Interdisciplinary Centre for Mathematical and Computational Modeling at Warsaw University (computing grant G27-9), and Wrocław Centre for Networking and Supercomputing (computing grant 30).

Funding

The research was funded by the National Science Centre Poland (NCN) project 2016/21/B/ST6/02158, and by the Silesian University of Technology grant for maintaining and developing research potential.

Author information

Authors and Affiliations

Department of Algorithmics and Software, Silesian University of Technology, Gliwice, Poland
Tomasz Jastrzęb & Zbigniew J. Czech
Department of Computer Science and Automatics, University of Bielsko-Biała, Bielsko-Biała, Poland
Wojciech Wieczorek

Authors

Tomasz Jastrzęb
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew J. Czech
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Wieczorek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Jastrzęb.

Ethics declarations

Code and data availability

The source code of the algorithms and selected benchmark languages used in the experiments are available on the GitHub website and service (https://github.com/tjastrzab/ai).

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jastrzęb, T., Czech, Z.J. & Wieczorek, W. An adaptive parallel algorithm for finite language decomposition. Appl Intell 52, 3029–3050 (2022). https://doi.org/10.1007/s10489-021-02488-y

Download citation

Accepted: 29 April 2021
Published: 26 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10489-021-02488-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An adaptive parallel algorithm for finite language decomposition

Abstract

Similar content being viewed by others

Parallelizing the dual revised simplex method

Efficient High-Level Programming in Plain Java

Parallel Multi-Block ADMM with o(1 / k) Convergence

1 Introduction

2 Related work

3 Preliminaries

Theorem 1

Proof

Corollary 1

Problem 1

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

4 Basic parallel decomposition algorithm

Definition 1

Lemma 1

Proof

5 Adaptive parallel decomposition algorithm

Definition 2

Example 7

Lemma 2

Proof

Lemma 3

Proof

Example 8

Lemma 4

Proof

Example 9

6 Computational experiments

6.1 Benchmark languages

6.2 Experimental results

6.2.1 Scalability study

7 Conclusions and future work

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Code and data availability

Conflicts of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation