Networks of splicing processors: simulations between topologies

Networks of splicing processors are one of the theoretical computational models that take inspiration from nature to efficiently solve problems that our current computational knowledge is not able to. One of the issues restricting/hindering is practical implementation is the arbitrariness of the underlying graph, since our computational systems usually conform to a predefined topology. We propose simulations of networks of splicing processors having arbitrary underlying graphs by networks whose underlying graphs are of a predefined topology: complete, star, and grid graphs. We show that all of these simulations are time efficient in the meaning that they preserve the time complexity of the original network: each computational step in that network is simulated by a fixed number of computational steps in the new topologic networks. Moreover, these simulations do not modify the order of magnitude of the network size.


Introduction
The formal operation of splicing on strings has been introduced in [5] as an abstraction of the biological phenomenon of DNA recombination under the effect of restriction and ligases enzymes. The biological phenomenon is illustrated in Fig. 1. We give here a few informal explanations. Two DNA molecules (the blue and the red ones) are cut by a restriction enzyme (in this case the enzyme is EcoRI). This process yields fragments with Watson-Crick complementary tails called "sticky ends". These sticky ends may join again leading to the recombination of DNA. To fix the new combination, a DNA enzyme called ligase seals the gaps after the sticky ends are joint.
We follow [15] with the formal definition of splicing as an operation on pairs of strings. First, we need to define what a splicing rule is: a quadruple of strings specifying the subsequences in the two strings where the strings are cut. Therefore, a splicing rule is intended to abstract the restriction enzymes and its subsequences indicate the sites where the enzymes cut. Different computational models based on the iteration of this operation may be defined. Thus, a generating splicing system initiates a computation starting from a given finite set of strings (axioms) and iteratively applying splicing rules, from a given finite set of such rules, producing eventually a language. This computational model was introduced in [5]; further on, the model and its variants have intensively been investigated. Splicing operation, as a formal operation on words and languages, has been vividly studied for more than two decades. There have been published a lot of papers as well as several books containing chapters devoted to this topic. We mention here just a few of them [6,9,16], containing extensive chapters about splicing, as well as [7,8], containing chapters that intend to discuss various applications. There are two types of splicing systems: generating systems, which generates a language by iteratively applying splicing rules to the strings obtained starting from a finite set of strings, and accepting system, which starts out with just one initial string and a finite set of axioms and an iterative splicing as above is initiated. The computation halts when at least one string from a predefined set is obtained. The input string is accepted as soon as the system halts. The accepting splicing system has been introduced by Mitrana et al. in [13], while different variants have been studied in [1,4,14], etc.
In [10] a highly parallel and distributed computational model based on the splicing operation was introduced: network of splicing processors (NSP). This model consists in an undirected graph whose nodes host a splicing processor. A splicing processor consists in a finite set of splicing rules, a finite set of strings (axioms) and four sets of symbols, such that two of them define the input filter while the other two define the output filter. A computation in a network of splicing processors (NEP, for short) is a sequence of splicing and communication steps which alternate with each other. In a splicing step, each processor applies, in parallel, the splicing rules it contains to all the strings existing at that moment in the processor. Note that we assume that each string appearing in a processor at some moment, appears actually in an unlimited number of identical copies such that different copies may be rewritten by different splicing rules. In a communication step, all the strings existing in the network nodes are simultaneously are expelled from their nodes, provided that they can pass the output filters of the nodes. In the same communication step, arbitrary large number of copies of each string expelled from one node (sender) enter all the nodes (receivers) connected to the sender, provided that the string can pass the input filters of the receivers. The computation halts as soon as a predefined node, called Halt, contains at least a string.
Several variants of NSP have been considered so far, most of them being computationally complete, see, e.g., [2,3,[10][11][12]. These networks have an ad hoc underlying graph structure. By different reasons like: possible implementations, uniformity, comparisons, etc., it would be useful to have networks with a fixed and well known topology as: complete graph, star, grid, etc. This is actually the aim of this work: to investigate the possibility of transforming a given NSP into an equivalent NSP with an underlying graph of such a predefined structure. We are interested not only in the construction of these networks but also in comparing the computational time and size of the constructed networks with those of the original ones.

Basic definitions
In this section we introduce the main concepts and notations that will be used in the sequel. For those notions not defined here we refer to [17].
An alphabet is a finite and nonempty set of symbols. The cardinality of a finite set A is written card(A). Any finite sequence of symbols from an alphabet V is called string over V. The set of all strings over V is denoted by V * and the empty string is denoted by . The length of a string x is denoted by |x| while alph(x) denotes the minimal alphabet W such that x ∈ W * . A language over the alphabet V is a set L ⊆ V * .
We give now the formal definition of the splicing operation following [15]. A splicing rule over a finite alphabet V is a quadruple of strings of the form [(u 1 , u 2 );(v 1 , v 2 )] such that u 1 , u 2 , v 1 , and v 2 are in V * . For a splicing rule r = [(u 1 , u 2 );(v 1 , v 2 )] and for x, y, z ∈ V * , we say that r produces z from x and y (denoted by (x, y) ⊢ r z ) if there exist some x 1 , x 2 , y 1 , y 2 ∈ V * such that x = x 1 u 1 u 2 x 2 , y = y 1 v 1 v 2 y 2 , and z = x 1 u 1 v 2 y 2 . For a language L over V and a set of splicing rules R we define A short discussion is in order here. As one can see, the splicing rule defined above is a 1-splicing rule in the sense of [6]. However, in the rest of the paper we do not make any difference between the two strings a splicing rule is applied to, therefore we may say that the rules are actually 2-splicing rules .
Let V be an alphabet; we now define two predicates, one with strong conditions (s) and another with weak restrictions (w), for a string z ∈ V + and two disjoint subsets P, F of V as follows: In the definition of these predicates, the set P is a set of permitting symbols while the set F is a set of forbidding symbols. Informally, both conditions require that no forbidding symbol occurs in z. As one can see, the former condition is stronger than the second one since it requires that all permitting symbols are present in z, while the latter requires that at least one permitting symbol appears in z.
These predicates are extended to a language L ⊆ V * by with ∈ {(s), (w)}.
A splicing processor over an alphabet V is a 6-tuple (S, A, PI, FI, PO, FO), where: -S is a finite set of splicing rules over V.
-A is a finite set of auxiliary strings over V. These auxiliary strings are to be used, together with the existing strings, in the splicing steps of the processors. Auxiliary strings are available at any moment. -PI, FI ⊆ V are the sets of permitting and forbidding symbols, respectively, which form the input filter of the processor. -PO, FO ⊆ V are the sets of permitting and forbidding symbols, respectively, which form the output filter of the processor.
The set of splicing processors over V is denoted by SP V . A network of splicing processors is a 9-tuple = (V, U, ⟨, ⟩, G, N, , In , Halt), where: • V and U are the input and network alphabet, respectively, V ⊆ U , and ⟨, ⟩ ∈ U ⧵ V are two special symbols. • G = (X G , E G ) is an undirected graph without loops with the set of nodes X G and the set of edges E G . Each edge is given in the form of a binary set. G is called the underlying graph of the network. • N ∶ X G ⟶ SP U is a mapping, which associates with each node x ∈ X G the splicing processor } defines the type of the filters of a node. • In, Halt ∈ X G are the input and the halting node of , respectively.
The size of an NSP is defined as the number of nodes of the graph, i.e., card(X G ) . A configuration of an NSP is a mapping C ∶ X G → 2 U * , which associates a set of strings with every node of the graph. Although a configuration is a multiset of strings, each one appearing in an arbitrary number of copies, for the sake of simplicity, we work with the support of this multiset. A configuration can be seen as the sets of strings, except the auxiliary ones, which are present in the nodes at some moment. For a string w ∈ V * , we define the initial configuration of on w by C (w) 0 (In) = {⟨w⟩} and C (w) 0 (x) = � for all other x ∈ X G . A configuration is followed by another configuration either by a splicing step or by a communication step. A configuration C ′ follows a configuration C by a splicing step if each component C � (x) , for some node x, is the result of applying all the splicing rules in the set S x that can be applied to the strings in the set in C(x) together with those in A x . Formally, configuration C ′ follows the configuration C by a splicing step, written as C ⇒ C ′ , iff for all x ∈ X G , the following holds: In a communication step, the following actions take place simultaneously for every node x: (i) all the strings that can pass the output filter of a node are sent out of that node; (ii) all the strings that left their nodes enter all the nodes connected to their original ones, provided that they can pass the input filter of the receiving nodes.
Note that, according to this definition, those strings that are sent out of a node and cannot pass the input filter of any node are lost. Formally, a configuration C ′ follows a configuration C by a communication step (we write C � ⊧ C) iff for all x ∈ X G holds. For an NSP , a computation on an input string w is defined as a sequence of configurations C (w) and C (w) 2i+1 ⊧ C (w) 2i+2 , for all i ≥ 0 . A computation on an input string w halts if there exists k ≥ 1 such that C (w) k (Halt) is non-empty. Such a computation is called an accepting computation. As the halting node is used just for ending the computation, we shall consider that S Halt = A Halt = � . Furthermore, because as soon as a string enters Halt , the computation halts and no string goes out, we may also consider that PO Halt = FO Halt = �.
The language accepted by is defined as Given an NSP with the input alphabet V, we define the following computational complexity measure. The time complexity of the finite computation C (x) , Halt � } defined as follows: • node In ′ : • node x comp : We now analyze a computation of ′ on the input string < w > . In the input node In ′ , the symbol t 0 is attached to the end of the string. Next, the symbol t 0 is replaced by t 1 in the node x comp . When it goes out, it can only enter x s 1 and the simulation of a computation in starts. Thus, the string < w > t 1 lies in x s 1 , while the string < w > is found in x 1 , the input node of . More generally, we may assume that a string zt i is found in a node x s i ∈ � if and only if the corresponding string z lies in x i ∈ , for all 1 ≤ i ≤ n − 1 . Note that the strings can never return to In ′ because of the input filter of this node. Let x i be a splicing node, where a rule [(a, b); (u, v)] is applied to w yielding w ′ and w ′′ . Then, the same rule is applied in x s i and strings of the form w ′ t i and w ′′ t i are produced. Indeed, since all the strings in A x s i and any string entering x s i have the symbol t i at the end, the splicing rule will always yield strings keeping the character t i as the last one. Since both the node x i and the node x s i have the same output filters and the produced strings only differ in this last character t i , it follows that a string can only leave x s i if and only if the original counterpart can exit x i . Once it leaves, the string returns to x comp and the character t i is replaced with t j characters in different copies, provided that {x i , x j } ∈ E G . Each of the copies is sent to the corresponding connected node x j and the process described above restarts. It immediately follows that L( � ) = L( ).
It is easy to notice that Time � (w) = 2Time (w) for every w ∈ L( ) , hence the second statement is proved.
Finally, this construction needs two more nodes, therefore a a a a a a a a a The underlying graph of ′ 1 3

Simulating arbitrary NSP by star NSP
Theorem 2 For every NSP one can construct a star NSP ′ such that the following conditions are satisfied: Proof The simulation is identical to the one for complete graphs. The node x comp is set as the center of the star network, while all the other nodes defined in the previous proof are connected to it, as shown in Fig. 3. Clearly, each computation in ′ goes as in the previous construction, hence all the statement of the theorem follow. ◻

Simulating arbitrary NSP by grid NSP
Theorem 3 For every NSP one can construct a grid NSP ′ such that the following conditions are satisfied: The underlying graph of the network ′ is the grid graph with width 3 and height n + 1 from Fig. 4 below and its nodes are defined as follows: • node In ′ : • nodes x s i , 1 ≤ i ≤ n − 1 : • nodes D, D ′ :  Fig. 4 The underlying graph of ′ 1 3 We now analyze a computation of ′ on the input string < w > . In the input node In ′ , the symbol t 1 is attached at the end. Next, the string enters x s 1 and the simulation of a computation in starts. Thus, the string < w > t 1 lies in x s 1 while the string < w > is found in x 1 , the input node of . More generally, we may assume that a string zt i is found in a node x s i ∈ � if and only if the corresponding string z lies in x i ∈ . Note that the strings cannot longer return to In ′ because of its FI filter. Note that the node In ′ and the nodes D and D ′ will not accept any string from now on because of their PI filters. Consequently, the first row can be disregarded for the rest of the computation.
Let x i be a splicing node, where a rule [(a, b); (u, v)] is applied to w yielding w ′ and w ′′ . Then, the same rule is applied in x s i and strings of the form w ′ t i and w ′′ t i are produced. Indeed, since all the strings in A x s i and any string entering x s i have the symbol t i at the end, the splicing rule will always yield strings keeping the character t i as the last one. Since both the node x i and the node x s i have the same output filters and the produced strings only differ in this last character t i , it follows that a string can leave x s i if and only if the original counterpart can exit x i . Once it leaves the node, the string can only enter the linked node x comp i and, depending on if i is an odd or an even number, the character t i is replaced with t 1 j or t 2 j characters in different copies, respectively, granted that {x i , x j } ∈ E G . Because of this last transformation, the yielded strings can only enter the node , provided that i − 1 ≥ 1 , i + 1 ≤ n . In this way, the string eventually arrives to the node x connect j and either the character t 1 j or the symbol t 2 j is replaced with t ′ j blocking the string from continuing through the column of nodes x connect i . Lastly, this last character is replaced by t j in x comp j and the string enters the intended node x s j , granted that it meets the requirements set by the input filters of this last node. Otherwise, it is lost. Summarizing, we consider a splicing step in , that produces a string z ′ from z in node x i , 1 ≤ i ≤ n , which is further sent to x j , j > i (the case j < i is analogous). These two steps (splicing and communication) are simulated in ′ by a series of splicing steps such that the string zt i is transformed into z ′ t i in x s i , then sent, via an itinerary that starts with the node x and x s j . Therefore, the induction step is valid. From this reasoning, we infer that L( ) = L( � ) . Following closely the explanations, we note that each splicing step in the node x i of is simulated by at most n + 3 splicing steps in ′ . This is done as follows: one step in x s i , followed by one step in x comp i , and then at most n splicing steps in the nodes from x connect i to x connect j . Finally, one more step is done in x comp j before the string enters x s j . Since the size of is constant, it follows the second statement of the theorem. The third statement is immediately valid from the Fig. 4. ◻

Conclusions and further work
Motivated by possible implementations, we have investigated the possibility of transforming an NSP with an arbitrary underlying graph into an equivalent NSP (the two have the same computational power) with an underlying graph of a predefined topology. We have considered here the complete, star, and grid graphs. We have proposed constructions for these transformations such that: (i) these constructions do not increase the time complexity, and (ii) these constructions do not increase the network size by more than a constant. The protocol of communication of the networks considered here is based on some random context conditions. We would like to investigate whether or not similar constructions can be obtained for networks of polarized splicing processors, where the protocol of communication is regulated by the polarization of the nodes and a mapping that defines the polarization of data.