Abstract
There are various definitions of a gene cluster determined by two genomes and methods for finding these clusters. However, there is little work on characterizing configurations of genes that are eligible to be a cluster according to a given definition. For example, given a set of genes in a genome, is it always possible to find two genomes such that their intersection is exactly this cluster? In one version of this problem, we make use of the graph theory to reformulated it as follows: Given a graph G with n vertices, do there exist two θ-powers of paths G S =(V S ,E S ) and G T =(V T ,E T ) such that G S ∩G T contains G as an induced subgraph? In this work, we divide the problem in two cases, depending on whether or not G is an induced subgraph of G S or G T . We show an \(\mathcal{O}(n^{2})\) time algorithm that generates the smallest θ-powers of paths G S and G T (with respect to and the number of vertices) that contains G as an induced subgraph. Finally, we discuss the problem when G is an induced subgraph neither of G S nor of G T and we present a method of finding the smallest power of a path when graph G is a cycle C n .
Similar content being viewed by others
1 Introduction
Due to recent research on genetic mapping, a large amount of information is available and stored in databases of various research centers in the world. Processing these data, in order to obtain relevant biological conclusions, is one of the challenges in biology. One way to structure these data is using comparison of genomes, i.e., the search for similarities and differences between two or more organisms. The central question of this paper proposes to deal with a problem in this area by asking: given a set of genes in a genome, called cluster, is it always possible to find two genomes such that their intersection is exactly this cluster? First, we show the modeling presented by Adam et al. [1] and Sankoff and Xu [8], which will be used in this paper.
A marker is a gene with a known location on a chromosome. Let V X be the set of n markers in the genome X. These markers are partitioned among a number of total orders called chromosomes. For markers g and h in V X on the same chromosome in X, let gh∈E X if the number of genes intervening between g and h in X is less than θ, where θ≥1 is a fixed neighborhood parameter. We call G X =(V X ,E X ) a θ-adjacency graph if its edges are determined by a neighborhood parameter θ.
Consider the θ-adjacency graphs G S =(V S ,E S ) and G T =(V T ,E T ) with a non-null set of vertices in common V ST =V S ∩V T . We say that a subset of V⊆V ST is a generalized adjacency cluster if it consists of vertices of a maximal connected subgraph of G ST =(V ST ,E S ∩E T ). We call G=G ST [V] the subgraph induced by set V.
Let G=(V(G),E(G)) be a graph with vertex set V(G) and edge set E(G), such that |V(G)|=n. Let v, \(\bar{v} \in V(G)\). The distance between vertices v and \(\bar{v}\), denoted by \(d_{G}(v,\bar{v})\), is the number of edges in a shortest path between v and \(\bar{v}\) in G. A path between two vertices v0 and v t of graph G is a sequence of vertices v1,v2,…,v t such that v i vi+1 is an edge of G, 1≤i≤t−1. Let P n be a graph that is a path with n vertices. A θ-power of a path\(P_{n_{\theta}}\), denoted by \(P^{\theta}_{n_{\theta}}\), θ>0, is graph such that \(V(P^{\theta}_{n_{\theta}}) = V(P_{n_{\theta}})\) and \(E(P_{n_{\theta}}^{\theta}) = \{v\bar{v} : d_{P_{n_{\theta}}}(v, \bar{v})\leq \theta \ \mathrm{with} \ v, \bar{v} \in V(P_{n_{\theta}}^{\theta})\}\). For the benefit of the reader, we denote the power of a path \(P^{\theta}_{n_{\theta}}\) by Pθ. The definition of a chromosome with n θ markers in a θ-adjacency graph is similar to a power of a path \(P^{\theta}_{n_{\theta}}\). Now, the central question of this work can be reformulated as follows:
Question 1
Given a connected graph G, do there exist G S and G T , two θ-powers of paths P S and P T , whose intersection contains G as an induced subgraph?
If the answer is yes, we are also interested in finding the minimum value of power θ and number vertices n θ for these two θ-powers of paths.
In order to contribute to this challenging problem, we divide our study in two cases, depending on whether or not G is an induced subgraph of G S or G T . First, we give some definitions. We say that G is an unit interval graph if there exists a family I of intervals (a,b) on the real line such that each v∈V(G) can be put in a one-to-one correspondence with (a v ,b v )∈I; the intervals in I are of same length; and \(v\bar{v}\) is a edge of E(G) if, and only if, \((a_{v}, b_{v}) \cap (a_{\bar{v}}, b_{\bar{v}}) \neq \emptyset\). This family of intervals is called an interval model for G. Lin et al. [6] and Soulignac [9] present a proof that the class of proper interval graphs precisely the class of unit interval graphs. There exist linear-time recognition algorithms for unit interval graphs, for example Figueiredo et al. [4] and Corneil et al. [3].
Brandstädt et al. [2] and Lin et al. [5] proved independently the following structural property:
Theorem 1
A graphGis an induced subgraph of a power of a path if, and only if, Gis an unit interval graph.
Thus, given an unit interval graph G with n vertices, there exists a θ-power of a path \(P_{n_{\theta}}\) that contains G as an induced subgraph. But the proofs of the structural characterization given by Theorem 1 [2, 5] does not lead to an algorithm that constructs G S and G T for Question 1 with minimum value of power θ and number vertices n θ .
In the paper [6], the authors show an \(\mathcal{O}(n)\) time algorithm that includes new intervals into a proper interval model I of a connected graph G, constructing an extended model I′ containing I. This extended model I′ gives an implicit representation of a power of a path for all proper interval graph G, but the number of inserted intervals, or the size of the power θ, cannot be minimum. The authors also remark that any explicit representation would require \(\mathcal{O}(n^{2})\) steps.
We present in this work an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path, G S (with respect to θ and to the number of vertices), that contains G as an induced subgraph. Next, we construct G T , a θ-power of a path with the same number of vertices of G S , such that the intersection G S ∩G T contains G as an induced subgraph.
This paper is organized as follows. In Sects. 2 and 3, we present the algorithm and we prove its correctness and complexity. In Sect. 4, we discuss the problem when G is an induced subgraph neither of G S nor of G T and we present a method of finding the smallest power of a path when graph G is a cycle C n .
2 The algorithm
Our result is based on the ordering of the vertex set of G, given by Algorithm Recognize [3], which satisfies the property proved by Roberts in [7]:
Property 2
A graphGis an unit interval graph if and only if there is an order < on vertices such that for all verticesv, the closed neighborhood ofvis a set of consecutive vertices with respect to the order <.
Since all powers of paths are unit interval graphs, we can insert the vertices of V(G) in the vertex set of a power of a path \(P^{\theta}_{n_{\theta}}\) until this power of a path contains G as an induced subgraph.
This construction is done by Algorithm CPP as follows. First, let v1<v2<⋯<v n be an ordering of V(G) given by Algorithm Recognize [3]. We consider θ0 as the number of vertices of the maximal clique that contains v1, minus one; and we insert the vertices of this clique in \(P^{\theta_{0}}\). The Algorithm CPP constructs a sequence of power of a paths \(P^{\theta_{0}} \subset P^{\theta_{1}} \subset \cdots \subset P^{\theta_{l-1}} \subset P^{\theta_{l}}\) such that θ i =θi−1+1.
Let v be the first vertex non-adjacent to v1 in the order on V(G). If v is adjacent to v2, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at distance θ0+1 from vertex v1 in \(P^{\theta_{0}}\). Similarly, if v is not adjacent to v t , but is adjacent to vt+1, Algorithm CPP must insert v in the vertex of \(P^{\theta_{0}}\) that is at a distance θ0+1 from vertex v t in \(P^{\theta_{0}}\). This is done by inserting t−1 vertices between the vertex of largest index adjacent to v1 and v in \(P^{\theta_{0}}\). Now, suppose that there exist at least two vertices v, \(\bar{v}\) that are not adjacent to v1 and adjacent to v2. Let \(\bar{v}\) be the second vertex of this set. In order to minimize the number of vertices of \(P^{\theta_{0}}\), vertex \(\bar{v}\) must be a vertex of \(P^{\theta_{0}}\) at distance θ0+2 of vertex v1 in \(P^{\theta_{0}}\). Then Algorithm CPP must call Procedure SHIFT to increase θ0 to θ1:=θ0+1 because of the edge \(\bar{v}v_{2}\). On the other hand, this increase adds several edges in \(P^{\theta_{0}}\) which are not in E(G). Thus, Procedure SHIFT adjusts the power of a path \(P^{\theta_{0}}\) for the new θ1, by inserting vertices in \(P^{\theta_{0}}\) in order to preserve the adjacencies and non-adjacencies between vertices of G and generates a new \(P^{\theta_{1}}\). Algorithm CPP proceeds until all vertices of V(G) are included in \(P_{n_{\theta}}^{\theta}\), a smallest power of a path with respect to θ and n θ .
Before describing Algorithm CPP, we borrow some definitions from [3]. Given an ordering of V(G) returned by Algorithm Recognize [3], then order G (v) is the position of vertex v considering this ordering; \(\xi_{G}(v) = \textrm{max} \{\mathrm {order}_{G}(\overline{v}) :\bar{v} \in N_{G}[v]\}\) and \(\eta_{G}(v) = \textrm{min} \{\mathrm {order}_{G}(\bar{v}): \bar{v} \in N_{G}[v]\}\), where N G [v]={w∈V(G):vw∈E(G)}∪{v}. Let v∈V(G) and u∈V(Pθ). We refer to \(\mathrm {order}_{P^{\theta}}(v)\) as the position of vertex v in the ordering of the vertex set of Pθ, i.e., \(\mathrm {order}_{P^{\theta}}(v)= i\), if u i =v in Pθ. We denote \(\xi_{P^{\theta}}(u)= \textrm{max} \{\mathrm {order}_{P^{\theta}}(\bar{u}) : \bar{u} \in N_{P^{\theta}}[u]\}\) and \(\eta_{P^{\theta}}(u) = \textrm{min} \{\mathrm {order}_{P^{\theta}}(\bar{u}) :\bar{u} \in N_{P^{\theta}}[u]\}\).
Next, we present Algorithm CPP and Procedure SHIFT.
Algorithm
CONSTRUCTING_POWER_OF_PATH(CPP)
Procedure SHIFT receives as input a smallest power of a path Pθ that contains G[v1,…,vl−1], ξ G (v1)+1≤l≤n as an induced subgraph in Pθ. Power Pθ contains the last vertex v l inserted by Algorithm CPP. Vertex v l raises Procedure SHIFT because v l is not adjacent to some vertex vl−t in Pθ, but vl−tv l ∈E(G).
Procedure
SHIFT
Algorithm CPP returns \(P_{n_{\theta}}^{\theta}\), the smallest power of a path (with respect to θ and n θ ) that contains G as an unit interval graph. We construct two powers of paths, G T =(V T ,E T ) and G S =(V S ,E S ), from \(P_{n_{\theta}}^{\theta}\) as follows. First, \(V_{T}= V_{S} = V(P^{\theta}_{n_{\theta}})\). Then, vertices of V T , which are not in V, receive different labels from vertices in \(V(P_{n_{\theta}}^{\theta})\).
We show an example of an unit interval graph G in Fig. 1. For this graph G, Algorithm CPP returns G S , the 2-power of path P S =v1,v2,v3,0,v4,v5. Then, G T is a 2-power of path P T =v1,v2,v3,v b ,v4,v5.
3 Proofs
In this section, we present the proofs of correctness of the Procedure SHIFT (Lemma 1) and Algorithm CPP (Theorem 4).
Lemma 1
LetPθbe a smallest power of a path that containsGl−1=G[v1,…,vl−1] as an induced subgraph, with respect to the orderingv1<⋯<vl−1. Letv l ∈V(G) be the next vertex inserted inPθand\(v_{l-t-1}v_{l} \not\in E(G), \ v_{l-t}v_{l} \in E(G)\)and\(d_{P_{n_{\theta}}}(v_{l-t}, v_{l})= \theta+1\). Then, the output of the Procedure SHIFT, the power of a pathPθ+1, is a smallest power of a path that containsG l =G[v1,…,vl−1,v l ] as an induced subgraph, with respect to the orderingv1<⋯<vl−1<v l .
Proof
Since \(v_{l-t-1}v_{l} \not\in E(G)\), vl−tv l ∈E(G) and \(\theta+1=\allowbreak d_{P_{n_{\theta}}}(v_{l-t}, v_{l})\), the Procedure SHIFT must increase the power θ by one unit (Step 1). But the increase of θ to θ+1 creates several adjacencies in Pθ between pairs of vertices of the set {v1,…,v l } that are non-adjacent in G. In order to preserve the adjacencies and non-adjacencies between vertices of G in Pθ, Procedure SHIFT is forced to insert one vertex between the vertex that received vl−t−1 in Pθ and its consecutive vertex in Pθ. Again, counting in descending order from vertex vl−t−1, the adjacencies were violated in each “block” of θ vertices in Pθ. So, the procedure must insert one vertex to each θ+1 vertices in descending order, from vertex vl−t−1 in Pθ. We observe that the set formed by the initial vertices of V(Pθ) has cardinality less than or equal to θ+1, because dividing \(\mathrm {order}_{P^{\theta}}(v_{l-t-1}) \ \) by θ+1 the remainder is greater than or equal to 1 and less than or equal to θ+1.
In each step, the procedure inserts the smallest number of vertices necessary to guarantee that the power of a path Pθ+1, created by Procedure SHIFT, contains G l [v1,…,v l ] as an induced subgraph. So, the power θ+1 and the number of inserted vertices are minimum and, consequently, Pθ+1 is a smallest power of a path that contains G l [v1,…,v l ] as an induced subgraph. □
First, we prove that Algorithm CPP correctly returns a smallest power of a path according to the ordering given by Algorithm Recognize [3].
Lemma 2
LetGbe a connected unit interval graph. Algorithm CPP generates the smallest power of a path\(P_{n_{\theta}}^{\theta}\), with respect toθandn θ , that containsGas an induced subgraph according to the orderingv1<⋯<v n given by the input of CPP.
Proof
Algorithm CPP constructs a sequence of powers of paths \(P^{\theta_{0}} \subseteq P^{\theta_{1}} \subseteq \cdots \subseteq P^{\theta}\), where θ i =θi−1+1. This is done by successively adding, in each \(P^{\theta_{i}}\), vertices of G following the input ordering, preserving the adjacencies and non-adjacencies between vertices of G and minimizing θ and n θ . Initially, the power of a path \(P^{\theta_{0}}\) receives the maximal clique containing v1, i.e., \(V(P^{\theta_{0}}) =\{u_{1}, \ldots, u_{\xi_{P^{\theta_{0}}}(v_{1})}\}\) and θ0=ξ G (v1)−1. This is the smallest power of a path that contains \(G[v_{1}, \ldots, v_{\xi_{G}(v_{1})}]\) as an induced subgraph.
Suppose that the l−1 first vertices, i.e., {v1,…,vl−1}, were already been inserted by Algorithm CPP in the power of a path Pθ, i.e., Pθ is the smallest power of a path, with respect to θ and n θ that contains G[v1,…,vl−1] as an induced subgraph. Let v l ∈V(G) the next vertex to be inserted by Algorithm CPP in Pθ. Suppose that v l is adjacent, in Pθ, to {vl−t,…,vl−1}. Vertex v l must be inserted in Pθ between positions \(\xi_{P^{\theta}}(v_{l-t-1})+1\) and \(\xi_{P^{\theta}}(v_{l-t})\) so that G[v1,…,v l ] be an induced subgraph of Pθ. Then, \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta+1\) and \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). We consider two cases with respect to the adjacencies of v l in G. From now on, we refer to Fig. 2, and Fig. 3 and Fig. 4, where dashed lines represent adjacencies.
Case 1: If t=θ, then after insertion of v l , \(\theta + 1 \leq d_{P_{n_{\theta}}}(v_{l-t-1},v_{l})\), because the set {vl−t,…,vl−1} has t=θ elements (see Fig. 2). In order to minimize θ and n θ , Algorithm CPP must insert v l in the consecutive vertex to vl−1 in the power of a path Pθ, and as a consequence \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta + 1\). In effect, since vl−1 is adjacent to vl−t in Pθ, by hypothesis, vl−1 was inserted in Pθ such that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) \leq \theta\) and v l was inserted in the consecutive vertex to vl−1 in Pθ, then the claim is true. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\), Algorithm CPP inserted v l without changing θ, the number of vertices of Pθ became n θ +1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta+1\), Algorithm CPP called the Procedure SHIFT and, by Lemma 1, we conclude the proof.
Case 2: If 1<t<θ, Algorithm CPP must insert v l in Pθ such that \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) \geq \theta + 1\) so that vl−t−1 and v l are not adjacent. We observe the position of vl−1 in Pθ. If vl−1 is not adjacent to vl−t−1 in Pθ (see Fig. 3), in order to minimize the number of vertices of Pθ, Algorithm CPP inserts v l in the consecutive vertex to vl−1. By hypothesis, vertex vl−1 was inserted in Pθ so that \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1})\leq \theta\). Then, if \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) < \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). Thus v l was inserted in Pθ without changing θ, the number of vertices of Pθ became n θ +1, and so this insertion was minimum. If \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l-1}) = \theta\), we have \(d_{P_{n_{\theta}}}(v_{l-t},v_{l}) = \theta + 1\), Procedure SHIFT was called and, by Lemma 1, we conclude the proof.
If vl−1 is adjacent to vl−t−1 in Pθ (see Fig. 4), the position of vl−t−1 in Pθ is between l−t+1 and \(\xi_{P^{\theta}}(v_{l-t-1})\), including them. Again, in order to minimize the number of vertices of Pθ, vertex v l is inserted \((\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))\) vertices after vertex vl−1 in Pθ. Thus,
Since \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) < d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\), we have \(d_{P_{n_{\theta}}}(v_{l-t}, v_{l}) \leq \theta\). So, v l was inserted in Pθ without changing θ, and the number of vertices of Pθ became \(n_{\theta}+(\xi_{P^{\theta}}(v_{l-t-1}) - \mathrm {order}_{P^{\theta}}(v_{l-1}))+1\). This insertion was minimum, because \(d_{P_{n_{\theta}}}(v_{l-t-1}, v_{l}) = \theta +1\).
This concludes the proof of the Lemma 2. □
In order to show that the Algorithm CPP returns the smallest power of a path containing G as an induced subgraph, we present two results with a given power of a path Pσ containing G as an induced subgraph. First, we shall give some notation from [3]. Given an unit interval graph G and an unit interval model associated to its vertices I={v1,v2,…,v n }, we recall that the interval associated to vertex v is (a v ,b v ). We say that v1,v2,…,v n is a natural labeling for the vertices of G, if \(a_{v_{i}} \leq a_{v_{i+1}}\), for each 1≤i≤n−1. The ordering v1<v2<…<v n is a natural ordering, if v1,v2,…,v n is a natural labeling for V(G). A vertex is a left anchor if it can receive the label v1 in some natural labeling for V(G). Consider the model I′ obtained by mirroring an unit interval model I (that is, replacing each interval (a,b) by (−b,−a)). Model I′ is also a valid unit interval model for G, so the rightmost interval in I is also a left anchor.
In the next results, we show properties of the ordering of V(G) induced by a natural ordering that is generated by the subscripts of a natural labeling of a power of a path.
Lemma 3
Let\(P_{n_{\sigma}}^{\sigma}\)be a power of a path that containsGas an induced subgraph. The ordering of the vertices ofV(G) induced by a natural ordering of the vertices ofV(Pσ) satisfies Property 2.
Proof
Suppose that this ordering of V(G) does not satisfy Property 2. Then, there exist three vertices v r ,v s ,v t ∈V(G) such that v r <v s <v t with \(v_{r}v_{s}\not\in E(G)\) and v r v t ∈E(G). It follows that v r v t ∈E(G)⊂E(Pσ). Therefore, \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{t})|\leq \sigma\). Since v r <v s <v t in V(Pσ), we have \(|\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq |\mathrm {order}_{P}(v_{r}) - \mathrm {order}_{P}(v_{t})|\), and then \(1\leq |\mathrm {order}_{P^{\sigma}}(v_{r}) - \mathrm {order}_{P^{\sigma}}(v_{s})|\leq \sigma\). Consequently, v r v s ∈E(Pσ) and \(v_{r}v_{s} \not\in E(G)\), i.e., \(P_{n_{\sigma}}^{\sigma}\) does not contain G as an induced subgraph. □
Vertices \(v, \overline{v} \in V(G)\) are indistinguishable vertices (twin vertices) if \(N_{G}[v] = N_{G}[\overline{v}]\). The next result states that it is possible to change the position, between indistinguishable vertices of V(G) in a natural ordering of V(Pσ).
Lemma 4
Letv, \(\overline{v} \in V(G)\)such that\(N_{G}[v] = N_{G}[\overline{v}]\)withv=u i and\(\overline{v} = u_{j}\)inV(Pσ). If we change the positions of verticesvand\(\overline{v}\)inPσ, i.e., v=u j and\(\overline{v} = u_{i}\), graphGwill still be an induced subgraph ofPσ.
Proof
Without loss of generality, suppose i<j. By Lemma 3, the ordering of V(G) induced by a natural ordering of V(Pσ) satisfies Property 2. So, \(N_{G}[v]=\{v_{\eta_{G}(v)},\ldots, v_{\xi_{G}(v)}\}\) and \(N_{G}[\overline{v}]=\{v_{\eta_{G}(\overline{v})}, \ldots,v_{\xi_{G}(\overline{v})}\}\). Since \(N[v] = N[\overline{v}]\), we have \(\xi_{G}(\overline{v}) =\xi_{G}(v)\) and \(\eta_{G}(\overline{v}) = \eta_{G}(v)\). Then, \(v_{\xi_{G}(v)} =v_{\xi_{G}(\overline{v})}\), \(v_{\eta_{G}(v)} = v_{\eta_{G}(\overline{v})}\), \(v_{\xi_{G}(v)+1}=v_{\xi_{G}(\overline{v})+1}\) and \(v_{\eta_{G}(v)-1} = v_{\eta_{G}(\overline{v})-1}\). Thus, by changing the positions of vertices v and \(\overline{v}\) in Pσ, we have \(\mathrm {order}_{P^{\sigma}}(\overline{v}) - \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(\overline{v})-1}) \geq \sigma +1\); then edge \(\overline{v}v_{\eta_{G}(\overline{v})-1} \not\in E(P^{\sigma})\). Also, for any v′∈V(G) with \(\mathrm {order}_{G}(v') < \mathrm {order}_{G}(v_{\eta_{G}(\overline{v})-1})\), edge \(\overline{v}v' \not\in E(P^{\sigma})\). Similarly, \(\mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(v)+1}) - \mathrm {order}_{P^{\sigma}}(v)\geq \sigma + 1\), i.e., edge \(vv_{\xi_{G}(v)+1} \not\in E(P^{\sigma})\) and also, for any v′∈V(G) with \(\mathrm {order}_{G}(v_{\xi_{G}(v)+1}) < \mathrm {order}_{G}(v')\), edge \(vv' \not\in E(P^{\sigma})\).
Analogously, \(\sigma \geq \mathrm {order}_{P^{\sigma}}(\overline{v})- \mathrm {order}_{P^{\sigma}}(v_{\eta_{G}(v)})\), i.e., edge \(\overline{v}v_{\eta_{G}(\overline{v})} \in E(P^{\sigma})\) and, for any v′∈V(G) with \(\mathrm {order}_{G}(v_{\eta_{G}(\overline{v})}) < \mathrm {order}_{G}(v') < \mathrm {order}_{G}(\overline{v})\), edge \(\overline{v}v' \in E(P^{\sigma})\). Similarly \(\sigma \geq \mathrm {order}_{P^{\sigma}}(v_{\xi_{G}(\overline{v})}) - \mathrm {order}_{P^{\sigma}}(v)\), i.e., edge \(vv_{\xi_{G}(v)} \in E(P^{\sigma})\) and, for any v′∈V(G) with \(\mathrm {order}_{G}(v) < \mathrm {order}_{G}(v') <\mathrm {order}_{G}(v_{\xi_{G}(v)})\), edge vv′∈E(Pσ). □
In what follows, we denote by v i < B v j if order G (v i )<order G (v j ) considering the ordering of V(G) given by Algorithm Recognize [3]. First, Theorem 4, we need two results.
Theorem 3
(Theorem 2.2 [3])
LetIbe an unit interval model of an unit interval graphGwith natural labelingv1,…,v n . Then, for all vertices\(\bar{v},\ v \in V(G)\), if\(a_{\bar{v}} < a_{v}\)but\(v <_{B} \bar{v}\), we have\(N_{G}[v]=N_{G}[\bar{v}]\).
As consequence of Theorem 2.3 of [3], we have the following result.
Lemma 5
([3])
Let\(v'_{1} <_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\)be an ordering ofV(G) given by Algorithm Recognize [3] of an unit interval graphG. Given a natural labelingv1,…,v n then\(N_{G}[v'_{1}]=N_{G}[v_{1}]\)or\(N_{G}[v'_{1}]=N_{G}[v_{n}]\).
Finally, the correctness of Algorithm CPP is given by theorem below.
Theorem 4
LetGbe an unit interval graph. Algorithm CPP returns the smallest power of a path\(P^{\theta}_{n_{\theta}}\)with respect toθandn θ , that containsGas an induced subgraph.
Proof
Let \(P^{\sigma}_{n_{\sigma}}\) be the smallest power of a path that contains G as an induced subgraph. Let \(\overline{u}_{1} < \cdots < \overline{u}_{n_{\sigma}}\) be a natural ordering of V(Pσ) and let \(\overline{v}_{1} < \cdots < \overline{v}_{n}\) be the ordering of V(G) induced by the natural ordering of V(Pσ). Clearly, \(\overline{v}_{1}, \ldots, \overline{v}_{n}\) is a natural labeling of V(G). Let I be a family of intervals for this labeling of V(G), such that each v∈V(G) is associated to (a v ,b v )∈I.
If we prove \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\) is equal to \(v'_{1} <_{B}v'_{2} <_{B} \cdots <_{B} v'_{n}\) up to indistinguishable vertices, we have θ=σ and n θ =n σ . In fact, since Pσ is the smallest power of a path that contains G as an induced subgraph, then σ≤θ and n σ ≤n θ . On the order hand, by Lemma 2, the power of a path Pθ returned by Algorithm CPP is the smallest power of a path that contains G as an induced subgraph with respect to the ordering, \(v'_{1}<_{B} v'_{2} <_{B} \cdots <_{B} v'_{n}\). So, if this ordering is equal to \(\overline{v}_{1} < \overline{v}_{2} < \cdots < \overline{v}_{n}\), up to indistinguishable vertices, by Lemma 4, Pσ contains G as an induced subgraph with respect to the ordering \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\). Then, by minimality of θ and n θ with respect to \(v'_{1} <_{B} v'_{2} <_{B}\cdots <_{B} v'_{n}\), we have σ≥θ and n σ ≥n θ .
First, suppose that the left anchor \(\overline{v}_{1}\) is equal to \(v'_{1}\). Suppose, by absurd, that there exist \(v, \ \tilde{v} \in V(G)\), such that \(v < \tilde{v}\), \(\tilde{v} <_{B} v\) and \(N_{G}[v] \neq N_{G}[\tilde{v}]\). Since \(v < \tilde{v}\) then \(a_{v} \leq a_{\tilde{v}}\). If \(a_{v} = a_{\tilde{v}}\), since all intervals of I have the same length, we have \(b_{v} = b_{\tilde{v}}\) and hence \(N_{G}[v] =N_{G}[\tilde{v}]\) a contradiction to the hypothesis. If \(a_{v} < a_{\tilde{v}}\), since \(\tilde{v} <_{B} v\) then, by Theorem 3, \(N_{G}[v] = N_{G}[\tilde{v}]\), a contradiction to the hypothesis. Thus, for all pair of vertices \(v, \tilde{v} \in V(G)\) such that \(v < \tilde{v}\) and \(\tilde{v} <_{B} v\), then \(N_{G}[v] = N_{G}[\tilde{v}]\). Consequently, we have σ=θ and n σ =n θ .
Now, suppose that the left anchor \(\overline{v}_{1}\) is different from \(v'_{1}\). By Lemma 5, either \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\) or \(N_{G}[\overline{v}_{n}] =N_{G}[v'_{1}]\). If \(N_{G}[\overline{v}_{1}] = N_{G}[v'_{1}]\), by Lemma 4, we can change the positions of these vertices in V(Pσ), i.e., \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(v'_{1})}=\overline{v}_{1}\) and \(\overline{u}_{\mathrm {order}_{P^{\sigma}}(\overline{v}_{1})} = v'_{1}\) and G will still be an induced subgraph of Pσ. After this change \(v'_{1} < \overline{v}_{2} < \cdots <\overline{v}_{1} < \cdots < \overline{v}_{n}\) is the new ordering of V(G) induced by the ordering of V(Pσ). We repeat the same argument used in the previous case, where \(\overline{v}_{1}\) is equal to \(v'_{1}\) and we conclude the proof. If \(N_{G}[\overline{v}_{n}] = N_{G}[v'_{1}]\), since \(\overline{v}_{n}\) is the left anchor of the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} <\cdots < \overline{v}_{1}\) of V(G) induced by the natural ordering \(\overline{u}_{n_{\sigma}} < \cdots < \overline{u}_{1}\) of V(Pσ) then, we can repeat the previous argument for the natural labeling \(\overline{v}_{n} < \overline{v}_{n-1} < \cdots < \overline{v}_{1}\) and so we conclude the proof. □
The Algorithm CPP analyzes each vertex of G in the ordering returned by Algorithm Recognize [3] a single time. In the worst case, the Algorithm CPP calls Procedure SHIFT for each vertex v l ∈V(G) only once. Since for each vertex v l the Procedure SHIFT analyzes the set of vertices of G l at most once, the complexity of the Algorithm CPP is \(\mathcal{O}(n^{2})\).
4 G is not an induced subgraph of G S and G T
If we relax the constraint that G must be an induced subgraph of G S or G T then even for unit interval graphs it is possible to find two powers of paths, whose intersection contains G as an induced subgraph, smaller than the answer given by Algorithm CPP. See an example in Fig. 5.
If graph G is an unit interval graph then G contains no induced Claw (Fig. 6), S3 (Fig. 7), \(\overline{S}_{3}\) (Fig. 8) and Cycle (C n ), n≥4. If G is a cycle C n , n≥4. Then the smallest θ-powers of paths, G S and G T , such that G S ∩G T contains C n as induced subgraph can be obtained as follows. First, we construct G S : for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), u2j−1:=v j ; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), u2j:=vn+1−j. Now, we construct G T : for \(1 \leq j \leq \lceil \frac{n}{2}\rceil\), w2j−1:=vj+1; and \(1 \leq j \leq \lfloor \frac{n}{2}\rfloor\), w2j:=v k , where \(k = (n+2-j) \operatorname {mod}n\). See an example when G is a C6 in Fig. 9.
Theorem 5
LetG S andG T be 2-powers of paths with n vertices constructed by the previous method. ThenG S ∩G T isC n , n≥4.
Proof
Let G S be the 2-power of path P S =u1,…,u n , and let G T be the 2-power of path P T =w1,…,w n constructed by the previous method. Since the distance between consecutive vertices of G in G S (resp. G T ) is less than or equal to 2, G S (resp. G T ) contains G as subgraph.
For each v i ∈C n , \(i \in \{2, \ldots, \lceil\frac{n}{2}\rceil,\lceil\frac{n}{2}\rceil +2, \ldots, n\}\), and 3≤j≤n−2, if u j =v i with j odd then wj−2=v i ; if j is even, we have wj+2=v i .
Now, let v i ∈C n , if u j =v i , 3≤j≤n−2 with j odd (resp. even), then wj−2=v i (resp. wj+2=v i ), and its neighbors uj−1=v k =wj−1+2 (resp. uj−1=vk−1=wj−1−2) and uj+1=vk−1=wj+1+2 (resp. uj+1=v k =wj+1−2). We conclude that \(d_{P_{S}}(v_{i},v_{k})= d_{P_{S}}(v_{i}, v_{k-1}) = 1\), \(d_{P_{T}}(v_{i}, v_{k}) = 3\) and \(d_{P_{T}}(v_{i}, v_{k-1}) = 5\), i.e., v i v k ,v i vk−1∈E(G S ) and \(v_{i}v_{k},v_{i}v_{k-1} \not \in E(G_{T})\). Hence, \(v_{i}v_{k},v_{i}v_{k-1} \not \in G_{S} \cap G_{T}\). □
5 Conclusion
In this work, we developed an \(\mathcal{O}(n^{2})\) time algorithm that generates, from a connected unit interval graph G, an explicit representation of the smallest θ-power of path G S (with respect to θ and to the number of vertices) that contains G as an induced subgraph. We construct G T , a θ-power of a path with the same number of vertices of G S , such that the intersection G S ∩G T contains G as an induced subgraph.
We remark that θ can be greater than or equal to the size of a maximum clique of the graph G, ω(G). We present in Fig. 10 an example where G has ω(G)=4 and Algorithm CPP returns θ=5, but the difference between θ and ω(G) can be greater than 1.
In case graph G is not an induced of G S and G T , we show a method that generates G S and G T , 2-powers of paths with n vertices, whose intersection is C n , n≥4.
As future work, we intend to investigate this problem for other classes of graphs. We remark that all remaining forbidden induced subgraphs of unit interval graphs (Figs. 6, 7 and 8), have answer YES to Question 1.
For a Claw graph, we see that G S is the 2-power of path P S =v2,a,v1,v3,v4; and G T is the 2-power of path P T =v3,b,v1,v2,v4. For a 3-sun graph, we find that G S and G T are 4-powers of paths P S =v5,a,b,v4,v6,v3,v1,v2 and P T =v1,v6,x,v5,v2,v4,y,v3, respectively. For a Net graph, we see that G S and G T are 2-powers of paths P S =v4,v2,v1,v5,v3,a,v6 and P T =v4,b,v1,v5,v3,v2,v6, respectively.
References
Adam Z, Choi V, Sankoff D, Zhu Q (2008) Generalized gene adjacencies, graph bandwidth and clusters in yeast evolution. In: Lecture Notes in Bioinformatics, vol 4983, pp 134–145
Brandstädt A, Hundt C, Mancini F, Wagner P (2010) Rooted directed path graphs are leaf powers. Discrete Math 310:897–910
Corneil DG, Kim H, Natarajan S, Olariu S, Sprague A (1995) Simple linear time recognition of unit interval graphs. Inf Process Lett 55:99–104
Figueiredo CMH, Meidanis J, Mello CP (1995) A linear-time algorithm for proper interval graph recognition. Inf Process Lett 56:179–184
Lin MC, Rautenbach D, Soulignac FJ, Szwarcfiter JL (2011) Powers of cycles, powers of paths, and distance graph. Discrete Appl Math 159:621–627
Lin MC, Soulignac FJ, Szwarcfiter JL (2009) Short models for unit interval graphs. Electron Notes Discrete Math 35:247–255
Roberts FS (1968) Representations of indifference relations. Stanford University, Stanford
Sankoff D, Xu X (2008) Tests for gene clusters satisfying the generalized criterion. Lect Notes Comput Sci 5167:152–160
Soulignac FJ (2010) On proper and helly circular-arc graphs. Universidad de Buenos Aires, Buenos Aires
Acknowledgements
This research was supported by CNPq and FAPERJ.
We are really grateful to professor Jayme Szwarcfiter for having presented to us the paper [5] in the very beginning of our work and for fruitful discussions on this topic. We are also thankful to the anonymous referees for their careful reading and valuable contributions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Costa, V., Dantas, S., Sankoff, D. et al. Gene clusters as intersections of powers of paths. J Braz Comput Soc 18, 129–136 (2012). https://doi.org/10.1007/s13173-012-0064-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13173-012-0064-8