Keywords

1 Introduction

Given a query string s and a directed graph G whose vertices are labeled with strings (referred as labeled graph), the matching and the approximate matching of s to G ask for a path (not necessarily simple) in G that represents s, that is by concatenating the labels of the vertices on the path we obtain s or an approximate occurrence of s.

The matching and the approximate matching of a query string to a labeled graph have applications in different areas, from graph databases and data mining to genome research. The problems have been introduced in the context of pattern matching in hypertext [1, 3, 10, 15], but have found recently new applications. Indeed in computational biology a representation of variants of related sequences is often provided by a labeled graph [11, 17] and the query of a string in a labeled graph has found application in computational pan-genomics [13, 18].

The exact matching problem is known to be in P [1, 3, 15]. Furthermore, conditional lower bounds for this problem has been recently given in [7].

The approximate string to graph matching problem, referred to String to Graph Approximate Matching, has the goal of minimizing the number of edit operations (of the query string or of the labels of the graph) such that there exists a path p in G whose labels match the query string. String to Graph Restricted Approximate Matching denotes the variant where edit operations are allowed only on the graph labels. String to Graph Approximate Matching and String to Graph Restricted Approximate Matching are known to be NP-hard [12], even for binary alphabet [9]. When the edit operations are allowed only on the query string, then String to Graph Approximate Matching is polynomial-time solvable [9]. Moreover, when the input graph is a Directed Acyclic Graph (DAG), String to Graph Approximate Matching and String to Graph Restricted Approximate Matching are polynomial-time solvable [10].

In this contribution, we consider the String to Graph Approximate Matching problem and the String to Graph Restricted Approximate Matching problem, with the goal of deepening the understanding of their complexity. Notice that the edit operations we consider are symbol substitutions of the graph labels or of the query string. Other variants with different edit operations have been considered in literature [3, 9].

We introduce a variant of String to Graph Restricted Approximate Matching, called String to Graph Compatibility Matching, that asks whether it is possible to find an occurrence of a query string in a graph with any number of edit operations of the graph labels. This decision problem is helpful to characterize whether a feasible solution of String to Graph Restricted Approximate Matching exists or not. We show in Sect. 3 that String to Graph Compatibility Matching is NP-complete, even when the labels of the graph have length one or when the alphabet is binary. The reduction shows also that String to Graph Compatibility Matching when parameterized by the length of the query string is unlikely to have a polynomial kernelFootnote 1 (for details on kernelization we refer to [6, 14]). A consequence of the intractability of String to Graph Compatibility Matching is that String to Graph Restricted Approximate Matching cannot be approximated within any factor in polynomial time. Notice that if we allow edit operations of the query string, then the existence of a path that represents an approximate matching of the query string can be decided in polynomial time. Indeed, it is enough to check whether the input graph contains a (non necessarily simple) path p in G that represents a string of length |s|.

We consider in Sect. 4 the parameterized complexity of String to Graph Restricted Approximate Matching and of String to Graph Approximate Matching and we show that they are W[2]-hard when parameterized by the number of edit operations, even for a labeled graph having distance one from a DAG. This result shows that, while String to Graph Restricted Approximate Matching and String to Graph Approximate Matching are solvable in polynomial time when the labeled graph is a DAG [10], even for graphs that are very close to DAG they become hard. The reduction designed to prove this latter result allows us to show that String to Graph Approximate Matching is not approximable within factor \(\varOmega (\log (|V|))\) and \(\varOmega (\log (|s|))\), for a labeled graph \(G=(V,E)\) and a query string s.

In Sect. 5, we provide a fixed-parameter tractable algorithm for String to Graph Compatibility Matching, when parameterized by size of the query string and when the graph labels have length one. We conclude the paper in Sect. 6 with some open problems, while in Sect. 2 we introduce some definitions and the problems we are interested in. Some of the proofs are not included due to page limit.

2 Definitions

Given an alphabet \(\varSigma \) and a string s over \(\varSigma \), we denote by |s| the length of s, by s[i], with \(1 \le i \le |s|\), the i-th symbol of s and by s[ij], with \(1 \le i \le j \le |s|\), the substring of s that starts at position i and ends at position j.

Every graph we consider in this paper is directed. Given a graph \(G=(V,E)\) and a vertex \(v \in V\), we define \(N^+(v)= \{ u \in V: (v,u) \in E \}\) and \(N^-(v)= \{ w \in V: (w,v) \in E \}\).

A labeled graph \(G = (V,E,\sigma )\) is a graph whose vertices are labeled with strings, formally assigned by a labeled function \(\sigma : V \rightarrow \varSigma ^*\), where \(\varSigma \) is an alphabet of symbols. Notice that \(\sigma (v)\), with \(v \in V\), denotes the string associated by \(\sigma \) to vertex v. Let \(p = v_1 v_2 \dots v_z\) be a path (non necessarily simple) in G, the set of vertices that induces p is denoted by V(p) and the string associated with p is defined as \(\sigma (p) = \sigma (v_1) \sigma (v_2) \dots \sigma (v_z)\), that is \(\sigma (p)\) is obtained by concatenating the strings that label the vertices of path p.

Consider a string s on alphabet \(\varSigma \) and a labeled graph \(G=(V,E,\sigma )\). We say that a path p in G is an occurrence of s if \(\sigma (p) = s\); in this case we call \(\sigma (p)\) an exact matching of s and we say that p matches s.

An edit operation of a string s is a substitution of the symbol in a position i, with \(1 \le i \le |s|\), of s with a different symbol in \(\varSigma \). An edit operation of \(G=(V,E, \sigma )\) is an edit operation of a string \(\sigma (v)\), with \(v \in V\). A path p in G is an approximate matching of s if, after \(k_1 \ge 0\) edit operations of labels of G, \(\sigma (p) = s'\), where \(s'\) is a string obtained with \(k_2 \ge 0\) edit operations of s. In this case, we say that the approximate matching requires \(k = k_1 + k_2\) edit operations. We say that p in G is a restricted approximate matching of s, if, after after \(k \ge 0\) edit operations to labels of G, \(s = \sigma (p)\) (that is the edit operations are allowed only on the labels of G).

Consider a path p that matches (exactly, approximately or restricted approximately) the query string s. If position i, \(1 \le i \le |s|\), in s and the j-th position, \(1 \le j \le |\sigma (u)|\), of the label of vertex u in p match (possibly after an edit operation), we say that position i is mapped in \(\sigma (u)[j]\); if \(|\sigma (u)|=1\), by slightly abusing the notation, we say that position i is mapped in u.

Next, we define the first combinatorial problem we are interested in.

Problem 1

String to Graph Approximate Matching

Input: A labeled graph \(G=(V,E, \sigma )\) and a query string s, both on alphabet \(\varSigma \).

Output: An approximate matching p of s that requires the minimum number of edit operations.

We define now the variant of the problem, called String to Graph Restricted Approximate Matching, where edit operations are allowed only on the labels of the labeled graph.

Problem 2

String to Graph Restricted Approximate Matching

Input: A labeled graph \(G=(V,E, \sigma )\) and a query string s, both on alphabet \(\varSigma \).

Output: A restricted approximate matching p of s that requires the minimum number of edit operations.

Consider a labeled graph \(G=(V,E,\sigma )\) and a query string s over \(\varSigma \). If there exists a path p in G which is a restricted approximate matching of s, we say that p is compatible with s. Notice that the definition of compatibility does not put any bound on the number of edit operations of graph labels and that no edit operation is allowed on the query string. In this paper, we introduce a decision problem, called String to Graph Compatibility Matching, related to String to Graph Restricted Approximate Matching, that asks whether there exists a path in \(G=(V,E,\sigma )\) compatible with s.

Problem 3

String to Graph Compatibility Matching

Input: A labeled graph \(G=(V,E, \sigma )\), a query string s, both on alphabet \(\varSigma \).

Output: Does there exist a path in G that is compatible with s?

3 Hardness of String to Graph Compatibility Matching

In this section we consider the computational complexity of String to Graph Compatibility Matching and we prove that the problem is indeed NP-complete and it is unlikely to admit a polynomial kernel. This result, as discussed in Theorem 3, is not only interesting to characterize the complexity of String to Graph Compatibility Matching, but also to give insights into the approximation complexity of String to Graph Restricted Approximate Matching.

We start by proving that String to Graph Compatibility Matching is NP-complete when the labels of the graph have length one, via a reduction from the \(h\)-Path problem. The reduction is inspired by that in [3] to prove the NP-hardness of String to Graph Restricted Approximate Matching. Then we modify the reduction so that it holds also for binary alphabet. We recall the definition of \(h\)-Path, which is known to be NP-complete [8].

Problem 4

\(h\)-Path

Input: A directed graph \(G=(V_L,E_L)\).

Output: Does there exist a simple path in \(G_L\) of length h?

3.1 Graph Labels of Length One

Consider a graph \(G_L=(V_L,E_L)\), with \(V_L=\{ v_1^l, \dots , v_n^l \}\), which is an instance of \(h\)-Path, we define an instance of String to Graph Compatibility Matching consisting of a labeled graph \(G=(V,E, \sigma )\) and a query string s.

First, define the alphabet \(\varSigma \) as follows: \( \varSigma = \{ x_i : 1 \le i \le n \} \cup \{ y_i : 1 \le i \le h \}.\)

The labeled graph \(G=(V,E, \sigma )\) is defined as follows:

$$ V = \{ v_i : v_i^l \in V, 1 \le i \le n \}, E = \{ (v_i,v_j) : (v_i^l, v_j^l) \in E_L \}. $$

The labelling function \(\sigma : V \rightarrow \varSigma ^*\) of the graph vertices is defined as follows: \(\sigma (v_i) = x_i \text {, for each } i \text { with } 1 \le i \le n.\)

Finally, we define the query string \(s = y_1 y_2 \dots y_h\).

The following lemma allows us to prove the hardness of String to Graph Compatibility Matching.

Lemma 1

Let \(G_L=(V_L, E_L)\) be a graph instance of \(h\)-Path and let \((G=(V,E,\sigma ),s)\) be the corresponding instance of String to Graph Compatibility Matching. There exists a simple path of length h in \(G_L\) if and only if there exists a path in G compatible with s.

Proof

Consider a simple path \(v_{i_1}^l v_{i_2}^l \dots v_{i_h}^l\) in \(G_L\). Then consider the corresponding path \(v_{i_1} v_{i_2} \dots v_{i_h}\) in G and edit the symbol of each vertex \(v_{i_j}\), with \(1 \le j \le h\), so that it is associated with symbol \(y_i\). It follows that p matches s. Then \(v_{i_1} v_{i_2} \dots v_{i_h}\) is a path of G compatible with s.

Consider a path \(p= v_{i_1} v_{i_2} \dots v_{i_h}\) in \(G_L\) compatible with s. Notice that p must be a simple path, since s consists of h distinct symbols. As a consequence, the corresponding path \(v_{i_1}^l v_{i_2}^l \dots v_{i_h}^l\) in \(G_L\) is a simple path of length h.    \(\square \)

Lemma 1 and the NP-completeness of \(h\)-Path [8] allow to prove the following result.

Theorem 1

String to Graph Compatibility Matching is NP-complete even when the labels of the graph have length one.

Notice that the reduction we have described is also a Polynomial Parameter Transformation [5] from \(h\)-Path parameterized by h to String to Graph Compatibility Matching parameterized by |s|, as \(|s|=h\). Since \(h\)-Path when parameterized by h does not admit a polynomial kernel unless \(NP \subseteq coNP/Poly\) [4], the reduction leads to the following result.

Corollary 1

The String to Graph Compatibility Matching problem parameterized by |s| does not admit a polynomial kernel unless \(NP \subseteq coNP/Poly\) even when the labels of the graph have length one.

3.2 Binary Alphabet

Next, we show that the String to Graph Compatibility Matching problem is NP-complete even on binary alphabet. The reduction is similar to the reduction of the Sect. 3.1, except for the definition of the query string s and the labeling \(\sigma : V \rightarrow \varSigma ^*\) of the labeled graph.

Consider a graph \(G_L=(V_L,E_L)\), with \(V_L=\{ v_1^l, \dots , v_n^l \}\), that is an instance of \(h\)-Path, we define a corresponding instance \((G=(V,E, \sigma ),s)\) of String to Graph Compatibility Matching. The alphabet is binary, hence \(\varSigma = \{ 0,1 \}\). Next, we define the labeled graph \(G=(V,E, \sigma )\). The sets V of vertices and E of edges are defined as in Sect. 3.1. For each \(v_i \in V\), with \(1 \le i \le h\), \(\sigma (v_i) = 0^{h}\), namely it is a string consisting of h occurrences of symbol 0.

The construction of the query string s requires the introduction of strings \(s_i\), with \(1 \le i \le h\), having length h and defined as follows:

$$ s_i[i] = 1; \qquad \quad s_i[j] = 0, \text { with } 1 \le j \le h \text { and } j \ne i. $$

Finally, s is defined as the concatenation of \(s_1\), \(s_2\), \(\dots s_n\), that is \(s = s_1\ s_2\ \dots s_n\).

Next, we prove the correctness of the reduction.

Lemma 2

Let \(G_L=(V_L, E_L)\) be a graph instance of \(h\)-Path and let \((G=(V,E,\sigma ),s)\) be the corresponding instance of String to Graph Compatibility Matching on binary alphabet. There exists a simple path of length h in \(G_L\) if and only if there exists a path compatible with s in G.

Proof

Consider a simple path \(v_{i_1}^l v_{i_2}^l \dots v_{i_h}^l\) in \(G_L\). Then consider the corresponding path \(v_{i_1} v_{i_2} \dots v_{i_h}\) in G and edit the label of each vertex \(v_{i_j}\), with \(1 \le j \le h\), such that is associated with string \(s_j\). Then the resulting string is an exact match of s, hence \(v_{i_1} v_{i_2} \dots v_{i_h}\) is a path compatible with s.

Consider a path \(p = v_{i_1} v_{i_2} \dots v_{i_h}\) in G that is compatible with s. Since \(\sigma (p)\) must match s after some symbol substitutions and, by construction, \(|\sigma (v_j)| = |s_l|\), for each \(1 \le j \le n\) and \(1 \le l \le h\), it follows that the positions of \(s_l\), \(1 \le l \le h\), are mapped to the positions of \(\sigma (v_{i_t})\), for some t with \(1 \le t \le h\). Moreover, since \(s_l \ne s_q\), with \(t \ne q\), all the vertices in p are distinct and p is a simple path in G of length h. As a consequence the corresponding path \(v_{i_1}^l v_{i_2}^l \dots v_{i_h}^l\) in \(G_L\) is a simple path of length h, thus concluding the proof.    \(\square \)

Thus, based on Lemma 2, we can prove the following result.

Theorem 2

String to Graph Compatibility Matching is NP-complete even on binary alphabet.

The results of Theorems 1 and 2 have a consequence not only on the complexity of String to Graph Compatibility Matching, but also on the approximation of String to Graph Restricted Approximate Matching.

Theorem 3

The String to Graph Restricted Approximate Matching problem cannot be approximated within any factor in polynomial time, unless P = NP, even when the labels of the graph have length one or when the alphabet is binary.

Proof

The NP-completeness of String to Graph Compatibility Matching implies that, given an instance \((G=(V,E,\sigma ), s)\), even deciding whether there exists a feasible solution of String to Graph Restricted Approximate Matching, with any number of edit operations in G, is NP-complete. Hence if there exists a polynomial-time approximation algorithm \(\mathcal {A}\) for String to Graph Restricted Approximate Matching, with some approximation factor \(\alpha \), it follows that \(\mathcal {A}\) can be used to decide the String to Graph Compatibility Matching problem: if \(\mathcal {A}\) returns an approximated solution for String to Graph Restricted Approximate Matching with input (Gs), then it follows that there exists a path in G compatible with s, if \(\mathcal {A}\) does not return an approximated solution for String to Graph Restricted Approximate Matching with input (Gs), then there is no path in G compatible with s. Since String to Graph Compatibility Matching is NP-complete, when the labels of the graph have length one (by Theorem 1) and on binary alphabet (by Theorem 2), then there does not exist a polynomial-time approximation algorithm with any approximation factor for String to Graph Restricted Approximate Matching when the graph labels have length one or when the alphabet is binary, unless P = NP.    \(\square \)

4 Hardness of Parameterization

In this section, we consider the parameterized complexity of String to Graph Restricted Approximate Matching and String to Graph Approximate Matching. The reduction we present allows us to prove that String to Graph Restricted Approximate Matching and String to Graph Approximate Matching, when parameterized by the number of edit operations, are W[2]-hard for a labeled graph having distance one from a DAG. Moreover, the same reduction will allow us to prove that String to Graph Approximate Matching is not approximable within factor \(\varOmega (\log (|V|))\) and \(\varOmega (\log (|s|))\).

We prove these results by presenting a reduction, that is parameterized [6, 14] and approximate preserving [19], from the Minimum Set Cover problem. We recall here the definition of Minimum Set Cover.

Problem 5

Minimum Set Cover

Input: A collection \(C=\{S_1, \dots , S_m\}\) of sets over a universe \(U=\{ u_1, \dots , u_n\}\).

Output: A subcollection \(C'\) of C of minimum cardinality such that for each \(u_i \in U\), with \(1 \le i \le n\), there exists a set in \(C'\) containing \(u_i\).

First, we focus on String to Graph Restricted Approximate Matching, then we show that the same reduction can be applied to String to Graph Approximate Matching.

Given an instance (UC) of Minimum Set Cover, in the following we define an instance \((G=(V,E, \sigma ),s)\) of String to Graph Restricted Approximate Matching (see Fig. 1 for an example). We start by defining the alphabet \(\varSigma \):

$$ \varSigma = \{ x_i: 0 \le i \le m \} \cup \{ y_i: 1 \le i \le n \} \cup \{ z \}. $$

Then, we define the labeled graph \(G=(V,E, \sigma )\):

$$ V = \{v_i : 0 \le i \le m\} \cup \{ v_{i,j}: 1 \le i \le m, 1 \le j \le |S_i| \} $$
$$\begin{aligned} E= & {} \{(v_0 ,v_i) : 1 \le i \le m \} \cup \{ (v_i, v_{i,j}): 1 \le i \le m, 1 \le j \le |S_i| \} \\&\cup \,\{ (v_{i,j},v_0): 1 \le i \le m, 1 \le j \le |S_i| \}. \end{aligned}$$

Now, we define the labeling \(\sigma \) of the vertices of G:

  • \(\sigma (v_i)= x_i\), \(0 \le i \le m\)

  • \(\sigma (v_{i,l})= y_j\), \(1 \le i \le m\), \(1 \le l \le |S_i|\) and \(1 \le j \le n\), where the l-th element of \(S_i\) is \(u_j\) (based on some ordering of the elements in \(S_i\))

The query string s is defined as follows: \(s = x_0\ z\ y_1\ x_0\ z\ y_2 \dots x_0\ z\ y_n\).

Fig. 1.
figure 1

A labeled graph G and a query string s associated with the following instance of Minimum Set Cover: \(U=\{u_1, u_2, u_3, u_4\}\); \(S_1=\{ u_1, u_3, u_4\}\), \(S_2 = \{ u_2, u_3\}\), \(S_3 = \{u_2, u_4\}\). Inside each vertex we represent its label.

First, we prove that the labeled graph G, has distance one from a DAG, that is by removing a vertex of G (namely, \(v_0)\), we obtain a DAG.

Lemma 3

Let (CU) be an instance of Minimum Set Cover and let \((G=(V,E,\sigma ),s)\) be the corresponding instance of String to Graph Restricted Approximate Matching. Then, G has distance one from a DAG.

Next, we present the main result to prove the correctness of the reduction.

Lemma 4

Let (CU) be an instance of Minimum Set Cover and let \((G=(V,E,\sigma ),s)\) be the corresponding instance of String to Graph Restricted Approximate Matching. There exists a cover \(C'\) of U of cardinality \(h < n\) if and only if there exists a solution of String to Graph Restricted Approximate Matching that requires h edit operations.

Proof

We present only one direction of the proof. Consider a path p in G such that p is a restricted approximate matching of s requiring at most h edit operations of the labels of vertices in p. First, we prove some properties of G. If \(v_0\) is removed from G, then the resulting graph \(G'\) contains paths consisting of at most 2 vertices. Since \(|s|= 3n\), there is no path in \(G'\) that can be a restricted approximate matching of s. This implies that at least one position of s is mapped in \(v_0\).

Now, assume that the first vertex of p is not \(v_0\). Assume that the first position of s is mapped in \(v_i\), for some i with \(1 \le i \le m\). By construction, \(p = v_i\ v_{i,j}\ v_0\ v_l\ v_{l,t}\ v_0 \dots \), since \(N^+(v_i) = \{ v_{i,j}: 1 \le j \le |S_j| \}\), \(N^+(v_{i,j}) = \{ v_0 \}\) and \(N^+(v_0) = \{ v_i: 1 \le i \le m \}\). Then each occurrence of a symbol \(y_q\), \(1 \le q \le n\), in s is mapped in \(v_0\), while the symbol associated with \(v_0\) can be at most one of \(y_1 ,\dots , y_n\), thus there is no path in G that starts with a vertex \(v_i\) and that is a restricted approximate matching of s.

Assume that the first vertex of p is some vertex \(v_{i,j}\), with \(1 \le i \le m\) and \(1 \le j \le |S_i|\). By construction, \(p = v_{i,j}\ v_0\ v_l\ v_{l,t}\ v_0 \dots \). Hence each position of s containing z is mapped in vertex \(v_0\), while each position of s containing \(y_t\), \(1 \le t \le n\), is mapped in a vertex \(v_q\), with \(1 \le q \le m\). This last mapping requires \(n > h\) edit operations of labels of vertices of G, violating the hypothesis that at most \(h < n\) edit operations are applied.

We can conclude that if p is a restricted approximate matching of s requiring \(h < n\) edit operations, then \(v_0\) must be the first vertex of p. It follows that each label of a vertex \(v_i\), \(1 \le i \le n\), in path p must be edited to z. Consider the case that position t of s, \(1 \le t \le |s|\), where \(s[t] = y_q\), \(1 \le q \le n\), is mapped to some vertex \(v_{i,j}\), with \(1 \le i \le m\) and \(1 \le j \le |S_i|\), such that \(\sigma (v_{i,j}) \ne y_q\), and that hence the label of \(v_{i,j}\) is edited to \(y_q\). Let \(v_a\), with \(1 \le a \le m\), be the vertex that precedes \(v_{i,j}\) in p. Then, we can modify p, so that the number of edit operations are not increased, by replacing \(v_a\) with a vertex \(v_b\), with \(1 \le b \le m\), and \(v_{i,j}\) with \(v_{b,l}\), with \(1 \le l \le |S_b|\), so that \(\sigma (v_{b,l}) = y_q\), and by editing the label of \(v_b\) (if it is no already edited) to z. This implies that the only vertices of p whose labels are edited are vertices \(v_i\), \(1 \le i \le m\).

Now, we can define a solution \(C'\) of Minimum Set Cover consisting of h sets as follows: \( C' = \{ S_i: \text { the label of vertex } v_i \text { in } p \text { is edited to } z , 1 \le i \le m \}. \) Since at most h labels of vertices of p are edited (to z), it follows that at most h sets belong to \(C'\). Furthermore, since each vertex with label \(y_j\), \(1 \le j \le n\), is connected to a vertex \(v_i\) in p, \(1 \le i \le m\), by construction it follows that each element of U belongs to some set in \(C'\).    \(\square \)

Based on Lemma 3 and on Lemma 4, we can prove the following result.

Theorem 4

The String to Graph Restricted Approximate Matching problem is W[2]-hard when parameterized by the number of edit operations, even when the input graph has distance one from a DAG.

Proof

Notice that, by Lemma 3, G has distance one from a DAG. The W[2]-hardness of String to Graph Approximate Matching follows from Lemma 4 and from the W[2]-hardness of Minimum Set Cover [16].    \(\square \)

Next, we show that the same reduction allows us to prove the W[2]-hardness and the inapproximability of String to Graph Approximate Matching. Essentially, we will prove that we can avoid edit operations of the query string.

Theorem 5

The String to Graph Approximate Matching problem is W[2]-hard when parameterized by the number of edit operations, even when the input graph has distance one from a DAG. Moreover, String to Graph Approximate Matching cannot be approximated within factor \(\varOmega (\log (|V|))\) and \(\varOmega (\log (|s|))\), unless \(P=NP\), even when the input graph has distance one from a DAG.

5 String to Graph Compatibility Matching Parameterized by |s|

We present a fixed-parameter algorithm for String to Graph Compatibility Matching when parameterized by |s|. We consider the case where each vertex of G is labeled with exactly one symbol (notice that in this case, by Theorem 1, String to Graph Compatibility Matching is NP-complete and, by Corollary 1, String to Graph Compatibility Matching parameterized by |s| does not admit a polynomial kernel unless \(NP \subseteq coNP/Poly\)).

We start by proving an easy property of an instance of String to Graph Compatibility Matching.

Lemma 5

\(|\varSigma | \le |s|\).

The fixed-parameter algorithm is based on the color-coding technique [2] and on dynamic programming. Consider a path p in G that is compatible with s and the set V(p) of vertices that induces p, where \(|V(p)| = k\). It holds \(k \le |s|\), since each position of s is mapped in at least one vertex of p.

We consider a coloring of V with a set of colors \(\{c_1 , \dots , c_k\}\), where, given a vertex \(v \in V\), we denote by c(v) the color assigned to v. Based on color-coding (see Definition 1), we assume that the coloring is colorful, that is each vertex of V(p) is assigned a distinct color in \(\{c_1 , \dots , c_k\}\).

Now, each color \(c_i\), with \(1 \le i \le k\), is associated by a function r: \(\{ c_1, \dots , c_k\} \rightarrow \) \(\varSigma \), with a symbol in \( \varSigma \), that represents the fact that the vertices of p that are colored by \(c_i\), with \(1 \le i \le k\), must match a position of s containing symbol \(r(c_i)\). In this case we say that p satisfies r. The algorithm iterates over the possible colorings of graph G based on a family of perfect hash functions and over the possible functions r.

Now, given a coloring of G and a function r, define a function \(M_{r}[i,v]\), with \(1 \le i \le |s|\) and \(v \in V\), as follows. \(M_{r}[i,v]\) is equal to 1 if there exists a path p of G that is compatible with s[1, i] and such that (1) position i of s is mapped in v, and (2) p satisfies r; else \(M_{r}[i,v]=0\). Notice that, since s[1, i] is mapped in v, it follows that v is the last vertex of p. Next, we describe the recurrence to compute \(M_{r}[i,v]\). For \(i \ge 2\), if \(r(c(v)) \ne s[i]\), then \(M_{r}[i,v] = 0\); if \(r(c(v)) = s[i]\), then:

$$ M_{r}[i,v] = \bigvee _{u \in V:(u,v) \in E} M_{r}[i-1,u] $$

In the base case, it holds \(M_{r}[1,v] = 1\) if and only if \(r(c(v))= s[1]\), else \(M_{r}[1,v] = 0\). Next, we prove the correctness of the recurrence.

Lemma 6

\(M_{r}[i,v]\) is equal to 1 if and only if there exists a path p of G that is compatible with s[1, i] and such that (1) position i of s is mapped in v, and (2) p satisfies r.

In order to compute a colorful coloring of G, we consider a perfect family of hash functions for the set of vertices of G.

Definition 1

Let \(G=(V,E, \sigma )\) be a labeled graph and let \(C=\{c_1, \dots , c_k \}\) be a set of colors. A family F of hash functions from V to C is called perfect if for each subset \(V' \subseteq V\), with \(|V'|=k\), there exists a function \(f \in F\) such that for each \(x,y \in V'\), with \(x \ne y\), \(f(x)=c_i\), \(f(y)=c_j\), with \(1 \le i,j \le k\) and \(i \ne j\).

It has been shown in [2] that a perfect family F of hash functions from V to C, having size \( 2^{O(k)}O(\log |V| )\), can be computed in time \(2^{O(k)} O( |V| \log |V|)\). From Lemma 6 and by using a perfect family of hash functions to color the vertices in G, we can prove the main result of this section.

Theorem 6

The String to Graph Compatibility Matching problem can be decided in time \(2^{O(|s|)} O(|s|^{|s|+1} |V|^2 \log |V|)\).

6 Conclusion

In this contribution we have presented results on the tractability of the approximate matching of a query string to a labeled graph. There are several open questions related to variants of this problem. It will be interesting to further investigate the approximability of String to Graph Approximate Matching, since it can be trivially approximated within factor |s| in polynomial time, while it cannot be approximated within factor \(\varOmega (\log (|s|))\), unless P = NP. Another interesting open question is to investigate the parameterized complexity of String to Graph Approximate Matching when the edit operations are not restricted to symbol substitutions, but include symbol insertions and deletions.