Redefining the Graph Edit Distance

Serratosa, Francesc

doi:10.1007/s42979-021-00792-5

Redefining the Graph Edit Distance

Original Research
Open access
Published: 31 August 2021

Volume 2, article number 438, (2021)
Cite this article

Download PDF

You have full access to this open access article

SN Computer Science Aims and scope Submit manuscript

Redefining the Graph Edit Distance

Download PDF

Francesc Serratosa ORCID: orcid.org/0000-0001-6112-5913¹

3008 Accesses
9 Citations
Explore all metrics

This article has been updated

Abstract

Graph edit distance has been used since 1983 to compare objects in machine learning when these objects are represented by attributed graphs instead of vectors. In these cases, the graph edit distance is usually applied to deduce a distance between attributed graphs. This distance is defined as the minimum amount of edit operations (deletion, insertion and substitution of nodes and edges) needed to transform a graph into another. Since now, it has been stated that the distance properties have to be applied [(1) non-negativity (2) symmetry (3) identity and (4) triangle inequality] to the involved edit operations in the process of computing the graph edit distance to make the graph edit distance a metric. In this paper, we show that there is no need to impose the triangle inequality in each edit operation. This is an important finding since in pattern recognition applications, the classification ratio usually maximizes in the edit operation combinations (deletion, insertion and substitution of nodes and edges) that the triangle inequality is not fulfilled.

Greedy Graph Edit Distance

Pattern Recognition Using Graph Edit Distance

Graph Edit Distance or Graph Edit Pseudo-Distance?

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Several kinds of problems have been used to model attributed graphs for more than 4 decades [1,2,3]. This is because they have the flexible ability to define an element through their structural and semantic information. Some reviews have been published through these years [4,5,6,7]. Figure 1 shows several examples of representing an element through an attributed graph. Applications range from optical character recognition to chemical compound toxicity analysis. Other examples are image processing, text analysis or geo-localization. If elements in pattern recognition are modelled through attributed graphs, a distance between them is needed since, in pattern recognition, comparing elements is an essential task. One of the most widely used distances between attributed graphs is the graph edit distance [8,9,10,11].

Graph edit distance depends on some parameters, which have to be tuned to maximize an objective function, which is related on the task to be carried out. In the earliest presentations of the graph edit distance in 1983, [1, 2], this function was considered a metric subject to certain properties on these parameters. A more contemporary book published in 2015 [12] also suggests the same properties. That book analyses some pattern recognition applications and several algorithms that compute the graph edit distance in an optimal and sub-optimal manner.

As an example of how important is to define the graph edit distance as a metric, we could mention the graph databases structured by metric-trees [13]. In this case, if the function to compare attributed graphs is not a metric, some attributed graphs could not be retrieved from the database although they had properly been introduced. Thus, the system performs as these graphs have not never been introduced in the database.

Another example are the pattern recognition methods that compact a set of attributed graphs that have been considered to belong to one class into only one graph representative [14,15,16]. The aim of this type of representations is twofold. First, they want to reduce the runtime of deducing the class of an element to be classified. This is because in these methods, only one comparison is needed between this unclassified graph and each class in the set. In contrast, in other methods, such as the k-nearest neighbours [17], the number of comparisons needed is the number of graphs in the set. Second, they want to increase the representational power of the set. This is because, by compacting the general features of the set, the local discrepancies between the elements in the set tend to be softened. In this case, if the function to compare attributed graphs is not a metric, the representative graph is not properly defined (the construction of the representative is carried out by several comparisons between the graphs in the set) and thus, there is a quality reduction of the pattern recognition task.

For this reason, in the previously commented examples, the graph edit distance parameters are set such as the graph edit distance becomes a metric. If we could somehow relax some of the distance properties while maintaining the graph edit distance a metric, we could extend the parameters’ domain and thus, set them at a value where the query precision becomes higher (in the first example) and the representative power of the set also becomes higher (in the second example). Note we are aware that the dissimilarity between patterns in other pattern recognition applications can be defined as non-metric. Finally, the configuration of the graph edit distance parameters considering some matching algorithms is analysed in [18]. That paper demonstrates that only some graph matching algorithms return a metric although some distance properties are not fulfilled. Thus, it concludes that the selection of the graph matching is decisive not only for the run time point of view but also to assure the deduced graph edit distance between the graphs in the application is a metric.

To summarize, in this paper, we show that some properties usually considered in the graph edit distance [1, 2, 12] can be relaxed while the graph edit distance being a metric. This simplification is central in several pattern recognition applications. This is because all these applications have an objective function to be maximized (or minimized), for instance, the maximization of the recognition ratio or classification accuracy, between others. Thus, it turns out that several machine learning methods have been presented [19,20,21,22,23] to automatically deduce the graph edit distance parameters and, frequently, the objective function of these methods is maximized where these properties do not hold. Due to the findings of this paper, we currently know that the automatically learned graph edit distance parameters makes the graph edit distance to be a metric. The experimental section shows the classification ratio given some graph edit distance parameters and also shows that the maximum classification is achieved at the parameters that we currently can assure the graph edit distance is a metric.

Note that several algorithms have been presented to compute the graph edit distance. This is because its computation has been demonstrated to be NP complete [24]. Basically, there are algorithms that return the exact distance in exponential computational cost [25] and other ones that return an approximation of this value in polynomial computational cost [26,27,28]. The fact of using different edit costs do not influence over the computational cost but only on the algorithm per se. For instance, the computational cost of [25] is exponential, the computational cost of [26,27,28] is cubic and the computational cost of [29] is linear with respect to the number of nodes as it can be deduced in [30, 31]. For this reason, in this paper, the experimental section does not show a comparison between these methods but only shows that the edit cost that return the highest classification rations do not fulfil the triangle inequality. For a specific comparison of graph matching methods, we refer to [32].

Graph Edit Distance

A distance is a function that holds the following properties [33]:

$$\begin{aligned} 1)\;{\text{dist}}\left( {x,y} \right) & \ge 0.\;{\text{Non-negativity}}. \\ 2)\;{\text{dist}}\left( {x,y} \right) & = {\text{dist}}\left( {y,x} \right).\;{\text{Symmetry}}. \\ 3)\;{\text{dist}}\left( {x,y} \right) & = 0 \Leftrightarrow x = y.\;{\text{Identity of indiscernible elements}}: \\ 4)\;{\text{dist}}\left( {x,y} \right) & \le {\text{dist}}\left( {x,z} \right) + {\text{dist}}\left( {z,y} \right).\;{\text{Triangle inequality,}} \\ \end{aligned}$$

(1)

where x, y and z are elements in a specific domain. The distance domain is $[0,\infty$).

An attributed graph $G = \left( {V,E} \right)$ is composed of a set of n nodes $V = \left\{ {v_{1} , \ldots ,v_{n} } \right\}$ and a set of m edges $E = \left\{ {e_{1,2} , \ldots ,e_{t,k} } \right\}.$ Moreover, there is a function on the nodes, $\gamma _{{\text{v}}}$, and on the edges, $\gamma _{{\text{e}}}$, which assign an attribute on these nodes and edges. The graph edit distance between two attributed graphs, GED, $G$ and $G^{'}$ is defined as,

$${\text{GED}}\left( {G,G^{\prime}} \right) = \mathop {\min }\limits_{{\forall \left( {e_{1} ,{ }e_{2} ,{ } \ldots e_{k} } \right) \in E\left( {G,G^{\prime}} \right)}} \left\{ {\mathop {{\text{CED}}\left( {G,G^{\prime}} \right)}\limits_{{\left( {e_{1} ,{ }e_{2} ,{ } \ldots e_{k} } \right)}} } \right\}$$

(2)

where $E\left( {G,G{^{\prime}}} \right)$ denotes the set of all edit paths $\left( {e_{1} ,{ }e_{2} ,{ } \ldots e_{k} } \right)$ that transform $G$ into $G^{'}$ and the cost of the edit path is ${\text{CED}}$,

$$\begin{aligned} \mathop {{\text{CED}}\left( {G,G^{\prime}} \right)}\limits_{{\left( {e_{1} ,{ }e_{2} ,{ } \ldots e_{k} } \right)}} & = \mathop \sum \limits_{{\forall e_{t} \in {\text{vs}}}} C_{{{\text{vs}}}} \left( {e_{t} } \right) + \mathop \sum \limits_{{\forall e_{t} \in {\text{es}}}} C_{{{\text{es}}}} \left( {e_{t} } \right) + \mathop \sum \limits_{{\forall e_{t} \in {\text{vd}}}} C_{{{\text{vd}}}} \left( {e_{t} } \right) \\ & + \mathop \sum \limits_{{\forall e_{t} \in ed}} C_{{{\text{ed}}}} \left( {e_{t} } \right) + \mathop \sum \limits_{{\forall e_{t} \in vi}} C_{{{\text{vi}}}} \left( {e_{t} } \right) + \mathop \sum \limits_{{\forall e_{t} \in ei}} C_{{{\text{ei}}}} \left( {e_{t} } \right), \\ \end{aligned}$$

(3)

The set of node substitutions, deletions and insertions is represented by ${\text{vs}}$, ${\text{vd}}$ and ${\text{vi}}$. Moreover, the costs of these edit operations are represented by $C_{{{\text{vs}}}}$, $C_{{{\text{vd}}}}$ and $C_{{{\text{vi}}}}$. Edit operations and costs on edges have similar definitions: ${\text{es}}$, ${\text{ed}}$, ${\text{ei}}$, $C_{{{\text{es}}}}$ $C_{{{\text{ed}}}}$ and $C_{ei}$.

Figure 2 shows a toy example of a graph edit distance between two simple graphs, which are composed of only two nodes and an edge between them. In this case, the transformation between both graphs is carried out by five edit operations: ${\text{ed}}$, ${\text{vd}}$, ${\text{vi}}$, ${\text{ei}}$ and ${\text{vs}}$. Note that several edit paths transform one graph into the other one that generate different costs (Eq. 3). The edit path that is considered in the graph edit distance is the one that achieves the minimum cost through the previously defined $C_{{{\text{vs}}}}$, $C_{{{\text{vd}}}}$, $C_{{{\text{vi}}}}$, $C_{{{\text{es}}}}$ $C_{{{\text{ed}}}}$ and $C_{{{\text{ei}}}}$ (Eq. 2).

The edit operations $e_{t}$ depend on the attributes on the nodes or on the arcs. For this reason, we define, for the rest of the paper, the following relations:

If $e_{t} \in vs$, then $e_{t} = \left[ {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right]$. $v_{a}^{{}} \in G$ and $v_{i}^{^{\prime}} \in G^{\prime}$.
If $e_{t} \in es$, then $e_{t} = \left[ {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right]$. $e_{ab}^{{}} \in G$ and $e_{ij}^{^{\prime}} \in G^{\prime}$.
If $e_{t} \in vd$, then $e_{t} = \left[ {v_{a}^{{}} } \right]$. $v_{a}^{{}} \in G$.
If $e_{t} \in ed$, then $e_{t} = \left[ {e_{ab}^{{}} } \right]$.${ }e_{ab}^{{}} \in G$.
If $e_{t} \in vi$, then $e_{t} = \left[ {v_{i}^{^{\prime}} } \right]$. $v_{i}^{^{\prime}} \in G^{\prime}$.
If $e_{t} \in ei$, then $e_{t} = \left[ {e_{ij}^{^{\prime}} } \right]$.${ }e_{ij}^{^{\prime}} \in G^{\prime}$.

The distance properties (Eq. 1) are taken into consideration in the first definition of the graph edit distance [1] through the edit operations (Lemma 5 in [1]) and they are rewritten (Sect. 2.1.1 in [12]). Authors propose that if the edit operations fulfil the metric properties (Eq. 1) in their domain, then, it is clear that the graph edit distance is a metric. This is due to the graph edit distance is a linear function of the edit operations. In [1, 12], the distance properties are written in the following three points. Note the nomenclature has been slightly changed with regard to [1]. Besides, we write ${C}_{\cdot }\left(\cdot \right)$ instead of ${C}_{\cdot }\left(\left[\cdot \right]\right)$.

(1)

A.
$C_{{{\text{vs}}}} \left( {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right) > 0$ if $\gamma_{v}^{{}} \left( {v_{a}^{{}} } \right) \ne \gamma_{v}^{^{\prime}} \left( {v_{i}^{^{\prime}} } \right)$.

$C_{{{\text{vs}}}} \left( {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right) = 0$ otherwise.
B.
$C_{{{\text{es}}}} \left( {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right) > 0$ if $\gamma_{e}^{{}} \left( {e_{ab}^{{}} } \right) \ne \gamma_{v}^{^{\prime}} \left( {e_{ij}^{^{\prime}} } \right)$.

$C_{{{\text{es}}}} \left( {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right) = 0$ otherwise.
C.
$C_{{{\text{vd}}}} \left( {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right) > 0$.
D.
$C_{{{\text{ed}}}} \left( {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right) > 0$.
E.
$C_{{{\text{vi}}}} \left( {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right) > 0$.
F.
$C_{{{\text{ei}}}} \left( {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right) > 0$.

(2)

A.
$C_{{{\text{vs}}}} \left( {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right) = C_{{{\text{vs}}}} \left( {v_{i}^{^{\prime}} ,v_{a}^{{}} } \right)$.
B.
$C_{{{\text{es}}}} \left( {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right) = C_{{{\text{es}}}} \left( {e_{ij}^{^{\prime}} ,e_{ab}^{{}} } \right)$.
C.
$C_{{{\text{vd}}}} \left( {v_{a}^{{}} } \right) = C_{{{\text{vi}}}} \left( {v_{i}^{^{\prime}} } \right)$.
D.
$C_{{{\text{ed}}}} \left( {e_{ab}^{{}} } \right) = C_{{{\text{ei}}}} \left( {e_{ij}^{^{\prime}} } \right)$.

(3)

A.
$C_{{{\text{vs}}}} \left( {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right) \le C_{{{\text{vs}}}} \left( {v_{a}^{{}} ,v_{j}^{{^{\prime\prime}}} } \right) + C_{{{\text{vs}}}} \left( {v_{j}^{{^{\prime\prime}}} ,v_{i}^{^{\prime}} } \right)$.
B.
$C_{{{\text{es}}}} \left( {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right) \le C_{{{\text{es}}}} \left( {e_{ab}^{{}} ,e_{kl}^{{^{\prime\prime}}} } \right) + C_{{{\text{es}}}} \left( {e_{kl}^{{^{\prime\prime}}} ,e_{ij}^{^{\prime}} } \right)$.
C.
$C_{{{\text{vs}}}} \left( {v_{a}^{{}} ,v_{i}^{^{\prime}} } \right) \le C_{{{\text{vd}}}} \left( {v_{a}^{{}} } \right) + C_{{{\text{vi}}}} \left( {v_{i}^{^{\prime}} } \right)$.
D.
$C_{{{\text{es}}}} \left( {e_{ab}^{{}} ,e_{ij}^{^{\prime}} } \right) \le C_{{{\text{ed}}}} \left( {e_{ab}^{{}} } \right) + C_{{{\text{ei}}}} \left( {e_{ij}^{^{\prime}} } \right)$.

Theorem 1, defined below, demonstrates that the previously defined point 3.C and point 3.D (the ones that relate the triangle inequality to the deletion, insertion and substitution costs) do not need to be imposed to properly define the graph edit distance. We also comment how this finding influence over the string and tree edit distance.

Theorem 1

The ${\text{GED}}$ is a metric if points 1.A,…,1.F, 2.A,…,2.D, 3.A and 3.B are fulfilled (Points 3.C and 3.D are not need to be fulfilled).

Proof

It is trivial to realise that the non-negativity: ${\text{GED}}\left( {G,G{^{\prime}}} \right) \ge 0$; the identity of indiscernible elements: ${\text{GED}}\left( {G,G{^{\prime}}} \right) = 0$ $\Leftrightarrow$ $G = G{^{\prime}}$; and the symmetry properties: ${\text{GED}}\left( {G,G{^{\prime}}} \right) =$ $GED\left( {G{^{\prime}},G} \right)$ are fulfilled in the graph edit distance in the case that points 1. and 2. are fulfilled.

Thus, we have to prove the triangle inequality, which means that $GED\left( {G,G{^{\prime}}} \right) \le$ ${\text{GED}}\left( {G,G^{\prime\prime}} \right) +$ ${\text{GED}}\left( {G^{\prime\prime},G{^{\prime}}} \right)$ holds. We assume $e_{{G,G{^{\prime}}}} \in E\left( {G,G^{\prime}} \right)$, $e_{{G,G{^{\prime\prime}}}} \in E\left( {G,G^{\prime\prime}} \right)$ and $e_{{G{^{\prime\prime}},G{^{\prime}}}} \in E\left( {G{^{\prime\prime}},G^{\prime}} \right)$ are three edit path that make the minimum cost $CED$ in $E\left( {G,G^{\prime}} \right)$, $E\left( {G,G^{\prime\prime}} \right)$ and $E\left( {G{^{\prime\prime}},G^{\prime}} \right)$, respectively. In other words, the ones used to compute ${\text{GED}}\left( {G,G{^{\prime}}} \right)$, ${\text{GED}}\left( {G,G{^{\prime\prime}}} \right)$ and ${\text{GED}}\left( {G{^{\prime\prime}},G{^{\prime}}} \right)$. We define an edit path ${\text{\^e }}_{{G,G{^{\prime}}}}$ that transforms $G$ into $G{^{\prime}}$ as a concatenation of $e_{{G,G{^{\prime\prime}}}}$ and $e_{{G{^{\prime\prime}},G{^{\prime}}}}$. By construction, it holds that ${\text{GED}}\left( {G,G{^{\prime}}} \right) =$ ${\text{CED}}\left( {G,G^{\prime\prime}} \right) + {\text{CED}}\left( {G{^{\prime\prime}},G^{\prime}} \right)$ and by definition of ${\text{CED}}\left( {G,G^{\prime}} \right)$, it holds that ${\text{GED}}\left( {G,G^{\prime}} \right) \le {\text{CED}}\left( {G,G^{\prime}} \right)$. Therefore, ${\text{GED}}\left( {G,G{^{\prime}}} \right) \le {\text{CED}}\left( {G,G^{\prime}} \right)$ = ${\text{GED}}\left( {G,G^{\prime\prime}} \right) +$ ${\text{GED}}\left( {G^{\prime\prime},G{^{\prime}}} \right)$. □

Strings and trees can be modelled as graphs. In the case of strings, each element in the string is represented as a node in the graph and the order of the string elements is represented through edges in the graph that connect the string elements that are placed side by side. In the case of trees, nodes in the trees are represented as nodes in the graphs and the father—sibling relation is represented as an edge in the graph from the father node to the sibling node. For this reason, Theorem 1 can be applied to strings and trees and thus, we have demonstrated that the tree edit distance and the string edit distance do not need to fulfil the triangle inequality between their edit operations to be considered a metric.

Practical Experiment

The aim of the paper is to show that the triangle inequality applied to the edit operations is not needed to be the graph edit distance a metric. Thus, it could be presented without practical evaluation. Nevertheless, we included this evaluation to show that it is usual to have the optimal configuration of the parameters at the point that this triangle inequality is not fulfilled, and therefore, it is worth this demonstration. Thus, in this section, we show that the highest classification ratios appear in the values of the edit operations, such that the triangle inequality between deletion and insertion operations and substitution operation is not fulfilled. Yet, these values were rarely applied since the graph edit distance was not considered a metric. We observe that the algorithms, such as [34], designed to learn the edit operations, do not intrinsically take into consideration fulfilling the triangle inequality properties but they do, for the other distance properties. Thanks to Theorem 1, we know that the graph edit distance continues to be a metric and therefore it is worth setting the edit operations in the values such that the recognition ratio is maximized (even if the triangle inequality on the edit operations is not fulfilled).

The following five public graph databases have been used: Coil-Rag, Grec, Letter-low, Letter-med and Letter-high. These databases are explained in [32, 35], and their most important features and details are summarized in Table 1. For instance, the number of graphs and classes, the average number of nodes and edges per graph and the number of attributes on nodes.

Table 1 Main features of the testing and learning sets of the databases

Full size table

Table 2 shows the minimum, maximum, mean and standard deviation of the node substitution costs of these databases. Given that the insertion and deletion edit operations are constants, we have to assure that these constants are set such that $C_{{{\text{vd}}}} + C_{{{\text{vi}}}} \ge {\text{max}}\left( {C_{{{\text{vs}}}} } \right)$, to fulfil the triangle inequality on the edit operations. Hence, the minimum value of $C_{{{\text{vd}}}}$ and $C_{{{\text{vi}}}}$ would have to be $C_{{{\text{vd}}}} = C_{{{\text{vi}}}} = 0.5 \max \left( {C_{{{\text{vs}}}} } \right)$. The last column shows $0.5 \max \left( {C_{{{\text{vs}}}} } \right)$ per each database.

Table 2 Different statistics of the node substitution costs $C_{{{\text{vs}}}}$

Full size table

Given these databases, we have computed the classification ratio as follows. We used the 1-nearest neighbour classification algorithm [17] for the classification purposes. Moreover, we have used the fast bipartite graph matching to deduce the graph edit distance between graphs [26, 27]. The edit operations are set as follows: $C_{{{\text{vs}}}}$ is the Euclidean distance between node attributes.$C_{{{\text{es}}}}$ is not defined (these databases do not have attributes on the edges). Finally, we impose the insertion and deletion operations on nodes and edges to have the same value and to be the constant $K_{{\text{v}}}$: $K_{{\text{v}}} = C_{{{\text{vd}}}} = C_{{{\text{ed}}}} = C_{{{\text{vi}}}} = C_{{{\text{ei}}}}$. We first deduced the class of each graph in the test set. Then, the classification ratio is computed as the number of times the graphs in the test set are properly classified, divided by the number of graphs in the test set. To obtain the class of a graph in the test set, it is needed to compute the graph edit distance between this graph and all the graphs in the learning set. We conclude the class of the graph in the test set is the class of the graph in the learning set that has returns the minimum distance.

Figure 3 shows the classification ratio in the vertical axis and the constant $K_{v}$ in the horizontal axis. Each value in the plot represents the classification ratio when the graph edit distance has been computed considering $K_{{\text{v}}}$ as the deletion and insertion costs on nodes and edges. The range of values of $K_{{\text{v}}}$ has been set from the minimum value of the substitution costs (first column in Table 2) to the maximum value of the substitution costs (second column in Table 2). Moreover, the arrow shows the range of values such that the triangle inequality between edit costs are fulfilled. It ranges from the last column of Table 2 to infinitive. Computing each plot took approximately two hours (MacMini, I5, Matlab). We realize that the classification ratio is maximized at the value of the insertion and deletion costs that are much smaller than half of the maximum value of the substitution cost (the last column of Table 2), which is the point that the triangle inequality begins not to be fulfilled. Therefore, we conclude that it is worth knowing that in the whole range of $K_{v}$ the graph edit distance is a metric although the triangle inequality on the edit operations does not hold on the lower part of the range of $K_{{\text{v}}}$.

Discussion and Conclusion

Considering the practical experiment, we could deduce that imposing the insertion and deletion costs (represented as $K_{{\text{v}}}$) to be the mean of half of the substitution cost is a good option since it is at the point where the classification ratio is maximized. Nevertheless, in the past, this setting was not used due to the graph edit distance was not considered to be a metric. This paper relaxes the properties needed to be the graph edit distance a metric and for this reason, we can confirm that at this point, the graph edit distance keeps being a metric. Thus, it is worth using these costs.

Three algorithms have been presented to automatically deduce the edit operations [19,20,21]. Given the topic of this paper, we observe that their automatically found edit operations (the values that maximize the classification ratio) do not fulfil the triangle inequality. Nevertheless, we currently confirm that the graph edit distance is a metric at the insertion and deletion costs optimized by those learning algorithms. Clearly, this paper does not state that in all the applications, the classification ratio is maximized in the point that the triangle inequality between edit operations is not fulfilled. We only state that in these cases, we can consider using the learned parameters since we currently know that the graph edit distance is a metric. Finally, note that the learning methods impose, by construction of the algorithms, to fulfil the non-negativity, the symmetry and the identity of the edit operations.

As a future work, we want to analyse how influences the relaxation of other distance properties, specifically, the symmetry property. Note that in the classification scenarios, we want to know the distance between a graph of the test set and a graph of the learning set. Since, the symmetric distance, that is, the distance between a graph of the learning set and a graph of the test set is never computed, it could be worth not to fulfil the symmetry property at the edit operation level. This means that the insertion costs do not have to be the same as the deletion costs on nodes or edges. To do so, we are computing some experiments similar to the ones presented in Fig. 3 but taking some values for the insertion cost that are different from the deletion cost, that is, $K_{{{\text{vd}}}} = C_{{{\text{vd}}}} \ne K_{{{\text{ve}}}} = C_{{{\text{ed}}}}$.

Change history

17 November 2021
Open access funding note missed in the original publication. Now, it has been added in the section Funding

References

Bunke H, Allermann G. Inexact graph matching for structural pattern recognition. Pattern Recogn Lett. 1983;1(4):245–53.
Article Google Scholar
Sanfeliu A, Fu KS. A Distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern. 1983;13(3):353–62.
Article Google Scholar
Sanfeliu A, Alquézar R, Andrade J, Climent J, Serratosa F, Vergés J. Graph-based representations and techniques for image processing and image analysis. Pattern Recogn. 2002;35(3):639–50.
Article Google Scholar
Conte D, Foggia P, Sansone C, Vento M. Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell. 2004;18(3):265–98.
Article Google Scholar
Vento M. A long trip in the charming world of graphs for pattern recognition. Pattern Recognit. 2015;48:291–301.
Article Google Scholar
Livi L, Rizzi A. The graph matching problem. Pattern Anal Appl. 2013;16(3):253–83.
Article MathSciNet Google Scholar
Foggia P, Percannella G, Vento M. Graph matching and learning in Pattern Recognition in the last 10 years. Int J Pattern Recognit Artif Intell. 2014;28(1):1450001 (40 pages).
Article MathSciNet Google Scholar
Solé A, Serratosa F, Sanfeliu A. On the graph edit distance cost: properties and applications. Int J Pattern Recognit Artif Intell. 2012;26(5):1260004 (21 pages).
Article MathSciNet Google Scholar
Serratosa F, Cortés X. Graph Edit Distance: moving from global to local structure to solve the graph-matching problem. Pattern Recogn Lett. 2015;65:204–10.
Article Google Scholar
Gao X, Xiao B, Tao D, Li X. A survey of graph edit distance. Pattern Anal Appl. 2010;13(1):113–29.
Article MathSciNet Google Scholar
Serratosa F. A general model to define the substitution, insertion and deletion graph edit costs based on an embedded space. Pattern Recogn Lett. 2020;138:115–22.
Article Google Scholar
Riesen K. Structural pattern recognition with graph edit distance. Approximation algorithms and applications. Springer; 2015.
MATH Google Scholar
Serratosa F, Cortés X, Solé-Ribalta A. Component retrieval based on a database of graphs for hand-written electronic-scheme digitalisation. Expert Syst Appl. 2013;40:2493–502.
Article Google Scholar
Sanfeliu A, Serratosa F, Alquézar R. Second-order random graphs for modelling sets of attributed graphs and their application to object learning and recognition. Int J Pattern Recognit Artif Intell. 2004;18(3):375–96.
Article Google Scholar
Serratosa F, Alquézar R, Sanfeliu A. Function-Described Graphs for modelling objects represented by attributed graphs. Pattern Recogn. 2003;36(3):781–98.
Article Google Scholar
Serratosa F, Alquézar R, Sanfeliu A. Synthesis of function-described graphs and clustering of attributed graphs. Int J Pattern Recognit Artif Intell. 2002;16(6):621–55.
Article Google Scholar
Cover TM, Hart PE. Nearest neighbours pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–7.
Article Google Scholar
Serratosa F. Graph edit distance: restrictions to be a metric. Pattern Recogn. 2019;90:250–6.
Article Google Scholar
Cortés X, Serratosa F. Learning graph matching substitution weights based on the ground truth node correspondence. Int J Pattern Recognit Artif Intell. 2016;30(2):1650005 (22 pages).
Article MathSciNet Google Scholar
Cortés X, Serratosa F. Learning graph-matching edit-costs based on the optimality of the oracle’s node correspondences. Pattern Recogn Lett. 2015;56:22–9.
Article Google Scholar
Algabli S, Serratosa F. Embedding the node-to-node mappings to learn the Graph edit distance parameters. Pattern Recogn Lett. 2018;112:353–60.
Article Google Scholar
Santacruz P, Serratosa F. Learning the graph edit costs based on a learning model applied to sub-optimal graph matching. Neural Process Lett. 2020;51:881–904.
Article Google Scholar
Conte D, Serratosa F. Interactive online learning for graph matching using active strategies. Knowl Based Syst. 2020;105:106275.
Article Google Scholar
Garey M, Johnson D. Computers and intractability: a guide to the theory of NP-completeness. Siam Rev. 1979;24:90.
MATH Google Scholar
Ferrer M, Serratosa F, Riesen K. Improving Bipartite graph matching by assessing the assignment confidence. Pattern Recogn Lett. 2015;65:29–36.
Article Google Scholar
Serratosa F. Fast computation of bipartite graph matching. Pattern Recogn Lett. 2014;45:244–50.
Article Google Scholar
Serratosa F. Speeding up Fast bipartite Graph Matching trough a new cost matrix. Int J Pattern Recognit Artif Intell. 2015;29(2):1550010 (17 pages).
Article MathSciNet Google Scholar
Serratosa F. Computation of graph edit distance: reasoning about optimality and speed-up. Image Vis Comput. 2015;40:38–48.
Article Google Scholar
Santacruz P, Serratosa F. Error-tolerant graph matching in linear computational cost using an initial small partial matching. Pattern Recognit Lett. 2020;134:10–9.
Article Google Scholar
Hart P, Nilsson N, Raphael B. A formal basis for the heuristic determination of minimum cost paths. Trans Syst Sci Cybern. 1968;4(2):100–7.
Article Google Scholar
Kuhn HW. The Hungarian method for the assignment problem. Nav Res Log Q. 1955;2:83–97.
Article MathSciNet Google Scholar
Moreno-Garcia C, Cortés X, Serratosa F. A graph repository for learning error-tolerant graph matching. Syn Struct Pattern Recognit SSPR2016 LNCS. 2016;10029:519–29.
MathSciNet Google Scholar
Arkhangel’skii AV, Pontryagin LS. General Topology I: basic concepts and constructions dimension theory, encyclopaedia of mathematical sciences. Springer; 1990.
Book Google Scholar
Neuhaus M, Bunke H. Self-organizing maps for learning the edit costs in graph matching. IEEE Trans Syst Man Cyber Part B (Cybernetics). 2005;35(3):503–14.
Article Google Scholar
Riesen K, Bunke H. IAM graph database repository for graph based pattern recognition and machine learning. In: Structural Syntactic and Statistical Pattern Recognition. Lecture Notes in Computer Science book series (LNCS, volume 5342); 2008, p. 287–97.

Download references

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Author information

Authors and Affiliations

Universitat Rovira i Virgili, Tarragona, Catalonia, Spain
Francesc Serratosa

Authors

Francesc Serratosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesc Serratosa.

Ethics declarations

Conflict of Interest

On behalf of all authors, the author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Serratosa, F. Redefining the Graph Edit Distance. SN COMPUT. SCI. 2, 438 (2021). https://doi.org/10.1007/s42979-021-00792-5

Download citation

Received: 17 December 2020
Accepted: 22 July 2021
Published: 31 August 2021
DOI: https://doi.org/10.1007/s42979-021-00792-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Redefining the Graph Edit Distance

Abstract