Abstract
Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is NP-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with error correction, local search, and linear programming to approximate GED via upper or lower bounds. In this paper, we provide a systematic overview of the most important heuristics. Moreover, we empirically evaluate all compared heuristics within an integrated implementation.
Similar content being viewed by others
Notes
As BRANCH-CONST was proposed before BRANCH and BRANCH-FAST, it is in fact more correct to say that BRANCH and BRANCH-FAST generalize BRANCH-CONST to arbitrary edit costs. For the sake of simplicity, we here change the order of presentation.
In the original publications, this technique is suggested for the LSAPE instance produced by BP (cf. Sect. 5.2.2). It can, however, be employed in combination with the LSAPE instances produced by any instantiation of LSAPE-GED.
In [58], SA is presented as a technique for improving the upper bound computed by the LSAPE-GED instantiation BP. Since SA can be used with any instantiation of LSAPE-GED, we here present a more general version.
To be precise, we tested 19 algorithms that compute lower bounds and 173 algorithms that compute upper bounds. The reason for this is that the extensions of the paradigms LSAPE-GED and LS-GED only affect the upper bounds.
References
Abu-Aisheh, Z., Gaüzere, B., Bougleux, S., Ramel, J.Y., Brun, L., Raveaux, R., Héroux, P., Adam, S.: Graph edit distance contest 2016: results and future challenges. Pattern Recognit. Lett. 100, 96–103 (2017). https://doi.org/10.1016/j.patrec.2017.10.007
Abu-Aisheh, Z., Raveaux, R., Ramel, J.: A graph database repository and performance evaluation metrics for graph edit distance. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 138–147 (2015). https://doi.org/10.1007/978-3-319-18224-7_14
Babai, L.: Graph isomorphism in quasipolynomial time [extended abstract]. In: Wichs, D., Mansour, Y. (eds.) STOC 2016. ACM, New York, pp. 684–697 (2016). https://doi.org/10.1145/2897518.2897542
Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: Ring based approximation of graph edit distance. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds.) S+SSPR 2018, LNCS, vol. 11004. Springer, Cham, pp. 293–303 (2018). https://doi.org/10.1007/978-3-319-97785-0_28
Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: Upper bounding GED via transformations to LSAPE based on rings and machine learning (2019)
Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: GEDLIB: a C++ library for graph edit distance computation. In: Conte, D., Ramel, J.Y., Foggia, P. (eds.) Graph-Based Representations in Pattern Recognition. GbRPR 2019. Lecture Notes in Computer Science, vol. 11510, pp. 14–24. Springer, Cham (2019)
Blumenthal, D.B., Daller, E., Bougleux, S., Brun, L., Gamper, J.: Quasimetric graph edit distance as a compact quadratic assignment problem. In: ICPR 2018. IEEE Computer Society, pp. 934–939 (2018). https://doi.org/10.1109/ICPR.2018.8546055
Blumenthal, D.B., Gamper, J.: Correcting and speeding-up bounds for non-uniform graph edit distance. In: ICDE 2017. IEEE Computer Society, pp. 131–134 (2017). https://doi.org/10.1109/ICDE.2017.57
Blumenthal, D.B., Gamper, J.: Improved lower bounds for graph edit distance. IEEE Trans. Knowl. Data Eng. 30(3), 503–516 (2018). https://doi.org/10.1109/TKDE.2017.2772243
Blumenthal, D.B., Gamper, J.: On the exact computation of the graph edit distance. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.05.002
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987). https://doi.org/10.1086/228631
Boria, N., Blumenthal, D.B., Bougleux, S., Brun, L.: Improved local search for graph edit distance (2019). Submitted. arXiv:1907.02929
Boria, N., Bougleux, S., Brun, L.: Approximating GED using a stochastic generator and multistart IPFP. In: Bai, X., Hancock, E.R., Ho, T.K., Wilson, R.C., Biggio, B., Robles-Kelly, A. (eds.) S+SSPR 2018. Springer, Cham, pp. 460–469 (2018). https://doi.org/10.1007/978-3-319-97785-0_44
Bougleux, S., Brun, L., Carletti, V., Foggia, P., Gaüzère, B., Vento, M.: Graph edit distance as a quadratic assignment problem. Pattern Recognit. Lett. 87, 38–46 (2017). https://doi.org/10.1016/j.patrec.2016.10.001
Bougleux, S., Gaüzère, B., Blumenthal, D.B., Brun, L.: Fast linear sum assignment with error-correction and no cost constraints. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.03.032
Bougleux, S., Gaüzère, B., Brun, L.: Graph edit distance as a quadratic program. In: ICPR 2016. IEEE Computer Society, pp. 1701–1706 (2016). https://doi.org/10.1109/ICPR.2016.7899881
Bougleux, S., Gaüzère, B., Brun, L.: A Hungarian algorithm for error-correcting graph matching. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 118–127 (2017). https://doi.org/10.1007/978-3-319-58961-9_11
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X
Brun, L., Foggia, P., Vento, M.: Trends in graph-based representations for pattern recognition. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.03.016
Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognit. Lett. 1(4), 245–253 (1983). https://doi.org/10.1016/0167-8655(83)90033-8
Carletti, V., Gaüzère, B., Brun, L., Vento, M.: Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 188–197 (2015). https://doi.org/10.1007/978-3-319-18224-7_19
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011). https://doi.org/10.1145/1961189.1961199
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18(3), 265–298 (2004). https://doi.org/10.1142/S0218001404003228
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75
Cortés, X., Serratosa, F., Moreno-García, C.F.: On the influence of node centralities on graph edit distance for graph classification. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 231–241 (2015). https://doi.org/10.1007/978-3-319-18224-7_23
Daller, É., Bougleux, S., Gaüzère, B., Brun, L.: Approximate graph edit distance by several local searches in parallel. In: Fred, A., di Baja, G.S., Marsico, M.D. (eds.) ICPRAM 2018. SciTePress, pp. 149–158 (2018). https://doi.org/10.5220/0006599901490158
Ferrer, M., Serratosa, F., Riesen, K.: A first step towards exact graph edit distance using bipartite graph matching. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 77–86 (2015). https://doi.org/10.1007/978-3-319-18224-7_8
Fischer, A., Suen, C.Y., Frinken, V., Riesen, K., Bunke, H.: Approximation of graph edit distance based on Hausdorff matching. Pattern Recognit. 48(2), 331–343 (2015). https://doi.org/10.1016/j.patcog.2014.07.015
Foggia, P., Percannella, G., Vento, M.: Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recognit. Artif. Intell. 28(1), 1450001:1–1450001:40 (2014). https://doi.org/10.1142/S0218001414500013
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109
Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010). https://doi.org/10.1007/s10044-008-0141-y
Gaüzère, B., Bougleux, S., Riesen, K., Brun, L.: Approximate graph edit distance guided by bipartite matching of bags of walks. In: Fränti, P., Brown, G., Loog, M., Escolano, F., Pelillo, M. (eds.) S+SSPR 2014, LNCS, vol. 8621. Springer, Cham, pp. 73–82 (2014). https://doi.org/10.1007/978-3-662-44415-3_8
Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http://eigen.tuxfamily.org. Accessed 5 July 2019
Gurobi Optimization LLC: Gurobi Optimizer Reference Manual. http://www.gurobi.com. Accessed 5 July 2019
Henry, E.R.: Classification and Uses of Finger Prints. Routledge, London (1900)
Justice, D., Hero, A.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1200–1214 (2006). https://doi.org/10.1109/TPAMI.2006.152
Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinatorica 4(4), 373–396 (1984). https://doi.org/10.1007/BF02579150
Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955). https://doi.org/10.1002/nav.3800020109
Le Digabel, S.: Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm. ACM Trans. Math. Softw. 37(4), 44:1–44:15 (2011). https://doi.org/10.1145/1916461.1916468
Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Nabeshima, K., Nagasaka, K., Winkler, F., Szántó, Á. (eds.) ISSAC 2014. ACM, pp. 296–303 (2014). https://doi.org/10.1145/2608628.2608664
Lee, L., Lumsdaine, A., Siek, J.: The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley Longman, Boston (2002)
Leordeanu, M., Hebert, M., Sukthankar, R.: An integer projected fixed point method for graph matching and MAP inference. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS 2009. Curran Associates, pp. 1114–1122 (2009)
Lerouge, J., Abu-Aisheh, Z., Raveaux, R., Héroux, P., Adam, S.: Exact graph edit distance computation using a binary linear program. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016, LNCS, vol. 10029. Springer, Cham, pp. 485–495 (2016). https://doi.org/10.1007/978-3-319-49055-7_43
Lerouge, J., Abu-Aisheh, Z., Raveaux, R., Héroux, P., Adam, S.: New binary linear programming formulation to compute the graph edit distance. Pattern Recognit. 72, 254–265 (2017). https://doi.org/10.1016/j.patcog.2017.07.029
Lin, C.L.: Hardness of approximating graph transformation problem. In: Du, D.Z., Zhang, X.S. (eds.) Algorithms and Computation, LNCS, vol. 834. Springer, Berlin, pp. 74–82 (1994). https://doi.org/10.1007/3-540-58325-4_168
Munkres, J.: Algorithms for the assignment and transportation problems. SIAM J. Appl. Math. 5(1), 32–38 (1957). https://doi.org/10.1137/0105003
Nissen, S.: Implementation of a Fast Artificial Neural Network Library (FANN). Technical report, Department of Computer Science, University of Copenhagen (2003). http://fann.sourceforge.net/report/
Ozdemir, E., Gunduz-Demir, C.: A hybrid classification model for digital pathology using structural and statistical pattern recognition. IEEE Trans. Med. Imaging 32(2), 474–483 (2013). https://doi.org/10.1109/TMI.2012.2230186
Riesen, K.: Structural Pattern Recognition with Graph Edit Distance. Advances in Computer Vision and Pattern Recognition. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27252-8
Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern recognition and machine learning. In: da Vitoria Lobo, N., Kasparis, T., Roli, F., Kwok, J.T., Georgiopoulos, M., Anagnostopoulos, G.C., Loog, M. (eds.) S+SSPR 2008, LNCS, vol. 5342. Springer, Berlin, pp. 287–297 (2008). https://doi.org/10.1007/978-3-540-89689-0_33
Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27(7), 950–959 (2009). https://doi.org/10.1016/j.imavis.2008.04.004
Riesen, K., Bunke, H.: Graph Classification and Clustering Based on Vector Space Embedding. Series in Machine Perception and Artificial Intelligence, vol. 77. World Scientific, Singapore (2010). https://doi.org/10.1142/7731
Riesen, K., Bunke, H., Fischer, A.: Improving graph edit distance approximation by centrality measures. In: ICPR 2014. IEEE Computer Society, pp. 3910–3914 (2014). https://doi.org/10.1109/ICPR.2014.671
Riesen, K., Ferrer, M.: Predicting the correctness of node assignments in bipartite graph matching. Pattern Recognit. Lett. 69, 8–14 (2016). https://doi.org/10.1016/j.patrec.2015.10.007
Riesen, K., Ferrer, M., Fischer, A., Bunke, H.: Approximation of graph edit distance in quadratic time. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 3–12 (2015). https://doi.org/10.1007/978-3-319-18224-7_1
Riesen, K., Fischer, A., Bunke, H.: Combining bipartite graph matching and beam search for graph edit distance approximation. In: Gayar, N.E., Schwenker, F., Suen, C. (eds.) ANNPR 2014, LNCS, vol. 8774. Springer, Cham, pp. 117–128 (2014). https://doi.org/10.1007/978-3-319-11656-3_11
Riesen, K., Fischer, A., Bunke, H.: Computing upper and lower bounds of graph edit distance in cubic time. In: Gayar, N.E., Schwenker, F., Suen, C. (eds.) ANNPR 2014, LNCS, vol. 8774. Springer, Heidelberg, pp. 129–140 (2014). https://doi.org/10.1007/978-3-319-11656-3
Riesen, K., Fischer, A., Bunke, H.: Improved graph edit distance approximation with simulated annealing. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 222–231 (2017). https://doi.org/10.1007/978-3-319-58961-9_20
Sanfeliu, A., Fu, K.S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. 13(3), 353–362 (1983). https://doi.org/10.1109/TSMC.1983.6313167
Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D.: BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res. 32(Database–Issue), 431–433 (2004). https://doi.org/10.1093/nar/gkh081
Stauffer, M., Fischer, A., Riesen, K.: A novel graph database for handwritten word images. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016, LNCS, vol. 10029. Springer, Cham, pp. 553–563 (2016). https://doi.org/10.1007/978-3-319-49055-7_49
Stauffer, M., Tschachtli, T., Fischer, A., Riesen, K.: A survey on applications of bipartite graph edit distance. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 242–252 (2017). https://doi.org/10.1007/978-3-319-58961-9_22
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969). https://doi.org/10.1007/BF02165411
Uno, T.: Algorithms for enumerating all perfect, maximum and maximal matchings in bipartite graphs. In: Leong, H.W., Imai, H., Jain, S. (eds.) ISAAC 1997, LNCS, vol. 1350. Springer, Berlin, pp. 92–101 (1997). https://doi.org/10.1007/3-540-63890-3_11
Uno, T.: A fast algorithm for enumerating bipartite perfect matchings. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001, LNCS, vol. 2223. Springer, Berlin, pp. 367–379 (2001). https://doi.org/10.1007/3-540-45678-3_32
Vento, M.: A long trip in the charming world of graphs for pattern recognition. Pattern Recognit. 48(2), 291–301 (2015). https://doi.org/10.1016/j.patcog.2014.01.002
Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: Kementsietsidis, A., Salles, M.A.V. (eds.) ICDE 2012. IEEE Computer Society, pp. 210–221 (2012). https://doi.org/10.1109/ICDE.2012.28
Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009). https://doi.org/10.14778/1687627.1687631
Zhao, X., Xiao, C., Lin, X., Zhang, W., Wang, Y.: Efficient structure similarity searches: a partition-based approach. VLDB J. 27(1), 53–78 (2018). https://doi.org/10.1007/s00778-017-0487-0
Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Graph similarity search with edit distance constraint in large graph databases. In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) CIKM 2013. ACM, pp. 1595–1600 (2013). https://doi.org/10.1145/2505515.2505723
Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng. 27(4), 964–978 (2015). https://doi.org/10.1109/TKDE.2014.2349924
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Datasets and edit cost functions
-
The datasets aids and muta: Graphs contained in aids and muta represent molecular compounds. The molecules represented by the graphs contained in aids are divided into the class of molecules that do and the class of molecules that do not exhibit activity against HIV. Similarly, the molecules represented by the graphs contained in muta are divided into the class of molecules that do and the class of molecules that do not cause genetic mutation. The nodes of the graphs contained in aids and muta are labeled with chemical symbols, and their edges are labeled with a valence (either 1 or 2). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=5.5\cdot \delta _{\alpha \ne \alpha ^\prime }\), \(c_V(\alpha ,\varepsilon ):=2.75\), and \(c_V(\varepsilon ,\alpha ^\prime ):=2.75\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=1.65\cdot \delta _{\beta \ne \beta ^\prime }\), \(c_E(\beta ,\varepsilon ):=0.825\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.825\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\).
-
The dataset protein: Graphs contained in protein represent proteins which are annotated with their EC classes (EC1, EC2, EC3, EC4, EC5, and EC6) [60]. Nodes are labeled with tuples (t, s), where t is the node’s type (helix, sheet, or loop) and s is its amino acid sequence. Nodes are connected via structural or sequential edges or both, i.e., edges \((u_i,u_j)\) are labeled with tuples \((t_1,t_2)\), where \(t_1\) is the type of the first edge connecting \(u_i\) and \(u_j\) and \(t_2\) is the type of the second edge connecting \(u_i\) and \(u_j\) (possibly null). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=16.5\cdot \delta _{\alpha .t\ne \alpha ^\prime .t}+0.75\cdot \delta _{\alpha .t=\alpha ^\prime .t}\cdot \mathrm{LD}(\alpha .s,\alpha ^\prime .s))\), \(c_V(\alpha ,\varepsilon ):=8.25\), and \(c_V(\varepsilon ,\alpha ^\prime ):=8.25\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\), where \(\mathrm{LD}(\cdot ,\cdot )\) is Levenshtein’s string edit distance. Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.25\cdot \mathrm{LSAPE} ({\mathbf{C}} ^{\beta ,\beta ^\prime })\), \(c_E(\beta ,\varepsilon ):=0.25\cdot f(\beta )\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.25\cdot f(\beta ^\prime )\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\), where \(f(\beta ):=1+\delta _{\beta .t_2\ne {\texttt {null}}}\) and \({\mathbf{C}} ^{\beta ,\beta ^\prime }\in {\mathbb {R}} ^{(f(\beta )+1)\times (f(\beta ^\prime )+1)}\) is constructed as \(c^{\beta ,\beta ^\prime }_{r,s}:=2\cdot \delta _{\beta .t_r\ne \beta ^\prime .t_s}\) and \(c^{\beta ,\beta ^\prime }_{r,f(\beta ^\prime )+1}:=c^{\beta ,\beta ^\prime }_{f(\beta )+1,s}:=1\), for all \((r,s)\in [f(\beta )]\times [f(\beta ^\prime )]\).
-
The dataset letter (h): Graphs contained in letter (h) represent highly distorted drawings of the capital letters A, E, F, H, I, K, L, M, N, T, V, W, X, Y, and Z. Nodes are labeled with two-dimensional Euclidean coordinates. Edges are unlabeled. Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=0.75\cdot \left||\alpha -\alpha ^\prime \right||\), \(c_V(\alpha ,\varepsilon ):=0.675\), and \(c_V(\varepsilon ,\alpha ^\prime ):=0.675\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\), where \(\left||\cdot \right||\) is the Euclidean norm. The edge edit costs \(c_E\) are defined as \(c_E(1,\varepsilon ):=c_E(\varepsilon ,1):=0.425\).
-
The dataset grec: Graphs contained in grec represent 22 different symbols from electronic and architectural drawings. Nodes are labeled with tuples (t, x, y), where t equals one of four node types and (x, y) is a two-dimensional Euclidean coordinate. Nodes are connected via line or arc edges or both, i.e., edges \((u_i,u_j)\) are labeled with tuples \((t_1,t_2)\), where \(t_1\) is the type of the first edge connecting \(u_i\) and \(u_j\) and \(t_2\) is the type of the second edge connecting \(u_i\) and \(u_j\) (possibly null). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=0.5\cdot \left||\alpha .(x,y)-\alpha ^\prime .(x,y)\right||\cdot \delta _{\alpha .t=\alpha ^\prime .t}+90\cdot \delta _{\alpha .t\ne \alpha ^\prime .t}\), \(c_V(\alpha ,\varepsilon ):=45\), and \(c_V(\varepsilon ,\alpha ^\prime ):=45\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.5\cdot \mathrm{LSAPE} ({\mathbf{C}} ^{\beta ,\beta ^\prime })\), \(c_E(\beta ,\varepsilon ):=0.5\cdot f(\beta )\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.5\cdot f(\beta ^\prime )\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\), where \(f(\beta ):=1+\delta _{\beta .t_2\ne {\texttt {null}}}\) and \({\mathbf{C}} ^{\beta ,\beta ^\prime }\in {\mathbb {R}} ^{(f(\beta )+1)\times (f(\beta ^\prime )+1)}\) is constructed as \(c^{\beta ,\beta ^\prime }_{r,s}:=30\cdot \delta _{\beta .t_r\ne \beta ^\prime .t_s}\) and \(c^{\beta ,\beta ^\prime }_{r,f(\beta ^\prime )+1}:=c^{\beta ,\beta ^\prime }_{f(\beta )+1,s}:=15\) for all \((r,s)\in [f(\beta )]\times [f(\beta ^\prime )]\).
-
The dataset fp: Graphs contained in fp represent fingerprint images which are annotated with their classes (arch, left loop, right loop, and whorl) from the Galton-Henry classification system [35]. Nodes are unlabeled and edges are labeled with an orientation \(\beta \in {\mathbb {R}} \) with \(-\pi /2<\beta \le \pi /2\). Node edit costs are defined as \(c_V(1,\varepsilon ):=c_V(\varepsilon ,1):=0.525\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.5\cdot \min \{|\beta -\beta ^\prime |,\pi -|\beta -\beta ^\prime |\}\), \(c_E(\beta ,\varepsilon ):=0.375\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.375\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\).
Visualization of experiments via dominance graphs
See Figures 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, and 28.
Figures 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, and 28 visualize the transitive reductions of the dominance graphs induced by \(\succ _{\textit{LB}} \) (Figs. 17, 18, 19, 20, 21, 22) and \(\succ _{\textit{UB}} \) (Figs. 23, 24, 25, 26, 27, 28) and hence provide more detailed views on the results of the experiments reported in Sects. 9.5 and 9.6. In the dominance graphs, instantiations of LSAPE-GED are displayed black on white, instantiations of LP-GED are displayed black on light gray, instantiations of LS-GED are displayed white on dark gray, and miscellaneous heuristics are displayed white on black. For all algorithms instantiating LSAPE-GED, we display the configuration \((K,\gamma )\) of the extensions MULTI-SOL and CENTRALITIES in addition to the name of the heuristic. Similarly, for all algorithms instantiating LS-GED, we display the configuration \((K,\rho ,L,\eta )\) of the extensions MULTI-START and RANDPOST. Recall that instantiations of LSAPE-GED are run without extensions just in case \((K,\gamma )=(1,0)\) and that instantiations of LS-GED are run without extensions just in case \((K,\rho ,L,\eta )=(1,1,0,0)\) (cf. Sect. 9.2 for more details). For Pareto optimal algorithms, we also show the test metrics \(d_{{\textit{LB}} |{\textit{UB}}}\), t, and \(c_{{\textit{LB}} |{\textit{UB}}}\), and the joint score \(s_{{\textit{LB}} |{\textit{UB}}}\).
As the extensions MULTI-SOL and CENTRALITIES of LSAPE-GED improve the computed upper bounds at the price of increased runtimes but have no effect on the obtained lower bounds, for all instantiations of LSAPE-GED, we only show the baseline configurations \((K,\gamma )=(1,0)\) in the dominance graphs induced by \(\succ _{\textit{LB}} \). In the dominance graphs induced by \(\succ _{\textit{UB}} \), for each heuristic H, we only display those configurations that are Pareto optimal (i.e., maximal w.r.t. \(\succ _{\textit{UB}} \)) or have a maximal joint score \(s_{\textit{UB}} \) among all tested configurations of H.
In the transitive reduction of the dominance graphs induced by \(\succ _{\textit{LB}} \), we draw an arc from \({{\texttt {ALG}}} _1\) to \({{\texttt {ALG}}} _2\) just in case \({{\texttt {ALG}}} _1\succ _{\textit{LB}} {{\texttt {ALG}}} _2\) and there is no algorithm \({{\texttt {ALG}}} _3\) such that \({{\texttt {ALG}}} _1\succ _{\textit{LB}} {{\texttt {ALG}}} _3\succ _{\textit{LB}} {{\texttt {ALG}}} _2\). Arcs are blue if, additionally, \({{\texttt {ALG}}} _1\) yielded a tighter lower bound than \({{\texttt {ALG}}} _2\), red if \({{\texttt {ALG}}} _1\) was faster than \({{\texttt {ALG}}} _2\), and green if \({{\texttt {ALG}}} _1\) had a better classification coefficient than \({{\texttt {ALG}}} _2\). Multicolored arcs indicate that several of these relations holds. The graphs are oriented from left to right, such that an algorithm is Pareto optimal just in case it appears in the leftmost layer. The colored labels \(d^\star _{{\textit{LB}}}\), \(t^\star _{{\textit{LB}}}\), and \(c^\star _{{\textit{LB}}}\) highlight those Pareto optimal algorithms that, respectively, yielded the tightest lower bound, exhibited the best runtime behavior among all heuristics that compute lower bounds, or gave the best lower bound classification coefficient. The dominance graphs induced by \(\succ _{\textit{UB}} \) are constructed analogously.
Example 3
Figure 16 exemplifies the visualizations of the dominance graphs induced by \(\succ _{\textit{LB}} \) and \(\succ _{\textit{UB}} \). It shows a snapshot of the dominance graph induced by \(\succ _{\textit{UB}} \) on the dataset fp shown in Fig. 24. We see that IPFP run with the configuration \((K,\rho ,L,\eta )=(40,1,0,0)\) was Pareto optimal on fp. The blue label \(d^\star _{{\textit{LB}}}\) indicates that IPFP (40, 1, 0, 0) computed the tightest average upper bound on fp; the green label \(c^\star _{{\textit{LB}}}\) tells us that it also yielded the best upper bound classification coefficient. Furthermore, we see that \(t(\texttt {IPFP} \,(40,1,0,0))=1.63\cdot 10^{-2}\hbox {s}\), \(d_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=3.08\), \(c_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=0.11\), and \(s_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=0.67\).
The blue–red arc from IPFP (40, 1, 0, 0) to IBP-BEAM (40, 0.5, 1, 0) tells us that, on fp, IPFP with \((K,\rho ,L,\eta )=(40,1,0,0)\) dominated IBP-BEAM with \((K,\rho ,L,\eta )=(40,0.5,1,0)\). More precisely, we have \(t(\texttt {IBP-BEAM} \,(40,0.5,1,0))>t(\texttt {IPFP} \,(40,1,0,0))=1.63\cdot 10^{-2}\hbox {s}\), \(d_{\textit{UB}} (\texttt {IBP-BEAM} \,(40,0.5,1,0))>d_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=3.08\), and \(c_{\textit{UB}} (\texttt {IBP-BEAM} \,(40,0.5,1,0))=c_{\textit{LB}} (\texttt {IPFP} \,(40,1,0,0))=0.11\). As IPFP and IBP-BEAM instantiate LS-GED, they are shown white on dark gray.
Rights and permissions
About this article
Cite this article
Blumenthal, D.B., Boria, N., Gamper, J. et al. Comparing heuristics for graph edit distance computation. The VLDB Journal 29, 419–458 (2020). https://doi.org/10.1007/s00778-019-00544-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-019-00544-1