Comparing heuristics for graph edit distance computation

Blumenthal, David B.; Boria, Nicolas; Gamper, Johann; Bougleux, Sébastien; Brun, Luc

doi:10.1007/s00778-019-00544-1

Comparing heuristics for graph edit distance computation

Special Issue Paper
Published: 15 July 2019

Volume 29, pages 419–458, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

1234 Accesses
27 Citations
Explore all metrics

Abstract

Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is NP-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with error correction, local search, and linear programming to approximate GED via upper or lower bounds. In this paper, we provide a systematic overview of the most important heuristics. Moreover, we empirically evaluate all compared heuristics within an integrated implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 16

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

The p-Median Problem

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Article Open access 13 August 2018

Notes

As BRANCH-CONST was proposed before BRANCH and BRANCH-FAST, it is in fact more correct to say that BRANCH and BRANCH-FAST generalize BRANCH-CONST to arbitrary edit costs. For the sake of simplicity, we here change the order of presentation.
In the original publications, this technique is suggested for the LSAPE instance produced by BP (cf. Sect. 5.2.2). It can, however, be employed in combination with the LSAPE instances produced by any instantiation of LSAPE-GED.
In [58], SA is presented as a technique for improving the upper bound computed by the LSAPE-GED instantiation BP. Since SA can be used with any instantiation of LSAPE-GED, we here present a more general version.
To be precise, we tested 19 algorithms that compute lower bounds and 173 algorithms that compute upper bounds. The reason for this is that the extensions of the paradigms LSAPE-GED and LS-GED only affect the upper bounds.

References

Abu-Aisheh, Z., Gaüzere, B., Bougleux, S., Ramel, J.Y., Brun, L., Raveaux, R., Héroux, P., Adam, S.: Graph edit distance contest 2016: results and future challenges. Pattern Recognit. Lett. 100, 96–103 (2017). https://doi.org/10.1016/j.patrec.2017.10.007
Article Google Scholar
Abu-Aisheh, Z., Raveaux, R., Ramel, J.: A graph database repository and performance evaluation metrics for graph edit distance. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 138–147 (2015). https://doi.org/10.1007/978-3-319-18224-7_14
Chapter Google Scholar
Babai, L.: Graph isomorphism in quasipolynomial time [extended abstract]. In: Wichs, D., Mansour, Y. (eds.) STOC 2016. ACM, New York, pp. 684–697 (2016). https://doi.org/10.1145/2897518.2897542
Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: Ring based approximation of graph edit distance. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds.) S+SSPR 2018, LNCS, vol. 11004. Springer, Cham, pp. 293–303 (2018). https://doi.org/10.1007/978-3-319-97785-0_28
Google Scholar
Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: Upper bounding GED via transformations to LSAPE based on rings and machine learning (2019)
Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: GEDLIB: a C++ library for graph edit distance computation. In: Conte, D., Ramel, J.Y., Foggia, P. (eds.) Graph-Based Representations in Pattern Recognition. GbRPR 2019. Lecture Notes in Computer Science, vol. 11510, pp. 14–24. Springer, Cham (2019)
Chapter Google Scholar
Blumenthal, D.B., Daller, E., Bougleux, S., Brun, L., Gamper, J.: Quasimetric graph edit distance as a compact quadratic assignment problem. In: ICPR 2018. IEEE Computer Society, pp. 934–939 (2018). https://doi.org/10.1109/ICPR.2018.8546055
Blumenthal, D.B., Gamper, J.: Correcting and speeding-up bounds for non-uniform graph edit distance. In: ICDE 2017. IEEE Computer Society, pp. 131–134 (2017). https://doi.org/10.1109/ICDE.2017.57
Blumenthal, D.B., Gamper, J.: Improved lower bounds for graph edit distance. IEEE Trans. Knowl. Data Eng. 30(3), 503–516 (2018). https://doi.org/10.1109/TKDE.2017.2772243
Article Google Scholar
Blumenthal, D.B., Gamper, J.: On the exact computation of the graph edit distance. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.05.002
Article Google Scholar
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987). https://doi.org/10.1086/228631
Article Google Scholar
Boria, N., Blumenthal, D.B., Bougleux, S., Brun, L.: Improved local search for graph edit distance (2019). Submitted. arXiv:1907.02929
Boria, N., Bougleux, S., Brun, L.: Approximating GED using a stochastic generator and multistart IPFP. In: Bai, X., Hancock, E.R., Ho, T.K., Wilson, R.C., Biggio, B., Robles-Kelly, A. (eds.) S+SSPR 2018. Springer, Cham, pp. 460–469 (2018). https://doi.org/10.1007/978-3-319-97785-0_44
Google Scholar
Bougleux, S., Brun, L., Carletti, V., Foggia, P., Gaüzère, B., Vento, M.: Graph edit distance as a quadratic assignment problem. Pattern Recognit. Lett. 87, 38–46 (2017). https://doi.org/10.1016/j.patrec.2016.10.001
Article Google Scholar
Bougleux, S., Gaüzère, B., Blumenthal, D.B., Brun, L.: Fast linear sum assignment with error-correction and no cost constraints. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.03.032
Article Google Scholar
Bougleux, S., Gaüzère, B., Brun, L.: Graph edit distance as a quadratic program. In: ICPR 2016. IEEE Computer Society, pp. 1701–1706 (2016). https://doi.org/10.1109/ICPR.2016.7899881
Bougleux, S., Gaüzère, B., Brun, L.: A Hungarian algorithm for error-correcting graph matching. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 118–127 (2017). https://doi.org/10.1007/978-3-319-58961-9_11
Chapter Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X
Article Google Scholar
Brun, L., Foggia, P., Vento, M.: Trends in graph-based representations for pattern recognition. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.03.016
Article Google Scholar
Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognit. Lett. 1(4), 245–253 (1983). https://doi.org/10.1016/0167-8655(83)90033-8
Article MATH Google Scholar
Carletti, V., Gaüzère, B., Brun, L., Vento, M.: Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 188–197 (2015). https://doi.org/10.1007/978-3-319-18224-7_19
Chapter Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011). https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18(3), 265–298 (2004). https://doi.org/10.1142/S0218001404003228
Article Google Scholar
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75
Article Google Scholar
Cortés, X., Serratosa, F., Moreno-García, C.F.: On the influence of node centralities on graph edit distance for graph classification. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 231–241 (2015). https://doi.org/10.1007/978-3-319-18224-7_23
Chapter Google Scholar
Daller, É., Bougleux, S., Gaüzère, B., Brun, L.: Approximate graph edit distance by several local searches in parallel. In: Fred, A., di Baja, G.S., Marsico, M.D. (eds.) ICPRAM 2018. SciTePress, pp. 149–158 (2018). https://doi.org/10.5220/0006599901490158
Ferrer, M., Serratosa, F., Riesen, K.: A first step towards exact graph edit distance using bipartite graph matching. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 77–86 (2015). https://doi.org/10.1007/978-3-319-18224-7_8
Chapter Google Scholar
Fischer, A., Suen, C.Y., Frinken, V., Riesen, K., Bunke, H.: Approximation of graph edit distance based on Hausdorff matching. Pattern Recognit. 48(2), 331–343 (2015). https://doi.org/10.1016/j.patcog.2014.07.015
Article MATH Google Scholar
Foggia, P., Percannella, G., Vento, M.: Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recognit. Artif. Intell. 28(1), 1450001:1–1450001:40 (2014). https://doi.org/10.1142/S0218001414500013
Article MathSciNet Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109
Article MathSciNet Google Scholar
Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010). https://doi.org/10.1007/s10044-008-0141-y
Article MathSciNet MATH Google Scholar
Gaüzère, B., Bougleux, S., Riesen, K., Brun, L.: Approximate graph edit distance guided by bipartite matching of bags of walks. In: Fränti, P., Brown, G., Loog, M., Escolano, F., Pelillo, M. (eds.) S+SSPR 2014, LNCS, vol. 8621. Springer, Cham, pp. 73–82 (2014). https://doi.org/10.1007/978-3-662-44415-3_8
Google Scholar
Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http://eigen.tuxfamily.org. Accessed 5 July 2019
Gurobi Optimization LLC: Gurobi Optimizer Reference Manual. http://www.gurobi.com. Accessed 5 July 2019
Henry, E.R.: Classification and Uses of Finger Prints. Routledge, London (1900)
Google Scholar
Justice, D., Hero, A.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1200–1214 (2006). https://doi.org/10.1109/TPAMI.2006.152
Article Google Scholar
Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinatorica 4(4), 373–396 (1984). https://doi.org/10.1007/BF02579150
Article MathSciNet MATH Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955). https://doi.org/10.1002/nav.3800020109
Article MathSciNet MATH Google Scholar
Le Digabel, S.: Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm. ACM Trans. Math. Softw. 37(4), 44:1–44:15 (2011). https://doi.org/10.1145/1916461.1916468
Article MathSciNet MATH Google Scholar
Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Nabeshima, K., Nagasaka, K., Winkler, F., Szántó, Á. (eds.) ISSAC 2014. ACM, pp. 296–303 (2014). https://doi.org/10.1145/2608628.2608664
Lee, L., Lumsdaine, A., Siek, J.: The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley Longman, Boston (2002)
Google Scholar
Leordeanu, M., Hebert, M., Sukthankar, R.: An integer projected fixed point method for graph matching and MAP inference. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS 2009. Curran Associates, pp. 1114–1122 (2009)
Lerouge, J., Abu-Aisheh, Z., Raveaux, R., Héroux, P., Adam, S.: Exact graph edit distance computation using a binary linear program. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016, LNCS, vol. 10029. Springer, Cham, pp. 485–495 (2016). https://doi.org/10.1007/978-3-319-49055-7_43
Google Scholar
Lerouge, J., Abu-Aisheh, Z., Raveaux, R., Héroux, P., Adam, S.: New binary linear programming formulation to compute the graph edit distance. Pattern Recognit. 72, 254–265 (2017). https://doi.org/10.1016/j.patcog.2017.07.029
Article Google Scholar
Lin, C.L.: Hardness of approximating graph transformation problem. In: Du, D.Z., Zhang, X.S. (eds.) Algorithms and Computation, LNCS, vol. 834. Springer, Berlin, pp. 74–82 (1994). https://doi.org/10.1007/3-540-58325-4_168
Chapter Google Scholar
Munkres, J.: Algorithms for the assignment and transportation problems. SIAM J. Appl. Math. 5(1), 32–38 (1957). https://doi.org/10.1137/0105003
Article MathSciNet MATH Google Scholar
Nissen, S.: Implementation of a Fast Artificial Neural Network Library (FANN). Technical report, Department of Computer Science, University of Copenhagen (2003). http://fann.sourceforge.net/report/
Ozdemir, E., Gunduz-Demir, C.: A hybrid classification model for digital pathology using structural and statistical pattern recognition. IEEE Trans. Med. Imaging 32(2), 474–483 (2013). https://doi.org/10.1109/TMI.2012.2230186
Article Google Scholar
Riesen, K.: Structural Pattern Recognition with Graph Edit Distance. Advances in Computer Vision and Pattern Recognition. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27252-8
Book MATH Google Scholar
Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern recognition and machine learning. In: da Vitoria Lobo, N., Kasparis, T., Roli, F., Kwok, J.T., Georgiopoulos, M., Anagnostopoulos, G.C., Loog, M. (eds.) S+SSPR 2008, LNCS, vol. 5342. Springer, Berlin, pp. 287–297 (2008). https://doi.org/10.1007/978-3-540-89689-0_33
Google Scholar
Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27(7), 950–959 (2009). https://doi.org/10.1016/j.imavis.2008.04.004
Article Google Scholar
Riesen, K., Bunke, H.: Graph Classification and Clustering Based on Vector Space Embedding. Series in Machine Perception and Artificial Intelligence, vol. 77. World Scientific, Singapore (2010). https://doi.org/10.1142/7731
Book MATH Google Scholar
Riesen, K., Bunke, H., Fischer, A.: Improving graph edit distance approximation by centrality measures. In: ICPR 2014. IEEE Computer Society, pp. 3910–3914 (2014). https://doi.org/10.1109/ICPR.2014.671
Riesen, K., Ferrer, M.: Predicting the correctness of node assignments in bipartite graph matching. Pattern Recognit. Lett. 69, 8–14 (2016). https://doi.org/10.1016/j.patrec.2015.10.007
Article Google Scholar
Riesen, K., Ferrer, M., Fischer, A., Bunke, H.: Approximation of graph edit distance in quadratic time. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 3–12 (2015). https://doi.org/10.1007/978-3-319-18224-7_1
Chapter Google Scholar
Riesen, K., Fischer, A., Bunke, H.: Combining bipartite graph matching and beam search for graph edit distance approximation. In: Gayar, N.E., Schwenker, F., Suen, C. (eds.) ANNPR 2014, LNCS, vol. 8774. Springer, Cham, pp. 117–128 (2014). https://doi.org/10.1007/978-3-319-11656-3_11
Chapter Google Scholar
Riesen, K., Fischer, A., Bunke, H.: Computing upper and lower bounds of graph edit distance in cubic time. In: Gayar, N.E., Schwenker, F., Suen, C. (eds.) ANNPR 2014, LNCS, vol. 8774. Springer, Heidelberg, pp. 129–140 (2014). https://doi.org/10.1007/978-3-319-11656-3
Google Scholar
Riesen, K., Fischer, A., Bunke, H.: Improved graph edit distance approximation with simulated annealing. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 222–231 (2017). https://doi.org/10.1007/978-3-319-58961-9_20
Chapter Google Scholar
Sanfeliu, A., Fu, K.S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. 13(3), 353–362 (1983). https://doi.org/10.1109/TSMC.1983.6313167
Article MATH Google Scholar
Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D.: BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res. 32(Database–Issue), 431–433 (2004). https://doi.org/10.1093/nar/gkh081
Article Google Scholar
Stauffer, M., Fischer, A., Riesen, K.: A novel graph database for handwritten word images. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016, LNCS, vol. 10029. Springer, Cham, pp. 553–563 (2016). https://doi.org/10.1007/978-3-319-49055-7_49
Google Scholar
Stauffer, M., Tschachtli, T., Fischer, A., Riesen, K.: A survey on applications of bipartite graph edit distance. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 242–252 (2017). https://doi.org/10.1007/978-3-319-58961-9_22
Chapter Google Scholar
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969). https://doi.org/10.1007/BF02165411
Article MathSciNet MATH Google Scholar
Uno, T.: Algorithms for enumerating all perfect, maximum and maximal matchings in bipartite graphs. In: Leong, H.W., Imai, H., Jain, S. (eds.) ISAAC 1997, LNCS, vol. 1350. Springer, Berlin, pp. 92–101 (1997). https://doi.org/10.1007/3-540-63890-3_11
Chapter Google Scholar
Uno, T.: A fast algorithm for enumerating bipartite perfect matchings. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001, LNCS, vol. 2223. Springer, Berlin, pp. 367–379 (2001). https://doi.org/10.1007/3-540-45678-3_32
Chapter Google Scholar
Vento, M.: A long trip in the charming world of graphs for pattern recognition. Pattern Recognit. 48(2), 291–301 (2015). https://doi.org/10.1016/j.patcog.2014.01.002
Article MATH Google Scholar
Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: Kementsietsidis, A., Salles, M.A.V. (eds.) ICDE 2012. IEEE Computer Society, pp. 210–221 (2012). https://doi.org/10.1109/ICDE.2012.28
Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009). https://doi.org/10.14778/1687627.1687631
Article Google Scholar
Zhao, X., Xiao, C., Lin, X., Zhang, W., Wang, Y.: Efficient structure similarity searches: a partition-based approach. VLDB J. 27(1), 53–78 (2018). https://doi.org/10.1007/s00778-017-0487-0
Article Google Scholar
Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Graph similarity search with edit distance constraint in large graph databases. In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) CIKM 2013. ACM, pp. 1595–1600 (2013). https://doi.org/10.1145/2505515.2505723
Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng. 27(4), 964–978 (2015). https://doi.org/10.1109/TKDE.2014.2349924
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano, Italy
David B. Blumenthal & Johann Gamper
Normandie Université, GREYC, ENSICAEN, UNICAEN, Caen, France
Nicolas Boria, Sébastien Bougleux & Luc Brun

Authors

David B. Blumenthal
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Boria
View author publications
You can also search for this author in PubMed Google Scholar
Johann Gamper
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Bougleux
View author publications
You can also search for this author in PubMed Google Scholar
Luc Brun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David B. Blumenthal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Datasets and edit cost functions

The datasets aids and muta: Graphs contained in aids and muta represent molecular compounds. The molecules represented by the graphs contained in aids are divided into the class of molecules that do and the class of molecules that do not exhibit activity against HIV. Similarly, the molecules represented by the graphs contained in muta are divided into the class of molecules that do and the class of molecules that do not cause genetic mutation. The nodes of the graphs contained in aids and muta are labeled with chemical symbols, and their edges are labeled with a valence (either 1 or 2). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=5.5\cdot \delta _{\alpha \ne \alpha ^\prime }\), \(c_V(\alpha ,\varepsilon ):=2.75\), and \(c_V(\varepsilon ,\alpha ^\prime ):=2.75\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=1.65\cdot \delta _{\beta \ne \beta ^\prime }\), \(c_E(\beta ,\varepsilon ):=0.825\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.825\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\).
The dataset protein: Graphs contained in protein represent proteins which are annotated with their EC classes (EC1, EC2, EC3, EC4, EC5, and EC6) [60]. Nodes are labeled with tuples (t, s), where t is the node’s type (helix, sheet, or loop) and s is its amino acid sequence. Nodes are connected via structural or sequential edges or both, i.e., edges \((u_i,u_j)\) are labeled with tuples \((t_1,t_2)\), where \(t_1\) is the type of the first edge connecting \(u_i\) and \(u_j\) and \(t_2\) is the type of the second edge connecting \(u_i\) and \(u_j\) (possibly null). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=16.5\cdot \delta _{\alpha .t\ne \alpha ^\prime .t}+0.75\cdot \delta _{\alpha .t=\alpha ^\prime .t}\cdot \mathrm{LD}(\alpha .s,\alpha ^\prime .s))\), \(c_V(\alpha ,\varepsilon ):=8.25\), and \(c_V(\varepsilon ,\alpha ^\prime ):=8.25\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\), where \(\mathrm{LD}(\cdot ,\cdot )\) is Levenshtein’s string edit distance. Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.25\cdot \mathrm{LSAPE} ({\mathbf{C}} ^{\beta ,\beta ^\prime })\), \(c_E(\beta ,\varepsilon ):=0.25\cdot f(\beta )\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.25\cdot f(\beta ^\prime )\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\), where \(f(\beta ):=1+\delta _{\beta .t_2\ne {\texttt {null}}}\) and \({\mathbf{C}} ^{\beta ,\beta ^\prime }\in {\mathbb {R}} ^{(f(\beta )+1)\times (f(\beta ^\prime )+1)}\) is constructed as \(c^{\beta ,\beta ^\prime }_{r,s}:=2\cdot \delta _{\beta .t_r\ne \beta ^\prime .t_s}\) and \(c^{\beta ,\beta ^\prime }_{r,f(\beta ^\prime )+1}:=c^{\beta ,\beta ^\prime }_{f(\beta )+1,s}:=1\), for all \((r,s)\in [f(\beta )]\times [f(\beta ^\prime )]\).
The dataset letter (h): Graphs contained in letter (h) represent highly distorted drawings of the capital letters A, E, F, H, I, K, L, M, N, T, V, W, X, Y, and Z. Nodes are labeled with two-dimensional Euclidean coordinates. Edges are unlabeled. Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=0.75\cdot \left||\alpha -\alpha ^\prime \right||\), \(c_V(\alpha ,\varepsilon ):=0.675\), and \(c_V(\varepsilon ,\alpha ^\prime ):=0.675\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\), where \(\left||\cdot \right||\) is the Euclidean norm. The edge edit costs \(c_E\) are defined as \(c_E(1,\varepsilon ):=c_E(\varepsilon ,1):=0.425\).
The dataset grec: Graphs contained in grec represent 22 different symbols from electronic and architectural drawings. Nodes are labeled with tuples (t, x, y), where t equals one of four node types and (x, y) is a two-dimensional Euclidean coordinate. Nodes are connected via line or arc edges or both, i.e., edges \((u_i,u_j)\) are labeled with tuples \((t_1,t_2)\), where \(t_1\) is the type of the first edge connecting \(u_i\) and \(u_j\) and \(t_2\) is the type of the second edge connecting \(u_i\) and \(u_j\) (possibly null). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=0.5\cdot \left||\alpha .(x,y)-\alpha ^\prime .(x,y)\right||\cdot \delta _{\alpha .t=\alpha ^\prime .t}+90\cdot \delta _{\alpha .t\ne \alpha ^\prime .t}\), \(c_V(\alpha ,\varepsilon ):=45\), and \(c_V(\varepsilon ,\alpha ^\prime ):=45\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.5\cdot \mathrm{LSAPE} ({\mathbf{C}} ^{\beta ,\beta ^\prime })\), \(c_E(\beta ,\varepsilon ):=0.5\cdot f(\beta )\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.5\cdot f(\beta ^\prime )\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\), where \(f(\beta ):=1+\delta _{\beta .t_2\ne {\texttt {null}}}\) and \({\mathbf{C}} ^{\beta ,\beta ^\prime }\in {\mathbb {R}} ^{(f(\beta )+1)\times (f(\beta ^\prime )+1)}\) is constructed as \(c^{\beta ,\beta ^\prime }_{r,s}:=30\cdot \delta _{\beta .t_r\ne \beta ^\prime .t_s}\) and \(c^{\beta ,\beta ^\prime }_{r,f(\beta ^\prime )+1}:=c^{\beta ,\beta ^\prime }_{f(\beta )+1,s}:=15\) for all \((r,s)\in [f(\beta )]\times [f(\beta ^\prime )]\).
The dataset fp: Graphs contained in fp represent fingerprint images which are annotated with their classes (arch, left loop, right loop, and whorl) from the Galton-Henry classification system [35]. Nodes are unlabeled and edges are labeled with an orientation \(\beta \in {\mathbb {R}} \) with \(-\pi /2<\beta \le \pi /2\). Node edit costs are defined as \(c_V(1,\varepsilon ):=c_V(\varepsilon ,1):=0.525\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.5\cdot \min \{|\beta -\beta ^\prime |,\pi -|\beta -\beta ^\prime |\}\), \(c_E(\beta ,\varepsilon ):=0.375\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.375\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\).

Visualization of experiments via dominance graphs

See Figures 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, and 28.

Figures 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, and 28 visualize the transitive reductions of the dominance graphs induced by \(\succ _{\textit{LB}} \) (Figs. 17, 18, 19, 20, 21, 22) and \(\succ _{\textit{UB}} \) (Figs. 23, 24, 25, 26, 27, 28) and hence provide more detailed views on the results of the experiments reported in Sects. 9.5 and 9.6. In the dominance graphs, instantiations of LSAPE-GED are displayed black on white, instantiations of LP-GED are displayed black on light gray, instantiations of LS-GED are displayed white on dark gray, and miscellaneous heuristics are displayed white on black. For all algorithms instantiating LSAPE-GED, we display the configuration \((K,\gamma )\) of the extensions MULTI-SOL and CENTRALITIES in addition to the name of the heuristic. Similarly, for all algorithms instantiating LS-GED, we display the configuration \((K,\rho ,L,\eta )\) of the extensions MULTI-START and RANDPOST. Recall that instantiations of LSAPE-GED are run without extensions just in case \((K,\gamma )=(1,0)\) and that instantiations of LS-GED are run without extensions just in case \((K,\rho ,L,\eta )=(1,1,0,0)\) (cf. Sect. 9.2 for more details). For Pareto optimal algorithms, we also show the test metrics \(d_{{\textit{LB}} |{\textit{UB}}}\), t, and \(c_{{\textit{LB}} |{\textit{UB}}}\), and the joint score \(s_{{\textit{LB}} |{\textit{UB}}}\).

As the extensions MULTI-SOL and CENTRALITIES of LSAPE-GED improve the computed upper bounds at the price of increased runtimes but have no effect on the obtained lower bounds, for all instantiations of LSAPE-GED, we only show the baseline configurations \((K,\gamma )=(1,0)\) in the dominance graphs induced by \(\succ _{\textit{LB}} \). In the dominance graphs induced by \(\succ _{\textit{UB}} \), for each heuristic H, we only display those configurations that are Pareto optimal (i.e., maximal w.r.t. \(\succ _{\textit{UB}} \)) or have a maximal joint score \(s_{\textit{UB}} \) among all tested configurations of H.

In the transitive reduction of the dominance graphs induced by \(\succ _{\textit{LB}} \), we draw an arc from \({{\texttt {ALG}}} _1\) to \({{\texttt {ALG}}} _2\) just in case \({{\texttt {ALG}}} _1\succ _{\textit{LB}} {{\texttt {ALG}}} _2\) and there is no algorithm \({{\texttt {ALG}}} _3\) such that \({{\texttt {ALG}}} _1\succ _{\textit{LB}} {{\texttt {ALG}}} _3\succ _{\textit{LB}} {{\texttt {ALG}}} _2\). Arcs are blue if, additionally, \({{\texttt {ALG}}} _1\) yielded a tighter lower bound than \({{\texttt {ALG}}} _2\), red if \({{\texttt {ALG}}} _1\) was faster than \({{\texttt {ALG}}} _2\), and green if \({{\texttt {ALG}}} _1\) had a better classification coefficient than \({{\texttt {ALG}}} _2\). Multicolored arcs indicate that several of these relations holds. The graphs are oriented from left to right, such that an algorithm is Pareto optimal just in case it appears in the leftmost layer. The colored labels \(d^\star _{{\textit{LB}}}\), \(t^\star _{{\textit{LB}}}\), and \(c^\star _{{\textit{LB}}}\) highlight those Pareto optimal algorithms that, respectively, yielded the tightest lower bound, exhibited the best runtime behavior among all heuristics that compute lower bounds, or gave the best lower bound classification coefficient. The dominance graphs induced by \(\succ _{\textit{UB}} \) are constructed analogously.

Example 3

Figure 16 exemplifies the visualizations of the dominance graphs induced by \(\succ _{\textit{LB}} \) and \(\succ _{\textit{UB}} \). It shows a snapshot of the dominance graph induced by \(\succ _{\textit{UB}} \) on the dataset fp shown in Fig. 24. We see that IPFP run with the configuration \((K,\rho ,L,\eta )=(40,1,0,0)\) was Pareto optimal on fp. The blue label \(d^\star _{{\textit{LB}}}\) indicates that IPFP (40, 1, 0, 0) computed the tightest average upper bound on fp; the green label \(c^\star _{{\textit{LB}}}\) tells us that it also yielded the best upper bound classification coefficient. Furthermore, we see that \(t(\texttt {IPFP} \,(40,1,0,0))=1.63\cdot 10^{-2}\hbox {s}\), \(d_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=3.08\), \(c_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=0.11\), and \(s_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=0.67\).

The blue–red arc from IPFP (40, 1, 0, 0) to IBP-BEAM (40, 0.5, 1, 0) tells us that, on fp, IPFP with \((K,\rho ,L,\eta )=(40,1,0,0)\) dominated IBP-BEAM with \((K,\rho ,L,\eta )=(40,0.5,1,0)\). More precisely, we have \(t(\texttt {IBP-BEAM} \,(40,0.5,1,0))>t(\texttt {IPFP} \,(40,1,0,0))=1.63\cdot 10^{-2}\hbox {s}\), \(d_{\textit{UB}} (\texttt {IBP-BEAM} \,(40,0.5,1,0))>d_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=3.08\), and \(c_{\textit{UB}} (\texttt {IBP-BEAM} \,(40,0.5,1,0))=c_{\textit{LB}} (\texttt {IPFP} \,(40,1,0,0))=0.11\). As IPFP and IBP-BEAM instantiate LS-GED, they are shown white on dark gray.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blumenthal, D.B., Boria, N., Gamper, J. et al. Comparing heuristics for graph edit distance computation. The VLDB Journal 29, 419–458 (2020). https://doi.org/10.1007/s00778-019-00544-1

Download citation

Received: 31 December 2018
Revised: 16 May 2019
Accepted: 27 May 2019
Published: 15 July 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s00778-019-00544-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing heuristics for graph edit distance computation

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

The p-Median Problem

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Datasets and edit cost functions

Visualization of experiments via dominance graphs

Example 3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Comparing heuristics for graph edit distance computation

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

The p-Median Problem

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Datasets and edit cost functions

Visualization of experiments via dominance graphs

Example 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation