Skip to main content
Log in

Comparing heuristics for graph edit distance computation

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is NP-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with error correction, local search, and linear programming to approximate GED via upper or lower bounds. In this paper, we provide a systematic overview of the most important heuristics. Moreover, we empirically evaluate all compared heuristics within an integrated implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28

Similar content being viewed by others

Notes

  1. As BRANCH-CONST was proposed before BRANCH and BRANCH-FAST, it is in fact more correct to say that BRANCH and BRANCH-FAST generalize BRANCH-CONST to arbitrary edit costs. For the sake of simplicity, we here change the order of presentation.

  2. In the original publications, this technique is suggested for the LSAPE instance produced by BP (cf. Sect. 5.2.2). It can, however, be employed in combination with the LSAPE instances produced by any instantiation of LSAPE-GED.

  3. In [58], SA is presented as a technique for improving the upper bound computed by the LSAPE-GED instantiation BP. Since SA can be used with any instantiation of LSAPE-GED, we here present a more general version.

  4. To be precise, we tested 19 algorithms that compute lower bounds and 173 algorithms that compute upper bounds. The reason for this is that the extensions of the paradigms LSAPE-GED and LS-GED only affect the upper bounds.

References

  1. Abu-Aisheh, Z., Gaüzere, B., Bougleux, S., Ramel, J.Y., Brun, L., Raveaux, R., Héroux, P., Adam, S.: Graph edit distance contest 2016: results and future challenges. Pattern Recognit. Lett. 100, 96–103 (2017). https://doi.org/10.1016/j.patrec.2017.10.007

    Article  Google Scholar 

  2. Abu-Aisheh, Z., Raveaux, R., Ramel, J.: A graph database repository and performance evaluation metrics for graph edit distance. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 138–147 (2015). https://doi.org/10.1007/978-3-319-18224-7_14

    Chapter  Google Scholar 

  3. Babai, L.: Graph isomorphism in quasipolynomial time [extended abstract]. In: Wichs, D., Mansour, Y. (eds.) STOC 2016. ACM, New York, pp. 684–697 (2016). https://doi.org/10.1145/2897518.2897542

  4. Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: Ring based approximation of graph edit distance. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds.) S+SSPR 2018, LNCS, vol. 11004. Springer, Cham, pp. 293–303 (2018). https://doi.org/10.1007/978-3-319-97785-0_28

    Google Scholar 

  5. Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: Upper bounding GED via transformations to LSAPE based on rings and machine learning (2019)

  6. Blumenthal, D.B., Bougleux, S., Gamper, J., Brun, L.: GEDLIB: a C++ library for graph edit distance computation. In: Conte, D., Ramel, J.Y., Foggia, P. (eds.) Graph-Based Representations in Pattern Recognition. GbRPR 2019. Lecture Notes in Computer Science, vol. 11510, pp. 14–24. Springer, Cham (2019)

    Chapter  Google Scholar 

  7. Blumenthal, D.B., Daller, E., Bougleux, S., Brun, L., Gamper, J.: Quasimetric graph edit distance as a compact quadratic assignment problem. In: ICPR 2018. IEEE Computer Society, pp. 934–939 (2018). https://doi.org/10.1109/ICPR.2018.8546055

  8. Blumenthal, D.B., Gamper, J.: Correcting and speeding-up bounds for non-uniform graph edit distance. In: ICDE 2017. IEEE Computer Society, pp. 131–134 (2017). https://doi.org/10.1109/ICDE.2017.57

  9. Blumenthal, D.B., Gamper, J.: Improved lower bounds for graph edit distance. IEEE Trans. Knowl. Data Eng. 30(3), 503–516 (2018). https://doi.org/10.1109/TKDE.2017.2772243

    Article  Google Scholar 

  10. Blumenthal, D.B., Gamper, J.: On the exact computation of the graph edit distance. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.05.002

    Article  Google Scholar 

  11. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987). https://doi.org/10.1086/228631

    Article  Google Scholar 

  12. Boria, N., Blumenthal, D.B., Bougleux, S., Brun, L.: Improved local search for graph edit distance (2019). Submitted. arXiv:1907.02929

  13. Boria, N., Bougleux, S., Brun, L.: Approximating GED using a stochastic generator and multistart IPFP. In: Bai, X., Hancock, E.R., Ho, T.K., Wilson, R.C., Biggio, B., Robles-Kelly, A. (eds.) S+SSPR 2018. Springer, Cham, pp. 460–469 (2018). https://doi.org/10.1007/978-3-319-97785-0_44

    Google Scholar 

  14. Bougleux, S., Brun, L., Carletti, V., Foggia, P., Gaüzère, B., Vento, M.: Graph edit distance as a quadratic assignment problem. Pattern Recognit. Lett. 87, 38–46 (2017). https://doi.org/10.1016/j.patrec.2016.10.001

    Article  Google Scholar 

  15. Bougleux, S., Gaüzère, B., Blumenthal, D.B., Brun, L.: Fast linear sum assignment with error-correction and no cost constraints. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.03.032

    Article  Google Scholar 

  16. Bougleux, S., Gaüzère, B., Brun, L.: Graph edit distance as a quadratic program. In: ICPR 2016. IEEE Computer Society, pp. 1701–1706 (2016). https://doi.org/10.1109/ICPR.2016.7899881

  17. Bougleux, S., Gaüzère, B., Brun, L.: A Hungarian algorithm for error-correcting graph matching. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 118–127 (2017). https://doi.org/10.1007/978-3-319-58961-9_11

    Chapter  Google Scholar 

  18. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X

    Article  Google Scholar 

  19. Brun, L., Foggia, P., Vento, M.: Trends in graph-based representations for pattern recognition. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.03.016

    Article  Google Scholar 

  20. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognit. Lett. 1(4), 245–253 (1983). https://doi.org/10.1016/0167-8655(83)90033-8

    Article  MATH  Google Scholar 

  21. Carletti, V., Gaüzère, B., Brun, L., Vento, M.: Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 188–197 (2015). https://doi.org/10.1007/978-3-319-18224-7_19

    Chapter  Google Scholar 

  22. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011). https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  23. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18(3), 265–298 (2004). https://doi.org/10.1142/S0218001404003228

    Article  Google Scholar 

  24. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75

    Article  Google Scholar 

  25. Cortés, X., Serratosa, F., Moreno-García, C.F.: On the influence of node centralities on graph edit distance for graph classification. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 231–241 (2015). https://doi.org/10.1007/978-3-319-18224-7_23

    Chapter  Google Scholar 

  26. Daller, É., Bougleux, S., Gaüzère, B., Brun, L.: Approximate graph edit distance by several local searches in parallel. In: Fred, A., di Baja, G.S., Marsico, M.D. (eds.) ICPRAM 2018. SciTePress, pp. 149–158 (2018). https://doi.org/10.5220/0006599901490158

  27. Ferrer, M., Serratosa, F., Riesen, K.: A first step towards exact graph edit distance using bipartite graph matching. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 77–86 (2015). https://doi.org/10.1007/978-3-319-18224-7_8

    Chapter  Google Scholar 

  28. Fischer, A., Suen, C.Y., Frinken, V., Riesen, K., Bunke, H.: Approximation of graph edit distance based on Hausdorff matching. Pattern Recognit. 48(2), 331–343 (2015). https://doi.org/10.1016/j.patcog.2014.07.015

    Article  MATH  Google Scholar 

  29. Foggia, P., Percannella, G., Vento, M.: Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recognit. Artif. Intell. 28(1), 1450001:1–1450001:40 (2014). https://doi.org/10.1142/S0218001414500013

    Article  MathSciNet  Google Scholar 

  30. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109

    Article  MathSciNet  Google Scholar 

  31. Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010). https://doi.org/10.1007/s10044-008-0141-y

    Article  MathSciNet  MATH  Google Scholar 

  32. Gaüzère, B., Bougleux, S., Riesen, K., Brun, L.: Approximate graph edit distance guided by bipartite matching of bags of walks. In: Fränti, P., Brown, G., Loog, M., Escolano, F., Pelillo, M. (eds.) S+SSPR 2014, LNCS, vol. 8621. Springer, Cham, pp. 73–82 (2014). https://doi.org/10.1007/978-3-662-44415-3_8

    Google Scholar 

  33. Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http://eigen.tuxfamily.org. Accessed 5 July 2019

  34. Gurobi Optimization LLC: Gurobi Optimizer Reference Manual. http://www.gurobi.com. Accessed 5 July 2019

  35. Henry, E.R.: Classification and Uses of Finger Prints. Routledge, London (1900)

    Google Scholar 

  36. Justice, D., Hero, A.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1200–1214 (2006). https://doi.org/10.1109/TPAMI.2006.152

    Article  Google Scholar 

  37. Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinatorica 4(4), 373–396 (1984). https://doi.org/10.1007/BF02579150

    Article  MathSciNet  MATH  Google Scholar 

  38. Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955). https://doi.org/10.1002/nav.3800020109

    Article  MathSciNet  MATH  Google Scholar 

  39. Le Digabel, S.: Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm. ACM Trans. Math. Softw. 37(4), 44:1–44:15 (2011). https://doi.org/10.1145/1916461.1916468

    Article  MathSciNet  MATH  Google Scholar 

  40. Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Nabeshima, K., Nagasaka, K., Winkler, F., Szántó, Á. (eds.) ISSAC 2014. ACM, pp. 296–303 (2014). https://doi.org/10.1145/2608628.2608664

  41. Lee, L., Lumsdaine, A., Siek, J.: The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley Longman, Boston (2002)

    Google Scholar 

  42. Leordeanu, M., Hebert, M., Sukthankar, R.: An integer projected fixed point method for graph matching and MAP inference. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS 2009. Curran Associates, pp. 1114–1122 (2009)

  43. Lerouge, J., Abu-Aisheh, Z., Raveaux, R., Héroux, P., Adam, S.: Exact graph edit distance computation using a binary linear program. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016, LNCS, vol. 10029. Springer, Cham, pp. 485–495 (2016). https://doi.org/10.1007/978-3-319-49055-7_43

    Google Scholar 

  44. Lerouge, J., Abu-Aisheh, Z., Raveaux, R., Héroux, P., Adam, S.: New binary linear programming formulation to compute the graph edit distance. Pattern Recognit. 72, 254–265 (2017). https://doi.org/10.1016/j.patcog.2017.07.029

    Article  Google Scholar 

  45. Lin, C.L.: Hardness of approximating graph transformation problem. In: Du, D.Z., Zhang, X.S. (eds.) Algorithms and Computation, LNCS, vol. 834. Springer, Berlin, pp. 74–82 (1994). https://doi.org/10.1007/3-540-58325-4_168

    Chapter  Google Scholar 

  46. Munkres, J.: Algorithms for the assignment and transportation problems. SIAM J. Appl. Math. 5(1), 32–38 (1957). https://doi.org/10.1137/0105003

    Article  MathSciNet  MATH  Google Scholar 

  47. Nissen, S.: Implementation of a Fast Artificial Neural Network Library (FANN). Technical report, Department of Computer Science, University of Copenhagen (2003). http://fann.sourceforge.net/report/

  48. Ozdemir, E., Gunduz-Demir, C.: A hybrid classification model for digital pathology using structural and statistical pattern recognition. IEEE Trans. Med. Imaging 32(2), 474–483 (2013). https://doi.org/10.1109/TMI.2012.2230186

    Article  Google Scholar 

  49. Riesen, K.: Structural Pattern Recognition with Graph Edit Distance. Advances in Computer Vision and Pattern Recognition. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27252-8

    Book  MATH  Google Scholar 

  50. Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern recognition and machine learning. In: da Vitoria Lobo, N., Kasparis, T., Roli, F., Kwok, J.T., Georgiopoulos, M., Anagnostopoulos, G.C., Loog, M. (eds.) S+SSPR 2008, LNCS, vol. 5342. Springer, Berlin, pp. 287–297 (2008). https://doi.org/10.1007/978-3-540-89689-0_33

    Google Scholar 

  51. Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27(7), 950–959 (2009). https://doi.org/10.1016/j.imavis.2008.04.004

    Article  Google Scholar 

  52. Riesen, K., Bunke, H.: Graph Classification and Clustering Based on Vector Space Embedding. Series in Machine Perception and Artificial Intelligence, vol. 77. World Scientific, Singapore (2010). https://doi.org/10.1142/7731

    Book  MATH  Google Scholar 

  53. Riesen, K., Bunke, H., Fischer, A.: Improving graph edit distance approximation by centrality measures. In: ICPR 2014. IEEE Computer Society, pp. 3910–3914 (2014). https://doi.org/10.1109/ICPR.2014.671

  54. Riesen, K., Ferrer, M.: Predicting the correctness of node assignments in bipartite graph matching. Pattern Recognit. Lett. 69, 8–14 (2016). https://doi.org/10.1016/j.patrec.2015.10.007

    Article  Google Scholar 

  55. Riesen, K., Ferrer, M., Fischer, A., Bunke, H.: Approximation of graph edit distance in quadratic time. In: Liu, C., Luo, B., Kropatsch, W.G., Cheng, J. (eds.) GbRPR 2015, LNCS, vol. 9069. Springer, Cham, pp. 3–12 (2015). https://doi.org/10.1007/978-3-319-18224-7_1

    Chapter  Google Scholar 

  56. Riesen, K., Fischer, A., Bunke, H.: Combining bipartite graph matching and beam search for graph edit distance approximation. In: Gayar, N.E., Schwenker, F., Suen, C. (eds.) ANNPR 2014, LNCS, vol. 8774. Springer, Cham, pp. 117–128 (2014). https://doi.org/10.1007/978-3-319-11656-3_11

    Chapter  Google Scholar 

  57. Riesen, K., Fischer, A., Bunke, H.: Computing upper and lower bounds of graph edit distance in cubic time. In: Gayar, N.E., Schwenker, F., Suen, C. (eds.) ANNPR 2014, LNCS, vol. 8774. Springer, Heidelberg, pp. 129–140 (2014). https://doi.org/10.1007/978-3-319-11656-3

    Google Scholar 

  58. Riesen, K., Fischer, A., Bunke, H.: Improved graph edit distance approximation with simulated annealing. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 222–231 (2017). https://doi.org/10.1007/978-3-319-58961-9_20

    Chapter  Google Scholar 

  59. Sanfeliu, A., Fu, K.S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. 13(3), 353–362 (1983). https://doi.org/10.1109/TSMC.1983.6313167

    Article  MATH  Google Scholar 

  60. Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D.: BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res. 32(Database–Issue), 431–433 (2004). https://doi.org/10.1093/nar/gkh081

    Article  Google Scholar 

  61. Stauffer, M., Fischer, A., Riesen, K.: A novel graph database for handwritten word images. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016, LNCS, vol. 10029. Springer, Cham, pp. 553–563 (2016). https://doi.org/10.1007/978-3-319-49055-7_49

    Google Scholar 

  62. Stauffer, M., Tschachtli, T., Fischer, A., Riesen, K.: A survey on applications of bipartite graph edit distance. In: Foggia, P., Liu, C., Vento, M. (eds.) GbRPR 2017, LNCS, vol. 10310. Springer, Cham, pp. 242–252 (2017). https://doi.org/10.1007/978-3-319-58961-9_22

    Chapter  Google Scholar 

  63. Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969). https://doi.org/10.1007/BF02165411

    Article  MathSciNet  MATH  Google Scholar 

  64. Uno, T.: Algorithms for enumerating all perfect, maximum and maximal matchings in bipartite graphs. In: Leong, H.W., Imai, H., Jain, S. (eds.) ISAAC 1997, LNCS, vol. 1350. Springer, Berlin, pp. 92–101 (1997). https://doi.org/10.1007/3-540-63890-3_11

    Chapter  Google Scholar 

  65. Uno, T.: A fast algorithm for enumerating bipartite perfect matchings. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001, LNCS, vol. 2223. Springer, Berlin, pp. 367–379 (2001). https://doi.org/10.1007/3-540-45678-3_32

    Chapter  Google Scholar 

  66. Vento, M.: A long trip in the charming world of graphs for pattern recognition. Pattern Recognit. 48(2), 291–301 (2015). https://doi.org/10.1016/j.patcog.2014.01.002

    Article  MATH  Google Scholar 

  67. Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: Kementsietsidis, A., Salles, M.A.V. (eds.) ICDE 2012. IEEE Computer Society, pp. 210–221 (2012). https://doi.org/10.1109/ICDE.2012.28

  68. Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009). https://doi.org/10.14778/1687627.1687631

    Article  Google Scholar 

  69. Zhao, X., Xiao, C., Lin, X., Zhang, W., Wang, Y.: Efficient structure similarity searches: a partition-based approach. VLDB J. 27(1), 53–78 (2018). https://doi.org/10.1007/s00778-017-0487-0

    Article  Google Scholar 

  70. Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Graph similarity search with edit distance constraint in large graph databases. In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) CIKM 2013. ACM, pp. 1595–1600 (2013). https://doi.org/10.1145/2505515.2505723

  71. Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng. 27(4), 964–978 (2015). https://doi.org/10.1109/TKDE.2014.2349924

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David B. Blumenthal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Datasets and edit cost functions

  • The datasets aids and muta: Graphs contained in aids and muta represent molecular compounds. The molecules represented by the graphs contained in aids are divided into the class of molecules that do and the class of molecules that do not exhibit activity against HIV. Similarly, the molecules represented by the graphs contained in muta are divided into the class of molecules that do and the class of molecules that do not cause genetic mutation. The nodes of the graphs contained in aids and muta are labeled with chemical symbols, and their edges are labeled with a valence (either 1 or 2). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=5.5\cdot \delta _{\alpha \ne \alpha ^\prime }\), \(c_V(\alpha ,\varepsilon ):=2.75\), and \(c_V(\varepsilon ,\alpha ^\prime ):=2.75\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=1.65\cdot \delta _{\beta \ne \beta ^\prime }\), \(c_E(\beta ,\varepsilon ):=0.825\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.825\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\).

  • The dataset protein: Graphs contained in protein represent proteins which are annotated with their EC classes (EC1, EC2, EC3, EC4, EC5, and EC6) [60]. Nodes are labeled with tuples (ts), where t is the node’s type (helix, sheet, or loop) and s is its amino acid sequence. Nodes are connected via structural or sequential edges or both, i.e., edges \((u_i,u_j)\) are labeled with tuples \((t_1,t_2)\), where \(t_1\) is the type of the first edge connecting \(u_i\) and \(u_j\) and \(t_2\) is the type of the second edge connecting \(u_i\) and \(u_j\) (possibly null). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=16.5\cdot \delta _{\alpha .t\ne \alpha ^\prime .t}+0.75\cdot \delta _{\alpha .t=\alpha ^\prime .t}\cdot \mathrm{LD}(\alpha .s,\alpha ^\prime .s))\), \(c_V(\alpha ,\varepsilon ):=8.25\), and \(c_V(\varepsilon ,\alpha ^\prime ):=8.25\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\), where \(\mathrm{LD}(\cdot ,\cdot )\) is Levenshtein’s string edit distance. Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.25\cdot \mathrm{LSAPE} ({\mathbf{C}} ^{\beta ,\beta ^\prime })\), \(c_E(\beta ,\varepsilon ):=0.25\cdot f(\beta )\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.25\cdot f(\beta ^\prime )\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\), where \(f(\beta ):=1+\delta _{\beta .t_2\ne {\texttt {null}}}\) and \({\mathbf{C}} ^{\beta ,\beta ^\prime }\in {\mathbb {R}} ^{(f(\beta )+1)\times (f(\beta ^\prime )+1)}\) is constructed as \(c^{\beta ,\beta ^\prime }_{r,s}:=2\cdot \delta _{\beta .t_r\ne \beta ^\prime .t_s}\) and \(c^{\beta ,\beta ^\prime }_{r,f(\beta ^\prime )+1}:=c^{\beta ,\beta ^\prime }_{f(\beta )+1,s}:=1\), for all \((r,s)\in [f(\beta )]\times [f(\beta ^\prime )]\).

  • The dataset letter (h): Graphs contained in letter (h) represent highly distorted drawings of the capital letters A, E, F, H, I, K, L, M, N, T, V, W, X, Y, and Z. Nodes are labeled with two-dimensional Euclidean coordinates. Edges are unlabeled. Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=0.75\cdot \left||\alpha -\alpha ^\prime \right||\), \(c_V(\alpha ,\varepsilon ):=0.675\), and \(c_V(\varepsilon ,\alpha ^\prime ):=0.675\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\), where \(\left||\cdot \right||\) is the Euclidean norm. The edge edit costs \(c_E\) are defined as \(c_E(1,\varepsilon ):=c_E(\varepsilon ,1):=0.425\).

  • The dataset grec: Graphs contained in grec represent 22 different symbols from electronic and architectural drawings. Nodes are labeled with tuples (txy), where t equals one of four node types and (xy) is a two-dimensional Euclidean coordinate. Nodes are connected via line or arc edges or both, i.e., edges \((u_i,u_j)\) are labeled with tuples \((t_1,t_2)\), where \(t_1\) is the type of the first edge connecting \(u_i\) and \(u_j\) and \(t_2\) is the type of the second edge connecting \(u_i\) and \(u_j\) (possibly null). Node edit costs are defined as \(c_V(\alpha ,\alpha ^\prime ):=0.5\cdot \left||\alpha .(x,y)-\alpha ^\prime .(x,y)\right||\cdot \delta _{\alpha .t=\alpha ^\prime .t}+90\cdot \delta _{\alpha .t\ne \alpha ^\prime .t}\), \(c_V(\alpha ,\varepsilon ):=45\), and \(c_V(\varepsilon ,\alpha ^\prime ):=45\), for all \((\alpha ,\alpha ^\prime )\in \varSigma _V\times \varSigma _V\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.5\cdot \mathrm{LSAPE} ({\mathbf{C}} ^{\beta ,\beta ^\prime })\), \(c_E(\beta ,\varepsilon ):=0.5\cdot f(\beta )\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.5\cdot f(\beta ^\prime )\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\), where \(f(\beta ):=1+\delta _{\beta .t_2\ne {\texttt {null}}}\) and \({\mathbf{C}} ^{\beta ,\beta ^\prime }\in {\mathbb {R}} ^{(f(\beta )+1)\times (f(\beta ^\prime )+1)}\) is constructed as \(c^{\beta ,\beta ^\prime }_{r,s}:=30\cdot \delta _{\beta .t_r\ne \beta ^\prime .t_s}\) and \(c^{\beta ,\beta ^\prime }_{r,f(\beta ^\prime )+1}:=c^{\beta ,\beta ^\prime }_{f(\beta )+1,s}:=15\) for all \((r,s)\in [f(\beta )]\times [f(\beta ^\prime )]\).

  • The dataset fp: Graphs contained in fp represent fingerprint images which are annotated with their classes (arch, left loop, right loop, and whorl) from the Galton-Henry classification system [35]. Nodes are unlabeled and edges are labeled with an orientation \(\beta \in {\mathbb {R}} \) with \(-\pi /2<\beta \le \pi /2\). Node edit costs are defined as \(c_V(1,\varepsilon ):=c_V(\varepsilon ,1):=0.525\). Edge edit costs are defined as \(c_E(\beta ,\beta ^\prime ):=0.5\cdot \min \{|\beta -\beta ^\prime |,\pi -|\beta -\beta ^\prime |\}\), \(c_E(\beta ,\varepsilon ):=0.375\), and \(c_E(\varepsilon ,\beta ^\prime ):=0.375\), for all \((\beta ,\beta ^\prime )\in \varSigma _E\times \varSigma _E\).

Visualization of experiments via dominance graphs

See Figures 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, and 28.

Figures 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, and 28 visualize the transitive reductions of the dominance graphs induced by \(\succ _{\textit{LB}} \) (Figs. 17, 18, 19, 20, 21, 22) and \(\succ _{\textit{UB}} \) (Figs. 23, 24, 25, 26, 27, 28) and hence provide more detailed views on the results of the experiments reported in Sects. 9.5 and 9.6. In the dominance graphs, instantiations of LSAPE-GED are displayed black on white, instantiations of LP-GED are displayed black on light gray, instantiations of LS-GED are displayed white on dark gray, and miscellaneous heuristics are displayed white on black. For all algorithms instantiating LSAPE-GED, we display the configuration \((K,\gamma )\) of the extensions MULTI-SOL and CENTRALITIES in addition to the name of the heuristic. Similarly, for all algorithms instantiating LS-GED, we display the configuration \((K,\rho ,L,\eta )\) of the extensions MULTI-START and RANDPOST. Recall that instantiations of LSAPE-GED are run without extensions just in case \((K,\gamma )=(1,0)\) and that instantiations of LS-GED are run without extensions just in case \((K,\rho ,L,\eta )=(1,1,0,0)\) (cf. Sect. 9.2 for more details). For Pareto optimal algorithms, we also show the test metrics \(d_{{\textit{LB}} |{\textit{UB}}}\), t, and \(c_{{\textit{LB}} |{\textit{UB}}}\), and the joint score \(s_{{\textit{LB}} |{\textit{UB}}}\).

As the extensions MULTI-SOL and CENTRALITIES of LSAPE-GED improve the computed upper bounds at the price of increased runtimes but have no effect on the obtained lower bounds, for all instantiations of LSAPE-GED, we only show the baseline configurations \((K,\gamma )=(1,0)\) in the dominance graphs induced by \(\succ _{\textit{LB}} \). In the dominance graphs induced by \(\succ _{\textit{UB}} \), for each heuristic H, we only display those configurations that are Pareto optimal (i.e., maximal w.r.t. \(\succ _{\textit{UB}} \)) or have a maximal joint score \(s_{\textit{UB}} \) among all tested configurations of H.

In the transitive reduction of the dominance graphs induced by \(\succ _{\textit{LB}} \), we draw an arc from \({{\texttt {ALG}}} _1\) to \({{\texttt {ALG}}} _2\) just in case \({{\texttt {ALG}}} _1\succ _{\textit{LB}} {{\texttt {ALG}}} _2\) and there is no algorithm \({{\texttt {ALG}}} _3\) such that \({{\texttt {ALG}}} _1\succ _{\textit{LB}} {{\texttt {ALG}}} _3\succ _{\textit{LB}} {{\texttt {ALG}}} _2\). Arcs are blue if, additionally, \({{\texttt {ALG}}} _1\) yielded a tighter lower bound than \({{\texttt {ALG}}} _2\), red if \({{\texttt {ALG}}} _1\) was faster than \({{\texttt {ALG}}} _2\), and green if \({{\texttt {ALG}}} _1\) had a better classification coefficient than \({{\texttt {ALG}}} _2\). Multicolored arcs indicate that several of these relations holds. The graphs are oriented from left to right, such that an algorithm is Pareto optimal just in case it appears in the leftmost layer. The colored labels \(d^\star _{{\textit{LB}}}\), \(t^\star _{{\textit{LB}}}\), and \(c^\star _{{\textit{LB}}}\) highlight those Pareto optimal algorithms that, respectively, yielded the tightest lower bound, exhibited the best runtime behavior among all heuristics that compute lower bounds, or gave the best lower bound classification coefficient. The dominance graphs induced by \(\succ _{\textit{UB}} \) are constructed analogously.

Example 3

Figure 16 exemplifies the visualizations of the dominance graphs induced by \(\succ _{\textit{LB}} \) and \(\succ _{\textit{UB}} \). It shows a snapshot of the dominance graph induced by \(\succ _{\textit{UB}} \) on the dataset fp shown in Fig. 24. We see that IPFP run with the configuration \((K,\rho ,L,\eta )=(40,1,0,0)\) was Pareto optimal on fp. The blue label \(d^\star _{{\textit{LB}}}\) indicates that IPFP (40, 1, 0, 0) computed the tightest average upper bound on fp; the green label \(c^\star _{{\textit{LB}}}\) tells us that it also yielded the best upper bound classification coefficient. Furthermore, we see that \(t(\texttt {IPFP} \,(40,1,0,0))=1.63\cdot 10^{-2}\hbox {s}\), \(d_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=3.08\), \(c_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=0.11\), and \(s_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=0.67\).

The blue–red arc from IPFP (40, 1, 0, 0) to IBP-BEAM (40, 0.5, 1, 0) tells us that, on fp, IPFP with \((K,\rho ,L,\eta )=(40,1,0,0)\) dominated IBP-BEAM with \((K,\rho ,L,\eta )=(40,0.5,1,0)\). More precisely, we have \(t(\texttt {IBP-BEAM} \,(40,0.5,1,0))>t(\texttt {IPFP} \,(40,1,0,0))=1.63\cdot 10^{-2}\hbox {s}\), \(d_{\textit{UB}} (\texttt {IBP-BEAM} \,(40,0.5,1,0))>d_{\textit{UB}} (\texttt {IPFP} \,(40,1,0,0))=3.08\), and \(c_{\textit{UB}} (\texttt {IBP-BEAM} \,(40,0.5,1,0))=c_{\textit{LB}} (\texttt {IPFP} \,(40,1,0,0))=0.11\). As IPFP and IBP-BEAM instantiate LS-GED, they are shown white on dark gray.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blumenthal, D.B., Boria, N., Gamper, J. et al. Comparing heuristics for graph edit distance computation. The VLDB Journal 29, 419–458 (2020). https://doi.org/10.1007/s00778-019-00544-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00544-1

Keywords

Mathematics Subject Classification

Navigation