Skip to main content
Log in

Fast Similarity Search for Graphs by Edit Distance

  • NEW MEANS OF CYBERNETICS, INFORMATICS, COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS
  • Published:
Cybernetics and Systems Analysis Aims and scope

Abstract

This survey article considers index structures for fast similarity search for objects represented by trees and graphs. Edit distance is used as a measure of similarity. The execution of exact similarity search queries is considered. Algorithms based on the filter-and-refine strategy using inverted indexing are mainly presented. Algorithms for exact calculation of the graph edit distance and its lower and upper bounds are also considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. A. Rachkovskij, “Index structures for fast similarity search for symbolic strings,” Cybernetics and Systems Analysis, Vol. 55, No. 5, 860–878 (2019).

    Article  MathSciNet  MATH  Google Scholar 

  2. D. A. Rachkovskij, “Real-valued vectors for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 52, No. 6, 967–988 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  3. D. A. Rachkovskij. “Binary vectors for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 53, No. 1, 138–156 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  4. D. A. Rachkovskij, “Distance-based index structures for fast similarity search,” Cybernetics and Systems Analysis, Vol. 53, No. 4, 636–658 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  5. D. A. Rachkovskij, “Index structures for fast similarity search for binary vectors,” Cybernetics and Systems Analysis, Vol. 53, No. 5, 799–820 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  6. D. A. Rachkovskij, “Index structures for fast similarity search for real-valued vectors. I,” Cybernetics and Systems Analysis, Vol. 54, No. 1, 152–164 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  7. D. A. Rachkovskij, “Index structures for fast similarity search for real-valued vectors. II,” Cybernetics and Systems Analysis, Vol. 54, No. 2, 320–335 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  8. D. A. Rachkovskij and S. V. Slipchenko, “Similarity-based retrieval with structure-sensitive sparse binary distributed representations,” Comp. Intelligence, Vol. 28, No. 1, 106–129 (2012).

    Article  MathSciNet  Google Scholar 

  9. P. Bille, “A survey on tree edit distance and related problems,” Theoretical Computer Science, Vol. 337, Nos. 1–3, 217–239 (2005).

  10. K.-C. Tai, “The tree-to-tree correction problem,” Journal of the Association for Computing Machinery (JACM), Vol. 26, 422–433 (1979).

  11. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Physics — Doklady, Vol. 10, No. 8, 707–710 (1966).

    MathSciNet  Google Scholar 

  12. X. Gao, B. Xiao, D. Tao, and X. Li, “A survey of graph edit distance,” Pattern Analysis and Applications, Vol. 13, No. 1, 113–129 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  13. A. Sanfeliu and K. S. Fu, “A distance measure between attributed relational graphs for pattern recognition,” IEEE Trans. Syst. Man. Cybern., Vol. 13, No. 3, 353–362 (1983).

    Article  MATH  Google Scholar 

  14. K. Zhang and T. Jiang, “Some MAX SNP-hard results concerning unordered labeled trees,” Information Processing Letters, 49, 249–254 (1994).

    Article  MathSciNet  MATH  Google Scholar 

  15. M. Pawlik and N. Augsten, “Rted: A robust algorithm for the tree edit distance,” Proceedings of the VLDB Endowment, Vol. 5, No. 4, 334–345 (2011).

    Article  Google Scholar 

  16. M. Pawlik and N. Augsten, “Tree edit distance: Robust and memory-efficient,” Information Systems, Vol. 56, 157–173 (2016).

    Article  Google Scholar 

  17. K. Kailing, H.-P. Kriegel, S. Schonauer, and T. Seidl, “Efficient similarity search for hierarchical data in large databases,” in: Proc. EDBT’04 (2004), pp. 676–693.

  18. S. Berchtold, D. Keim, and H. P. Kriegel, “The X-tree: An index structure for high-dimensional data” in: Proc. VLDB’96 (1996), pp. 28–39.

  19. R. Yang, P. Kalnis, and A. K. H. Tung, “Similarity evaluation on tree-structured data,” in: Proc. SIGMOD’05 (2005), pp. 754–765.

  20. S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu, “Integrating XML data sources using approximate joins,” ACM Trans. Database Syst., Vol. 31, No. 1, 161–207 (2006).

    Article  Google Scholar 

  21. T. Akutsu, D. Fukagawa, and A. Takasu, “Approximating tree edit distance through string edit distance,” Algorithmica, Vol. 57, No. 2, 325–348 (2010).

    Article  MathSciNet  MATH  Google Scholar 

  22. Y. Tang, Y. Cai, and N. Mamoulis, “Scaling similarity joins over tree-structured data,” Proc. VLDB Endowment, Vol. 8, No. 11, 1130–1141 (2015).

    Article  Google Scholar 

  23. Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou, “Comparing stars: On approximating graph edit distance,” Proc. VLDB Endowment, Vol. 2, No. 1, 25–36 (2009).

    Article  Google Scholar 

  24. X. Zhao, C. Xiao, X. Lin, and W. Wang, “Efficient graph similarity joins with edit distance constraints,” in: Proc. ICDE’12 (2012), pp. 834–845.

  25. K. Gouda and M. Arafa, “An improved global lower bound for graph edit similarity search,” Pattern Recogn. Lett., Vol. 58, 8–14 (2015).

    Article  Google Scholar 

  26. G. Wang, B. Wang, X. Yang, and G. Yu, “Efficiently indexing large sparse graphs for similarity search,” IEEE Trans. Knowledge and Data Engineering, Vol. 24, No. 3 440–451 (2012).

    Article  Google Scholar 

  27. S. Bougleux, B. Gauzere, D. B. Blumenthal, and L. Brun, “Fast linear sum assignment with error-correction and no cost constraints,” Pattern Recogn. Lett. https://doi.org/https://doi.org/10.1016/j.patrec.2018.03.032.

  28. H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, Vol. 2, Nos. 1–2, 83–97 (1955).

  29. J. Munkres, “Algorithms for the assignment and transportation problems,” Journal of the Society for Industrial and Applied Mathematics, Vol. 5, No. 1, 32–38 (1957).

    Article  MathSciNet  MATH  Google Scholar 

  30. X. Wang, X. Ding, A. K. H. Tung, S. Ying, and H. Jin, “An efficient graph indexing method,” in: Proc. ICDE’12, 210–221 (2012).

  31. X. Zhao, C. Xiao, X. Lin, W. Wang, and Y. Ishikawa, “Efficient processing of graph similarity queries with edit distance constraints,” VLDB J, Vol. 22, 727–752 (2013).

    Article  Google Scholar 

  32. K. Riesen, S. Fankhauser, and H. Bunke, “Speeding up graph edit distance computation with a bipartite heuristic,” in: Proc. MLG’07 (2007), pp. 21–24.

  33. W. Zheng, L. Zou, X. Lian, D. Wang, and D. Zhao, “Efficient graph similarity search over large graph databases,” IEEE TKDE, Vol. 27, No. 4, 964–978 (2015).

    Google Scholar 

  34. Z. Li, X. Jian, X. Lian, and L. Chen, “An efficient probabilistic approach for graph similarity search,” in: Proc. ICDE’18 (2018), pp. 533–544.

  35. J. Qin and C. Xiao, “Pigeonring: a principle for faster thresholded similarity search,” Proc. VLDB Endow, Vol. 12, No. 1, 28–42 (2018).

    Article  Google Scholar 

  36. X. Zhao, C. Xiao, X. Lin, W. Zhang, and Y. Wang, “Efficient structure similarity searches: A partition-based approach,” The VLDB Journal, Vol. 27, No. 1, 53–78 (2018).

    Article  Google Scholar 

  37. Y. Liang and P. Zhao, “Similarity search in graph databases: A multilayered indexing approach,” in: Proc. ICDE’17 (2017), pp. 783–794.

  38. Z. Abu-Aisheh, R. Raveaux, and J.-Y. Ramel, “Efficient k-nearest neighbors search in graph space,” Pattern Recognition Letters. https://doi.org/https://doi.org/10.1016/j.patrec.2018.05.001.

  39. D. B. Blumenthal and J. Gamper, “Improved lower bounds for graph edit distance,” IEEE TKDE, Vol. 30, No. 3, 503–516 (2018).

    Google Scholar 

  40. D. B. Blumenthal and J. Gamper, “On the exact computation of the graph edit distance,” Pattern Recognition Letters (2018). https://doi.org/https://doi.org/10.1016/j.patrec.2018.05.002.

  41. Z. Abu-Aisheh, B. Gauzere, S. Bougleux, J.-Y. Ramel, L. Brun, R. Raveaux, P. Heroux, S. Adam, “Graph edit distance contest: Results and future challenges,” Pattern Recognition Letters, Vol. 100, 96–103 (2017).

    Article  Google Scholar 

  42. K. Riesen, M. Neuhaus, H. Bunke, “Bipartite graph matching for computing the edit distance of graphs,” in: Proc. GbRPR’07 (2007), pp. 1–12.

  43. K. Riesen and H. Bunke, “Approximate graph edit distance computation by means of bipartite graph matching,” Image and Vision Computing, Vol. 27, No. 7, 950–959 (2009).

    Article  Google Scholar 

  44. Z. Abu-Aisheh, R. Raveaux, J. Y. Ramel, and P. Martineau, “An exact graph edit distance algorithm for solving pattern recognition problems,” in Proc. ICPRAM’15. 2015. P. 271–278.

  45. Z. Abu-Aisheh, R. Raveaux, J.-Y. Ramel, and P. Martineau, “A parallel graph edit distance algorithm,” Expert Systems with Applications, Vol. 94, 41–57 (2018).

    Article  Google Scholar 

  46. K. Gouda and M. Hassaan, “À novel edge-centric approach for graph edit similarity computation,” Information Systems, Vol. 80, 91–106 (2019).

    Article  Google Scholar 

  47. X. Chen, H. Huo, J. Huan, and J. S. Vitter, “An efficient algorithm for graph edit distance computation,” Knowledge-Based Systems, Vol. 163, 762–775 (2019).

    Article  Google Scholar 

  48. R. Zhou and E. A. Hansen, “Beam-stack search: Integrating backtracking with beam search,” in: Proc. ICAPS’05 (2005), pp. 90–98.

  49. L. Chang, X. Feng, X. Lin, L. Qin, and W. Zhang, “Efficient graph edit distance computation and verification via anchor-aware lower bound estimation.” arXiv:1709.06810. 1 Oct 2017.

  50. D. Justice and A. Hero, “A binary linear programming formulation of the graph edit distance,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. 28, No. 8, 1200–1214 (2006).

    Article  Google Scholar 

  51. J. Lerouge, Z. Abu-Aisheh, R. Raveaux, P. Heroux, and S. Adam, “New binary linear programming formulation to compute the graph edit distance,” Pattern Recognition, Vol. 72, 254–265 (2017).

    Article  Google Scholar 

  52. Ì. Darwiche, R. Raveaux, D. Conte, and V. T’Kindt, “Graph Edit Distance in the exact context,” in: Proc. S+SSPR’18 (2018), pp. 304–314.

  53. V. Carletti, B. Gauzere, L. Brun, and M. Vento, “Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance,” in: Proc. GbRPR’15 (2015), pp. 188–197.

  54. D. Blumenthal, S. Bougleux, J. Gamper, and L. Brun, “Ring based approximation of graph edit distance,” in: Proc. S+SSPR’18 (2018), pp. 293–303.

  55. S. Bougleux, L. Brun, V. Carletti, P. Foggia, B. Gauzere, and M. Vento, “Graph edit distance as a quadratic assignment problem,” Pattern Recognition Letters, Vol. 87, 38–46 (2017).

    Article  Google Scholar 

  56. E. Daller, S. Bougleux, B. Gauzere, and L. Brun, “Approximate graph edit distance by several local searches in parallel,” in: Proc. ICPRAM’18 (2018), pp. 149–158.

  57. M. Darwiche, D. Conte, R. Raveaux, and V. T’Kindt, “A local branching heuristic for solving a graph edit distance problem,” Comp. & Oper. Res. https://doi.org/https://doi.org/10.1016/j.cor.2018.02.002.

  58. K. Gouda, M. Arafa, and T. Calders, “A novel hierarchical-based framework for upper bound computation of graph edit distance,” Pattern Recognition, Vol. 80, 210–224 (2018).

    Article  Google Scholar 

  59. S.V. Slipchenko and D. A. Rachkovskij, “Analogical mapping using similarity of binary distributed representations,” Information Theories & Applications, Vol. 16, No. 3, 269–290 (2009).

    Google Scholar 

  60. D. A. Rachkovskij, “Some approaches to analogical mapping with structure-sensitive distributed representations,” Journal of Experimental & Theoretical Artificial Intelligence, Vol. 16, No. 3, 125–144 (2004).

    Article  MATH  Google Scholar 

  61. D. A. Rachkovskij, “Formation of similarity-reflecting binary vectors with random binary projections,” Cybernetics and Systems Analysis, Vol. 51, No. 2, 313–323 (2015).

    Article  MATH  Google Scholar 

  62. D. A. Rachkovsky and V. I. Gritsenko, Distributed Representation of Vector Data Based on Random Projections [in Ukrainian], Interservice, Kyiv (2018).

    Google Scholar 

  63. D. A. Rachkovskij and E. G. Revunova, “A randomized method for solving discrete ill-posed problems,” Cybernetics and Systems Analysis, Vol. 48, No. 4, 621–635 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  64. E. G. Revunova, “Model selection criteria for a linear model to solve discrete ill-posed problems on the basis of singular decomposition and random projection,” Cybernetics and Systems Analysis, Vol. 52, No. 4, 647–664 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  65. E. G. Revunova, “Averaging over matrices in solving discrete ill-posed problems on the basis of random projection,” in: Proc. CSIT’17 (2017), pp. 473–478.

  66. P. Riba, J. Llados, A. Fornes, and A. Dutta, “Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases,” Pattern Recognition Letters, Vol. 87, 203–211 (2017).

    Article  Google Scholar 

  67. A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal, “Graph2vec: Learning distributed representations of graphs,” in: Proc. MLG’17 (2017), pp. 21:1–21:8.

  68. P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,” Knowledge Based Systems, Vol. 151, 78–94 (2018).

    Article  Google Scholar 

  69. Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks.” arXiv:1901.00596. 10 Mar. 2019.

  70. Y. Bai, H. Ding, S. Bian, T. Chen, Y. Sun, W. Wang, “SimGNN: A neural network approach to fast graph similarity computation,” in Proc. WSDM’19 (2019), pp. 384–392.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. A. Rachkovskij.

Additional information

Translated from Kibernetika i Sistemnyi Analiz, No. 6, November–December, 2019, pp. 178–194.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rachkovskij, D.A. Fast Similarity Search for Graphs by Edit Distance. Cybern Syst Anal 55, 1039–1051 (2019). https://doi.org/10.1007/s10559-019-00213-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10559-019-00213-9

Keywords

Navigation