Advertisement

Cybernetics and Systems Analysis

, Volume 55, Issue 6, pp 1039–1051 | Cite as

Fast Similarity Search for Graphs by Edit Distance

  • D. A. RachkovskijEmail author
NEW MEANS OF CYBERNETICS, INFORMATICS, COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS
  • 2 Downloads

Abstract

This survey article considers index structures for fast similarity search for objects represented by trees and graphs. Edit distance is used as a measure of similarity. The execution of exact similarity search queries is considered. Algorithms based on the filter-and-refine strategy using inverted indexing are mainly presented. Algorithms for exact calculation of the graph edit distance and its lower and upper bounds are also considered.

Keywords

similarity search graph edit distance nearest neighbor index structure inverted indexing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    D. A. Rachkovskij, “Index structures for fast similarity search for symbolic strings,” Cybernetics and Systems Analysis, Vol. 55, No. 5, 860–878 (2019).CrossRefGoogle Scholar
  2. 2.
    D. A. Rachkovskij, “Real-valued vectors for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 52, No. 6, 967–988 (2016).MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    D. A. Rachkovskij. “Binary vectors for fast distance and similarity estimation,” Cybernetics and Systems Analysis, Vol. 53, No. 1, 138–156 (2017).MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    D. A. Rachkovskij, “Distance-based index structures for fast similarity search,” Cybernetics and Systems Analysis, Vol. 53, No. 4, 636–658 (2017).MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    D. A. Rachkovskij, “Index structures for fast similarity search for binary vectors,” Cybernetics and Systems Analysis, Vol. 53, No. 5, 799–820 (2017).MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    D. A. Rachkovskij, “Index structures for fast similarity search for real-valued vectors. I,” Cybernetics and Systems Analysis, Vol. 54, No. 1, 152–164 (2018).MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    D. A. Rachkovskij, “Index structures for fast similarity search for real-valued vectors. II,” Cybernetics and Systems Analysis, Vol. 54, No. 2, 320–335 (2018).MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    D. A. Rachkovskij and S. V. Slipchenko, “Similarity-based retrieval with structure-sensitive sparse binary distributed representations,” Comp. Intelligence, Vol. 28, No. 1, 106–129 (2012).MathSciNetCrossRefGoogle Scholar
  9. 9.
    P. Bille, “A survey on tree edit distance and related problems,” Theoretical Computer Science, Vol. 337, Nos. 1–3, 217–239 (2005).MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    K.-C. Tai, “The tree-to-tree correction problem,” Journal of the Association for Computing Machinery (JACM), Vol. 26, 422–433 (1979).MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Physics — Doklady, Vol. 10, No. 8, 707–710 (1966).MathSciNetGoogle Scholar
  12. 12.
    X. Gao, B. Xiao, D. Tao, and X. Li, “A survey of graph edit distance,” Pattern Analysis and Applications, Vol. 13, No. 1, 113–129 (2010).MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    A. Sanfeliu and K. S. Fu, “A distance measure between attributed relational graphs for pattern recognition,” IEEE Trans. Syst. Man. Cybern., Vol. 13, No. 3, 353–362 (1983).zbMATHCrossRefGoogle Scholar
  14. 14.
    K. Zhang and T. Jiang, “Some MAX SNP-hard results concerning unordered labeled trees,” Information Processing Letters, 49, 249–254 (1994).MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    M. Pawlik and N. Augsten, “Rted: A robust algorithm for the tree edit distance,” Proceedings of the VLDB Endowment, Vol. 5, No. 4, 334–345 (2011).CrossRefGoogle Scholar
  16. 16.
    M. Pawlik and N. Augsten, “Tree edit distance: Robust and memory-efficient,” Information Systems, Vol. 56, 157–173 (2016).CrossRefGoogle Scholar
  17. 17.
    K. Kailing, H.-P. Kriegel, S. Schonauer, and T. Seidl, “Efficient similarity search for hierarchical data in large databases,” in: Proc. EDBT’04 (2004), pp. 676–693.Google Scholar
  18. 18.
    S. Berchtold, D. Keim, and H. P. Kriegel, “The X-tree: An index structure for high-dimensional data” in: Proc. VLDB’96 (1996), pp. 28–39.Google Scholar
  19. 19.
    R. Yang, P. Kalnis, and A. K. H. Tung, “Similarity evaluation on tree-structured data,” in: Proc. SIGMOD’05 (2005), pp. 754–765.Google Scholar
  20. 20.
    S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu, “Integrating XML data sources using approximate joins,” ACM Trans. Database Syst., Vol. 31, No. 1, 161–207 (2006).CrossRefGoogle Scholar
  21. 21.
    T. Akutsu, D. Fukagawa, and A. Takasu, “Approximating tree edit distance through string edit distance,” Algorithmica, Vol. 57, No. 2, 325–348 (2010).MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Y. Tang, Y. Cai, and N. Mamoulis, “Scaling similarity joins over tree-structured data,” Proc. VLDB Endowment, Vol. 8, No. 11, 1130–1141 (2015).CrossRefGoogle Scholar
  23. 23.
    Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou, “Comparing stars: On approximating graph edit distance,” Proc. VLDB Endowment, Vol. 2, No. 1, 25–36 (2009).CrossRefGoogle Scholar
  24. 24.
    X. Zhao, C. Xiao, X. Lin, and W. Wang, “Efficient graph similarity joins with edit distance constraints,” in: Proc. ICDE’12 (2012), pp. 834–845.Google Scholar
  25. 25.
    K. Gouda and M. Arafa, “An improved global lower bound for graph edit similarity search,” Pattern Recogn. Lett., Vol. 58, 8–14 (2015).CrossRefGoogle Scholar
  26. 26.
    G. Wang, B. Wang, X. Yang, and G. Yu, “Efficiently indexing large sparse graphs for similarity search,” IEEE Trans. Knowledge and Data Engineering, Vol. 24, No. 3 440–451 (2012).CrossRefGoogle Scholar
  27. 27.
    S. Bougleux, B. Gauzere, D. B. Blumenthal, and L. Brun, “Fast linear sum assignment with error-correction and no cost constraints,” Pattern Recogn. Lett. https://doi.org/ https://doi.org/10.1016/j.patrec.2018.03.032.
  28. 28.
    H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, Vol. 2, Nos. 1–2, 83–97 (1955).MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    J. Munkres, “Algorithms for the assignment and transportation problems,” Journal of the Society for Industrial and Applied Mathematics, Vol. 5, No. 1, 32–38 (1957).MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    X. Wang, X. Ding, A. K. H. Tung, S. Ying, and H. Jin, “An efficient graph indexing method,” in: Proc. ICDE’12, 210–221 (2012).Google Scholar
  31. 31.
    X. Zhao, C. Xiao, X. Lin, W. Wang, and Y. Ishikawa, “Efficient processing of graph similarity queries with edit distance constraints,” VLDB J, Vol. 22, 727–752 (2013).CrossRefGoogle Scholar
  32. 32.
    K. Riesen, S. Fankhauser, and H. Bunke, “Speeding up graph edit distance computation with a bipartite heuristic,” in: Proc. MLG’07 (2007), pp. 21–24.Google Scholar
  33. 33.
    W. Zheng, L. Zou, X. Lian, D. Wang, and D. Zhao, “Efficient graph similarity search over large graph databases,” IEEE TKDE, Vol. 27, No. 4, 964–978 (2015).Google Scholar
  34. 34.
    Z. Li, X. Jian, X. Lian, and L. Chen, “An efficient probabilistic approach for graph similarity search,” in: Proc. ICDE’18 (2018), pp. 533–544.Google Scholar
  35. 35.
    J. Qin and C. Xiao, “Pigeonring: a principle for faster thresholded similarity search,” Proc. VLDB Endow, Vol. 12, No. 1, 28–42 (2018).CrossRefGoogle Scholar
  36. 36.
    X. Zhao, C. Xiao, X. Lin, W. Zhang, and Y. Wang, “Efficient structure similarity searches: A partition-based approach,” The VLDB Journal, Vol. 27, No. 1, 53–78 (2018).CrossRefGoogle Scholar
  37. 37.
    Y. Liang and P. Zhao, “Similarity search in graph databases: A multilayered indexing approach,” in: Proc. ICDE’17 (2017), pp. 783–794.Google Scholar
  38. 38.
    Z. Abu-Aisheh, R. Raveaux, and J.-Y. Ramel, “Efficient k-nearest neighbors search in graph space,” Pattern Recognition Letters. https://doi.org/ https://doi.org/10.1016/j.patrec.2018.05.001.
  39. 39.
    D. B. Blumenthal and J. Gamper, “Improved lower bounds for graph edit distance,” IEEE TKDE, Vol. 30, No. 3, 503–516 (2018).Google Scholar
  40. 40.
    D. B. Blumenthal and J. Gamper, “On the exact computation of the graph edit distance,” Pattern Recognition Letters (2018). https://doi.org/ https://doi.org/10.1016/j.patrec.2018.05.002.
  41. 41.
    Z. Abu-Aisheh, B. Gauzere, S. Bougleux, J.-Y. Ramel, L. Brun, R. Raveaux, P. Heroux, S. Adam, “Graph edit distance contest: Results and future challenges,” Pattern Recognition Letters, Vol. 100, 96–103 (2017).CrossRefGoogle Scholar
  42. 42.
    K. Riesen, M. Neuhaus, H. Bunke, “Bipartite graph matching for computing the edit distance of graphs,” in: Proc. GbRPR’07 (2007), pp. 1–12.Google Scholar
  43. 43.
    K. Riesen and H. Bunke, “Approximate graph edit distance computation by means of bipartite graph matching,” Image and Vision Computing, Vol. 27, No. 7, 950–959 (2009).CrossRefGoogle Scholar
  44. 44.
    Z. Abu-Aisheh, R. Raveaux, J. Y. Ramel, and P. Martineau, “An exact graph edit distance algorithm for solving pattern recognition problems,” in Proc. ICPRAM’15. 2015. P. 271–278.Google Scholar
  45. 45.
    Z. Abu-Aisheh, R. Raveaux, J.-Y. Ramel, and P. Martineau, “A parallel graph edit distance algorithm,” Expert Systems with Applications, Vol. 94, 41–57 (2018).CrossRefGoogle Scholar
  46. 46.
    K. Gouda and M. Hassaan, “À novel edge-centric approach for graph edit similarity computation,” Information Systems, Vol. 80, 91–106 (2019).CrossRefGoogle Scholar
  47. 47.
    X. Chen, H. Huo, J. Huan, and J. S. Vitter, “An efficient algorithm for graph edit distance computation,” Knowledge-Based Systems, Vol. 163, 762–775 (2019).CrossRefGoogle Scholar
  48. 48.
    R. Zhou and E. A. Hansen, “Beam-stack search: Integrating backtracking with beam search,” in: Proc. ICAPS’05 (2005), pp. 90–98.Google Scholar
  49. 49.
    L. Chang, X. Feng, X. Lin, L. Qin, and W. Zhang, “Efficient graph edit distance computation and verification via anchor-aware lower bound estimation.” arXiv:1709.06810. 1 Oct 2017.Google Scholar
  50. 50.
    D. Justice and A. Hero, “A binary linear programming formulation of the graph edit distance,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. 28, No. 8, 1200–1214 (2006).CrossRefGoogle Scholar
  51. 51.
    J. Lerouge, Z. Abu-Aisheh, R. Raveaux, P. Heroux, and S. Adam, “New binary linear programming formulation to compute the graph edit distance,” Pattern Recognition, Vol. 72, 254–265 (2017).CrossRefGoogle Scholar
  52. 52.
    Ì. Darwiche, R. Raveaux, D. Conte, and V. T’Kindt, “Graph Edit Distance in the exact context,” in: Proc. S+SSPR’18 (2018), pp. 304–314.Google Scholar
  53. 53.
    V. Carletti, B. Gauzere, L. Brun, and M. Vento, “Approximate graph edit distance computation combining bipartite matching and exact neighborhood substructure distance,” in: Proc. GbRPR’15 (2015), pp. 188–197.Google Scholar
  54. 54.
    D. Blumenthal, S. Bougleux, J. Gamper, and L. Brun, “Ring based approximation of graph edit distance,” in: Proc. S+SSPR’18 (2018), pp. 293–303.Google Scholar
  55. 55.
    S. Bougleux, L. Brun, V. Carletti, P. Foggia, B. Gauzere, and M. Vento, “Graph edit distance as a quadratic assignment problem,” Pattern Recognition Letters, Vol. 87, 38–46 (2017).CrossRefGoogle Scholar
  56. 56.
    E. Daller, S. Bougleux, B. Gauzere, and L. Brun, “Approximate graph edit distance by several local searches in parallel,” in: Proc. ICPRAM’18 (2018), pp. 149–158.Google Scholar
  57. 57.
    M. Darwiche, D. Conte, R. Raveaux, and V. T’Kindt, “A local branching heuristic for solving a graph edit distance problem,” Comp. & Oper. Res. https://doi.org/ https://doi.org/10.1016/j.cor.2018.02.002.MathSciNetzbMATHCrossRefGoogle Scholar
  58. 58.
    K. Gouda, M. Arafa, and T. Calders, “A novel hierarchical-based framework for upper bound computation of graph edit distance,” Pattern Recognition, Vol. 80, 210–224 (2018).CrossRefGoogle Scholar
  59. 59.
    S.V. Slipchenko and D. A. Rachkovskij, “Analogical mapping using similarity of binary distributed representations,” Information Theories & Applications, Vol. 16, No. 3, 269–290 (2009).Google Scholar
  60. 60.
    D. A. Rachkovskij, “Some approaches to analogical mapping with structure-sensitive distributed representations,” Journal of Experimental & Theoretical Artificial Intelligence, Vol. 16, No. 3, 125–144 (2004).zbMATHCrossRefGoogle Scholar
  61. 61.
    D. A. Rachkovskij, “Formation of similarity-reflecting binary vectors with random binary projections,” Cybernetics and Systems Analysis, Vol. 51, No. 2, 313–323 (2015).zbMATHCrossRefGoogle Scholar
  62. 62.
    D. A. Rachkovsky and V. I. Gritsenko, Distributed Representation of Vector Data Based on Random Projections [in Ukrainian], Interservice, Kyiv (2018).Google Scholar
  63. 63.
    D. A. Rachkovskij and E. G. Revunova, “A randomized method for solving discrete ill-posed problems,” Cybernetics and Systems Analysis, Vol. 48, No. 4, 621–635 (2012).MathSciNetzbMATHCrossRefGoogle Scholar
  64. 64.
    E. G. Revunova, “Model selection criteria for a linear model to solve discrete ill-posed problems on the basis of singular decomposition and random projection,” Cybernetics and Systems Analysis, Vol. 52, No. 4, 647–664 (2016).MathSciNetzbMATHCrossRefGoogle Scholar
  65. 65.
    E. G. Revunova, “Averaging over matrices in solving discrete ill-posed problems on the basis of random projection,” in: Proc. CSIT’17 (2017), pp. 473–478.Google Scholar
  66. 66.
    P. Riba, J. Llados, A. Fornes, and A. Dutta, “Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases,” Pattern Recognition Letters, Vol. 87, 203–211 (2017).CrossRefGoogle Scholar
  67. 67.
    A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal, “Graph2vec: Learning distributed representations of graphs,” in: Proc. MLG’17 (2017), pp. 21:1–21:8.Google Scholar
  68. 68.
    P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,” Knowledge Based Systems, Vol. 151, 78–94 (2018).CrossRefGoogle Scholar
  69. 69.
    Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks.” arXiv:1901.00596. 10 Mar. 2019.Google Scholar
  70. 70.
    Y. Bai, H. Ding, S. Bian, T. Chen, Y. Sun, W. Wang, “SimGNN: A neural network approach to fast graph similarity computation,” in Proc. WSDM’19 (2019), pp. 384–392.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.International Research and Training Center for Information Technologies and SystemsNAS of Ukraine and MES of UkraineKyivUkraine

Personalised recommendations