The VLDB Journal

, Volume 22, Issue 3, pp 345–368 | Cite as

High efficiency and quality: large graphs matching

  • Yuanyuan Zhu
  • Lu Qin
  • Jeffrey Xu Yu
  • Yiping Ke
  • Xuemin Lin
Regular Paper

Abstract

Graph matching plays an essential role in many real applications. In this paper, we study how to match two large graphs by maximizing the number of matched edges, which is known as maximum common subgraph matching and is NP-hard. To find exact matching, it cannot a graph with more than 30 nodes. To find an approximate matching, the quality can be very poor. We propose a novel two-step approach that can efficiently match two large graphs over thousands of nodes with high matching quality. In the first step, we propose an anchor-selection/expansion approach to compute a good initial matching. In the second step, we propose a new approach to refine the initial matching. We give the optimality of our refinement and discuss how to randomly refine the matching with different combinations. We further show how to extend our solution to handle labeled graphs. We conducted extensive testing using real and synthetic datasets and report our findings in this paper.

Keywords

Graph matching Maximum common subgraph Vertex cover 

References

  1. 1.
    Abu-Khzam, F.N., Samatova, N.F., Rizk, M.A., Langston, M.A.: The maximum common subgraph problem: faster solutions via vertex cover. In: AICCSA, pp. 367–373 (2007)Google Scholar
  2. 2.
    Almohamad, H.A., Duffuaa, S.O.: A linear programming approach for the weighted graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 522–525 (1993)CrossRefGoogle Scholar
  3. 3.
    Arora, S., Safra, S.: Approximating clique is np-complete. In: Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, pp. 2–13 (1992)Google Scholar
  4. 4.
    Bai, X., Yu, H., Hancock, E.: Graph matching using spectral embedding and alignment. In: Proceedings of International Conference on Pattern Recognition, pp. 398–401 (2004)Google Scholar
  5. 5.
    Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509 (1999)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)MATHCrossRefGoogle Scholar
  7. 7.
    Bernard, M., Richard, N., Paquereau, J.: Functional brain imaging by eeg graph-matching. In: 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’05), pp. 5309–5312 (2005)Google Scholar
  8. 8.
    Blondel, V., Gajardo, A., Heymans, M., Senellart, P., Van Dooren, P.: A measure of similarity between graph vertices: Applications to synonym extraction and web searching. Siam Rev. 46(4), 647–666 (2004)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Bonchi, F., Esfandiar, P., Gleich, D.F., Greif, C., Lakshmanan, L.V.S.: Fast matrix computations for pair-wise and column-wise commute times and katz scores. CoRR abs/1104.3791 (2011)Google Scholar
  10. 10.
    Caelli, T., Kosinov, S.: An eigenspace projection clustering method for inexact graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 26(4), 515–519 (2004)Google Scholar
  11. 11.
    Caelli, T., Kosinov, S.: Inexact graph matching using eigen-subspace projection clustering. Int. J. Pattern Recognit. Artif. Intell. 18(3), 329–354 (2004)CrossRefGoogle Scholar
  12. 12.
    Chevalier, F., Domenger, J.P., Benois-Pineau, J., Delest, M.: Retrieval of objects in video by similarity based on graph matching. Pattern Recogn. Lett. 28(8), 939–949 (2007)CrossRefGoogle Scholar
  13. 13.
    Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. IJPRAI 18(3), 265–298 (2004)Google Scholar
  14. 14.
    Foster, K.C., Muth, S.Q., Potterat, J.J., Rothenberg, R.B.: A faster katz status score algorithm. Comput. Math. Organ. Theory 7(4), 275–285 (2001)CrossRefGoogle Scholar
  15. 15.
    Jouili, S., Tabbone, S.: Graph matching based on node signatures. In: GbRPR, pp. 154–163 (2009)Google Scholar
  16. 16.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999) Google Scholar
  17. 17.
    Knossow, D., Sharma, A., Mateus, D., Horaud, R.: Inexact matching of large and sparse graphs using laplacian eigenvectors. In: Proceedings of the 7th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition, p. 153. Springer (2009)Google Scholar
  18. 18.
    Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)Google Scholar
  19. 19.
    Krissinel, E., Henrick, K.: Common subgraph isomorphism detection by backtracking search. Softw. Practice Experience 34(6), 591–607 (2004)CrossRefGoogle Scholar
  20. 20.
    Lee, W., Duin, R.: An inexact graph comparison approach in joint eigenspace. In: Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, p. 44. Springer (2008)Google Scholar
  21. 21.
    McGregor, J.: Backtrack search algorithms and the maximal common subgraph problem. Softw. Practice Experience 12(1), 23–34 (1982)MATHCrossRefGoogle Scholar
  22. 22.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: ICDE, pp. 117–128 (2002)Google Scholar
  23. 23.
    Newman, M.E.J.: Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46, 323–351 (2005)CrossRefGoogle Scholar
  24. 24.
    Ogata, H., Fujibuchi, W., Goto, S., Kanehisa, M.: A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28(20), 4021–4028 (2000)Google Scholar
  25. 25.
    Qiu, H., Hancock, E.: Graph matching and clustering using spectral partitions. Pattern Recognit. 39(1), 22–34 (2006)CrossRefGoogle Scholar
  26. 26.
    Raymond, J., Gardiner, E., Willett, P.: Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45(6), 631 (2002)Google Scholar
  27. 27.
    Riesen, K., Jiang, X., Bunke, H.: Exact and inexact graph matching: methodology and applications. In: Managing and Mining Graph Data (Chapter 7) (2010)Google Scholar
  28. 28.
    Singh, R., Xu, J., Berger, B.: Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Research in Computational Molecular Biology, pp. 16–31. Springer (2007)Google Scholar
  29. 29.
    Suters, W., Abu-Khzam F., Zhang, Y., Symons, C., Samatova, N., Langston, M.: A new approach and faster exact methods for the maximum common subgraph problem. Comput. Comb. 717–727 (2005)Google Scholar
  30. 30.
    Tong, H., Faloutsos, C., Pan, J.-Y.: Random walk with restart: fast solutions and applications. Knowl. Inf. Syst. 14(3), 327–346 (2008)MATHCrossRefGoogle Scholar
  31. 31.
    Ullmann, J.: An algorithm for subgraph isomorphism. J. ACM (JACM) 23(1), 42 (1976)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell. 10(5), 695–703 (1988)MATHCrossRefGoogle Scholar
  33. 33.
    Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)CrossRefGoogle Scholar
  34. 34.
    Xiao, B., Hancock, E., Wilson, R.: A generative model for graph matching and embedding. Comput. Vis. Image Underst. 113(7), 777–789 (2009)CrossRefGoogle Scholar
  35. 35.
    Xu, L., King, I.: A PCA approach for fast retrieval of structural patterns in attributed graphs. IEEE Trans. Syst. Man Cybern. B Cybern. 31(5), 812–817 (2001)CrossRefGoogle Scholar
  36. 36.
    Zaslavskiy, M., Bach, F., Vert, J.: A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2227–2242 (2009)CrossRefGoogle Scholar
  37. 37.
    Zaslavskiy, M., Bach, F., Vert, J.: Global alignment of protein-protein interaction networks by graph matching methods. Bioinformatics 25(12), i259 (2009)CrossRefGoogle Scholar
  38. 38.
    Zhu, Y., Qin, L., Yu, J.X., Ke, Y., Lin, X.: High efficiency and quality: large graphs matching. In: CIKM (2011)Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Yuanyuan Zhu
    • 1
  • Lu Qin
    • 1
  • Jeffrey Xu Yu
    • 1
  • Yiping Ke
    • 1
  • Xuemin Lin
    • 2
    • 3
  1. 1.The Chinese University of Hong KongSha TinHong Kong, China
  2. 2.University of New South WalesSydneyAustralia
  3. 3.NICTASydneyAustralia

Personalised recommendations