Abstract
Graph matching plays an essential role in many real applications. In this paper, we study how to match two large graphs by maximizing the number of matched edges, which is known as maximum common subgraph matching and is NP-hard. To find exact matching, it cannot a graph with more than 30 nodes. To find an approximate matching, the quality can be very poor. We propose a novel two-step approach that can efficiently match two large graphs over thousands of nodes with high matching quality. In the first step, we propose an anchor-selection/expansion approach to compute a good initial matching. In the second step, we propose a new approach to refine the initial matching. We give the optimality of our refinement and discuss how to randomly refine the matching with different combinations. We further show how to extend our solution to handle labeled graphs. We conducted extensive testing using real and synthetic datasets and report our findings in this paper.
Similar content being viewed by others
Notes
The conference version of this work was reported in [38].
We cannot vary degrees for real datasets like PN.
References
Abu-Khzam, F.N., Samatova, N.F., Rizk, M.A., Langston, M.A.: The maximum common subgraph problem: faster solutions via vertex cover. In: AICCSA, pp. 367–373 (2007)
Almohamad, H.A., Duffuaa, S.O.: A linear programming approach for the weighted graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 522–525 (1993)
Arora, S., Safra, S.: Approximating clique is np-complete. In: Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, pp. 2–13 (1992)
Bai, X., Yu, H., Hancock, E.: Graph matching using spectral embedding and alignment. In: Proceedings of International Conference on Pattern Recognition, pp. 398–401 (2004)
Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509 (1999)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Bernard, M., Richard, N., Paquereau, J.: Functional brain imaging by eeg graph-matching. In: 27th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’05), pp. 5309–5312 (2005)
Blondel, V., Gajardo, A., Heymans, M., Senellart, P., Van Dooren, P.: A measure of similarity between graph vertices: Applications to synonym extraction and web searching. Siam Rev. 46(4), 647–666 (2004)
Bonchi, F., Esfandiar, P., Gleich, D.F., Greif, C., Lakshmanan, L.V.S.: Fast matrix computations for pair-wise and column-wise commute times and katz scores. CoRR abs/1104.3791 (2011)
Caelli, T., Kosinov, S.: An eigenspace projection clustering method for inexact graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 26(4), 515–519 (2004)
Caelli, T., Kosinov, S.: Inexact graph matching using eigen-subspace projection clustering. Int. J. Pattern Recognit. Artif. Intell. 18(3), 329–354 (2004)
Chevalier, F., Domenger, J.P., Benois-Pineau, J., Delest, M.: Retrieval of objects in video by similarity based on graph matching. Pattern Recogn. Lett. 28(8), 939–949 (2007)
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. IJPRAI 18(3), 265–298 (2004)
Foster, K.C., Muth, S.Q., Potterat, J.J., Rothenberg, R.B.: A faster katz status score algorithm. Comput. Math. Organ. Theory 7(4), 275–285 (2001)
Jouili, S., Tabbone, S.: Graph matching based on node signatures. In: GbRPR, pp. 154–163 (2009)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Knossow, D., Sharma, A., Mateus, D., Horaud, R.: Inexact matching of large and sparse graphs using laplacian eigenvectors. In: Proceedings of the 7th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition, p. 153. Springer (2009)
Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)
Krissinel, E., Henrick, K.: Common subgraph isomorphism detection by backtracking search. Softw. Practice Experience 34(6), 591–607 (2004)
Lee, W., Duin, R.: An inexact graph comparison approach in joint eigenspace. In: Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, p. 44. Springer (2008)
McGregor, J.: Backtrack search algorithms and the maximal common subgraph problem. Softw. Practice Experience 12(1), 23–34 (1982)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: ICDE, pp. 117–128 (2002)
Newman, M.E.J.: Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46, 323–351 (2005)
Ogata, H., Fujibuchi, W., Goto, S., Kanehisa, M.: A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. Nucleic Acids Res. 28(20), 4021–4028 (2000)
Qiu, H., Hancock, E.: Graph matching and clustering using spectral partitions. Pattern Recognit. 39(1), 22–34 (2006)
Raymond, J., Gardiner, E., Willett, P.: Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45(6), 631 (2002)
Riesen, K., Jiang, X., Bunke, H.: Exact and inexact graph matching: methodology and applications. In: Managing and Mining Graph Data (Chapter 7) (2010)
Singh, R., Xu, J., Berger, B.: Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Research in Computational Molecular Biology, pp. 16–31. Springer (2007)
Suters, W., Abu-Khzam F., Zhang, Y., Symons, C., Samatova, N., Langston, M.: A new approach and faster exact methods for the maximum common subgraph problem. Comput. Comb. 717–727 (2005)
Tong, H., Faloutsos, C., Pan, J.-Y.: Random walk with restart: fast solutions and applications. Knowl. Inf. Syst. 14(3), 327–346 (2008)
Ullmann, J.: An algorithm for subgraph isomorphism. J. ACM (JACM) 23(1), 42 (1976)
Umeyama, S.: An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell. 10(5), 695–703 (1988)
Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)
Xiao, B., Hancock, E., Wilson, R.: A generative model for graph matching and embedding. Comput. Vis. Image Underst. 113(7), 777–789 (2009)
Xu, L., King, I.: A PCA approach for fast retrieval of structural patterns in attributed graphs. IEEE Trans. Syst. Man Cybern. B Cybern. 31(5), 812–817 (2001)
Zaslavskiy, M., Bach, F., Vert, J.: A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 31(12), 2227–2242 (2009)
Zaslavskiy, M., Bach, F., Vert, J.: Global alignment of protein-protein interaction networks by graph matching methods. Bioinformatics 25(12), i259 (2009)
Zhu, Y., Qin, L., Yu, J.X., Ke, Y., Lin, X.: High efficiency and quality: large graphs matching. In: CIKM (2011)
Acknowledgments
The work was supported by the Research Grants Council of the Hong Kong SAR, China (419109), ARC Discovery Grants (ARCDP0987557, ARCDP110102937, ARCDP120104168), and NSFC61021004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, Y., Qin, L., Yu, J.X. et al. High efficiency and quality: large graphs matching. The VLDB Journal 22, 345–368 (2013). https://doi.org/10.1007/s00778-012-0292-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0292-8