Graph matching beyond perfectly-overlapping Erdős–Rényi random graphs

Graph matching is a fruitful area in terms of both algorithms and theories. Given two graphs G1=(V1,E1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1 = (V_1, E_1)$$\end{document} and G2=(V2,E2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_2 = (V_2, E_2)$$\end{document}, where V1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_1$$\end{document} and V2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_2$$\end{document} are the same or largely overlapped upon an unknown permutation π∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*$$\end{document}, graph matching is to seek the correct mapping π∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi ^*$$\end{document}. In this paper, we exploit the degree information, which was previously used only in noiseless graphs and perfectly-overlapping Erdős–Rényi random graphs matching. We are concerned with graph matching of partially-overlapping graphs and stochastic block models, which are more useful in tackling real-life problems. We propose the edge exploited degree profile graph matching method and two refined variations. We conduct a thorough analysis of our proposed methods’ performances in a range of challenging scenarios, including coauthorship data set and a zebrafish neuron activity data set. Our methods are proved to be numerically superior than the state-of-the-art methods. The algorithms are implemented in the R (A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, 2020) package GMPro (GMPro: graph matching with degree profiles, 2020).


Introduction
Graph matching has been an active area of research for decades.The research on graph matching can be traced back to at least 1970s (e.g.Ullmann, 1976), and interpreted as "graph matching", "network alignment" and "graph isomorphism".In this paper, we do not distinguish these terms, nor the terms "graph" and "networks", or "nodes" and "vertices".Mathematically, the graph matching problem can be loosely stated as follows.Given two graphs G 1 = (V 1 , E 1 ) and G 2 = (V 2 , E 2 ), it is assumed that V 1 and V 2 are the same or largely overlapped upon an unknown permutation π * .Graph matching is to seek the mapping π * between the vertices sets V 1 and V 2 .A correct matching would help augment the connectivity information between the vertices, and hence improve the graph analysis.In recent years, due to the advancements in collecting, storing and processing large volume of data, graph matching is going through a renaissance, with a surge of work on graph matching in different application areas.For instance, Narayanan and Shmatikov (2009) targeted at acquiring information from an anonymous graph of Twitter with the graph of Flickr as the auxiliary information; Kazemi et al. (2016) seek the alignment of proteinprotein interaction networks in order to uncover the relationships between different species; (Haghighi et al., 2005) constructed graphs based on texts relationship and developed a system for deciding whether a given sentence can be inferred from text by matching graphs.
Graph matching is an extremely fruitful research area.In the following, we review the existing literature from three different aspects, based on which, we characterize our main interest of this paper.
In terms of methodology, broadly speaking, the graph matching algorithms can be categorized into two schools: exact matching and inexact matching.The exact graph matching focus on deterministic graphs.It seeks a perfect matching, which is NP-hard in most cases, with exceptions in some special graph structures, for instance planar graphs (e.g.Eppstein, 2002).When we move from deterministic graphs to random graphs, it is challenging and not natural to seek a perfect matching.The inexact matching approaches are considered in this case.Existing methods designed for inexact matching include tree search types of methods (e.g.Sanfeliu and Fu, 1983), continuous optimization types of methods (e.g.Liu et al., 2012) and spectral-based convex relaxation types of methods.Due to the demand of computational feasibility when dealing with large-scale datasets, the spectral-based methods have been, arguably, the most popular type of methods.To be more specific, spectral-based methods include spectral matching (e.g.Leordeanu and Hebert, 2005), semidefinite-programming approaches (e.g.Schellewald and Schnörr, 2005) and doubly-stochastic relaxation methods (e.g.Gold and Rangarajan, 1996).For comprehensive reviews, we refer to Conte et al. (2004), Foggia et al. (2014) and Yan et al. (2016).
In terms of the underlying models, despite the large amount of algorithms proposed over the years, the majority of the efforts are on the Erdős-Rényi random graphs (Erdös and Rényi, 1959), which are fundamental yet realistic.Beyond the Erdős-Rényi random graphs, Patsolic et al. (2017) studied the graph matching in a random dot product graph (Young and Scheinerman, 2007) framework.Li and Campbell (2016) is concerned with the community matching in a multi-layer graph, the matching resolution thereof is at the community level, but not at the individual level.The study in this area usually is complicated by the misclustered vertices.
In terms of the proportion of overlapping vertices in two graphs.Some of the existing works consider the situations where the two graphs have identical vertices sets, while some consider the situations where the difference between two vertices sets is nonempty.In the sequel, we will refer to these two situations as perfectly-overlapping and partiallyoverlapping. Work on the latter includes the following: Pedarsani and Grossglauser (2011) studied the privacy of anonyized networks; Kazemi et al. (2015) defined a cost function for structural mismatch under a particular alignment and established a threshold for per-fect matchability; and Patsolic et al. (2017) provided a vector of probabilities of possible matchings.
We now specify the problem we are concerned about in this paper.
(1) We intend to exploit the degree information and extend the degree profile method, which shares connection with the doubly stochastic relaxation methods and which has been previously studied in Czajka and Pandurangan (2008) and Mossel and Ross (2017) for deterministic graphs and in Ding et al. (2018) for perfectly-overlapping Erdős-Rényi random graphs.(2) We consider network models with community structures, including stochastic block models, which is arguably the most popular network models for both theoretical and practical studies.(3) We tackle partially-overlapping graphs, e.g. the two graphs to be matched do not have identical vertices sets.Our contribution is listed below.
• We formally describe a partially-overlapping correlated Bernoulli networks model in Definition 2. Further, we explore the degree profile graph matching method for the newly defined partially-overlapping Erdős-Rényi random graphs and also the stochastic block random graphs.To the best of our knowledge, this is the first work to exploit the degree profile-type method in stochastic block model graph matching problems.
• We propose the edge exploited (EE) degree profile graph matching method.In addition, we propose refined EE algorithms, including pre-processing and post-processing steps.These proposed methods are demonstrated to outperform the state-of-the-art methods when the graphs are partially overlapping.
• The degree profile core of our methods enable us to conduct graph matching in the sparse regime, where the spectral-based methods usually fail.
The rest of this paper is organized as follows.In Section 2, we present our proposed methods.We kick off by reviewing a state-of-the-art degree profile method, extend it to handle the partially-overlapping scenarios, and finally tackle stochastic block model graph matching.Our proposed methods are supported by extensive numerical evidence on both simulated and real datasets in Sections 3-4.The paper is concluded by discussions in Section 5.

Methodology
In this section, we first state the partially-overlapping correlated Bernoulli models in Section 2.1, and introduce the degree profile graph matching method in Section 2.2.In Section 2.3, we propose the core edge exploited (EE) graph matching method, with its refinements in Section 2.4.The stochastic block models graph matching is tackled in Section 2.5.

Correlated Bernoulli networks
The degree profile method was pioneered in Czajka and Pandurangan (2008) and Mossel and Ross (2017) on graph matching of two identical graphs generated from Erdős-Rényi random graph models.This method is further studied in Ding et al. (2018) and is extended to correlated Erdős-Rényi random graphs.The key of the degree profile graph matching is to assign each vertex an empirical distribution of its neighbours' degrees, and match vertices by measuring the distance between each pair of the empirical distributions.We first set up the models in this section.
Definition 1 (Bernoulli networks G(n, Θ)).A network with vertices set {1, . . ., n} is a Bernoulli network G(n, Θ n×n ), if its associated adjacency matrix A ∈ R n×n , which is defined by 1, vertices i and j are connected by an edge, 0, otherwise, where {A ij , i < j} are independent Bernoulli random variables with E(A) = Θ.
Definition 1 includes the Erdős-Rényi random graphs, where all the off-diagonal entries of Θ are equal; stochastic block models, where Θ possesses a block structure; degree corrected block models, where degree heterogeneity is added; random dot product graphs, where latent positions are assumed.Note that in Definition 1, it is assumed that matrices A and Θ are symmetric with diagonal entries being zero.In fact, the definition along with the methods proposed later in this paper can be relaxed to more general cases.However, in this paper, we focus on Definition 1 and move on to more general cases in Section 5.
Definition 2 (Partially-overlapping correlated Bernoulli networks).Let G be the adjacency matrix of a given graph and s, ρ ∈ [0, 1] be the overlapping and correlation parameters, respectively.Construct a matrix A by independently keeping or removing each row (and the corresponding column) in G with probability 1 − s.Further, construct A by ∼ Bernoulli(ρ), i < j.The graph with adjacency matrix A is called a child graph of G. Relabel the vertices of G according to a latent permutation π * and then repeat the sampling process independently to obtain another child graph B. A and B are partially-overlapping correlated Bernoulli networks.
We list a few cases to better understand Definition 2. When s = ρ = 1, A and B are exactly the same up to a permutation (isomorphic graphs, e.g.Scheinerman and Ullman, 2011).If we fix s = 1 only, and let G be a realization of G(n, Θ) in Definition 1, where Θ has off-diagonals as a constant, then both A and B have all n vertices.In this case, A is in fact an adjacency matrix of G(n, sΘ), and B can be seen as Hence, it coincides with the perfectly-overlapping correlated Erdős-Rényi random graphs, which have been studied extensively in the existing literature including Lyzinski et al. (2014) and Ding et al. (2018), among others.When ρ = 1, both A and B are subgraphs of G.
Compared to the perfectly-overlapping correlated Erdős-Rényi random graphs, Definition 2 characterizes a general model, but inherits the key features that (i) A and B have identical marginal distributions, and (ii) the corresponding entries of A and B are correlated with correlation ρ.In practice, the underlying G is usually unknown, but A and B can be obtained from different studies.For instance, one may obtain a fully-known Amazon users network and an anonymized eBay users network, while the underlying true network is unknown.Due to the anonymity, it is only reasonable to assume the users are largely overlapping in these two networks, but not perfectly.
The goal of this paper is to match the vertices between A and B. Since exact matching between A and B may not exist, we seek best matching between largest overlapped subgraphs of A and B. Mathematically, for networks A and B with n A and n B vertices, respectively, we seek a permutation π defined as where Π m ranges over all m × m permutation matrices and •, • denotes the matrix inner product.

Degree profile graph matching
Generally speaking, degree profile graph matching methods exploit the degree information of all the neighbours to construct an empirical distribution for each vertex, and then match the vertices by comparing the similarity between these empirical distributions.In this section, we first detail the definition of degree profile and then explain the simplest form of the degree profile method in Algorithm 2.
Definition 3 (Degree profile).Let A ∈ R n×n be an adjacency matrix.For any i ∈ {1, . . ., n}, let a i = n j=1 A ij and N A (i) = {j : A ij = 1} be the degree and the neighbourhood of i, respectively.Further denote a (i) k as the degree of i's neighbour k that a The degree profile of vertex i in A is defined to be µ i (•) and denoted as DP(A, i).
The degree profile defined in Definition 3 is the second term of the iterated degree sequence.A necessary and sufficient condition for fractional isomorphism is that two graphs have identical iterated degree sequences.See Scheinerman and Ullman (2011) for more details.
In Ding et al. (2018), a similar definition is studied for the perfectly-overlapping Erdős-Rényi random graphs.In Ding et al. (2018), the degree profile is a normalized empirical distribution of neighbours' degrees, excluding edges between neighbours when counting degrees and standardizing the degrees such that they are mean zero and variance one random variables.This normalization is for theoretical simplicity when dealing with the behaviours of the empirical distributions.
With the degree profiles for all vertices in A and B, our next step is to introduce a distance (or similarity) between each pair (i, j), i ∈ A and j ∈ B and construct an n A × n B distance matrix W (or similarity matrix), where W ij denotes the distance (or similarity) between the degree profiles of i and j.Intuitively, if i ∈ A and j ∈ B is a true pair, that is to say π * (i) = j, then the distance W ij is small (or the similarity W ij is large), otherwise large (small).For each i ∈ A, hence, we seek the mapping π The mapping π may not be a permutation, since multiple vertices in A might be mapped to the same vertex j in B. Hence, the final graph matching is output by applying a maximal bipartite matching algorithm to this mapping π.Every vertex in B is matched to at most one vertex in A and there is no guarantee that all the vertices in A are matched to vertices in B. It is ensured that there exists no other bipartite matching which can match more vertices.In this paper, this is done by R (R Core Team, 2020) package igraph (Csardi and Nepusz, 2006), which uses the push-relabel algorithm introduced in Cherkassky et al. (1998).
With the analysis, we detail the, arguably, simplest degree profile algorithm, which invloves a subroutine of calculating the distance matrix in Algorithm 1 and implements the degree profile graph matching in Algorithm 2.
In Algorithm 1, the distance used thereof is W 1 , the 1-Wasserstein distance (e.g.Villani, 2009).In fact, any distance or similarity measure can be used here.We choose W 1 to demonstrate numerical results and it will be used throughout this paper.In Section 2.4, in the refined algorithms that Algorithm 4 and 5, the similarity measure used is the number of common neighbours of i and j according to the prior information, which delivers satisfactory results.
Algorithm 2 is similar to Algorithm 1 proposed in Ding et al. (2018).A difference is that Algorithm 2 seeks the pair with minimal distance for each vertex, but Ding et al. (2018) consider n pairs with smallest distances among all possible combinations.The two approaches deliver the same result if two networks are perfectly overlapping, yet Algorithm 2 can also provide possible matchings for vertices without counterparts.More discussions on this are available later.
The theoretical properties of degree profile graph matching have been extensively studied in Ding et al. (2018).The main advantage of degree profile graph matching over competitors is the ability to conduct polynomial-time graph matching in a sparse regime, where the spectral-type methods would fail.The main challenges in deriving the theoretical properties of the output of Algorithm 2 is to carefully control the fact that the degree profiles are linear combinations of correlated random variables, and this is out of the scope of this paper.

Edge exploited methods for partially-overlapping graphs
The state-of-the-art methodology on the degree profile graph matching method is restricted to the cases where a bijection exists between the vertices sets of two graphs.This, however, is by no means realistic in more real-life problem.It is, therefore, of great interest to extend Algorithm 2 to handle the partially-overlapping networks.
Algorithm 3 Edge exploited degree profile graph matching.EE(A, B, d) There are two differences between Algorithms 2 and 3. First, in Algorithm 3, we introduce an additional parameter d, which is in fact taken to be 1 in Algorithm 2. In Algorithm 3, the matrix Z is an adjacency matrix of a bipartite graph, where each vertex in A is connected to one and only one vertex in B. In the edge exploited version Algorithm 3, we allow for d edges for each vertex in A. A demonstration is depicted in Figure 1 with d = 3.Instead of matching each vertex in A to an individual vertex in B, we match A to a hypergraph built upon B with hyper-edges of size at most d = 3.Second, the final output of Algorithm 2 from a maximum bipartite graph matching algorithm, and it does not allow for matching one vertex to a collection of vertices.To overcome this, we omit the maximum bipartite graph matching step in Algorithm 3 and output the matching matrix Z directly.To see why it is necessary to consider matching a node with more than one nodes, we match two partially-overlapping graphs A and B from Definition 2, with Θ ij = 0.1, With the introduction of the parameter d, in terms of correctly matched pairs, Algorithm 3 of course improves substantially over Algorithm 2, which we will elaborate in Section 3. The rationale behind is that in reality, adopting Algorithm 2 will nail down the matching to a small size of candidates.If the requirements on accuracy are not to the individual level, then instead of matching each vertex to at most one vertex, one would pay the price of increasing the matching size in order to find the correct matching.This is common in advertizing, for instance.This also shares similarity with Fishkind et al. (2012), where the output is a probability distribution attached to each vertex representing the probability of potential matches.The output of Algorithm 3 can be regarded as a uniform distribution over d potential matches.

Refinement
The key component of the degree profile graph matching algorithms in Bernoulli networks is discussed in Algorithms 2 and 3.In practice, Algorithm 2 suffers from the small matching size and unsatisfactory recovery rate, and the Algorithm 3 can only provide a matching set for each vertex.It is of question whether any additional steps can help to refine the matching result.In this subsection, we discuss two refinement algorithms, focusing on preprocessing and post processing, respectively.

Preprocessing
In practice, the high degree vertices have many neighbours and enjoy ample information for a successful matching.A natural idea is to first find such high degree vertices and their counterparts in the other graph, and then extend the matchings of high degree vertices only to matchings of all.This can be done by finding vertices with degrees larger than a pre-specified threshold.
Based on this idea, we propose the seeded edge exploited graph matching algorithm in Algorithm 4. We first establish a collection of matches π 0 for the high degree vertices (degrees are at least τ 1 ), the distances of which are the smallest among the pairs in consideration (distances are at most τ 2 ).The set of these vertices is called the seeds set, S. Next, we calculate the similarity between i ∈ A and j ∈ B using W ij = k∈S A ik B jπ 0 (k) , the number of common neighbours between i and j based on π 0 .We then turn the similarity matrix W to a bipartite adjacency matrix using the threshold τ 3 .With the maximum bipartite matching, we find a one-to-one correspondence between A and B as π 1 , so that the number of common neighbours is maximized.Finally, we calculate the similarity again based on π 1 , and find the matching set for each vertex as the vertices set with largest similarity.Details can be found in Algorithm 4.
In Algorithm 4, there are three thresholds to find a proper original matching.In practice, we conduct grid search to determine {τ 1 , τ 2 , τ 3 }.For the two graphs, we calculate the degrees of all vertices and select 7 candidates for τ 1 , which correspond to the i-th quantile of vertices' degrees, i ∈ {0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8}.Possible τ 2 is chosen from the j-th, j ∈ {0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}, quantile of the minimum distance between vertices in two graphs.The best combination of τ 1 and τ 2 is supposed to give the largest collection Algorithm 4 Edge exploited degree profile graph matching with preprocessing of seeds.Having obtained seeds, the number of common neighbours are calculated between all pairs of vertices and denoted as U ik .The parameter τ 3 is defined as the n−1 n -th quantile of U ik , which guarantees that the number of nonzero elements in U is approximately n.If all possible combinations provided empty seed sets, Algorithm 3 is summoned instead.
The concept of seeded graph matching is used in other ways in the literature.For instance, in Lyzinski et al. (2014) and Lyzinski et al. (2015), the seeds mean the information of some known vertices correspondence and seeded graph matching utilizes these known partial matching and includes them as constraints in the optimization.In Ding et al. (2018), the seeded degree profile graph matching starts with no known partial matchings and aims to refine Algorithm 1 in relatively dense graphs.We would like to emphasize that this relatively dense regime studied there is even too sparse for spectral-based graph matching methods to perform well.
Algorithm 4 can be regarded as an edge exploited version of Algorithms 2 and 3 in Ding et al. ( 2018).As we have mentioned, the main task of this paper is to move beyond the perfectly-overlapping Erdős-Rényi random graphs, therefore, in Algorithm 4, we adopt an edge exploited version.To motivate the preprocessing step, we alter the settings in Figure 2 slightly, by increasing the overlapping parameter s from 0.9 to 0.99, which results in an easier problem.In Figure 3, we again exhibit the true ranks.Different from Figure 2, we can see that in this easier setting, almost all the true matchings are the ones with smallest distances.A preprocessing step will return a set of true matching.

Post processing
The way to produce a seeds set in Algorithm 4 sheds light on the post-processing step.With any preliminary graph matching result π t (this can be from either Algorithm 2 or Algorithm 3), we can define the similarity between i ∈ A and j ∈ B as which is the number of common neighbours between i and j according to the matching π 0 .Based on the similarity matrix, we use maximum bipartite matching to maximize the number of common neighbours for the matched vertices.Now we rewrite the matching π t as Π t , where Π t is an n A × n B permutation matrix with (Π t ) ij = 1 if j ∈ π t (i) and 0 otherwise.Given Π t , The post processing step is to seek a refinement Π t+1 satisfying The intuition is to refine the result iteratively by optimizing this quadratic assignment problem.Details are collected in Algorithm 5.
Algorithm 5 Edge exploited degree profile graph matching with post processing.
In addition to the graph matching output Π 0 , we also output a convergence indicator vector FLAG.In practice, we have observed that the true matches usually reach the convergence and stay the same after a few iterations, while the false matches may keep changing in the iterations.Instead of giving a guidance on the choice of n rep , we report the convergence indicators for each matching as a reference for the certainty about the matching.Default value for τ is n rep /10, which means for the final 10% iterations, the matchings staying the same are regarded as "converged".
The post processing algorithm in Algorithm 5 is inspired by the iterative clean-up procedure proposed in Ding et al. (2018).Algorithm 5 is shown to be numerically superior in more challenging setting and provides more information to improve the matching accuracy.

Graph matching in community-structured networks
Since most of the theoretically-justified graph matching algorithms are designed for perfectlyoverlapping Erdős-Rényi random graphs, including the degree profile graph matching, a natural question when we move beyond is whether to conduct graph matching directly on, say stochastic block models, or to conduct community detection first then match the graphs.
Before we investigate this problem, we first state the community detection algorithm we adopt in this paper.The spectral clustering on ratios-of-eigenvectors was proposed in Jin (2015) and detailed below for completeness.
Algorithm 6 Spectral clustering on ratios-of-eigenvectors SCORE(A, K) Algorithm 7 Degree profile graph matching with community detection end for π µ ← ∪ K k=1 π k end for OUTPUT: {Π µ , µ ∈ S K } In Section 3.2, we conduct a systematic investigation on the following two approaches: (1) first applying Algorithm 6, then applying a graph matching algorithm within communities; (2) directly applying a graph matching algorithm.
There are various different community detection methods, even within the category of spectral-based methods.As for the methods we have applied, there is no obvious differences between those based on Algorithm 6 and those based on other spectral clustering methods.
As for the first approach, we further detail two algorithms listed in Algorithms 7 and 8.In Algorithm 7, we first apply Algorithm 6 and then use a certain graph matching method to match different communities.Note that S K is the collection of all possible permutations on {1, . . ., K}.We write Algorithm 7 in a generic and, in fact, incomplete way.The output of Algorithm 7 has K! many matching results.Algorithm 8 can be regarded a post processing version of Algorithm 7 using the post processing method we introduced in Algorithm 5.The quantity Eval in Algorithm 8 is short for evaluation, which is algorithm-specific.For instance, if the graph matching algorithm used thereof is chosen to be Algorithm 2 or Algorithm 8 Edge exploited degree profile graph matching with community detection Algorithm 5, then Eval can be taken as the number of matched vertices or the number of converged vertices, respectively.
We now come back to investigate the choice between approaches (1) and ( 2).The evaluation is twofold: the theoretical limits and the violation to the theoretical guarantees of the graph matching methods.
We first resort to the theoretical limits of Algorithms 2 and 6.It is established (see e.g.Rohe et al., 2011, Theorem 2.2) that to ensure the misclustered nodes are consisted of a vanishing ratio of all the nodes, the entries in Θ defined in Definition 1 are at least of order log −1/2 (n), which is a much stronger condition than the ones required in Algorithm 2. For instance, in order to achieve a perfect matching with high probability in two perfect overlapped correlated Erdős-Rényi random graphs, the required lower bound on the Erdős-Rényi parameter is of order log 2 (n)/n.This is to say, in terms of the order of Θ ∞ , the limit of approach (1) is at least log −1/2 (n), and log 2 (n)/n in (2).However, we should bear in mind that the log 2 (n)/n is established for Erdős-Rényi random graphs but not for stochastic block models.
In terms of the violations of the theoretical guarantees provided in Ding et al. (2018), we first state the rationale behind the approach (1).Since Algorithm 2 is only theoretically justified on correlated Erdős-Rényi random graphs, it might be helpful to conduct community detection first to reduce a stochastic block model graph matching problem to a patially-overlapping Erdős-Rényi one.In fact, we cannot guarantee that with probability tending to 1, there is no misclustered vertex.This means even if we are in a regime where the community detection is strongly consistent, the resulting community may still contain misclustered vertices.The matching conducted in the approach (1) is a graph matching over partially-overlapping graphs.

Simulation analysis
In this section, we conduct a thorough simulation analysis on the numerical performances of the algorithms proposed in Section 2. For notational simplicity, we refer to Algorithms 2, 3, 4 and 5 as DP (degree profile), EE (edge exploited version), EE-pre (preprocessing, EE-) and EE-post (post processing, EE+), respectively.We will see that our proposed methods can perform well in challenging situations for partially-overlapping graphs and for stochastic block models.
As for the tuning parameters used in the algorithms, we let d ∈ {10, 30}, where d is the tuning parameter for the edge exploited step in Algorithms 3, 4 and 5.The tuning parameters required in Algorithm 4 are generated automatically based on the grid search method we introduced in Section 2.4.1.
The methods we adopt are DP, EE, EE-pre and EE-post.The performances are evaluated by the recovery rate over all nodes that have counterparts in the other graph.Note that we allow for partially-overlapping graphs, hence not every single node has a counterpart in the other graph, and the number of these overlapping nodes is usually smaller than the number of nodes in a single graph.
The results are collected in Figure 4. Since our methods are asymmetry for the two graphs, we present the recovery rates for each graph separately.Despite the asymmetry, the difference between graphs are negligible.
The three parameter settings are in difficulty decreasing order.In Setting 1, (ρ, s) = (0.9, 0.95), in terms of the recovery rate, EE algorithms are the best.This is not a surprise, since they are the only ones allowing for matching one node to multiple nodes.Even so, the recovery rate is still below half.In Setting 3, (ρ, s) = (1, 1), the two graphs are identical.All algorithms behave well.In Setting 2, (ρ, s) = (0.95, 0.98), we can see that EE-post dominantly outperformed all the other methods, even though EE algorithms allow for multiple matching while EE-post only for single matching.
Among all settings, EE-post with d = 30 is similar or worse than the case d = 10.It suggests a small tuning parameter d for successful results.Besides the recovery rate, there is also a convergence parameter FLAG for EE-post algorithm.Interestingly, if we roughly regard the iterations with FLAG i > n/2 as iterations that EE-post succeeds, then EEpost has recovery rate around 0.9 for all the successful iterations, and approximately 0 for others.The convergence indicator provides supporting information to decide whether the matching is reliable or not.

Correlated stochastic block models
In this subsection, we consider graph matching in correlated stochastic block models.Different from Section 3.1, we only consider perfectly-overlapping graphs.In Section 2.5, we have discussed that different theoretical limits for community detection and graph matching may induce misclustered nodes and hence partially-overlapping graphs to match.
The simulation settings involve the following parameters: (i) the network size n = 1000, (ii) the number of communities K = 2, (iii) the within communities probability q ∈ {0.10, 0.05} and the between communities probability q/2, and (iv) the probability of keeping an edge from the parent graph ρ ∈ {0.95, 0.93, 0.9}.Each setting is repeated 10 times.
As for the tuning parameters used in the algorithms, we let d ∈ {10, 50}, where d is the tuning parameter for the edge exploited step in Algorithms 3, 4 and 5.The tuning parameters required in Algorithm 4 are generated automatically based on the grid search method we introduced in Section 2.4.1.
In this scenario, we compare results from six different methods.(i) Algorithm 2, (ii) Algorithm 5, (iii) Algorithm 7 with Algorithm 2, (iv) Algorithm 7 with Algorithm 5, (v) Algorithm 8 with Algorithm 2 and (vi) Algorithm 8 with Algorithm 5.In Section 2.5 we have mentioned that the output of Algorithms 7 and 8 are not necessarily unique.In (iii) and (v), we choose the permutations of the communities which return more matchings.In (iv) and (vi), we report the ones with larger converging matchings.The measurements we adopt here are similar to those in Section 3.1, except that in this section, we do not report the results for graphs A and B separately.Since we let s = 1 and the algorithms we evaluate only report at most one matching, the recovery results for graphs A and B are identical.
Setting 1 is the most difficult one.For this setting, EE-post methods can actually achieve almost perfect recovery for relatively sparse graphs (q = 0.05, right column panels).Another interesting thing to notice in Setting 1 is that, EE-post with d = 10 can perform better in the sparse graphs while EE-post with d = 50 performs better in the dense graphs.It may indicate a choice of small d for sparse graphs in practice.In Setting 3, all algorithms perform well except (i) and (iii), both of which are based on Algorithm 2.
In order to answer the question that if one should do community detection before matching two stochastic block models, we recall that algorithm (ii) is to directly match stochastic block models, (iv) is to conduct EE-post on estimated communities and (vi) is to conduct EE-post on the estimated communities first and then the whole graph.The comparable settings for this matter are Settings 1 and 2. We can see that in the denser graphs (left column panel), conducting EE-post on both the estimated communities and the whole graph perform best.In the sparser graphs (right column panel), directly matching graphs perform best.This is to some extent expected, since the success of community detection relies on more stringent density requirements than the degree profile algorithms.

Real data
In this section, we conduct analysis on two real datasets and focus on the performance of Algorithm 2, Algorithm 3 and Algorithm 5.

Coauthor dataset
In this section, we analyse the coauthorship dataset, which is originally studied in Ji and Jin (2016), to find the co-authorship patterns between statisticians according to the publications in the Annals of Statistics (AoS), Biometrika, Journal of American Statistician Association (JASA) and Journal of Royal Statistical Society, Series B (JRSSB), during the period Years 2003-2012.For any three distinct journals A, B and C chosen from the above mentioned four journals, we construct two networks.One network is formed by the authors who published papers in A and/or B, namely A ∪ B. The other network is formed by the authors who published papers in A and/or C, namely A ∪ C. In A ∪ B(A ∪ C), a node is an author who have published in A ∪ B(A ∪ C), and an edge indicates that the corresponding two authors have at least one coauthored paper published in A ∪ B(A ∪ C).This construction provides partially-overlapping networks.
Since the networks contain isolated nodes and pairs, which provide little information for graph matching, we preprocess the two networks as follows.An author is kept only when they has common coauthors in both A ∪ B and A ∪ C. We then extract the giant components of these two networks respectively.The resulting giant components are the final networks we work on.Note that, the sizes of the giant components are about half of the original networks, and the final two networks have different size.
As for EE and EE-post, we let d = 5 and n rep = 50.In EE-post, we let τ = 5.It means we consider a matching as "converged matching" when the matching stays the for at least last 5 iterations.We introduce this new notion in the real data analysis, since our methods perform well without this additional criteria in the simulated data.The detailed network sizes exhibited in Table 1.Note that, no matter which combination of journals is considered, the corresponding pairs of networks are partially overlapping.In fact, the sizes of overlapped nodes are much smaller than that of networks.
In Figure 6, we depict the recovery rates over five different metrics.DP(all) is the recovery rate of Algorithm 2 over all nodes, and it is smaller than DP(mat), which is the recovery rate of Algorithm 2 over all matched nodes.Apparently, DP(mat) is alway larger than DP(all), so in Figure 6, we stack the differences between these two on top of DP(all).The larger the differences are, the smaller the matched nodes ratios are.As for Algorithm 5, we also consider two metrics, EE+(all) -the recovery rate in terms of all nodes, and EE+(conv) -the recovery rate in terms of converged nodes.In Figure 6, we also stack the difference between EE+(conv) and EE+(all) on top of EE+(all).
A fair comparison is to compare EE+(conv) with DP(mat), and to compare EE+(all) with DP(all).As we can see, EE-post consistently and substantially outperform DP in all aspects.In particular, EE+(conv) shows an even more prominent improvement, which suggests that, for real data where the underlying truth is unknown and the matching accuracy is of concern, we can use the converged matchings of EE-post as a reliable matching.
To provide more insights of EE-post methods, we examine three specific authors in the coauthor dataset.We use the dataset AoS ∪ Biometrika and AoS ∪ JASA for illustration.
• Converged and correctly matched.An example of this category is Author 60.It has in total three coauthors in the dataset concerned, and all these three coauthors occur in the AoS.This suggests that Author 60 has the same size of neighbourhood in AoS ∪ Biometrika and AoS ∪ JASA.In addition, at least one of these three neighbours is correctly matched.These two facts provide ample information for graph matching, and result in Author 60 being a converged node in EE-post algorithm and is correctly matched.
• Converged but wrongly matched.An example of this category is Author 222.It has zero coauthor in AoS, four in JASA and four in Biometrika.The intersection of it's JASA and Biometrika collaborators sets is of size three.In terms of graph matching Author 222, the interference signal comes from Author 655, who share two coauthors with Author 222 and who is wrongly matched to Author 222.A relatively large number of coauthors leads to the convergence, while the interference signal results in a wrong match.
• Correctly matched but not converged.An example of this category is Author 115.Note that Author 115 has five coauthors in the dataset concerned, but only one of these five neighbours is correctly matched.This causes that in the iterations, the matching of Author 115 is not stable, but one possible matching is correct due to the relatively large number of neighbours.This example also sheds light on the rationale of adopting EE with d > 1, when one can afford a multiple matching storage.

Zebrafish dataset
In this section, we analyse a zebrafish neuronal activity dataset.This dataset is originally acquired and processed in Prevedel et al. (2014) and is a time series of whole-brain zebrafish neuronal activity.We follow the preprocessing routine conducted in Lyzinski et al. (2017) and subtract a slice of neuronal activity network which is in fact the sample correlation matrix in a small window of time.This can be regarded as the adjacency matrix of a weighted undirected network, with 5105 nodes.The further analysis conducted in this section is based on thresholding the entries of this correlation matrix R to provide adjacency matrices in {0, 1} 5105×5105 .We conduct two sets of simulation based on this dataset.One is to match graphs generated from two different thresholds and the other is based on the same thresholds.To be specific, in the different thresholds setting, we first use threshold t 1 ∈ {0.5, 0.6, 0.7} to produce a matrix A 1 ∈ {0, 1} 5105×5105 , by letting (A 1 ) ij = 1{R ij ≥ t 1 }, and use t 2 = t 1 +0.1 to produce B 1 ∈ {0, 1} 5105×5105 .For each of A 1 and B 1 , we then subtract the leading principal sub-matrix A 2 , B 2 ∈ {0, 1} m×m , m ∈ {100, 300, 1000}.Finally, for each node in A 2 (B 2 ), we independently keep it with probability s ∈ {0.95, 0.97} to produce A 3 (B 3 ), and output A(B) by deleting isolated nodes.When matching A and B, we also randomly permute the nodes in B to increase difficulty.In the same threshold setting, we let A 1 = B 1 using the same threshold t ∈ {0.5, 0.6, 0.7}, and follow the rest of the procedures as those in the different threshold scenarios.It is worth mentioning that, in the different threshold scenario, the higher threshold graph is a sub-graph of the lower threshold graph; in the same threshold scenario, we have that ρ = 1.
Each combination of the parameters mentioned above is repeated 10 times.In particular, in the same threshold setting, the repetitions are conducted by permuting the nodes 10 times.
The numerical results are depicted in Figures 7 and 8, for the different and same thresholds settings, respectively.As for Algorithm 2, we calculate the recovery rates over all nodes, DP(all) and matched nodes, DP(mat), separately.Since DP(mat) is always larger than DP(all), we stack the difference between these two rates on top of the DP(all) in the figures.As for Algorithm 5, we calculate the recovery rates over all nodes, EE+(all) and converged nodes, EE+(conv), separately.For the same reasons as stated for Algorithm 2, we stack the two bars in one in each panel in the figures.
Generally speaking, as the thresholds increase, all the performances deteriorate, since the networks become sparser and the matching problems become harder.The two scenarios, different and same thresholds, show very similar information, and in most of cases, all methods perform slightly better in the same threshold scenario.It is interesting to see that EE has almost full recovery in most settings, even though this is based on real datasets.Since the convergence rates of EE-post are high across all settings, the recovery rates of EE-post in two different metrics are comparable.Overall, EE and EE-post outperform DP.We would like to point out, as the network size increases, EE and EE-post improve their performances, while DP deteriorates.This further suggests that in reality, EE-type methods are preferable over the original DP algorithm.

Discussions
In this paper, we investigated the extensions of the degree profile graph matching in perfectly-overlapping Erdős-Rényi random graphs.The extensions include partially-overlapping graphs matching and stochastic block model graph matching.We proposed the edge exploited graph matching algorithm and its variants, and conducted thorough numerical experiments to evaluate their performances.
In Definition 1, we focused on simple graphs, i.e. there are no multiple edges between any give pair of nodes.The extension to multiple edge networks is straightforward, since all our current methods are based on counting the edges.Other possible extensions include graph matching on directed graphs and the theoretical guarantees associated.We will leave these for future work.
Figure 1: A cartoon of the edge exploited matching

Figure 3 :
Figure 3: Histogram of the ranks of true pairs.Most true pairs have the smallest distance that will be chosen as the seeds.

Figure 6 :
Figure 6: Results in Section 4.1.Each panel corresponds a Journal A, as indicated at the top-left corner of the panel.Every three consecutive bars in a panel correspond to a different choice of Journals B and C, as indicated at the top of the bars.The metrics are: DP(mat), the recovery rate of Algorithm 2 over all matched pairs; DP(all), the recovery rate of Algorithm 2 over all nodes; EE, the recovery rate of Algorithm 3 in terms of all nodes; EE+(all), the recovery rate of Algorithm 5 over all nodes; EE+(conv), the recovery rate of Algorithm 5 over all converged pairs.

Figure 7 :
Figure 7: Results in Section 4.2, different thresholds scenarios.Each bar indicates the mean and standard error of a certain metric.The size m and the correlation parameter ρ are indicated in each panel.In each panel, every three consecutive bars represent metric values for a threshold value, indicated at the top.The metrics are: DP(mat), the recovery rate of Algorithm 2 over all matched pairs; DP(all), the recovery rate of Algorithm 2 over all nodes; EE, the recovery rate of Algorithm 3 in terms of all nodes; EE+(all), the recovery rate of Algorithm 5 over all nodes; EE+(conv), the recovery rate of Algorithm 5 over all converged pairs.

Figure 8 :
Figure 8: Results in Section 4.2, same thresholds scenarios.Each bar indicates the mean and standard error of a certain metric.The size m and the correlation parameter ρ are indicated in each panel.In each panel, every three consecutive bars represent metric values for a threshold value, indicated at the top.The metrics are: DP(mat), the recovery rate of Algorithm 2 over all matched pairs; DP(all), the recovery rate of Algorithm 2 over all nodes; EE, the recovery rate of Algorithm 3 in terms of all nodes; EE+(all), the recovery rate of Algorithm 5 over all nodes; EE+(conv), the recovery rate of Algorithm 5 over all converged pairs.

Table 1 :
Dataset sizes of those studied in Section 4.1.Size A ∪ B: the size of the giant component in processed A ∪ B; Size A ∪ C: the size of the giant component in processed A ∪ C; Size Overlap: the number of overlapping nodes of the two networks.