# Efficient random graph matching via degree profiles

## Abstract

Random graph matching refers to recovering the underlying vertex correspondence between two random graphs with correlated edges; a prominent example is when the two random graphs are given by Erdős-Rényi graphs $$G(n,\frac{d}{n})$$. This can be viewed as an average-case and noisy version of the graph isomorphism problem. Under this model, the maximum likelihood estimator is equivalent to solving the intractable quadratic assignment problem. This work develops an $${\widetilde{O}}(n d^2+n^2)$$-time algorithm which perfectly recovers the true vertex correspondence with high probability, provided that the average degree is at least $$d = \varOmega (\log ^2 n)$$ and the two graphs differ by at most $$\delta = O( \log ^{-2}(n) )$$ fraction of edges. For dense graphs and sparse graphs, this can be improved to $$\delta = O( \log ^{-2/3}(n) )$$ and $$\delta = O( \log ^{-2}(d) )$$ respectively, both in polynomial time. The methodology is based on appropriately chosen distance statistics of the degree profiles (empirical distribution of the degrees of neighbors). Before this work, the best known result achieves $$\delta =O(1)$$ and $$n^{o(1)} \le d \le n^c$$ for some constant c with an $$n^{O(\log n)}$$-time algorithm and $$\delta ={{\widetilde{O}}}((d/n)^4)$$ and $$d = {\widetilde{\varOmega }}(n^{4/5})$$ with a polynomial-time algorithm.

This is a preview of subscription content, access via your institution.

We’re sorry, something doesn't seem to be working properly.

1. To ensure the Bernoulli parameter in (2) is well-defined, we need to assume $$q(1-s) \le 1-q$$, or equivalently $$s \ge 2-1/q$$. Similarly, to ensure the edge probability in the parent graph $$p=q/s \le 1$$, we need to assume $$s \ge q$$.

2. Throughout the paper, we use standard big O notation, e.g., for any sequences $$\{a_n\}$$ and $$\{b_n\}$$, $$a_n=\varTheta (b_n)$$ (or $$a_n \asymp b_n$$) if $$1/c\le a_n/ b_n \le c$$ holds for all n for some absolute constant $$c>0$$; $$a_n =\varOmega (b_n)$$ and $$b_n = O(a_n)$$ (or $$a_n > rsim b_n$$ and $$b_n \lesssim a_n$$) if $$a_n/b_n \ge c$$. We use big $${\widetilde{O}}$$ notation to hide logarithmic factors.

3. Achievability and converse bounds for more general correlated Erdős-Rényi random graph models are also available in [13, 14].

4. To be precise, all but two elements (namely, $$A_{ik}$$ and $$B_{ki}$$) are independent. This can be easily dealt with by excluding those two from the empirical distribution, which, by the triangle inequality, changes the distance statistic by at most $$\frac{1}{n}$$.

5. Alternatively, outdegrees can be computed via the number of common neighbors by squaring the adjacency matrix using fast matrix multiplication.

## References

1. Aflalo, Y., Bronstein, A., Kimmel, R.: On convex relaxation of graph isomorphism. Proc. Nat. Acad. Sci. 112(10), 2942–2947 (2015)

2. Alon, N., Spencer, J.H.: The probabilistic method, 3rd edn. Wiley, New Jersey (2008)

3. Babai, L., Erdös, P., Selkow, S.M.: Random graph isomorphism. SIAM J. Comput. 9(3), 628–635 (1980)

4. Barak, B., Chou, C.N., Lei, Z., Schramm, T., Sheng, Y.: (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. arXiv preprint arXiv:1805.02349 (2018)

5. del Barrio, E., Giné, E., Matrán, C.: Central limit theorems for the Wasserstein distance between the empirical and the true distributions. Ann. Prob. 27, 1009–1071 (1999)

6. Berend, D., Kontorovich, A.: A sharp estimate of the binomial mean absolute deviation with applications. Stat. Prob. Lett. 83(4), 1254–1259 (2013)

7. Bollobás, B.: Distinguishing vertices of random graphs. In: North-Holland Mathematics Studies vol. 62, pp. 33–49 (1982)

8. Bollobás, B.: Cambridge studies in advanced mathematics. In: Random Graphs (2nd Edition). Cambridge university press, New York (2001)

9. Bordenave, C., Lelarge, M., Massoulié, L.: Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1347–1357 (2015). ArXiv arXiv:1501.06087

10. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

11. Burkard, R.E., Cela, E., Pardalos, P.M., Pitsoulis, L.S.: The quadratic assignment problem. In: Handbook of Combinatorial Optimization, pp. 1713–1809. Springer, Berlin (1998)

12. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18(03), 265–298 (2004)

13. Cullina, D., Kiyavash, N.: Improved achievability and converse bounds for Erdös-Rényi graph matching. In: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pp. 63–72. ACM (2016)

14. Cullina, D., Kiyavash, N.: Exact alignment recovery for correlated Erdös-Rényi graphs. arXiv preprint arXiv:1711.06783 (2017)

15. Cullina, D., Kiyavash, N., Mittal, P., Poor, H.V.: Partial recovery of Erdős-Rényi graph alignment via $$k$$-core alignment. arXiv preprint arXiv:1809.03553 (2018)

16. Czajka, T., Pandurangan, G.: Improved random graph isomorphism. J. Discrete Algorithms 6(1), 85–92 (2008)

17. Dai, O.E., Cullina, D., Kiyavash, N., Grossglauser, M.: On the performance of a canonical labeling for matching correlated Erdös-Rényi graphs. arXiv preprint arXiv:1804.09758 (2018)

18. David, H., Nagaraja, H.: Order Statistics, 3rd edn. Wiley, New Jersey (2003)

19. Dym, N., Maron, H., Lipman, Y.: DS++: a flexible, scalable and provably tight relaxation for matching problems. ACM Trans. Graphics (TOG) 36(6), 184 (2017)

20. Feizi, S., Quon, G., Recamonde-Mendoza, M., Medard, M., Kellis, M., Jadbabaie, A.: Spectral alignment of graphs. arXiv preprint arXiv:1602.04181 (2016)

21. Fiori, M., Sapiro, G.: On spectral properties for graph matching and graph isomorphism problems. Inf. Inference J. IMA 4(1), 63–76 (2015)

22. Fishkind, D.E., Adali, S., Patsolic, G.H., Meng, L., Singh, D., Lyzinski, V., Priebe, C.E.: Seeded graph matching. Pattern Recogn. 87, 203–215 (2019)

23. Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Can. J. Math. 8(3), 399–404 (1956)

24. Haghighi, A.D., Ng, A.Y., Manning, C.D.: Robust textual inference via graph matching. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 387–394. Association for Computational Linguistics (2005)

25. Hopcroft, J.E., Karp, R.M.: An $$n^{5/2}$$ algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)

26. Kaas, R., Buhrman, J.M.: Mean, median and mode in binomial distributions. Stat. Neerl. 34(1), 13–18 (1980)

27. Kazemi, E., Hassani, H., Grossglauser, M., Modarres, H.P.: Proper: global protein interaction network alignment through percolation matching. BMC Bioinform. 17(1), 527 (2016)

28. Kazemi, E., Hassani, S.H., Grossglauser, M.: Growing a graph matching from a handful of seeds. Proc. VLDB Endow. 8(10), 1010–1021 (2015)

29. Kezurer, I., Kovalsky, S.Z., Basri, R., Lipman, Y.: Tight relaxation of quadratic matching. In: Computer Graphics Forum, vol. 34, pp. 115–128. Wiley Online Library (2015)

30. Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. Proc. VLDB Endow. 7(5), 377–388 (2014)

31. Li, W.V., Shao, Q.M.: Gaussian processes: inequalities, small ball probabilities and applications. Handbook of Statistics 19, 533–597 (2001)

32. Livi, L., Rizzi, A.: The graph matching problem. Pattern Anal. Appl. 16(3), 253–283 (2013)

33. Lubars, J., Srikant, R.: Correcting the output of approximate graph matching algorithms. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1745–1753. IEEE (2018)

34. Lyzinski, V., Fishkind, D., Fiori, M., Vogelstein, J., Priebe, C., Sapiro, G.: Graph matching: relax at your own risk. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 60–73 (2016)

35. Lyzinski, V., Fishkind, D.E., Priebe, C.E.: Seeded graph matching for correlated Erdős-Rényi graphs. J. Mach. Learn. Res. 15, 3513 (2013)

36. Makarychev, K., Manokaran, R., Sviridenko, M.: Maximum quadratic assignment problem: Reduction from maximum label cover and lp-based approximation algorithm. In: International Colloquium on Automata, Languages, and Programming pp. 594–604 (2010)

37. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (2005)

38. Mossel, E., Ross, N.: Shotgun assembly of labeled graphs. IEEE Trans. Netw. Sci. Eng. 6(2), 145–157 (2019)

39. Mossel, E., Xu, J.: Seeded graph matching via large neighborhood statistics. To appear in 2019 ACM-SIAM Symposium on Discrete Algorithms (SODA), arXiv preprint arXiv:1807.10262 (2018)

40. Nadarajah, S., Kotz, S.: Exact distribution of the max/min of two Gaussian random variables. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 16(2), 210–212 (2008)

41. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Security and Privacy, 2008. SP 2008. IEEE Symposium on, pp. 111–125. IEEE (2008)

42. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Security and Privacy, 2009 30th IEEE Symposium on, pp. 173–187. IEEE (2009)

43. Okamoto, M.: Some inequalities relating to the partial sum of binomial probabilities. Ann. Inst. Stat. Math. 10(1), 29–35 (1959). https://doi.org/10.1007/BF02883985

44. Onaran, E., Villar, S.: Projected power iteration for network alignment. arXiv preprint arXiv:1707.04929 (2017)

45. Pardalos, P.M., Rendl, F., Wolkowicz, H.: The quadratic assignment problem: a survey and recent developments. In: Proceedings of the DIMACS Workshop on Quadratic Assignment Problems, volume 16 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 1–42. American Mathematical Society (1994)

46. Pedarsani, P., Grossglauser, M.: On the privacy of anonymized networks. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1243 (2011)

47. Petrov, V.V.: Limit Theorems of Probability Theory: Sequences of Independent Random Variables. Oxford Science Publications, Clarendon Press, Oxford, United Kingdom (1995)

48. Slashdot social network (2009). https://snap.stanford.edu/data/soc-Slashdot0902.html

49. Scheinerman, E.R., Ullman, D.H.: Fractional Graph Theory: a Rational Approach to the Theory of Graphs. Dover, Illinois (1997)

50. Schellewald, C., Schnörr, C.: Probabilistic subgraph matching based on convex relaxation. In: EMMCVPR, vol. 5, pp. 171–186. Springer, Berlin (2005)

51. Shirani, F., Garg, S., Erkip, E.: Seeded graph matching: Efficient algorithms and theoretical guarantees. arXiv preprint arXiv:1711.10360 (2017)

52. Shorack, G.R., Wellner, J.A.: Empirical Processes with Applications to Statistics. Wiley, New Jersey (1986)

53. Singh, R., Xu, J., Berger, B.: Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc. Nat. Acad. Sci. 105(35), 12763–12768 (2008)

54. Wright, E.M.: Graphs on unlabelled nodes with a given number of edges. Acta Mathematica 126(1), 1–9 (1971)

55. Yartseva, L., Grossglauser, M.: On the performance of percolation graph matching. In: Proceedings of the First ACM Conference on Online Social Networks, pp. 119–130. ACM (2013)

56. Zhao, Q., Karisch, S.E., Rendl, F., Wolkowicz, H.: Semidefinite programming relaxations for the quadratic assignment problem. J. Comb. Opt. 2(1), 71–109 (1998)

57. Zubkov, A.M., Serov, A.A.: A complete proof of universal inequalities for the distribution function of the binomial law. Theory Prob. Its Appl. 57(3), 539–544 (2013)

## Author information

Authors

### Corresponding author

Correspondence to Yihong Wu.

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

J. Ding is supported in part by the NSF Grant DMS-1757479 and an Alfred Sloan fellowship. Z. Ma is supported in part by an NSF CAREER award DMS-1352060 and an Alfred Sloan fellowship. Y. Wu is supported in part by the NSF Grant CCF-1527105, an NSF CAREER award CCF-1651588, and an Alfred Sloan fellowship. J. Xu is supported by the NSF Grants CCF-1850743, CCF-1856424, and IIS-1932630.

J. Ding and Y. Wu would like to thank the Centre de Recherches Mathématiques at the Université de Montréal, where some of the work was carried out during the Workshop on Combinatorial Statistics. Y. Wu is also grateful to David Pollard for helpful discussions on small ball probability. J. Xu would like to thank Nadav Dym and Shahar Kovalsky for pointing out the connections between fractional isomorphism and iterated degree sequences. The authors are grateful to the anonymous referees for helpful comments and corrections.

## Appendices

### Appendix A Auxiliary results

Recall the following tail bound for binomial random variable $$X\sim \mathrm{Binom}(n,p)$$ [37, Theorems 4.4, 4.5]

\begin{aligned} {\mathbb {P}}\left\{ X \ge (1+t) np \right\}&\le e^{-\frac{t^2}{3} np}, \quad 0 \le t \le 1 \nonumber \\ {\mathbb {P}}\left\{ X \le (1-t) np \right\}&\le e^{-\frac{t^2}{2} np}, \quad 0 \le t \le 1 \end{aligned}
(165)

and

\begin{aligned} {\mathbb {P}}\left\{ X \ge R \right\} \le 2^{-R}, \quad R \ge 6np. \end{aligned}
(166)

### Theorem 5

( ) Let $$X \sim \mathrm {Bin}(n,p)$$. It holds that

\begin{aligned} {\mathbb {P}}\left\{ X \le n t \right\}&\le \exp \left( - n \left( \sqrt{p} - \sqrt{t} \right) ^2\right) , \quad \forall 0 \le t \le p \end{aligned}
(167)
\begin{aligned} {\mathbb {P}}\left\{ X \ge n t \right\}&\le \exp \left( - 2n \left( \sqrt{t} - \sqrt{p} \right) ^2\right) , \quad \forall p \le t \le 1. \end{aligned}
(168)

### Appendix B Analysis for seeded graph matching

In this section we analyze Algorithm 3 for seeded graph matching. Note that when Algorithm 3 is used as a subroutine in Algorithm 2, the seed set S is obtained from Algorithm 1 based on matching degree profiles, which can potentially depend on the edges between the non-seeded vertices. To deal with this dependency, the following lemma gives a sufficient condition for the seeded graph matching subroutine (Algorithm 3) to succeed, even if the seed set is chosen adversarially:

### Lemma 18

(Seeded graph matching) Assume $$n\ge 4$$, $$s \ge 30 q$$, and

\begin{aligned} n (qs)^2 \ge 2^{11} \times 3 \log ^2 n. \end{aligned}
(169)

If the number of seeds satisfies $$m \ge \frac{96 \log n}{q s}$$, then with probability $$1 - 5n^{-1}$$, the following holds: for any $$\pi _0:S \rightarrow T$$ that coincides with true permutation $$\pi ^*$$ on the seed set S, (i.e. $$\pi _0 = \pi ^*|_S$$) with $$|S|=m$$, Algorithm 3 with $$\pi _0$$ as the seed set and threshold $$\kappa =\frac{1}{2} mqs$$ outputs $${{\widehat{\pi }}} = \pi$$.

We start by analyzing the first stage of Algorithm 3, which upgrades a partial (but correct) permutation $$\pi _0: S \rightarrow T$$ to a full permutation $$\pi _1:[n] \rightarrow [n]$$ with at most $$O(\log n/q)$$ errors, even if the seed set S is adversarially chosen.

### Lemma 19

Assume $$n\ge 2$$, $$m q s \ge 96 \log n$$, and $$s \ge 12q$$. Recall the threshold $$\kappa =\frac{1}{2} mqs$$ in Algorithm 3. Then with probability at least $$1- 2n^{ -m }$$, the following holds in Algorithm 3: for any partial permutation $$\pi _0: S \rightarrow T$$ such that $$\pi _0 = \pi ^*|_S$$ and $$|S|=m$$, $$\pi _1$$ is guaranteed to have at most $$\frac{192\log n}{qs}$$ errors with respect to $$\pi ^*$$, i.e., $$|\{i\in [n]: \pi _1(i) \ne \pi ^*(i) \}| \le \frac{192\log n}{qs}$$.

### Proof (Proof of Lemma 19)

Without loss of generality, we assume $$\pi ^*$$ is the identity permutation.

Fix a seed set S of cardinality m. Since $$\pi _0 = \pi ^*|_S$$, it follows that

\begin{aligned} n_{ik} = \sum _{j \in S} A_{ij} B_{k \pi _0(j)} =\sum _{j \in S} A_{ij} B_{k \pi ^*(j)}. \end{aligned}

Recall that according to the definition of the weights in (35), we have

\begin{aligned} w(\pi ^*) = \sum _{i \in S^c} {{\mathbf {1}}_{\left\{ {n_{ii} \ge \kappa }\right\} }}. \end{aligned}

First, we show that

\begin{aligned} {\mathbb {P}}\left\{ w(\pi ^*) \le n -m- \frac{ 32 \log n}{qs} \right\} \le \exp \left( - 2 m \log n \right) , \end{aligned}
(170)

Indeed, for $$i \in S^c$$ we have $$n_{i i} {{\mathop {\sim }\limits ^{\text {i.i.d.}}}}\mathrm{Binom}(m, qs)$$. It follows from the Chernoff bound (165) that

\begin{aligned} {\mathbb {P}}\left\{ n_{ii} \le \kappa \right\} = {\mathbb {P}}\left\{ n_{ii} \le \frac{1}{2} mqs \right\} \le \exp \left( - \frac{1}{8} m q s \right) . \end{aligned}

Therefore,

\begin{aligned} (n-m)-w(\pi ^*) = \sum _{i \in S^c} {{\mathbf {1}}_{\left\{ {n_{ii} < \kappa }\right\} }} \overset{s.t.}{\le } \mathrm{Binom}\left( n-m, \exp \left( - \frac{1}{8} m q s \right) \right) . \end{aligned}

Using the following fact (which follows from a simple union bound)

\begin{aligned} {\mathbb {P}}\left\{ \mathrm{Binom}\left( n, p \right) \ge t \right\} \le \left( {\begin{array}{c}n\\ t\end{array}}\right) p^t, \end{aligned}
(171)

we get that

\begin{aligned} {\mathbb {P}}\left\{ (n-m)-w(\pi ^*) \ge t \right\}\le & {} \left( {\begin{array}{c}n-m\\ t\end{array}}\right) \exp \left( - \frac{t}{8} m q s \right) \le n^t \exp \left( - \frac{t}{8} m q s \right) \\\le & {} \exp \left( - \frac{t}{16} m q s \right) , \end{aligned}

where the last inequality holds due to the assumption that $$mqs \ge 16 \log n$$. Setting $$t=\frac{ 32 \log n}{qs}$$, we arrive at the desired (170).

Next, fix any permutation $$\pi$$ such that $$\pi |_S = \pi _0$$ and it has $$\ell$$ non-fixed points. Since by assumption $$\pi _0=\pi ^* |_S$$ and $$\pi ^*$$ is the identity permutation, it follows that $$\pi (i)=i$$ for all $$i \in S$$. Let $$F = \{i \in S^c: \pi (i) = i\}$$ denote the set of fixed points in $$S^c$$. Then $$|F|=n-m-\ell$$ and $$|S^c\backslash F|=\ell$$. Thus

\begin{aligned} w(\pi ) = \sum _{i \in F} {{\mathbf {1}}_{\left\{ {n_{ii} \ge \kappa }\right\} }} + \sum _{ i \in S^c \backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \le n-m-\ell + \sum _{ i \in S^c\backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }}. \end{aligned}

Note that for each $$i \in S^c \backslash F$$, $$n_{i\pi (i)} \sim \mathrm{Binom}(m, q^2)$$. Since by assumption $$s \ge 12q$$, it follows that $$\kappa = mqs/2 \ge 6 m q^2$$. Hence, the Chernoff bound (166) yields that for each $$i \in S^c \backslash F$$,

\begin{aligned} {\mathbb {P}}\left\{ n_{i \pi (i)} \ge \kappa \right\} \le 2^{- m q s/2 } \le \exp \left( - \frac{1}{4} m q s \right) . \end{aligned}

Note that $$\{n_{i\pi (i)}: i \in S^c \backslash F\}$$ are not mutually independent. For instance, $$n_{i \pi (i)}$$ and $$n_{\pi (i), \pi (\pi (i))}$$ are dependent. To deal with this dependency issue, we construct a subset $${{\mathcal {I}}}\subset S^c \backslash F$$ with $$|{{\mathcal {I}}}| \ge \ell /3$$ such that $$\{n_{i\pi (i)}: i \in {{\mathcal {I}}}\}$$ are mutually independent. In particular, consider the canonical cycle decomposition of permutation $$\pi |_{S^c \backslash F}$$. Let $${{\mathcal {C}}}_1, \ldots , {{\mathcal {C}}}_{a}$$ denote the cycles. Since $$\pi$$ has no fixed point in $$S^c \backslash F$$, each cycle $${{\mathcal {C}}}_i$$ has length $$\ell _i \ge 2$$. Let $$\varGamma$$ denote the graph formed by the union of these cycles. Each cycle $$C_i$$ has an independent set $${{\mathcal {I}}}_i$$ of size $$\lfloor \ell _i /2 \rfloor \ge \ell _i/3$$. Let $${{\mathcal {I}}}= \cup _{i=1}^a {{\mathcal {I}}}_i$$. Then $${{\mathcal {I}}}$$ is an independent set in $$\varGamma$$ and $$|{{\mathcal {I}}}| \ge \sum _{i=1}^a \ell _i/3=\ell /3$$. Since $${{\mathcal {I}}}$$ is an independent set, it follows that $$\{i, \pi (i)\} \cap \{j, \pi (j)\} =\emptyset$$ for all $$i \ne j \in {{\mathcal {I}}}$$. Therefore, $$\{n_{i\pi (i)}: i \in {{\mathcal {I}}}\}$$ are mutually independent. Therefore,

\begin{aligned} \sum _{ i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \overset{s.t.}{\le } \mathrm{Binom}\left( |{{\mathcal {I}}}| , \exp \left( - \frac{1}{4} m q s \right) \right) . \end{aligned}

Note that

\begin{aligned} w(\pi ) \le n-m-\ell + \sum _{ i \in S^c\backslash F} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \le n-m-|{{\mathcal {I}}}| + \sum _{i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \end{aligned}

Using (171) again, we have

\begin{aligned}&{\mathbb {P}}\left\{ w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\} \\&\quad \le {\mathbb {P}}\left\{ \sum _{ i \in {{\mathcal {I}}}} {{\mathbf {1}}_{\left\{ {n_{i\pi (i)} \ge \kappa }\right\} }} \ge |{{\mathcal {I}}}| -\frac{ 32 \log n}{qs} \right\} \\&\quad \le \left( {\begin{array}{c}|{{\mathcal {I}}}| \\ |{{\mathcal {I}}}| - \frac{ 32 \log n}{qs}\end{array}}\right) \exp \left( - \frac{1}{4} m q s \left( |{{\mathcal {I}}}| - \frac{ 32 \log n}{qs}\right) \right) \\&\quad \le 2^{ \ell } \exp \left( - \frac{1}{4} m q s \left( \frac{\ell }{3} - \frac{ 32 \log n}{qs}\right) \right) \le 2^{\ell } \exp \left( - \frac{1}{24} m q s \ell \right) , \end{aligned}

where the last inequality holds provided $$\ell qs\ge 192 \log n$$. Let $$\varPi _\ell$$ denote the set of permutations $$\pi$$ which has $$\ell$$ non-fixed points and satisfies $$\pi |_S = \pi _0$$. Then $$|\varPi _\ell | \le \left( {\begin{array}{c}n-m\\ \ell \end{array}}\right) \ell ! \le n^\ell$$. By the union bound, we have that for any $$\ell \ge \frac{ 192 \log n}{qs}$$,

\begin{aligned} {\mathbb {P}}\left\{ \max _{\pi \in \varPi _\ell } w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\} \le (2n)^{\ell } \exp \left( - \frac{1}{24} m q s \ell \right) \le \exp \left( - \frac{1}{48} m q s \ell \right) , \end{aligned}

where the last inequality holds due to the assumption that $$mqs \ge 96 \log n$$ and $$n \ge 2$$. Applying the union bound again over $$\ell$$, we get that

\begin{aligned} {\mathbb {P}}\left\{ \max _{\ell \ge \frac{ 192 \log n}{qs}} \max _{\pi \in \varPi _\ell } w(\pi ) \ge n -m- \frac{ 32 \log n}{qs} \right\}&\le \sum _{\ell \ge \frac{ 192 \log n}{qs}} \exp \left( - \frac{1}{48} m q s \ell \right) \\&\le \frac{ \exp \left( - 4 m \log n \right) }{ 1 - \exp \left( - 4 m \log n \right) } \\&\le \exp \left( - 2 m \log n \right) , \end{aligned}

where the last inequality holds due to $$m \log n \ge \log 2$$.

Combining the last displayed equation with (170) we get that with probability at least $$1- 2 n^{-2m}$$, $$\pi _1$$ has at most $$192\log n/(qs)$$ errors with respect to $$\pi ^*$$.

Finally, applying a simple union bound over all the $$\left( {\begin{array}{c}n\\ m\end{array}}\right) \le n^m$$ possible choices of seed set S with $$|S|=m$$, we complete the proof. $$\square$$

The second stage of Algorithm 3 upgrades an almost exact full permutation $$\pi _1:[n] \rightarrow [n]$$ to an exact full permutation $${\widehat{\pi }}: [n] \rightarrow [n]$$. The following lemma provides a worst-case guarantee even if $$\pi _1$$ is adversarially chosen.

### Lemma 20

Let $$0 \le \ell \le n$$. Assume that $$(\ell -1) qs \ge 12 nq^2 +2$$ and $$(\ell -1) q s \ge 16 \max \{ 1, n-\ell \} \log n$$. Then with probability at least $$1-3n^{-1}$$, the following holds for Algorithm 3: for any $$\pi _1$$ with at most $$n-\ell$$ errors with respect to the true permutation $$\pi ^*$$, we have $${\widehat{\pi }}=\pi ^*$$.

### Proof

Without loss of generality, we assume $$\pi ^*$$ is the identity permutation.

We first fix a permutation $$\pi _1$$ which has at least $$\ell$$ fixed points. Let $$F \subset [n]$$ denote the set of fixed points of $$\pi _1$$. Then $$|F| \ge \ell$$. Recall that

\begin{aligned} w_{ik} = \sum _{j \in [n]} A_{ij} B_{k \pi _1(j)}. \end{aligned}

Then for $$i=k$$,

\begin{aligned} w_{ii} \ge \sum _{j \in F \setminus \{i\} } A_{ij} B_{i j} \overset{s.t.}{\ge } \mathrm{Binom}( |F| -1, qs ). \end{aligned}

Similarly, for $$i \ne k$$, note that $$A_{ij} B_{k \pi _1(j)} =0$$ if $$j=i$$ or $$j=\pi _1^{-1}(k)$$. Thus, $$w_{ik} = \sum _{j \in [n]\backslash \{i,\pi _1^{-1}(k)\} }A_{ij} B_{k \pi _1(j)}.$$ Moreover, $$A_{ij} B_{k \pi _1(j)} {{\mathop {\sim }\limits ^{\text {i.i.d.}}}}\mathrm{Bern}(q^2)$$ for all $$j \in [n]\backslash \{i,\pi _1^{-1}(k) , k \}$$. Therefore,

\begin{aligned} w_{ik} \le \sum _{j \in [n]\backslash \{i,\pi _1^{-1}(k), k \} } A_{ij} B_{k \pi _1(j)} +1 \overset{s.t.}{\le } \mathrm{Binom}( n-2 , q^2 ) + 1. \end{aligned}

It follows from the Chernoff bound (165) that

\begin{aligned} {\mathbb {P}}\left\{ w_{ii} \le \frac{1}{2} (\ell -1) qs \right\}\le & {} {\mathbb {P}}\left\{ \mathrm{Binom}\left( |F| -1 , qs \right) \le \frac{1}{2} (\ell -1) qs \right\} \\\le & {} \exp \left( - \frac{1}{8} (\ell -1) q s \right) . \end{aligned}

Thus, by the union bound,

\begin{aligned} {\mathbb {P}}\left\{ \min _{i \in [n] } w_{ii} \le \frac{1}{2} (\ell -1) qs \right\} \le n \exp \left( - \frac{1}{8} (\ell -1) q s \right) \le \exp \left( - \frac{1}{16} (\ell -1) q s \right) , \end{aligned}

where the last inequality holds due to the assumption that $$(\ell -1) q s \ge 16 \log n$$. Moreover, since by assumption $$(\ell -1) qs /2 -1 \ge 6 n q^2$$, it follows that the Chernoff bound (166) that for any $$i \ne k$$,

\begin{aligned} {\mathbb {P}}\left\{ w_{ik} \ge \frac{1}{2} (\ell -1) qs \right\}\le & {} {\mathbb {P}}\left\{ \mathrm{Binom}(n-2, q^2) \ge \frac{1}{2} (\ell -1) qs -1 \right\} \\\le & {} 2^{ - (\ell -1) qs /2 +1 } \le 2\exp \left( -\frac{1}{4} (\ell -1) qs \right) . \end{aligned}

Thus, by the union bound again,

\begin{aligned} {\mathbb {P}}\left\{ \max _{i \ne k } w_{ik} \ge \frac{1}{2} (\ell -1) qs \right\} \le 2n^2 \exp \left( - \frac{1}{4} (\ell -1) q s \right) \le 2 \exp \left( - \frac{1}{8} (\ell -1) q s \right) . \end{aligned}

In conclusion, for a fixed permutation $$\pi _1$$ with at least $$\ell$$ fixed points, with probability at least $$1-3\exp \left( - \frac{1}{8} (\ell -1) q s \right)$$,

\begin{aligned} \min _{i \in [n] } w_{ii} > \max _{i \ne k } w_{ik}, \end{aligned}

and hence $${\widehat{\pi }} = \pi ^*$$.

Finally, applying a simple union bound over all the $$\left( {\begin{array}{c}n\\ n-\ell \end{array}}\right) (n-\ell )! \le n^{n-\ell }$$ possible choices of permutation $$\pi _1$$ with at least $$\ell$$ fixed points, we get that even if $$\pi _1$$ is adversarially chosen, $${\widehat{\pi }} = \pi ^*$$ with probability at least

\begin{aligned} 1- 3 n^{n-\ell } \exp \left( - \frac{1}{8} (\ell -1) q s \right) \ge 1- 3 \exp \left( - \frac{1}{16} (\ell -1) q s \right) \ge 1-3n^{-1}, \end{aligned}

where the first inequality holds due to $$(\ell -1) qs \ge 16(n-\ell ) \log n$$ and the last inequality holds due to $$(\ell -1) qs \ge 16 \log n$$. $$\square$$

We now prove Lemma 18:

### Proof (Proof of Lemma 18)

In view of Lemma 19, we get that with probability at least $$1- 2n^{ -m }$$, $$\pi _1$$ is guaranteed to have at most $$192 \log n/(qs)$$ errors with respect to $$\pi ^*$$, even if $$\pi _0$$, or equivalently the seed set S, is adversarially chosen.

We next apply Lemma 20 with $$\ell = n- 192 \log n/(qs)$$. In view of the assumption $$n (qs)^2 \ge 2^{11} \times 3 \log ^2 n$$ and $$n \ge 4$$, we have $$(\ell -1) \ge n/2$$. Thus $$(\ell -1) qs \ge n qs /2 \ge 16 \log n$$, and $$(\ell -1) qs \ge nq s /2 \ge 12 nq^2+2$$ in view of $$s \ge 30 q$$ and $$nqs \ge 20$$. Moreover, $$(\ell -1) qs \ge n qs /2 \ge 2^{10} \times 3 \log ^2 n / (qs) = 16(n-\ell ) \log n$$. Therefore, all assumptions of Lemma 20 are satisfied. It follows from Lemma 20 that with probability at least $$1-3n^{-1}$$, $${\widehat{\pi }}=\pi ^*$$, even if $$\pi _1$$ is adversarially chosen.

In conclusion, we get that with probability at least $$1-5n^{-1}$$, Algorithm 3 with $$\pi _0$$ as the seed set outputs $${{\widehat{\pi }}} = \pi$$. $$\square$$

## Rights and permissions

Reprints and Permissions