Privacy and efficiency guaranteed social subgraph matching

Huang, Kai; Hu, Haibo; Zhou, Shuigeng; Guan, Jihong; Ye, Qingqing; Zhou, Xiaofang

doi:10.1007/s00778-021-00706-0

Privacy and efficiency guaranteed social subgraph matching

Regular Paper
Published: 11 November 2021

Volume 31, pages 581–602, (2022)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Kai Huang ORCID: orcid.org/0000-0001-9857-654X¹,
Haibo Hu^2,3,
Shuigeng Zhou¹,
Jihong Guan⁴,
Qingqing Ye^2,3 &
…
Xiaofang Zhou⁵

1319 Accesses
12 Citations
Explore all metrics

Abstract

Due to the increasing cost of data storage and computation, more and more graphs (e.g., web graphs, social networks) are outsourced and analyzed in the cloud. However, there is growing concern on the privacy of these outsourced graphs at the hands of untrusted cloud providers. Unfortunately, simple label anonymization cannot protect nodes from being re-identified by adversary who knows the graph structure. To address this issue, existing works adopt the k-automorphism model, which constructs $(k-1)$ symmetric vertices for each vertex. It has two disadvantages. First, it significantly enlarges the graphs, which makes graph mining tasks such as subgraph matching extremely inefficient and sometimes infeasible even in the cloud. Second, it cannot protect the privacy of attributes in each node. In this paper, we propose a new privacy model (k, t)-privacy that combines the k-automorphism model for graph structure with the t-closeness privacy model for node label generalization. Besides a stronger privacy guarantee, the paper also optimizes the matching efficiency by (1) an approximate label generalization algorithm TOGGLE with $(1+\epsilon )$ approximation ratio and (2) a new subgraph matching algorithm PGP on succinct k-automorphic graphs without decomposing the query graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Framework for Multiple Subgraph Pattern Matching Models

Article 22 November 2019

A Privacy-Preserving Framework for Subgraph Pattern Matching in Cloud

A Survey of Privacy Preserving Subgraph Matching Methods

Notes

https://aws.amazon.com/compliance/hipaa-compliance/.
https://www.oracle.com/database/graph/.
One can argue that differential privacy [11] is more stringent than t-closeness as the former is defined regardless of the underlying dataset or a priori knowledge. However, it is infeasible in subgraph matching where exact matchings are desirable.
We will directly use the notation $(v_1,v_2,\ldots v_i,\ldots v_j\ldots v_n)$ to denote the uniform distribution where each value is equally likely. For sorted numerical values $(v_1,v_2,\ldots v_i,\ldots v_j\ldots v_n)$, the ground distance of $v_i$ and $v_j$ is $\frac{|i-j|}{n-1}$ [10].
When n is small, we can enumerate all feasible subsets and relax the constraint $y_{i,j} \in \{0,1\}$ to $y_{i,j} \in [0,1]$. Then, we apply the Simplex method to solve this linear programming problem. Finally, we employ the Branch-and-Bound method to obtain the integer solution [27].

References

Bi, F., Chang, L., Lin, X., Qin, L., Zhang, W.: Efficient subgraph matching by postponing cartesian products. In: SIGMOD, pp. 1199–1214 (2016)
Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014)
Chang, Z., Zou, L., Li, F.: Privacy preserving subgraph matching on large graphs in cloud. In: SIGMOD, pp. 199–213 (2016)
Cao, N., Yang, Z., Wang, C., Ren, K., Lou, W.: Privacy-preserving query over encrypted graph-structured data in cloud computing. In: ICDCS, pp. 393–402 (2011)
Hu, H., Xu, J., Chen, Q. et al.: Authenticating location-based services without compromising location privacy. In: SIGMOD, pp. 301–312 (2012)
Xu, J., Yi, P., Choi, B. et al.: Privacy-preserving reachability query services for massive networks. In: CIKM, pp. 145–154 (2016)
Available at: https://www.oracle.com/a/tech/docs/sg-oow2019-using-graph-analysis-and-fraud-detection-in-fintech-industry.pdf
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
Article MathSciNet Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: ICDE, pp. 24 (2006)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115 (2007)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC, pp. 265–284 (2006)
Yuan, M., Chen, L., Philip, S.Y., Yu, T.: Protecting sensitive labels in social network data anonymization. TKDE 25(3), 633–647 (2013)
Google Scholar
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: SIGMOD, pp. 93–106 (2008)
Tai, C.-H., Tseng, P.-J., Philip, S.Y., Chen, M.-S.: Identity protection in sequential releases of dynamic networks. TKDE 26(3), 635–651 (2014)
Google Scholar
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE, pp. 506–515 (2008)
Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. PVLDB 1(1), 102–114 (2008)
Google Scholar
Zou, L., Chen, L., Özsu, M.T.: K-automorphism: a general framework for privacy preserving network publication. PVLDB 2(1), 946–957 (2009)
Google Scholar
Cheng, J., Fu, A.W.-c., Liu, J.: K-isomorphism: privacy preserving network publication against structural attacks. In: SIGMOD, pp. 459–470 (2010)
Wu, W., Xiao, Y., Wang, W., He, Z., Wang, Z.: K-symmetry model for identity anonymization in social networks. In: EDBT, pp. 111–122 (2010)
Gao, J., et al.: A privacy-preserving framework for subgraph pattern matching in cloud. In: DASFAA, pp. 307–322 (2018)
Barnhart, C., Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W., Vance, P.H.: Branch-and-price: column generation for solving huge integer programs. Oper. Res. 46(3), 316–329 (1998)
Article MathSciNet Google Scholar
Li, X.-Y., Zhang, C., Jung, T., Qian, J., Chen, L.: Graph-based privacy-preserving data publication. In: INFOCOM, pp. 1–9 (2016)
Hajian, S., Domingo-Ferrer, J., Farràs, O.: Generalization-based privacy preservation and discrimination prevention in data publishing and mining. DMKD 28(5–6), 1158–1188 (2014)
MathSciNet MATH Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. IJCV 40(2), 99–121 (2000)
Karypis, G., Kumar, V.: Analysis of multilevel graph partitioning. In: ICS, p. 29 (1995)
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD, pp. 405–418 (2008)
Lawler, E.L., Wood, D.E.: Branch-and-bound methods: a survey. Oper. Res. 14(4), 699–719 (1966)
Article MathSciNet Google Scholar
ILOG, I.: Cplex optimizer. https://www.ibm.com/cn-zh/marketplace/ibm-ilog-cplex (2012)
Du, B., Zhang, S., Cao, N., Tong, H.: First: fast interactive attributed subgraph matching. In: SIGKDD. ACM, pp. 1447–1456 (2017)
Qiao, M., Zhang, H., Cheng, H.: Subgraph matching: on compression and computation. PVLDB 11(2), 176–188 (2017)
Google Scholar
Yang, Z., Fu, A.W.-C., Liu, R.: Diversified top-k subgraph querying in a large graph. In: SIGMOD, pp. 1167–1182 (2016)
Han, W.-S., Lee, J., Lee, J.-H.: Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD, pp. 337–348 (2013)
Zhu, G., Lin, X., Zhu, K., Zhang, W., Yu, J.X.: Treespan: efficiently computing similarity all-matching. In: SIGMOD, pp. 529–540 (2012)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: ICDM, pp. 169–178 (2009)
Karwa, V., Raskhodnikova, S., Smith, A., Yaroslavtsev, G.: Private analysis of graph structure. PVLDB 4(11), 1146–1157 (2011)
MATH Google Scholar
Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X.: Private release of graph statistics using ladder functions. In: SIGMOD, pp. 731–745 (2015)
Ye, Q., Hu, H., Au, M.H., Meng, X., Xiao, X.: LF-GDPR:Graph metric estimation with local differential privacy. In: TKDE (2020). https://doi.org/10.1109/TKDE.2020.3047124
Jiang, H., Pei, J., Yu, D. et al.: Applications of differential privacy in social network analysis: a survey. TKDE (2021)
Ding, X., Sheng, S., Zhou, S. et al.: Differentially Private Triangle Counting in Large Graphs. TKDE (2021)
Chen, S., Zhou, S.: Recursive mechanism: Towards node differential privacy and unrestricted joins. In: SIGMOD, pp. 653–664 (2013)
Kasiviswanathan, S.P., Nissim, K., Raskhodnikova, S., Smith, A.: Analyzing graphs with node differential privacy. In: TCC, pp. 457–476 (2013)
Day, W.Y., Li, N., Lyu, M.: Publishing graph degree distribution with node differential privacy. In: SIGMOD, pp. 123–138 (2016)
Wang, Q., Zhang, Y., Lu, X., et al.: Real-time and spatio-temporal crowd-sourced social network data publishing with differential privacy. TDSC 15(4), 591–606 (2016)
Google Scholar
Jorgensen, Z., Yu, T., Cormode, G.: Publishing attributed social graphs with formal privacy guarantees. In: SIGMOD, pp. 107–122 (2016)
Zheleva, E., Getoor, L.: Preserving the privacy of sensitive relationships in graph data. In: International Workshop on Privacy, Security, and Trust in KDD, pp. 153–171 (2007)
Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: International Workshop on Privacy, Security, and Trust in KDD, pp. 33–54 (2008)
Bhagat, S., Cormode, G., Krishnamurthy, B., Srivastava, D.: Class-based graph anonymization for social network data. PVLDB 2(1), 766–777 (2009)
Google Scholar
Fan, Z., Choi, B., Xu, J., Bhowmick, S.S.: Asymmetric structure-preserving subgraph queries for large graphs. In: ICDE, pp. 339–350 (2015)
Gao, J., Yu, J.X., Jin, R., Zhou, J., Wang, T., Yang, D.: Neighborhood-privacy protected shortest distance computing in cloud. In: SIGMOD, pp. 409–420 (2011)
Xie, D., Li, G., Yao, B., Wei, X., Xiao, X., Gao, Y., Guo, M.: Practical private shortest path computation based on oblivious storage. In: ICDE, pp. 361–372 (2016)
Ma, J., Yao, B., Gao, X., et al.: Top-k critical vertices query on shortest path. TKDE 30(10), 1999–2012 (2018)
Google Scholar
Shen, M., Ma, B., Zhu, L., et al.: Cloud-based approximate constrained shortest distance queries over encrypted graphs with privacy protection. TIFS 13(4), 940–953 (2017)
Google Scholar
Ding, X., Wang, C., Choo, K.K.R., et al.: A novel privacy preserving framework for large scale graph data publishing. TKDE 33(2), 331–343 (2019)
Google Scholar
Jiang, J., Yi, P., Choi, B., et al.: Privacy-preserving reachability query services for massive networks. In: CIKM, pp. 145–154 (2016)
Yang, S., Tang, S., Zhang, X.: Privacy-preserving k nearest neighbor query with authentication on road networks. JPDC 134, 25–36 (2019)
Google Scholar
Liang, H., Yuan, H.: On the complexity of t-closeness anonymization and related problems. In: DASFAA, pp. 331–345 (2013)
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and intractability. Freeman San Francisco, vol. 174 (1979)
Schrenk, S., Finke, G., Cung, V.-D.: Two classical transportation problems revisited: pure constant fixed charges and the paradox. Math. Comput. Model. 54(9–10), 2306–2315 (2011)
Article MathSciNet Google Scholar
Žerovnik, J.: Heuristics for np-hard optimization problems-simpler is better!? Logist. Sustain. Transp. 6(1), 1–10 (2015)
Article Google Scholar
Nayak, K., Wang, X.S., Ioannidis, S., Weinsberg, U., Taft, N., Shi, E.: Graphsc: Parallel secure computation made easy. In: S&P, pp. 377–394 (2015)

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos: 62072390, U1936205, U1636205, 61572413, 62072125) and the Research Grants Council, Hong Kong SAR, China (Grant Nos: 15238116, 15222118, 15218919, 15203120).

Author information

Authors and Affiliations

Shanghai Key Lab of Intelligent Information Processing School of Computer Science, Fudan University, Shanghai, China
Kai Huang & Shuigeng Zhou
Department of Electronic and Information Engineering, Hong Kong Polytechnic University, Kowloon, Hong Kong
Haibo Hu & Qingqing Ye
Hong Kong and PolyU Shenzhen Research Institute, Shenzhen, China
Haibo Hu & Qingqing Ye
Department of Computer Science and Technology, Tongji University, Shanghai, China
Jihong Guan
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Xiaofang Zhou

Authors

Kai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shuigeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jihong Guan
View author publications
You can also search for this author in PubMed Google Scholar
Qingqing Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proofs

In this section, we present the formal proofs of theorems and lemmas.

1.1 A.1 Proof of Theorem 1

Given a data graph G, the graph outsourcing problem with t-closeness and k-automorphism is to compute an outsourced graph $G'$, where t-closeness for its labels is required. Since t-closeness is a known NP-Hard problem [56] and can be reduced to our graph outsourcing problem, the graph outsourcing problem with t-closeness and k-automorphism is NP-Hard. In addition, our subgraph matching problem is NP-Hard since it involves subgraph isomorphism testing, which is a classical NP-Hard problem [32, 57, 58]. Overall, both graph outsourcing problem with (k, t)-privacy and subgraph matching problem on outsourced graphs are NP-Hard.

1.2 A.2 Proof of Lemma 1

Let $l=(l_1,l_2,\ldots ,l_n)$ be ordered labels and $(1/n,1/n,\ldots ,1/n)$ their distribution masses. We define $\alpha $-th Alignment Group (denoted by $g_{\alpha }$) as m consecutive labels in l, i.e., $g_{\alpha }$ $=$ $(l_{(\alpha -1)m+1}$, $l_{(\alpha -1)m+2}$, $\ldots $, $l_{(\alpha -1)m+\beta },\ldots ,l_{\alpha m})$ (Fig. 21). In addition, let feasible column $y_j$ be ordered labels $(e_1,\ldots ,e_\alpha , \ldots ,e_{n/m})$ with evenly distributed mass $(m/n,m/n,\ldots ,$ m/n). Since labels are ordered, according to [10], the minimal workload of $EMD(l,y_j)$ can be achieved by satisfying all elements of l sequentially, i.e., sequentially move distribution masses from $y_j$ to l. In particular, as depicted in Fig. 21, $e_\alpha $ should transport $\frac{1}{n}$ distribution mass to each label in $g_{\alpha }$ $=$ $(l_{(\alpha -1)m+1}$, $l_{(\alpha -1)m+2}, \ldots ,l_{(\alpha -1)m+\beta }$,$\ldots $,$l_{\alpha m})$. In short, each element $e_\alpha $ in $y_j$ should be “aligned” with the $\alpha $-th alignment group (i.e., transport distribution mass to elements in $\alpha $-th alignment group), and $e_\alpha $ should transport $\frac{1}{n}$ distribution mass to each element of $\alpha $-th alignment group.

1.3 A.3 Proof of Lemma 2

According to Lemma 1, each element $e_\alpha $ in $y_j$ is aligned with $\alpha $-th alignment group (i.e., $g_\alpha $). In addition, observe that the subscripts of elements in alignment group $\alpha $ are $(\alpha -1)m+1, (\alpha -1)m+2,\ldots ,(\alpha -1)m+\beta ,\ldots ,(\alpha -1)m+m$, respectively, and the ground distance between $e_\alpha $ and $\beta $-th element in alignment group $\alpha $ is $\frac{|i-(\alpha -1)m-\beta |}{n-1}$ where i is the position of $e_\alpha $ in l. Therefore, we derive that the ground distance between $e_\alpha $ and $\alpha $-th alignment group is

$$\begin{aligned} \frac{1}{n-1}\sum _{\beta =1}^{m}\Big | i-(\alpha -1)m - \beta \Big | \end{aligned}$$

where i is $e_\alpha 's$ position in l. To estimate its domain, three cases should be considered:

(1)
If $i \le (\alpha -1)m+1$,
$$\begin{aligned} \begin{aligned} Dist(e_\alpha ,g_\alpha )&= \frac{1}{n-1} ( (\alpha -1)m^2 + \frac{(1+m)m}{2} - im) \\&= \frac{2n^2\alpha - 2(n^2/m)i - 2n^2 + (n/m+n)n}{2(n-1)(n/m)^2}. \end{aligned} \end{aligned}$$
$Dist(e_\alpha ,g_\alpha ) \in [\frac{n^2 - n^2/m}{2(n-1)(n/m)^2}, \frac{2n^3/m - 2n^3/m^2 - n^2 + n^2/m}{2(n-1)(n/m)^2}]$.
(2)
If $(\alpha -1)m+1 \le i \le (\alpha -1)m+m$,
$$\begin{aligned} \begin{aligned} Dist(e_\alpha ,g_\alpha )&= \frac{1}{n-1}\left( \sum _{\beta _1=1}^{\beta }(\beta -\beta _1)+\sum _{\beta _2=1}^{m-\beta }\beta _2 \right) \\&=\frac{2\beta ^2\alpha - 2(1+m)\beta + m + m^2}{2(n-1)}. \end{aligned} \end{aligned}$$
If m is odd, $Dist(e_\alpha ,g_\alpha )$ $\in $ $[\frac{n^2 - n^2/m^2}{4(n-1)(n/m)^2}$, $\frac{n^2 - n^2/m}{2(n-1)(n/m)^2}]$, otherwise, $[\frac{n^2}{4(n-1)(n/m)^2},$ $\frac{n^2 - n^2/m}{2(n-1)(n/m)^2} ]$.
(3)
If $i \ge (\alpha -1)m+m$,
$$\begin{aligned} \begin{aligned} Dist(e_\alpha ,g_\alpha )&= \frac{1}{n-1}( im - (\alpha -1)m^2 - \frac{(1+m)m}{2}) \\&= \frac{\frac{2n^2}{mi} - 2(\frac{n}{m})^2n - 2n^2\alpha + 2n^2 -(\frac{n}{m}+n)n}{2(n-1)(n/m)^2}, \end{aligned} \end{aligned}$$

$Dist(e_\alpha ,g_\alpha )$ $\in $ $[\frac{n^2 - n^2/m}{2(n-1)(n/m)^2},$ $ \frac{2n^3/m - 2n^3/m^2 - n^2 + n^2/m}{2(n-1)(n/m)^2}]$.

Therefore, for $\forall i \in [(\alpha -1)m+1, (\alpha -1)m+m ]$, $Dist(e_\alpha ,g_\alpha ) \in [\frac{n^2 - \frac{n}{m}^2}{4(n-1)\frac{n}{m}^2}$, $\frac{n^2 - \frac{n^2}{m}}{2(n-1)\frac{n}{m}^2}]$ (if m is odd) or $Dist(e_\alpha ,g_\alpha ) \in [\frac{n^2}{4(n-1)\frac{n}{m}^2}$, $\frac{n^2 - \frac{n^2}{m}}{2(n-1)\frac{n}{m}^2}]$ (if m is even).

1.4 A.4 Proof of Theorem 2

Lemma 2 proved that $\forall i \in [(\alpha -1)m+1, (\alpha -1)m+m ]$, if m is odd, $Dist(e_\alpha ,g_\alpha ) \in [\frac{n^2 - \frac{n}{m}^2}{4(n-1)\frac{n}{m}^2}$, $\frac{n^2 - \frac{n^2}{m}}{2(n-1)\frac{n}{m}^2}]$. Otherwise, $Dist(e_\alpha $, $g_\alpha ) \in [\frac{n^2}{4(n-1)\frac{n}{m}^2}$, $\frac{n^2 - \frac{n^2}{m}}{2(n-1)\frac{n}{m}^2}]$. Each element $e_\alpha $ of $y_j$ generated by initial solution is selected from the i-th position of l, where $i \in [(\alpha -1)m+1,(\alpha -1)m+m ]$. In addition, Lemma 1 showed that each element $e_\alpha $ in $y_j$ is supposed to transport 1/n distribution mass to each element in $\alpha -$th alignment group. Based on those two observations, we derive that $EMD(l,y_j) \le \sum _{\alpha =1}^{n/m} \frac{n^2 - n^2/m}{2(n-1)(n/m)^2}\times \frac{1}{n}$ $= \frac{mn^2-n^2}{2(n-1)n^2} = \frac{m-1}{2(n-1)}$. Therefore, when $t \ge \frac{m-1}{2(n-1)}$, the column $y_j$ satisfies $t-$closeness. Similarly, each column generated in subproblem also satisfies $t-$closeness. By the way, we can adopt the similar way to prove that the EMD between l and any other column is bounded by $\frac{2mn-2n-m^2+m}{2(n-1)m}$.

1.5 A.5 Proof of Lemma 3

Let l=$(l_1,l_2,\ldots ,l_n)$ be n labels ordered by their values, and a the Euler–Mascheroni constant ($\approx $ $ \frac{1}{ln(n)+0.5772+1/2n}$), the frequency of l can be represented by $(\frac{a}{1}, \frac{a}{2},\ldots , \frac{a}{n})$, since the frequencies of labels roughly obey the Zipf’s law [3]. When $t \ge \frac{m-1}{2(n-1)}$, sub-optimal TOGGLE generates the initial partition $\{y_j|j\in [1,m]\}$ where $y_{i,j}=1$ if i locates in $\{j,j+m,\ldots ,j+(n/m-1)m\}$ or $y_{i,j}=0$, otherwise. Let the sum of label frequencies of $y_j$ be

$$\begin{aligned} s_j = \frac{a}{j}+ \frac{a}{j+m}+\ldots + \frac{a}{j+(n/m-1)m}, \end{aligned}$$

the cost of $y_j$ is obviously $s^2_j$. Therefore, the total cost of the initial solution is $s^2_1+s^2_2+\ldots +s^2_m$ where $s_1+s_2+\ldots +s_m =1$ and $s_1>$ $s_2>$ $\ldots $ $>s_m$. Due to

$$\begin{aligned} \begin{aligned} s_1&= \frac{a}{1}+ \frac{a}{1+m}+\ldots + \frac{a}{1+(n/m-1)m} \\&\le \frac{a}{1}+ \frac{a}{m}+\ldots + \frac{a}{(n/m-1)m} \\&\le \frac{1}{m} + \frac{(m^2-m+2)a}{m^2+m}, \end{aligned} \end{aligned}$$

we can derive that $\frac{1}{m} + \frac{(m^2-m+2)a}{m^2+m} \ge s_1>s_2>\ldots >s_m$, and

$$\begin{aligned} \begin{aligned} \sum _{i=1}^{m}s_i^2&= \left( \sum _{i}^{m}{s_i} \right) ^2-s_1\left( \sum _{i \ne 1}^{m}s_i \right) -s_2\left( \sum _{i \ne 2}^{m}s_i \right) -\ldots -s_m\left( \sum _{i \ne m}^{m}s_i \right) \\&\le \frac{1}{m} + \frac{(m^2-m+2)a}{m^2+m}. \end{aligned} \end{aligned}$$

Therefore, the model cost of initial solution under $t-$closeness constraints is at most $ \frac{1}{m} + \frac{(m^2-m+2)a}{m^2+m}$. To estimate the approximate ratio $R_1$ of our model cost to the exact model cost, we first relax the $t-$closeness constraint to find the minimum model cost. Formally, for any $\{X_i\}$ subjecting to $\sum _{i=1}^{m}X_i=1$, we need to estimate the lower bound of $\sum _{i=1}^{m}X_i^2$. According to Cauchy–Schwarz inequality, for $X_i,Y_i \in {\mathcal {R}}$, $\big (\sum _{i=1}^{m}X_iY_i\big )^2 \le \big (\sum _{i=1}^{m}X_i^2\big )\big (\sum _{i=1}^{m}Y_i^2\big )$. Let $Y_i=1$, we derive that $\frac{1}{m} \le \sum _{i=1}^{m}X_i^2$. Therefore, the minimum model cost, $s^2_1+s^2_2+\ldots +s^2_m$, is no less than $\frac{1}{m}$. The approximate ratio $R_1 \le \frac{ 1/m + (m^2-m+2)a/(m^2+m)}{1/m} \le 1 + \frac{(m^2-m+2)a}{m+1}$. The approximation is good since the approximate ratio is approximately liner to $m \cdot a$.

1.6 A.6 Proof of Lemma 4

From Lemma 3, we observe that the sum of labels frequencies of $y_j$ is $s_j = \frac{a}{j}+ \frac{a}{j+m}+\ldots + \frac{a}{j+(n/m-1)m}$ where $s_1+s_2+\ldots +s_m =1$ and $s_1>$ $s_2>$ $\ldots $ $>s_m$. If we denote the first dual solution of the master problem as $\mu = [s^2_1,s^2_2,\ldots ,s^2_m,0,0,\ldots ,0]$, the objective values of the original subproblem and the reduced problem can be formulated as $J_2=min(c(y_j)-\mu y_j),~s.t., ~EMD(l,y_j)\le t$ and $ J_{2}' = min(c(y_j)-\mu y_j)$, s.t., QKP Constraints, respectively. Intuitively, we can derive that

$$ \begin{aligned} \frac{J_2}{J_{2}\prime }&\le \frac{\sum _{i=1}^{n}s^2_iy_{i,j}- (\sum _{i=1}^{n} \frac{ay_{i,j}}{i})^2 }{ s^2_1 - (\frac{a}{1}+\sum _{i=2}^{n/m}\frac{a}{im})^2 } \\&\le \frac{\sum _{i=1}^{n}s^2_iy_{i,j} }{ s^2_1 - (\frac{a}{1}+\sum _{i=2}^{n/m}\frac{a}{im})^2 } \le \frac{\sum _{i=1}^{n}s^2_iy_{i,j} }{ 2s_1 \times (\frac{a}{1}+\sum _{i=2}^{n/m}\frac{a}{im})} \\&\le \frac{\sum _{i=1}^{n}s^2_iy_{i,j} }{ 2s_1 \times s_m } \le \frac{\sum _{i=1}^{n/m}s^2_i }{ 2s_1 \times s_m } = \frac{\sum _{i=1}^{n/m}s_i\times s_1 }{ 2s_1 \times s_m } \\&= \frac{1}{2} \big ( \frac{s_1}{s_1}\frac{s_1}{s_m} + \frac{s_2}{s_1}\frac{s_2}{s_m} +\ldots + \frac{s_{n/m}}{s_1}\frac{s_{n/m}}{s_m}\big )\\&\le \frac{1}{2} \big ( \frac{s_1}{s_1}\frac{s_1}{s_m} + \frac{s_2}{s_1}\frac{s_1}{s_m} +\ldots + \frac{s_{n/m}}{s_1}\frac{s_1}{s_m}\big )\\&\le \frac{1}{2} \big ( \frac{s_1}{s_1} + \frac{s_2}{s_1} +\ldots + \frac{s_{n/m}}{s_1}\big ) \frac{s_1}{s_m} \\&\le \frac{1}{2} \big ( \frac{s_1+s_2+\ldots +s_{n/m}}{s_1}\big ) \frac{s_1}{s_m} \le \frac{1}{2} \frac{1}{s_1}\frac{s_1}{s_m} \le \frac{m}{4a}. \end{aligned} $$

Therefore, the approximate ratio of $ J_{2}'$ to $J_2$ is no less than 4a/m where a is the Euler–Mascheroni constant.

1.7 A.7 Proof of Theorem 3

Let the optimal solution to original problem be opt, and the initial solution $R_1\times $ opt, if the first reduced cost of the column generation method is $J_2$, then $\textsc {opt}= R_1\times \textsc {opt} - \gamma \times J_2$ where $\gamma \ge 1$ and $\gamma = (R_1-1)$ opt$/J_2$. Similarly, if the first reduced cost of the sub-optimal method is $ J_{2}'$, we can derive the objective value $X = R_1\times \textsc {opt} - \gamma ' \times J_{2}'$.

On the basis of those two lemmas, we can prove that if $ \gamma \le \gamma '$, the approximate ratio is $\textsc {opt}/X \ge \textsc {opt}/(R_1\times \textsc {opt} - \gamma \times J_{2}') = \textsc {opt}/(R_1\times \textsc {opt} - ((R_1-1)\textsc {opt}/J_2) \times J_{2}') \ge 1/(1+ (m^3-5m^2+6m-8)a/(m^2+m))$. Otherwise, $\textsc {opt}/X = \textsc {opt}/(R_1\times \textsc {opt} - \gamma ' \times J_{2}')\ge \textsc {opt}/(R_1\times \textsc {opt}) \ge 1/(1 + (m^2-m+2)a/(m+1))$. Therefore, $\textsc {opt} \le X \le (1+ (m^3-5m^2+6m-8)a/(m^2+m))\textsc {opt}$ or $(1 + (m^2-m+2)a/(m+1))\textsc {opt} \approx (1+0.2m)\textsc {opt}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, K., Hu, H., Zhou, S. et al. Privacy and efficiency guaranteed social subgraph matching. The VLDB Journal 31, 581–602 (2022). https://doi.org/10.1007/s00778-021-00706-0

Download citation

Received: 12 November 2020
Revised: 20 July 2021
Accepted: 29 September 2021
Published: 11 November 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00778-021-00706-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy and efficiency guaranteed social subgraph matching

Abstract

Access this article

Similar content being viewed by others

An Efficient Framework for Multiple Subgraph Pattern Matching Models

A Privacy-Preserving Framework for Subgraph Pattern Matching in Cloud

A Survey of Privacy Preserving Subgraph Matching Methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A Proofs

1.1 A.1 Proof of Theorem 1

1.2 A.2 Proof of Lemma 1

1.3 A.3 Proof of Lemma 2

1.4 A.4 Proof of Theorem 2

1.5 A.5 Proof of Lemma 3

1.6 A.6 Proof of Lemma 4

1.7 A.7 Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Privacy and efficiency guaranteed social subgraph matching

Abstract

Access this article

Similar content being viewed by others

An Efficient Framework for Multiple Subgraph Pattern Matching Models

A Privacy-Preserving Framework for Subgraph Pattern Matching in Cloud

A Survey of Privacy Preserving Subgraph Matching Methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A Proofs

Appendix A Proofs

1.1 A.1 Proof of Theorem 1

1.2 A.2 Proof of Lemma 1

1.3 A.3 Proof of Lemma 2

1.4 A.4 Proof of Theorem 2

1.5 A.5 Proof of Lemma 3

1.6 A.6 Proof of Lemma 4

1.7 A.7 Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation