Abstract
Uncertain graphs, a form of uncertain data, have recently attracted a lot of attention as they can represent inherent uncertainty in collected data. The uncertain graphs pose challenges to conventional data processing techniques and open new research directions. Going in the reserve direction, this paper focuses on the problem of anonymizing a deterministic graph by converting it into an uncertain form. The paper first analyzes drawbacks in a recent uncertainty-based anonymization scheme and then proposes Maximum Variance, a novel approach that provides better tradeoff between privacy and utility. Towards a fair comparison between the anonymization schemes on graphs, the second contribution of this paper is to describe a quantifying framework for graph anonymization by assessing privacy and utility scores of typical schemes in a unified space. The extensive experiments show the effectiveness and efficiency of Maximum Variance on three large real graphs.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Backstrom, L., Dwork, C., Kleinberg, J., Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: WWW, pp. 181–190. ACM (2007)
Boldi, P., Bonchi, F., Gionis, A., Tassa, T.: Injecting uncertainty in graphs for identity obfuscation. Proc. VLDB Endow. 5(11), 1376–1387 (2012)
Bonchi, F., Gionis, A., Tassa, T.: Identity obfuscation in graphs through the information theoretic lens. In: ICDE, pp. 924–935. IEEE (2011)
Cheng, J., Fu, A. W.-C., Liu, J.: K-isomorphism: privacy preserving network publication against structural attacks. In: SIGMOD, pp. 459–470. ACM (2010)
Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)
Chester, S., Kapron, B.M., Srivastava, G., Venkatesh, S.: Complexity of social network anonymization. Soc. Netw. Anal. Min. 3(2), 151–166 (2013)
Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS, pp. 1–12. ACM (2007)
Fard, A.M., Wang, K., Yu, P.S.: Limiting link disclosure in social network analysis through subgraph-wise perturbation. In: EDBT, pp. 109–119. ACM (2012)
Gao, H., Hu, J., Huang, T., Wang, J., Chen, Y.: Security issues in online social networks. IEEE Internet Comput. 15(4), 56–63 (2011)
Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 2 (2007)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: SIGMOD, pp. 93–106. ACM (2008)
Mittal, P., Papamanthou, C., Song, D.: Preserving link privacy in social network based systems. In: NDSS (2013)
Palmer, C. R., Gibbons, P. B., Faloutsos, C.: ANF: a fast and scalable tool for data mining in massive graphs. In: KDD, pp. 81–90. ACM (2002)
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3(1–2), 997–1008 (2010)
Sala, A., Cao, L., Wilson, C., Zablit, R., Zheng, H., Zhao, B.Y.: Measurement-calibrated graph models for social network experiments. In: WWW, pp. 861–870. ACM (2010)
Shokri, R., Theodorakopoulos, G., Le Boudec, J.-Y., Hubaux, J.-P.: Quantifying location privacy, In: SP, pp. 247–262. IEEE (2011)
Smith, G.: On the foundations of quantitative information flow. In: de Alfaro, L. (ed.) FOSSACS 2009. LNCS, vol. 5504, pp. 288–302. Springer, Heidelberg (2009)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
Tai, C.-H., Yu, P.S., Yang, D.-N., Chen. M.-S.: Privacy-preserving social network publication against friendship attacks. In: KDD, pp. 1262–1270. ACM (2011)
Vázquez, A.: Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. E 67(5), 056104 (2003)
Wu, W., Xiao, Y., Wang, W., He, Z., Wang, Z.: k-symmetry model for identity anonymization in social networks. In: EDBT, pp. 111–122. ACM (2010)
Ying, X., Wu, X.: Randomizing social networks: a spectrum preserving approach. In: SDM, vol.8, pp. 739–750. SIAM (2008)
Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. Proc. VLDB Endow. 4(11), 876–886 (2011)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE, pp. 506–515. IEEE (2008)
Zou, L., Chen, L., Özsu, M.T.: K-automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. 2(1), 946–957 (2009)
Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proof of Theorems
A Proof of Theorems
1.1 A.1 Proof of Theorem 1
Proof
We prove the result by induction.
When \(k=1\), we have two cases of \(G_1\): \(E_{G_1}=\{e_1\}\) and \(E_{G_1}=\emptyset \). For both cases, \(Var[D(\mathcal {G}_1,G_1)] = p_1(1-p_1)\), i.e. independent of \(G_1\).
Assume that the result is correct up to \(k-1\) edges, i.e. \(Var[D(\mathcal {G}_{k-1},G_{k-1})] = \sum _{i=1}^{k-1} p_i(1-p_i)\) for all \(G_{k-1} \sqsubseteq \mathcal {G}_{k-1}\), we need to prove that it is also correct for \(k\) edges. We use the subscript notations \(\mathcal {G}_k, G_k\) for the case of \(k\) edges. We consider two cases of \(G_k\): \(e_k \in G_k\) and \(e_k \notin G_k\).
Case 1. The formula for \(Var[D(\mathcal {G}_k, G_k)]\) is
The first sum is \(\sum _{G'_{k-1} \sqsubseteq \mathcal {G}_{k-1}} p_k Pr(G'_{k-1})[D_{k-1} - E[D_{k-1}] - (1-p_k)]^2\).
The second sum is \(\sum _{G'_{k-1} \sqsubseteq \mathcal {G}_{k-1}} (1-p_k) Pr(G'_{k-1})[D_{k-1} - E[D_{k-1}] + p_k)]^2\).
Here we use shortened notations \(D_k\) for \(D(G'_k,G_k)\) and \(E[D_k]\) for \(E[D(\mathcal {G}_k,G_k)]\).
By simple algebra, we have \(Var[D(\mathcal {G}_k,G_k)] = Var[D(\mathcal {G}_{k-1},G_{k-1})] + q_k(1-q_k) = \sum _{i=1}^{k} p_i(1-p_i)\).
Case 2. similar to the Case 1. \(\square \)
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, H.H., Imine, A., Rusinowitch, M. (2015). A Maximum Variance Approach for Graph Anonymization. In: Cuppens, F., Garcia-Alfaro, J., Zincir Heywood, N., Fong, P. (eds) Foundations and Practice of Security. FPS 2014. Lecture Notes in Computer Science(), vol 8930. Springer, Cham. https://doi.org/10.1007/978-3-319-17040-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-17040-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17039-8
Online ISBN: 978-3-319-17040-4
eBook Packages: Computer ScienceComputer Science (R0)