Skip to main content

A Maximum Variance Approach for Graph Anonymization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8930))

Abstract

Uncertain graphs, a form of uncertain data, have recently attracted a lot of attention as they can represent inherent uncertainty in collected data. The uncertain graphs pose challenges to conventional data processing techniques and open new research directions. Going in the reserve direction, this paper focuses on the problem of anonymizing a deterministic graph by converting it into an uncertain form. The paper first analyzes drawbacks in a recent uncertainty-based anonymization scheme and then proposes Maximum Variance, a novel approach that provides better tradeoff between privacy and utility. Towards a fair comparison between the anonymization schemes on graphs, the second contribution of this paper is to describe a quantifying framework for graph anonymization by assessing privacy and utility scores of typical schemes in a unified space. The extensive experiments show the effectiveness and efficiency of Maximum Variance on three large real graphs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://glaros.dtc.umn.edu/gkhome/views/metis.

  2. 2.

    http://mosek.com/.

  3. 3.

    http://snap.stanford.edu/data/index.html.

References

  1. Backstrom, L., Dwork, C., Kleinberg, J., Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: WWW, pp. 181–190. ACM (2007)

    Google Scholar 

  2. Boldi, P., Bonchi, F., Gionis, A., Tassa, T.: Injecting uncertainty in graphs for identity obfuscation. Proc. VLDB Endow. 5(11), 1376–1387 (2012)

    Article  Google Scholar 

  3. Bonchi, F., Gionis, A., Tassa, T.: Identity obfuscation in graphs through the information theoretic lens. In: ICDE, pp. 924–935. IEEE (2011)

    Google Scholar 

  4. Cheng, J., Fu, A. W.-C., Liu, J.: K-isomorphism: privacy preserving network publication against structural attacks. In: SIGMOD, pp. 459–470. ACM (2010)

    Google Scholar 

  5. Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)

    Article  Google Scholar 

  6. Chester, S., Kapron, B.M., Srivastava, G., Venkatesh, S.: Complexity of social network anonymization. Soc. Netw. Anal. Min. 3(2), 151–166 (2013)

    Article  Google Scholar 

  7. Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: PODS, pp. 1–12. ACM (2007)

    Google Scholar 

  8. Fard, A.M., Wang, K., Yu, P.S.: Limiting link disclosure in social network analysis through subgraph-wise perturbation. In: EDBT, pp. 109–119. ACM (2012)

    Google Scholar 

  9. Gao, H., Hu, J., Huang, T., Wang, J., Chen, Y.: Security issues in online social networks. IEEE Internet Comput. 15(4), 56–63 (2011)

    Article  Google Scholar 

  10. Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)

    Article  Google Scholar 

  11. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 2 (2007)

    Article  Google Scholar 

  12. Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: SIGMOD, pp. 93–106. ACM (2008)

    Google Scholar 

  13. Mittal, P., Papamanthou, C., Song, D.: Preserving link privacy in social network based systems. In: NDSS (2013)

    Google Scholar 

  14. Palmer, C. R., Gibbons, P. B., Faloutsos, C.: ANF: a fast and scalable tool for data mining in massive graphs. In: KDD, pp. 81–90. ACM (2002)

    Google Scholar 

  15. Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3(1–2), 997–1008 (2010)

    Article  Google Scholar 

  16. Sala, A., Cao, L., Wilson, C., Zablit, R., Zheng, H., Zhao, B.Y.: Measurement-calibrated graph models for social network experiments. In: WWW, pp. 861–870. ACM (2010)

    Google Scholar 

  17. Shokri, R., Theodorakopoulos, G., Le Boudec, J.-Y., Hubaux, J.-P.: Quantifying location privacy, In: SP, pp. 247–262. IEEE (2011)

    Google Scholar 

  18. Smith, G.: On the foundations of quantitative information flow. In: de Alfaro, L. (ed.) FOSSACS 2009. LNCS, vol. 5504, pp. 288–302. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  20. Tai, C.-H., Yu, P.S., Yang, D.-N., Chen. M.-S.: Privacy-preserving social network publication against friendship attacks. In: KDD, pp. 1262–1270. ACM (2011)

    Google Scholar 

  21. Vázquez, A.: Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. E 67(5), 056104 (2003)

    Article  Google Scholar 

  22. Wu, W., Xiao, Y., Wang, W., He, Z., Wang, Z.: k-symmetry model for identity anonymization in social networks. In: EDBT, pp. 111–122. ACM (2010)

    Google Scholar 

  23. Ying, X., Wu, X.: Randomizing social networks: a spectrum preserving approach. In: SDM, vol.8, pp. 739–750. SIAM (2008)

    Google Scholar 

  24. Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient subgraph search over large uncertain graphs. Proc. VLDB Endow. 4(11), 876–886 (2011)

    Google Scholar 

  25. Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE, pp. 506–515. IEEE (2008)

    Google Scholar 

  26. Zou, L., Chen, L., Özsu, M.T.: K-automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. 2(1), 946–957 (2009)

    Article  Google Scholar 

  27. Zou, Z., Li, J., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiep H. Nguyen .

Editor information

Editors and Affiliations

A Proof of Theorems

A Proof of Theorems

1.1 A.1 Proof of Theorem 1

Proof

We prove the result by induction.

When \(k=1\), we have two cases of \(G_1\): \(E_{G_1}=\{e_1\}\) and \(E_{G_1}=\emptyset \). For both cases, \(Var[D(\mathcal {G}_1,G_1)] = p_1(1-p_1)\), i.e. independent of \(G_1\).

Assume that the result is correct up to \(k-1\) edges, i.e. \(Var[D(\mathcal {G}_{k-1},G_{k-1})] = \sum _{i=1}^{k-1} p_i(1-p_i)\) for all \(G_{k-1} \sqsubseteq \mathcal {G}_{k-1}\), we need to prove that it is also correct for \(k\) edges. We use the subscript notations \(\mathcal {G}_k, G_k\) for the case of \(k\) edges. We consider two cases of \(G_k\): \(e_k \in G_k\) and \(e_k \notin G_k\).

Case 1. The formula for \(Var[D(\mathcal {G}_k, G_k)]\) is

$$\begin{aligned} Var[D(\mathcal {G}_k,G_k)] = \sum _{G'_k \sqsubseteq \mathcal {G}_k} Pr(G'_k) [D(G'_k,G_k) - E[D(\mathcal {G}_k,G_k)]]^2 \nonumber \\ = \sum _{e_k \in G'_k} Pr(G'_k) [D(G'_k,G_k) - E[D_k]]^2 + \sum _{e_k \notin G'_k} Pr(G'_k) [D(G'_k,G_k) - E[D_k]]^2 \nonumber \end{aligned}$$

The first sum is \(\sum _{G'_{k-1} \sqsubseteq \mathcal {G}_{k-1}} p_k Pr(G'_{k-1})[D_{k-1} - E[D_{k-1}] - (1-p_k)]^2\).

The second sum is \(\sum _{G'_{k-1} \sqsubseteq \mathcal {G}_{k-1}} (1-p_k) Pr(G'_{k-1})[D_{k-1} - E[D_{k-1}] + p_k)]^2\).

Here we use shortened notations \(D_k\) for \(D(G'_k,G_k)\) and \(E[D_k]\) for \(E[D(\mathcal {G}_k,G_k)]\).

By simple algebra, we have \(Var[D(\mathcal {G}_k,G_k)] = Var[D(\mathcal {G}_{k-1},G_{k-1})] + q_k(1-q_k) = \sum _{i=1}^{k} p_i(1-p_i)\).

Case 2. similar to the Case 1.    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, H.H., Imine, A., Rusinowitch, M. (2015). A Maximum Variance Approach for Graph Anonymization. In: Cuppens, F., Garcia-Alfaro, J., Zincir Heywood, N., Fong, P. (eds) Foundations and Practice of Security. FPS 2014. Lecture Notes in Computer Science(), vol 8930. Springer, Cham. https://doi.org/10.1007/978-3-319-17040-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17040-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17039-8

  • Online ISBN: 978-3-319-17040-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics