Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

International Conference on Principles of Security and Trust

POST 2012: Principles of Security and Trust pp 229–248Cite as

  1. Home
  2. Principles of Security and Trust
  3. Conference paper
Provable De-anonymization of Large Datasets with Sparse Dimensions

Provable De-anonymization of Large Datasets with Sparse Dimensions

  • Anupam Datta18,
  • Divya Sharma18 &
  • Arunesh Sinha18 
  • Conference paper
  • 1383 Accesses

  • 18 Citations

Part of the Lecture Notes in Computer Science book series (LNSC,volume 7215)

Abstract

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm that was used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of the database and the auxiliary information available to the adversary that enable two classes of privacy attacks. In the first attack, the adversary successfully identifies the individual about whom she possesses auxiliary information (an isolation attack). In the second attack, the adversary learns additional information about the individual, although she may not be able to uniquely identify him (an information amplification attack). We demonstrate the applicability of the analytical results by empirically verifying that the mathematical properties assumed of the database are actually true for a significant fraction of the records in the Netflix movie ratings database, which contains ratings from about 500,000 users.

Keywords

  • Privacy
  • database
  • de-anonymization

Download conference paper PDF

References

  1. PACER- Public Access to Court Electronic Records, http://www.pacer.gov (last accessed December 16, 2011)

  2. Barbaro, M., Zeller, T.: A Face Is Exposed for AOL Searcher No. 4417749. New York Times (August 09, 2006), http://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=all

  3. Boreale, M., Pampaloni, F., Paolini, M.: Quantitative Information Flow, with a View. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 588–606. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  4. Dalenius, T.: Towards a methodology for statistical disclosure control. Statistics Tidskrift 15, 429–444 (1977)

    Google Scholar 

  5. Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  6. Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008), http://dl.acm.org/citation.cfm?id=1791834.1791836

    CrossRef  Google Scholar 

  7. Frankowski, D., Cosley, D., Sen, S., Terveen, L., Riedl, J.: You are What You Say: Privacy Risks of Public Mentions. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 565–572. ACM, New York (2006), http://doi.acm.org/10.1145/1148170.1148267

    CrossRef  Google Scholar 

  8. Hafner, K.: And if You Liked the Movie, a Netflix Contest May Reward You Handsomely. New York Times (October 02, 2006), http://www.nytimes.com/2006/10/02/technology/02netflix.html

  9. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 106–115 (April 2007)

    Google Scholar 

  10. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1 (March 2007), http://doi.acm.org/10.1145/1217299.1217302

  11. Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy, pp. 111–125. IEEE Computer Society, Washington, DC (2008), http://dl.acm.org/citation.cfm?id=1397759.1398064

    Google Scholar 

  12. Narayanan, A., Shmatikov, V.: Myths and fallacies of personally identifiable information. Communications of the ACM 53, 24–26 (2010)

    CrossRef  Google Scholar 

  13. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. on Knowl. and Data Eng. 13, 1010–1027 (2001), http://dl.acm.org/citation.cfm?id=627337.628183

    CrossRef  Google Scholar 

  14. Schwarz, H.A.: ber ein Flchen kleinsten Flcheninhalts betreffendes Problem der Variationsrechnung. Acta Societatis Scientiarum Fennicae XV, 318 (1888)

    Google Scholar 

  15. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty, Fuzziness and Knowledge-Based System 10, 571–588 (2002), http://dl.acm.org/citation.cfm?id=774544.774553

    CrossRef  MathSciNet  MATH  Google Scholar 

  16. Sweeney, L.: k-anonymity: a Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002), http://dl.acm.org/citation.cfm?id=774544.774552

    CrossRef  MathSciNet  MATH  Google Scholar 

  17. Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 689–700. ACM, New York (2007), http://doi.acm.org/10.1145/1247480.1247556

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Carnegie Mellon University, USA

    Anupam Datta, Divya Sharma & Arunesh Sinha

Authors
  1. Anupam Datta
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Divya Sharma
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Arunesh Sinha
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Dipartimento di Informatica, Università di Pisa, Largo Bruno Pontecorvo, 3, 56127, Pisa, Italy

    Pierpaolo Degano

  2. Computer Science, Worcester Polytechnic Institute, 100 Institute Road, 01609, Worcester, MA, USA

    Joshua D. Guttman

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Datta, A., Sharma, D., Sinha, A. (2012). Provable De-anonymization of Large Datasets with Sparse Dimensions. In: Degano, P., Guttman, J.D. (eds) Principles of Security and Trust. POST 2012. Lecture Notes in Computer Science, vol 7215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28641-4_13

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-28641-4_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28640-7

  • Online ISBN: 978-3-642-28641-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature