Skip to main content
Log in

Some Basics on Privacy Techniques, Anonymization and their Big Data Challenges

  • Published:
Mathematics in Computer Science Aims and scope Submit manuscript

Abstract

With the progress in the information and communication fields, new opportunities and technologies for statistical analysis, knowledge discovery, data mining, and many other research areas have emerged, together with new challenges for privacy and data protection. Nowadays several personal records are kept in computerized databases. Personal data is collected and kept in census databases, medical databases, employee databases, among others. There has always been an asymmetry between the benefits of computerized databases and the rights of individual data subjects. Some data protection principles can be derived from the legal framework. In this survey, we present some basic cryptographic and non-cryptographic techniques that may be used for enhancing privacy, we focus mainly on anonymization in databases and networks, discuss some differences and interactions among the well-known models of k-anonymity and differential privacy and finally present some challenges to privacy that come from big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM (2000)

  2. Anderson, R., Needham, R., Shamir, A.: The steganographic file system. In: Information Hiding, volume 1525 of Lecture Notes in Computer Science, pp. 73–82. Springer, Berlin (1998)

  3. Backstrom, L., Dwork, C., Kleinberg, J.: Where Art Thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of 16th International World Wide Web Conference (2007)

  4. Blakley, George R.: Safeguarding cryptographic keys. Proc. Natl. Comput. Conf. 48, 313–317 (1979)

    Google Scholar 

  5. Blocki, J., Blum, A., Datta, A., Sheffet, O.: Differentially private data analysis of social networks via restricted sensitivity. In: ITCS (2013)

  6. Boneh, D., di Crescenzo, G., Ostrovsky, R., Persiano, G.: Public key encryption with keyword search. In: Advances in Cryptology EUROCRYPT 04, volume 3027 of Lecture Notes in Computer Science, pp. 506–522. Springer, Berlin (2004)

  7. Boyd, D.: Networked privacy. Surveillance & Society, [S.l.], vol. 10, No. 3/4, pp. 348–350, Dec. 2012. ISSN 1477-7487. http://ojs.library.queensu.ca/index.php/surveillance-and-society/article/view/networked. Accessed 30 May 2017

  8. Brand, R.: Microdata protection through noise addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Berlin (2002)

    Chapter  Google Scholar 

  9. Brickell, E.F., Yacobi, Y.: On privacy homomorphisms (extended abstract). In: Chaum, D., Price, W.L. (eds.) EUROCRYPT, volume 304 of Lecture Notes in Computer Science, pp. 117–125. Springer, Berlin (1987)

    Google Scholar 

  10. Campan, A., Truta, T.M.: A clustering approach for data and structural anonymity in social networks. In: Proceedings of the 2nd ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD’08), in Conjunction with KDD’08, Las Vegas, Nevada, USA (2008)

  11. Cao, J., Carminati, B., Ferrari, E., Tan, K.: Castle: continuously anonymizing data streams. IEEE Trans. Dependable Secur. Comput. 8(3), 337352 (2011)

    Google Scholar 

  12. Chaum, D.L.: Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. ACM 24(2), 84–90 (1981)

    Article  Google Scholar 

  13. Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? k-Anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)

    Article  Google Scholar 

  14. Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. J. ACM 45(6), 965–981 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  15. Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: Microdata protection, secure data management in decentralized systems, Volume 33 of the series Advances in Information Security, pp. 291–321 (2007)

  16. Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)

    MathSciNet  Google Scholar 

  17. Cormode, G., Procopiuc, C.M., Shen, E., Srivastava, D., Yu, T.: Empirical privacy and empirical utility of anonymized data. In: ICDE Workshop on Privacy-Preserving Data Publication and Analysis (2013)

  18. Curtmola, R., Garay, J.A., Kamara, S., Ostrovsky, R.: Searchable symmetric encryption: Improved definitions and efficient constructions. In: Juels, A., Wright, R.N., De Capitani di Vimercati, S. (eds.), Conference on Computer and Communications Security (CCS 06). ACM (2006)

  19. D’Acquisto, G., Domingo-Ferrer, J., Kikiras, P., Torra, V., de Montjoye, Y.-A., Bourka, A.: Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics (2015). CoRR arXiv:1512.06000

  20. Dalenius, T.: Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429444 (1977)

    Google Scholar 

  21. Danezis, G., Domingo-Ferrer, J., Hansen, M., Hoepman, J.-H., Le Mtayer, D., Tirtea, R., Schiffner, S.: Privacy and data protection by designfrom policy to engineering. Technical report, ENISA (2015)

  22. Domingo-Ferrer, J., Solanas, A., Castellà-Roca, J.: h(k)-private information retrieval from privacy-uncooperative queryable databases. Online Inf. Rev. 33(4), 720–744 (2009)

    Article  Google Scholar 

  23. Domingo-Ferrer, J., Soria-Comas, J.: From t-closeness to differential privacy and vice versa in data anonymization. Knowl. Based Syst. 74, 151–158 (2015)

    Article  Google Scholar 

  24. Domingo-Ferrer, J., Torra, V.: Disclosure risk assessment in statistical data protection. J. Comput. Appl. Math. 164(1), 285–293 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  26. Domingo-Ferrer, J., Torra, V.: Disclosure protection methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality. Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. North-Holland, Amsterdam (2001)

    Google Scholar 

  27. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S. Rabin, T. (eds.), TCC, volume 3876 of Lecture Notes in Computer Science, pp. 265–284. Springer, Berlin (2006)

  28. Dwork, C., Naor, M.: On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. J. Priv. Confid. 2(1), 93–107 (2010)

    Google Scholar 

  29. Duncan, G.T., Keller-McNulty, S.A., Stokes, S.L.: Disclosure risk vs. data utility: The R-U condentiality map. Technical report, Los Alamos National Laboratory. LA-UR-01-6428 (2001)

  30. Duncan, G.T., Pearson, R.W.: Enhancing access to microdata while protecting confidentiality: prospects for the future. Stat. Sci. 6, 219–239 (1991)

    Article  Google Scholar 

  31. El-Gamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31(4), 469–472 (1985)

    Article  MathSciNet  Google Scholar 

  32. Gehrke, J., Lui, E., Pass, R.: Towards privacy for social networks: a zero-knowledge based denition of privacy. In: Proceedings of the 8th Conference on Theory of Cryptography. TCC11, pp. 432–449 (2011)

  33. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Mitzenmacher, M. (ed.), STOC, pp. 169–178. ACM (2009)

  34. Giannotti, F., Pedreschi, D., Pentland, S., Lukowicz, P., Kossmann, D., Crowley, J., Helbing, D.A.: A planetary nervous system for social mining and collective awareness. Eur. Phys. J. Spec. Top. 214, 49 (2012)

    Article  Google Scholar 

  35. Goldschlag, D., Reed, M., Syverson, P.: Onion routing. Commun. ACM 42(2), 39–41 (1999)

    Article  Google Scholar 

  36. Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: ICDM (2009)

  37. Hay, M., Miklau, G., Jensen, D., Towsley, D.: Resisting structural identification in anonymized social networks. In: Proceedings of the 34th International Conference on Very Large Databases (VLDB’08). ACM (2008)

  38. Hedbom, H.: A survey on transparency tools for enhancing privacy. In: Matyás̆, V., Fischer-Hbner, S., Cvrc̆ek, D., S̆venda, P. (eds.), The Future of Identity in the Information Society Proceedings of 4th IFIP WG 9.2, 9.6/11.6, 11.7/FIDIS International Summer School, volume 298 of IFIP Advances in Information and Communication Technology, pp. 67–82. IFIP, Springer, Berlin (2009)

  39. Hoepman, J.-H.: Privacy design strategies (extended abstract). In: ICT Systems Security and Privacy Protection—29th IFIP TC 11 International Conference, SEC 2014, Marrakech, Morocco, June 2–4, 2014. Proceedings, pp. 446-459 (2014)

  40. Howe, D.C., Nissenbaum, H.: TrackMeNot: resisting surveillance in web search. In: Kerr, I., Lucock, C., Steeves, V. (eds.) Lessons From the Identity Trail: Privacy, Anonymity and Identity in a Networked Society, chapter 23. Oxford University Press, Oxford (2009)

    Google Scholar 

  41. Juárez, M., Torra, V.: Toward a privacy agent for information retrieval. Int. J. Intell. Syst. 28, 606–622 (2013)

    Article  Google Scholar 

  42. Kasiviswanathan, S.P., Nissim, K., Raskhodnikova, S., Smith, A.: Analyzing graphs with node differential privacy. In: Theory of Cryptography: 10th Theory of Cryptography Conference, TCC 2013, Tokyo, Japan, March 3–6 (2013)

  43. Kosinski, M., Stillwell, D., Graepel, D.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110(15), 5802–5805 (2013)

    Article  Google Scholar 

  44. Li, N., Li, T., Venkatasubramanian, S.: \(t\)-Closeness: Privacy beyond \(k\)-anonymity and \(\ell \)-diversity. In: Chirkova, R., Dogac, A., Tamerzsu, M., Sellis, T.K. (eds.), ICDE, p. 10115. IEEE (2007)

  45. Li, N., Qardaji, W.H., Su, D.: Provably private data anonymization: or, \(k\)-anonymity meets differential privacy (2011). CoRR arXiv:1101.2604

  46. Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy: Or, \(k\)-anonymization meets differential privacy. In: 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS2012), Seoul, Korea, May 2–4 (2012)

  47. Lindell, Y., Pinkas, B.: Privacy-preserving data mining. In: Advances in Cryptology-CRYPTO 2000, volume 1880 of Lecture Notes in Computer Science, pp. 36–54. Springer, Berlin (2000)

  48. Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–106 (2008)

  49. Meeker, M., Wu, L.: Internet Trends (2013)

  50. McSherry, F., Talwar, K.: Mechasim design via differential privacy. In: Proceedings of the 48th Annual Symposium of Foundations of Computer Science (2007)

  51. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.), EUROCRYPT, volume 1592 of Lecture Notes in Computer Science, pp. 223–238. Springer, Berlin (1999)

  52. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)

    Article  Google Scholar 

  53. Nissenbaum, H.: Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford Law Books, Palo Alto (2009)

    Google Scholar 

  54. Rabin, M.O.: How to exchange secrets with oblivious transfer. Technical Report. TR-81, Aiken Computation Lab, Harvard University (1981)

  55. Rivest, R.L., Adleman, L.M., Dertouzos, M.L.: On data banks and privacy homomorphisms. In: De Millo, R.A., et al. (eds.) Foundations of Secure Computation, p. 169179. Academic Press, New York (1978)

    Google Scholar 

  56. Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  57. Salas, J., Torra, V.: Graphic sequences, distances and k-degree anonymity. Disc. Appl. Math. 188, 25–31 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  58. Salas, J., Torra, V.: Improving the characterization of P-stability for applications in network privacy. Disc. Appl. Math. 206, 109–114 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  59. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Technical Report, SRI International (1998)

  60. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  61. Soria-Comas, J., Domingo-Ferrer, J., Snchez, D., Martnez, S.: Enhancing data utility in differential privacy via microaggregation-based k-anonymity. Int. J. Very Large Data Bases (VLDB) 23(5), 771–794 (2014)

    Article  Google Scholar 

  62. Soria-Comas, J., Domingo-Ferrer, J.: Big data privacy: challenges to privacy principles and models. Data Sci. Eng. 1(1), 1–8 (2015)

    Google Scholar 

  63. Soria-Comas, J., Domingo-Ferrer, J.: Co-utile Collaborative Anonymization of Microdata. In: 12th International Conference, MDAI 2015, Skövde, pp. 192–206 (2015)

  64. Stokes, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput. 16(10), 1657–1670 (2012)

    Article  MATH  Google Scholar 

  65. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  66. Shamir, A.: How to share a secret. Commun. ACM 22(11), 612613 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  67. van den Hoven, J., Helbing, D., Pedreschi, D., Domingo-Ferrer, J., Gianotti, F., Christen, M.: FuturICT the road towards ethical ICT. EPJ Spec. Top. 214, 153–181 (2012)

    Article  Google Scholar 

  68. Verykios, V.S., Gkoulalas-Divanis, A.: A survey of association rule hiding methods for privacy. In: Privacy-Preserving Data Mining: Models and Algorithms, pp. 267–289. Springer, Berlin (2008)

  69. Yao, A.C.: Protocols for secure computations (extended abstract). In: FOCS, pp. 160–164. IEEE Computer Society (1982)

  70. Zheleva, E., Getoor, L.: Preserving the Privacy of Sensitive Relationships in Graph Data. In: ACM SIGKDD Workshop on Privacy, Security, and Trust in KDD (PinKDD), pp. 153–171 (2007)

  71. Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE (2008)

  72. Zhou, B., Pei, J., Luk, W.S.: A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newslett. 10(2), 12–22 (2008)

    Article  Google Scholar 

  73. Personal Data: The Emergence of a New Asset Class. World EconomicForum (2011). http://www3.weforum.org/docs/WEF_ITTC_PersonalDataNewAsset_Report_2011.pdf

  74. U.S. Dep’t. of Health, Education and Welfare, Secretary’s Advisory Committee on Automated Personal Data Systems, Records, computers, and the Rights of Citizens viii (1973)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julián Salas.

Additional information

Julián Salas: With the support of a UOC postdoctoral fellowship.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salas, J., Domingo-Ferrer, J. Some Basics on Privacy Techniques, Anonymization and their Big Data Challenges. Math.Comput.Sci. 12, 263–274 (2018). https://doi.org/10.1007/s11786-018-0344-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11786-018-0344-6

Keywords

Mathematics Subject Classification

Navigation