Abstract
With the progress in the information and communication fields, new opportunities and technologies for statistical analysis, knowledge discovery, data mining, and many other research areas have emerged, together with new challenges for privacy and data protection. Nowadays several personal records are kept in computerized databases. Personal data is collected and kept in census databases, medical databases, employee databases, among others. There has always been an asymmetry between the benefits of computerized databases and the rights of individual data subjects. Some data protection principles can be derived from the legal framework. In this survey, we present some basic cryptographic and non-cryptographic techniques that may be used for enhancing privacy, we focus mainly on anonymization in databases and networks, discuss some differences and interactions among the well-known models of k-anonymity and differential privacy and finally present some challenges to privacy that come from big data analytics.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM (2000)
Anderson, R., Needham, R., Shamir, A.: The steganographic file system. In: Information Hiding, volume 1525 of Lecture Notes in Computer Science, pp. 73–82. Springer, Berlin (1998)
Backstrom, L., Dwork, C., Kleinberg, J.: Where Art Thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of 16th International World Wide Web Conference (2007)
Blakley, George R.: Safeguarding cryptographic keys. Proc. Natl. Comput. Conf. 48, 313–317 (1979)
Blocki, J., Blum, A., Datta, A., Sheffet, O.: Differentially private data analysis of social networks via restricted sensitivity. In: ITCS (2013)
Boneh, D., di Crescenzo, G., Ostrovsky, R., Persiano, G.: Public key encryption with keyword search. In: Advances in Cryptology EUROCRYPT 04, volume 3027 of Lecture Notes in Computer Science, pp. 506–522. Springer, Berlin (2004)
Boyd, D.: Networked privacy. Surveillance & Society, [S.l.], vol. 10, No. 3/4, pp. 348–350, Dec. 2012. ISSN 1477-7487. http://ojs.library.queensu.ca/index.php/surveillance-and-society/article/view/networked. Accessed 30 May 2017
Brand, R.: Microdata protection through noise addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 97–116. Springer, Berlin (2002)
Brickell, E.F., Yacobi, Y.: On privacy homomorphisms (extended abstract). In: Chaum, D., Price, W.L. (eds.) EUROCRYPT, volume 304 of Lecture Notes in Computer Science, pp. 117–125. Springer, Berlin (1987)
Campan, A., Truta, T.M.: A clustering approach for data and structural anonymity in social networks. In: Proceedings of the 2nd ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD’08), in Conjunction with KDD’08, Las Vegas, Nevada, USA (2008)
Cao, J., Carminati, B., Ferrari, E., Tan, K.: Castle: continuously anonymizing data streams. IEEE Trans. Dependable Secur. Comput. 8(3), 337352 (2011)
Chaum, D.L.: Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. ACM 24(2), 84–90 (1981)
Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? k-Anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)
Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. J. ACM 45(6), 965–981 (1998)
Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: Microdata protection, secure data management in decentralized systems, Volume 33 of the series Advances in Information Security, pp. 291–321 (2007)
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Trans. Data Priv. 6(2), 161–183 (2013)
Cormode, G., Procopiuc, C.M., Shen, E., Srivastava, D., Yu, T.: Empirical privacy and empirical utility of anonymized data. In: ICDE Workshop on Privacy-Preserving Data Publication and Analysis (2013)
Curtmola, R., Garay, J.A., Kamara, S., Ostrovsky, R.: Searchable symmetric encryption: Improved definitions and efficient constructions. In: Juels, A., Wright, R.N., De Capitani di Vimercati, S. (eds.), Conference on Computer and Communications Security (CCS 06). ACM (2006)
D’Acquisto, G., Domingo-Ferrer, J., Kikiras, P., Torra, V., de Montjoye, Y.-A., Bourka, A.: Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics (2015). CoRR arXiv:1512.06000
Dalenius, T.: Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429444 (1977)
Danezis, G., Domingo-Ferrer, J., Hansen, M., Hoepman, J.-H., Le Mtayer, D., Tirtea, R., Schiffner, S.: Privacy and data protection by designfrom policy to engineering. Technical report, ENISA (2015)
Domingo-Ferrer, J., Solanas, A., Castellà-Roca, J.: h(k)-private information retrieval from privacy-uncooperative queryable databases. Online Inf. Rev. 33(4), 720–744 (2009)
Domingo-Ferrer, J., Soria-Comas, J.: From t-closeness to differential privacy and vice versa in data anonymization. Knowl. Based Syst. 74, 151–158 (2015)
Domingo-Ferrer, J., Torra, V.: Disclosure risk assessment in statistical data protection. J. Comput. Appl. Math. 164(1), 285–293 (2004)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Domingo-Ferrer, J., Torra, V.: Disclosure protection methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality. Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. North-Holland, Amsterdam (2001)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S. Rabin, T. (eds.), TCC, volume 3876 of Lecture Notes in Computer Science, pp. 265–284. Springer, Berlin (2006)
Dwork, C., Naor, M.: On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. J. Priv. Confid. 2(1), 93–107 (2010)
Duncan, G.T., Keller-McNulty, S.A., Stokes, S.L.: Disclosure risk vs. data utility: The R-U condentiality map. Technical report, Los Alamos National Laboratory. LA-UR-01-6428 (2001)
Duncan, G.T., Pearson, R.W.: Enhancing access to microdata while protecting confidentiality: prospects for the future. Stat. Sci. 6, 219–239 (1991)
El-Gamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31(4), 469–472 (1985)
Gehrke, J., Lui, E., Pass, R.: Towards privacy for social networks: a zero-knowledge based denition of privacy. In: Proceedings of the 8th Conference on Theory of Cryptography. TCC11, pp. 432–449 (2011)
Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Mitzenmacher, M. (ed.), STOC, pp. 169–178. ACM (2009)
Giannotti, F., Pedreschi, D., Pentland, S., Lukowicz, P., Kossmann, D., Crowley, J., Helbing, D.A.: A planetary nervous system for social mining and collective awareness. Eur. Phys. J. Spec. Top. 214, 49 (2012)
Goldschlag, D., Reed, M., Syverson, P.: Onion routing. Commun. ACM 42(2), 39–41 (1999)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: ICDM (2009)
Hay, M., Miklau, G., Jensen, D., Towsley, D.: Resisting structural identification in anonymized social networks. In: Proceedings of the 34th International Conference on Very Large Databases (VLDB’08). ACM (2008)
Hedbom, H.: A survey on transparency tools for enhancing privacy. In: Matyás̆, V., Fischer-Hbner, S., Cvrc̆ek, D., S̆venda, P. (eds.), The Future of Identity in the Information Society Proceedings of 4th IFIP WG 9.2, 9.6/11.6, 11.7/FIDIS International Summer School, volume 298 of IFIP Advances in Information and Communication Technology, pp. 67–82. IFIP, Springer, Berlin (2009)
Hoepman, J.-H.: Privacy design strategies (extended abstract). In: ICT Systems Security and Privacy Protection—29th IFIP TC 11 International Conference, SEC 2014, Marrakech, Morocco, June 2–4, 2014. Proceedings, pp. 446-459 (2014)
Howe, D.C., Nissenbaum, H.: TrackMeNot: resisting surveillance in web search. In: Kerr, I., Lucock, C., Steeves, V. (eds.) Lessons From the Identity Trail: Privacy, Anonymity and Identity in a Networked Society, chapter 23. Oxford University Press, Oxford (2009)
Juárez, M., Torra, V.: Toward a privacy agent for information retrieval. Int. J. Intell. Syst. 28, 606–622 (2013)
Kasiviswanathan, S.P., Nissim, K., Raskhodnikova, S., Smith, A.: Analyzing graphs with node differential privacy. In: Theory of Cryptography: 10th Theory of Cryptography Conference, TCC 2013, Tokyo, Japan, March 3–6 (2013)
Kosinski, M., Stillwell, D., Graepel, D.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110(15), 5802–5805 (2013)
Li, N., Li, T., Venkatasubramanian, S.: \(t\)-Closeness: Privacy beyond \(k\)-anonymity and \(\ell \)-diversity. In: Chirkova, R., Dogac, A., Tamerzsu, M., Sellis, T.K. (eds.), ICDE, p. 10115. IEEE (2007)
Li, N., Qardaji, W.H., Su, D.: Provably private data anonymization: or, \(k\)-anonymity meets differential privacy (2011). CoRR arXiv:1101.2604
Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy: Or, \(k\)-anonymization meets differential privacy. In: 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS2012), Seoul, Korea, May 2–4 (2012)
Lindell, Y., Pinkas, B.: Privacy-preserving data mining. In: Advances in Cryptology-CRYPTO 2000, volume 1880 of Lecture Notes in Computer Science, pp. 36–54. Springer, Berlin (2000)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–106 (2008)
Meeker, M., Wu, L.: Internet Trends (2013)
McSherry, F., Talwar, K.: Mechasim design via differential privacy. In: Proceedings of the 48th Annual Symposium of Foundations of Computer Science (2007)
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.), EUROCRYPT, volume 1592 of Lecture Notes in Computer Science, pp. 223–238. Springer, Berlin (1999)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)
Nissenbaum, H.: Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford Law Books, Palo Alto (2009)
Rabin, M.O.: How to exchange secrets with oblivious transfer. Technical Report. TR-81, Aiken Computation Lab, Harvard University (1981)
Rivest, R.L., Adleman, L.M., Dertouzos, M.L.: On data banks and privacy homomorphisms. In: De Millo, R.A., et al. (eds.) Foundations of Secure Computation, p. 169179. Academic Press, New York (1978)
Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Salas, J., Torra, V.: Graphic sequences, distances and k-degree anonymity. Disc. Appl. Math. 188, 25–31 (2015)
Salas, J., Torra, V.: Improving the characterization of P-stability for applications in network privacy. Disc. Appl. Math. 206, 109–114 (2016)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Technical Report, SRI International (1998)
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Soria-Comas, J., Domingo-Ferrer, J., Snchez, D., Martnez, S.: Enhancing data utility in differential privacy via microaggregation-based k-anonymity. Int. J. Very Large Data Bases (VLDB) 23(5), 771–794 (2014)
Soria-Comas, J., Domingo-Ferrer, J.: Big data privacy: challenges to privacy principles and models. Data Sci. Eng. 1(1), 1–8 (2015)
Soria-Comas, J., Domingo-Ferrer, J.: Co-utile Collaborative Anonymization of Microdata. In: 12th International Conference, MDAI 2015, Skövde, pp. 192–206 (2015)
Stokes, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput. 16(10), 1657–1670 (2012)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Shamir, A.: How to share a secret. Commun. ACM 22(11), 612613 (1979)
van den Hoven, J., Helbing, D., Pedreschi, D., Domingo-Ferrer, J., Gianotti, F., Christen, M.: FuturICT the road towards ethical ICT. EPJ Spec. Top. 214, 153–181 (2012)
Verykios, V.S., Gkoulalas-Divanis, A.: A survey of association rule hiding methods for privacy. In: Privacy-Preserving Data Mining: Models and Algorithms, pp. 267–289. Springer, Berlin (2008)
Yao, A.C.: Protocols for secure computations (extended abstract). In: FOCS, pp. 160–164. IEEE Computer Society (1982)
Zheleva, E., Getoor, L.: Preserving the Privacy of Sensitive Relationships in Graph Data. In: ACM SIGKDD Workshop on Privacy, Security, and Trust in KDD (PinKDD), pp. 153–171 (2007)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE (2008)
Zhou, B., Pei, J., Luk, W.S.: A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newslett. 10(2), 12–22 (2008)
Personal Data: The Emergence of a New Asset Class. World EconomicForum (2011). http://www3.weforum.org/docs/WEF_ITTC_PersonalDataNewAsset_Report_2011.pdf
U.S. Dep’t. of Health, Education and Welfare, Secretary’s Advisory Committee on Automated Personal Data Systems, Records, computers, and the Rights of Citizens viii (1973)
Author information
Authors and Affiliations
Corresponding author
Additional information
Julián Salas: With the support of a UOC postdoctoral fellowship.
Rights and permissions
About this article
Cite this article
Salas, J., Domingo-Ferrer, J. Some Basics on Privacy Techniques, Anonymization and their Big Data Challenges. Math.Comput.Sci. 12, 263–274 (2018). https://doi.org/10.1007/s11786-018-0344-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786-018-0344-6
Keywords
- Big data privacy
- Privacy enhancing technologies
- Privacy by design statistical disclosure
- Control k-anonymity
- Differential privacy