Database Privacy

Domingo-Ferrer, Josep; Sánchez, David; Hajian, Sara

doi:10.1007/978-3-319-08470-1_2

Josep Domingo-Ferrer⁴,
David Sánchez⁴ &
Sara Hajian⁴

Part of the book series: Computer Communications and Networks ((CCN))

2155 Accesses
2 Citations
3 Altmetric

Abstract

Open data is a growing demand by data analysts, companies, and the general public. Yet, when databases to be publicly released contain information on individual respondents (e.g., responses to polls, census information, healthcare records, etc.), they must be released in a way that preserves the privacy of these respondents: it should be de facto impossible to relate the published data to specific individuals. To achieve this goal, the Statistical Disclosure Control (SDC) discipline has proposed a plethora of privacy protection methods, known under a variety of names such as SDC methods, anonymization methods, or sanitization methods. This chapter provides an overview of the issues in database privacy, a survey of the best-known SDC methods, a discussion on the related data privacy/utility trade-offs, and a description of privacy models proposed by the computer science community in recent years. Some relevant freeware packages are also identified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Hardcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adam NR, Wortmann JC (1989) Security-control for statistical databases: a comparative study. ACM Comput Surv 21(4):515–556
Article Google Scholar
Agrawal R, Srikant R (2000) Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD International conference on management of data, SIGMOD’00ACM, New York, USA, pp 439–450
Google Scholar
Aggarwal CC, Yu PS (eds) (2008) Privacy-preserving data mining: models and algorithms, vol 34 of Advances in database systems. Springer, Heidelberg (2008)
Google Scholar
Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigraphy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Proceedings of the 10th International conference on database theory, ICDT 2005, pp 246–258
Google Scholar
ARX—Powerful data anonymization (2014). http://arx.deidentifier.org
Batet M, Erola A, Sánchez D, Castellá-Roca J (2013) Utility preserving query log anonymization via semantic microaggregation. Inf Sci 242:110–123
Article Google Scholar
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. Proceedings of the 21st International conference on data engineering ICDE’05. IEEE Computer Society, Washington, DC, USA, pp 217–228
Google Scholar
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th Annual symposium on the theory of computing-STOC 2008, pp 609–618
Google Scholar
Chen B-C, Kifer D, LeFevre K, Machanavajjhala A (2009) Privacy-preserving data publishing. Found Trends Databases 2(1–2):1–167
Article Google Scholar
Chen R, Mohammed N, Fung BCM, Desai BC, Xiong L (2011) Publishing set-valued data via differential privacy. In: 37th International conference on very large data bases-VLDB 2011/Proceedings of the VLDB endowment, vol 4, issue no 11, 1087–1098
Google Scholar
Chin FY, Ozsoyoglu G (1982) Auditing and inference control in statistical databases. IEEE Trans Softw Eng SE-8:574–582
Google Scholar
Dalenius T (1974) The invasion of privacy problem and statistics production. An overview. Statistik Tidskrift 12:213–225
Google Scholar
Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on design and analysis of longitudinal surveys, Ottawa, Canada, pp 195–204
Google Scholar
Denning DE, Denning PJ, Schwartz MD (1979) The tracker: a threat to statistical database security. ACM Trans Database Syst 4(1):76–96
Article Google Scholar
Dobra A, Fienberg SE, Trottini M (2003) Assessing the risk of disclosure of confidential categorical data. In: Bernardo J et al (eds) Bayesian statistics 7, Proceedings of the Seventh Valencia International meeting on Bayesian statistics. Oxford University Press, Oxford, pp 125–139
Google Scholar
Domingo-Ferrer J (2007) A three-dimensional conceptual framework for database privacy. In: Secure data management-4th VLDB workshop SDM’2007, vol 4721. Lecture notes in computer science, pp 193–202
Google Scholar
Domingo-Ferrer J (2008) A critique of k-anonymity and some of its enhancements. In: Proceedings of ARES/PSAI 2008. IEEE Computer Society, pp 990–993
Google Scholar
Domingo-Ferrer J, Martnez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. VLDB J 15:355–369
Google Scholar
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Google Scholar
Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. Confidentiality. Disclosure and data access: theory and practical applications for statistical agencies, North-Holland, Amsterdam, pp 111–134
Google Scholar
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogenous k-anonymity through microaggregation. Data Min Knowl Disc 11(2):195–212
Article MathSciNet Google Scholar
Domingo-Ferrer J, Sebé F, Solanas A (2008) A polynomial-time approximation to optimal multivariate microaggregation. Comput Math Appl 55(4):714–732
Article MATH MathSciNet Google Scholar
Domingo-Ferrer J, Sánchez D, Rufian-Torrell G (2013) Anonymization of nominal data based on semantic marginality. Inf Sci 242:35–48
Article Google Scholar
Duncan GT, Mukherjee S (2000) Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. J Am Stat Assoc 45:720–729
Article Google Scholar
Dwork C, Naor M, Reingold O, Rothblum GN, Vadhan S (2009) On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual symposium on the theory of computing-STOC 2009, pp 381–390
Google Scholar
Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001) Disclosure limitation methods and information loss for tabular data. In: Confidentiality, disclosure and data access: theory and practical applications for statistical agencies. North-Holland, Amsterdam, pp 135–166
Google Scholar
Dwork C (2006) Differential privacy. In: Proceedings of 33rd International colloquium on automata, languages and programming, ICALP 2006. Springer, pp 1–12
Google Scholar
Fung BCM, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. Proceedings of the 21st International conference on data engineering, ICDE’05. IEEE Computer Society, Washington, DC, USA, pp 205–216
Google Scholar
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4)
Google Scholar
Gopal R, Garfinkel R, Goes P (2002) Confidentiality via camouflage: the CVC approach to disclosure limitation when answering queries to databases. Oper Res 50:501–516
Article MATH MathSciNet Google Scholar
Gouweleeuw JM, Kooiman P, Willenborg LCRJ, DeWolf P-P (1997) Post randomisation for statistical disclosure control: theory and implementation. Research paper no. 9731. Statistics Netherlands, Voorburg
Google Scholar
Greenberg B (1987) Rank swapping for ordinal data. U. S. Bureau of the Census, Washington, DC (unpublished manuscript)
Google Scholar
Guarino N (1998) Formal ontology in information systems, In: Proceedings of the 1st International conference on formal ontology in information systems, Trento, Italy, pp 3–15
Google Scholar
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459
Article Google Scholar
Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: Proceedings of the IEEE 12th International conference on data mining workshops, pp 360–369. IEEE Computer Society
Google Scholar
Hajian S, Domingo-Ferrer J, Farràs O (to appear) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Mining Knowl Discov
Google Scholar
Hansen SL, Mukherjee S (2003) Polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044
Article Google Scholar
Hardt M, Ligett K, McSherry F (2010) A simple and practical algorithm for differentially private data release. Preprint arXiv:1012.4763v1
Karr AF, Kohnen CN, Oganian A, Reiter JP, Sanil AP (2006) A framework for evaluating the utility of data altered to protect confidentiality. Am Stat 60(3)
Google Scholar
Hundepool A, Van de Wetering A, Ramaswamy R, Franconi L, Polettini S, Capobianchi A, DeWolf P-P, Domingo-Ferrer J, Torra V, Brand R, Giessing S (2008) μ-ARGUS version 4.2 Software and user’s manual. statistics Netherlands, Voorburg NL. http://neon.vb.cbs.nl/casc/mu.htm. Accessed 22 Dec 2008
Hundepool A, Van de Wetering A, Ramaswamy R, de Wolf P-P, Giessing S, Fischetti M, Salazar J-J, Castro J, Lowthian P (2011) τ-ARGUS v. 3.5 Software and user’s manual. CENEX SDC Project Deliverable. http://neon.vb.cbs.nl/casc/tau.htm
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte-Nordholt E, Spicer K, De Wolf PP (2012) Statistical disclosure control. Wiley, New York
Google Scholar
Kim JJ (1986) A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the section on survey research methods. American Statistical Association, Alexandria VA, pp 303–308
Google Scholar
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911
Article Google Scholar
Laszlo M, Mukherjee S (2009) Approximation bounds for minimum information loss microaggregation. IEEE Trans Knowl Data Eng 21(11):1643–1647
Article Google Scholar
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: Efficient full-domain k-anonymity. Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD’05ACM, New York, USA, pp 49–60
Google Scholar
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International conference on data engineering, ICDE’06. IEEE Computer Society, Washington, DC, USA, p 25
Google Scholar
Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the IEEE International conference on data engineering, ICDE 2007, pp 106–115
Google Scholar
Machanavajjhala A, Gehrke J, Kifer, D, Venkitasubramaniam M (2006) l-Diversity: privacy beyond k-anonymity. In: Proceedings of the IEEE International conference on data engineering, ICDE 2006, p 24
Google Scholar
Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: Proceedings of the IEEE international conference on data engineering, ICDE 2008, pp 277–286
Google Scholar
Martínez S, Sánchez D, Valls A (2013) A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J Biomed Inf 46(2):294–303
Article Google Scholar
Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Disc 11(2):181–193
Article Google Scholar
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. Proceedings of the 23th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems PODS’04. ACM, New York, USA, pp 223–228
Google Scholar
Muralidhar D, Sarathy R (2006) Data shuffling—a new masking approach for numerical data. Manage Sci 52(5):658–670
Article Google Scholar
Muralidhar K, Batra D, Kirs PJ (1995) Accessibility, security and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach. Manage Sci 41:1549–1564
Article MATH Google Scholar
Reiter JP (2002) Satisfying disclosure restrictions with synthetic data sets. J Off Stat 18(4):531–544
Google Scholar
Rubin DB (1993) Discussion of statistical disclosure limitation. J Off Stat 9(2):461–468
Google Scholar
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Article Google Scholar
Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International
Google Scholar
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS’98, p 188. ACM, New York, USA (1998)
Google Scholar
Sánchez D, Batet M, Isern D, Valls A (2012) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39(9):7718–7728
Article Google Scholar
Schlörer J (1975) Identification and retrieval of personal records from a statistical data bank. Methods Inf Med 14(1):7–13
Google Scholar
Schlörer J (1980) Disclosure from statistical databases: quantitative aspects of trackers. ACM Trans Database Syst 5:467–492
Article MATH Google Scholar
sdcMicro: statistical disclosure control methods for anonymization of microdata and risk estimation, v. 4.2.0. http://cran.r-project.org/web/packages/sdcMicro/index.html. Accessed 10 Jan 2014
sdcTable: Methods for statistical disclosure control in tabular data, v. 0.10.3. http://cran.r-project.org/package=sdcTable. Accessed 4 Nov 2014
Sebé F, Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases. Lecture notes in computer science, vol 2316. Springer, Berlin, pp 163–171
Google Scholar
Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martnez S (to appear) Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J
Google Scholar
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Article MATH MathSciNet Google Scholar
Templ M (2008) Statistical disclosure control for microdata using the R-package sdcMicro. Trans Data Priv 1(2):67–85
MathSciNet Google Scholar
Torra V (2004) Microaggregation for categorical variables: a median based approach. In: Privacy in statistical databases-PSD 2004, LNCS, vol 3050. Springer, Heidelberg, pp 162–174
Google Scholar
Traub JF, Yemini Y, Wozniakowski H (1984) The statistical security of a statistical database. ACM Trans Database Syst 9:672–679
Article Google Scholar
Willenborg L, DeWaal T (2001) Elements of statistical disclosure control. Springer, New York
Book MATH Google Scholar
Winkler WE (1998) Re-identification methods for evaluating the confidentiality of analytically valid microdata. Res Off Stat 1(2):50–69
Google Scholar
Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the ACM SIGKDD International conference on knowledge discovery and data mining, KDD 2016, pp 754–759
Google Scholar
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International conference on very large data bases-VLDB 2006, pp 139–150
Google Scholar
Xiao Y, Xiong L, Yuan C (2010) Differentially private data release through multidimensional partitioning. In: Proceedings of the 7th VLDB conference on secure data management, SDM’10, pp 150–168
Google Scholar
Xu J, Zhang Z, Xiao X, Yang Y, Yu G (2012) Differentially private histogram publication. In: Proceedings of the IEEE International conference on data engineering, ICDE 2012, pp 32–43
Google Scholar

Download references

Acknowledgments and Disclaimer

This work was partly supported by the Government of Catalonia under grant 2014 SGR 537, by the Spanish Government through projects TIN2011-27076-C03-01 “CO-PRIVACY” and TIN2014-57364-C2-R “SmartGlacis”, and by the European Commission under H2020 project “CLARUS”. J. Domingo-Ferrer is partially supported as an ICREA Acadèmia researcher by the Government of Catalonia. The authors are with the UNESCO Chair in Data Privacy, but they are solely responsible for the views expressed in this chapter, which do not necessarily reflect the position of UNESCO nor commit that organization.

Author information

Authors and Affiliations

Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, UNESCO Chair in Data Privacy, Av. Països Catalans 26, 43007, Tarragona, Catalonia, Spain
Josep Domingo-Ferrer, David Sánchez & Sara Hajian

Authors

Josep Domingo-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
David Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Sara Hajian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josep Domingo-Ferrer .

Editor information

Editors and Affiliations

University of Kentucky, Lexington, Kentucky, USA
Sherali Zeadally
Zayed University, Dubai, United Arab Emirates
Mohamad Badra

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Domingo-Ferrer, J., Sánchez, D., Hajian, S. (2015). Database Privacy. In: Zeadally, S., Badra, M. (eds) Privacy in a Digital, Networked World. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-08470-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-08470-1_2
Published: 14 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08469-5
Online ISBN: 978-3-319-08470-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics