Skip to main content

Database Privacy

  • Chapter
  • First Online:
Privacy in a Digital, Networked World

Abstract

Open data is a growing demand by data analysts, companies, and the general public. Yet, when databases to be publicly released contain information on individual respondents (e.g., responses to polls, census information, healthcare records, etc.), they must be released in a way that preserves the privacy of these respondents: it should be de facto impossible to relate the published data to specific individuals. To achieve this goal, the Statistical Disclosure Control (SDC) discipline has proposed a plethora of privacy protection methods, known under a variety of names such as SDC methods, anonymization methods, or sanitization methods. This chapter provides an overview of the issues in database privacy, a survey of the best-known SDC methods, a discussion on the related data privacy/utility trade-offs, and a description of privacy models proposed by the computer science community in recent years. Some relevant freeware packages are also identified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 59.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adam NR, Wortmann JC (1989) Security-control for statistical databases: a comparative study. ACM Comput Surv 21(4):515–556

    Article  Google Scholar 

  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD International conference on management of data, SIGMOD’00ACM, New York, USA, pp 439–450

    Google Scholar 

  3. Aggarwal CC, Yu PS (eds) (2008) Privacy-preserving data mining: models and algorithms, vol 34 of Advances in database systems. Springer, Heidelberg (2008)

    Google Scholar 

  4. Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigraphy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Proceedings of the 10th International conference on database theory, ICDT 2005, pp 246–258

    Google Scholar 

  5. ARX—Powerful data anonymization (2014). http://arx.deidentifier.org

  6. Batet M, Erola A, Sánchez D, Castellá-Roca J (2013) Utility preserving query log anonymization via semantic microaggregation. Inf Sci 242:110–123

    Article  Google Scholar 

  7. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. Proceedings of the 21st International conference on data engineering ICDE’05. IEEE Computer Society, Washington, DC, USA, pp 217–228

    Google Scholar 

  8. Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th Annual symposium on the theory of computing-STOC 2008, pp 609–618

    Google Scholar 

  9. Chen B-C, Kifer D, LeFevre K, Machanavajjhala A (2009) Privacy-preserving data publishing. Found Trends Databases 2(1–2):1–167

    Article  Google Scholar 

  10. Chen R, Mohammed N, Fung BCM, Desai BC, Xiong L (2011) Publishing set-valued data via differential privacy. In: 37th International conference on very large data bases-VLDB 2011/Proceedings of the VLDB endowment, vol 4, issue no 11, 1087–1098

    Google Scholar 

  11. Chin FY, Ozsoyoglu G (1982) Auditing and inference control in statistical databases. IEEE Trans Softw Eng SE-8:574–582

    Google Scholar 

  12. Dalenius T (1974) The invasion of privacy problem and statistics production. An overview. Statistik Tidskrift 12:213–225

    Google Scholar 

  13. Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on design and analysis of longitudinal surveys, Ottawa, Canada, pp 195–204

    Google Scholar 

  14. Denning DE, Denning PJ, Schwartz MD (1979) The tracker: a threat to statistical database security. ACM Trans Database Syst 4(1):76–96

    Article  Google Scholar 

  15. Dobra A, Fienberg SE, Trottini M (2003) Assessing the risk of disclosure of confidential categorical data. In: Bernardo J et al (eds) Bayesian statistics 7, Proceedings of the Seventh Valencia International meeting on Bayesian statistics. Oxford University Press, Oxford, pp 125–139

    Google Scholar 

  16. Domingo-Ferrer J (2007) A three-dimensional conceptual framework for database privacy. In: Secure data management-4th VLDB workshop SDM’2007, vol 4721. Lecture notes in computer science, pp 193–202

    Google Scholar 

  17. Domingo-Ferrer J (2008) A critique of k-anonymity and some of its enhancements. In: Proceedings of ARES/PSAI 2008. IEEE Computer Society, pp 990–993

    Google Scholar 

  18. Domingo-Ferrer J, Martnez-Ballesté A, Mateo-Sanz JM, Sebé F (2006) Efficient multivariate data-oriented microaggregation. VLDB J 15:355–369

    Google Scholar 

  19. Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201

    Google Scholar 

  20. Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. Confidentiality. Disclosure and data access: theory and practical applications for statistical agencies, North-Holland, Amsterdam, pp 111–134

    Google Scholar 

  21. Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogenous k-anonymity through microaggregation. Data Min Knowl Disc 11(2):195–212

    Article  MathSciNet  Google Scholar 

  22. Domingo-Ferrer J, Sebé F, Solanas A (2008) A polynomial-time approximation to optimal multivariate microaggregation. Comput Math Appl 55(4):714–732

    Article  MATH  MathSciNet  Google Scholar 

  23. Domingo-Ferrer J, Sánchez D, Rufian-Torrell G (2013) Anonymization of nominal data based on semantic marginality. Inf Sci 242:35–48

    Article  Google Scholar 

  24. Duncan GT, Mukherjee S (2000) Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. J Am Stat Assoc 45:720–729

    Article  Google Scholar 

  25. Dwork C, Naor M, Reingold O, Rothblum GN, Vadhan S (2009) On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual symposium on the theory of computing-STOC 2009, pp 381–390

    Google Scholar 

  26. Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001) Disclosure limitation methods and information loss for tabular data. In: Confidentiality, disclosure and data access: theory and practical applications for statistical agencies. North-Holland, Amsterdam, pp 135–166

    Google Scholar 

  27. Dwork C (2006) Differential privacy. In: Proceedings of 33rd International colloquium on automata, languages and programming, ICALP 2006. Springer, pp 1–12

    Google Scholar 

  28. Fung BCM, Wang K, Yu PS (2005) Top-down specialization for information and privacy preservation. Proceedings of the 21st International conference on data engineering, ICDE’05. IEEE Computer Society, Washington, DC, USA, pp 205–216

    Google Scholar 

  29. Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4)

    Google Scholar 

  30. Gopal R, Garfinkel R, Goes P (2002) Confidentiality via camouflage: the CVC approach to disclosure limitation when answering queries to databases. Oper Res 50:501–516

    Article  MATH  MathSciNet  Google Scholar 

  31. Gouweleeuw JM, Kooiman P, Willenborg LCRJ, DeWolf P-P (1997) Post randomisation for statistical disclosure control: theory and implementation. Research paper no. 9731. Statistics Netherlands, Voorburg

    Google Scholar 

  32. Greenberg B (1987) Rank swapping for ordinal data. U. S. Bureau of the Census, Washington, DC (unpublished manuscript)

    Google Scholar 

  33. Guarino N (1998) Formal ontology in information systems, In: Proceedings of the 1st International conference on formal ontology in information systems, Trento, Italy, pp 3–15

    Google Scholar 

  34. Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459

    Article  Google Scholar 

  35. Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: Proceedings of the IEEE 12th International conference on data mining workshops, pp 360–369. IEEE Computer Society

    Google Scholar 

  36. Hajian S, Domingo-Ferrer J, Farràs O (to appear) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Mining Knowl Discov

    Google Scholar 

  37. Hansen SL, Mukherjee S (2003) Polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044

    Article  Google Scholar 

  38. Hardt M, Ligett K, McSherry F (2010) A simple and practical algorithm for differentially private data release. Preprint arXiv:1012.4763v1

  39. Karr AF, Kohnen CN, Oganian A, Reiter JP, Sanil AP (2006) A framework for evaluating the utility of data altered to protect confidentiality. Am Stat 60(3)

    Google Scholar 

  40. Hundepool A, Van de Wetering A, Ramaswamy R, Franconi L, Polettini S, Capobianchi A, DeWolf P-P, Domingo-Ferrer J, Torra V, Brand R, Giessing S (2008) μ-ARGUS version 4.2 Software and user’s manual. statistics Netherlands, Voorburg NL. http://neon.vb.cbs.nl/casc/mu.htm. Accessed 22 Dec 2008

  41. Hundepool A, Van de Wetering A, Ramaswamy R, de Wolf P-P, Giessing S, Fischetti M, Salazar J-J, Castro J, Lowthian P (2011) τ-ARGUS v. 3.5 Software and user’s manual. CENEX SDC Project Deliverable. http://neon.vb.cbs.nl/casc/tau.htm

  42. Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte-Nordholt E, Spicer K, De Wolf PP (2012) Statistical disclosure control. Wiley, New York

    Google Scholar 

  43. Kim JJ (1986) A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the section on survey research methods. American Statistical Association, Alexandria VA, pp 303–308

    Google Scholar 

  44. Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911

    Article  Google Scholar 

  45. Laszlo M, Mukherjee S (2009) Approximation bounds for minimum information loss microaggregation. IEEE Trans Knowl Data Eng 21(11):1643–1647

    Article  Google Scholar 

  46. LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: Efficient full-domain k-anonymity. Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD’05ACM, New York, USA, pp 49–60

    Google Scholar 

  47. LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International conference on data engineering, ICDE’06. IEEE Computer Society, Washington, DC, USA, p 25

    Google Scholar 

  48. Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the IEEE International conference on data engineering, ICDE 2007, pp 106–115

    Google Scholar 

  49. Machanavajjhala A, Gehrke J, Kifer, D, Venkitasubramaniam M (2006) l-Diversity: privacy beyond k-anonymity. In: Proceedings of the IEEE International conference on data engineering, ICDE 2006, p 24

    Google Scholar 

  50. Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: Proceedings of the IEEE international conference on data engineering, ICDE 2008, pp 277–286

    Google Scholar 

  51. Martínez S, Sánchez D, Valls A (2013) A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J Biomed Inf 46(2):294–303

    Article  Google Scholar 

  52. Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Disc 11(2):181–193

    Article  Google Scholar 

  53. Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. Proceedings of the 23th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems PODS’04. ACM, New York, USA, pp 223–228

    Google Scholar 

  54. Muralidhar D, Sarathy R (2006) Data shuffling—a new masking approach for numerical data. Manage Sci 52(5):658–670

    Article  Google Scholar 

  55. Muralidhar K, Batra D, Kirs PJ (1995) Accessibility, security and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach. Manage Sci 41:1549–1564

    Article  MATH  Google Scholar 

  56. Reiter JP (2002) Satisfying disclosure restrictions with synthetic data sets. J Off Stat 18(4):531–544

    Google Scholar 

  57. Rubin DB (1993) Discussion of statistical disclosure limitation. J Off Stat 9(2):461–468

    Google Scholar 

  58. Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  59. Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International

    Google Scholar 

  60. Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS’98, p 188. ACM, New York, USA (1998)

    Google Scholar 

  61. Sánchez D, Batet M, Isern D, Valls A (2012) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39(9):7718–7728

    Article  Google Scholar 

  62. Schlörer J (1975) Identification and retrieval of personal records from a statistical data bank. Methods Inf Med 14(1):7–13

    Google Scholar 

  63. Schlörer J (1980) Disclosure from statistical databases: quantitative aspects of trackers. ACM Trans Database Syst 5:467–492

    Article  MATH  Google Scholar 

  64. sdcMicro: statistical disclosure control methods for anonymization of microdata and risk estimation, v. 4.2.0. http://cran.r-project.org/web/packages/sdcMicro/index.html. Accessed 10 Jan 2014

  65. sdcTable: Methods for statistical disclosure control in tabular data, v. 0.10.3. http://cran.r-project.org/package=sdcTable. Accessed 4 Nov 2014

  66. Sebé F, Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases. Lecture notes in computer science, vol 2316. Springer, Berlin, pp 163–171

    Google Scholar 

  67. Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martnez S (to appear) Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J

    Google Scholar 

  68. Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570

    Article  MATH  MathSciNet  Google Scholar 

  69. Templ M (2008) Statistical disclosure control for microdata using the R-package sdcMicro. Trans Data Priv 1(2):67–85

    MathSciNet  Google Scholar 

  70. Torra V (2004) Microaggregation for categorical variables: a median based approach. In: Privacy in statistical databases-PSD 2004, LNCS, vol 3050. Springer, Heidelberg, pp 162–174

    Google Scholar 

  71. Traub JF, Yemini Y, Wozniakowski H (1984) The statistical security of a statistical database. ACM Trans Database Syst 9:672–679

    Article  Google Scholar 

  72. Willenborg L, DeWaal T (2001) Elements of statistical disclosure control. Springer, New York

    Book  MATH  Google Scholar 

  73. Winkler WE (1998) Re-identification methods for evaluating the confidentiality of analytically valid microdata. Res Off Stat 1(2):50–69

    Google Scholar 

  74. Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the ACM SIGKDD International conference on knowledge discovery and data mining, KDD 2016, pp 754–759

    Google Scholar 

  75. Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd International conference on very large data bases-VLDB 2006, pp 139–150

    Google Scholar 

  76. Xiao Y, Xiong L, Yuan C (2010) Differentially private data release through multidimensional partitioning. In: Proceedings of the 7th VLDB conference on secure data management, SDM’10, pp 150–168

    Google Scholar 

  77. Xu J, Zhang Z, Xiao X, Yang Y, Yu G (2012) Differentially private histogram publication. In: Proceedings of the IEEE International conference on data engineering, ICDE 2012, pp 32–43

    Google Scholar 

Download references

Acknowledgments and Disclaimer

This work was partly supported by the Government of Catalonia under grant 2014 SGR 537, by the Spanish Government through projects TIN2011-27076-C03-01 “CO-PRIVACY” and TIN2014-57364-C2-R “SmartGlacis”, and by the European Commission under H2020 project “CLARUS”. J. Domingo-Ferrer is partially supported as an ICREA Acadèmia researcher by the Government of Catalonia. The authors are with the UNESCO Chair in Data Privacy, but they are solely responsible for the views expressed in this chapter, which do not necessarily reflect the position of UNESCO nor commit that organization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep Domingo-Ferrer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Domingo-Ferrer, J., Sánchez, D., Hajian, S. (2015). Database Privacy. In: Zeadally, S., Badra, M. (eds) Privacy in a Digital, Networked World. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-08470-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08470-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08469-5

  • Online ISBN: 978-3-319-08470-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics