Ordered Data Set Vectorization for Linear Regression on Data Privacy

  • Pau Medrano-Gracia
  • Jordi Pont-Tuset
  • Jordi Nin
  • Victor Muntés-Mulero
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4617)


Many situations demand from publishing data without revealing the confidential information in it. Among several data protection methods proposed in the literature, those based on linear regression are widely used for numerical data. The main objective of these methods is to minimize both the disclosure risk (DR) and the information lost (IL). However, most of these techniques try to protect the non-confidential attributes based on the values of the confidential attributes in the data set. In this situation, when these two sets of attributes are strongly correlated, the possibility of an intruder to reveal confidential data increases, making these methods unsuitable for many typical scenarios. In this paper we propose a new type of methods called LiROP− k methods that, based on linear regression, avoid the problems derived from the correlation between attributes in the data set. We propose the vectorization, sorting and partitioning of all values in the attributes to be protected in the data set, breaking the semantics of these attributes inside the record. We present two different protection methods: a synthetic protection method called LiROPs-k and a perturbative method, called LiROPp-k. We show that, when the attributes in the data set are highly correlated, our methods present lower DR than other protection methods based on linear regression.


Privacy in statistical databases Privacy preserving data mining Statistical disclosure risk Linear regression masking methods 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adam, N.R., Wortmann, J.C.: Security-Control for statistical databases: a comparative study. ACM Computing Surveys 21, 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Privacy Preserving Data Mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)Google Scholar
  3. 3.
    Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare sdc methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002), http://neon.vb.cbs.nl/casc
  4. 4.
    Burridge, J.: Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Dahlquist, G., Björck, A.: Numerical methods. Dover Publications, Mineola (2003)MATHGoogle Scholar
  6. 6.
    Data Extraction System, U.S. Census Bureau: http://www.census.gov/DES/
  7. 7.
    Domingo-Ferrer, J., Torra, V.: Disclosure Control Methods and Information Loss for Microdata, Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier Science, North-Holland, Amsterdam (2001)Google Scholar
  8. 8.
    Domingo-Ferrer, J., Torra, V.: A Quantitative Comparison of Disclosure Control Methods for Microdata, Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier Science, North-Holland, Amsterdam (2001)Google Scholar
  9. 9.
  10. 10.
    Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining, Information Fusion in Data Mining, pp. 101–132. Springer, Heidelberg (2003)Google Scholar
  11. 11.
    Torra, V., Abowd, J.M., Domingo-Ferrer, J.: Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 233–242. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Pau Medrano-Gracia
    • 1
  • Jordi Pont-Tuset
    • 1
  • Jordi Nin
    • 2
  • Victor Muntés-Mulero
    • 1
  1. 1.DAMA-UPC, Computer Architecture Dept., Universitat Politècnica de Catalunya, Campus Nord UPC, C/Jordi Girona 1-3, 08034 Barcelona, (CataloniaSpain)
  2. 2.IIIA, Artificial Intelligence Research Institute, CSIC, Spanish National Research Council, Campus UAB s/n, 08193 Bellaterra (CataloniaSpain)

Personalised recommendations