Balancing Quality and Confidentiality for Multivariate Tabular Data

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Absolute cell deviation has been used as a proxy for preserving data quality in statistical disclosure limitation for tabular data. However, users’ primary interest is that analytical properties of the data are for the most part preserved, meaning that the values of key statistics are nearly unchanged. Moreover, important relationships within (additivity) and between (correlation) the published tables should also be unaffected. Previous work demonstrated how to preserve additivity, mean and variance in for univariate tabular data. In this paper, we bridge the gap between statistics and mathematical programming to propose nonlinear and linear models based on constraint satisfaction to preserve additivity and covariance, correlation, and regression coefficient between data tables. Linear models are superior than nonlinear models owing to simplicity, flexibility and computational speed. Simulations demonstrate the models perform well in terms of preserving key statistics with reasonable accuracy.