Source Data Perturbation and consistent sets of safe tables
- 71 Downloads
When tables are generated from a data file, the release of those tables should not reveal too detailed information concerning individual respondents. The disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level (by cell suppression or cell perturbation), but this may create inconsistencies among other tables based on the same data file. Alternatively, disclosure control methods can be applied at the microdata level, but these methods may change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a (single and fixed) weight factor to each respondent/record in the microdata file. Normally this weight factor is equal to 1 for each record, and is not explicitly incorporated in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent's weight factor. This approach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the data in the original microdata is not changed; only a weight variable is added. The weight factors can be chosen in accordance with the SDC paradigm, i.e. such that the tables generated from the microdata are safe, and the information loss is minimized. The paper indicates how this can be done. Moreover it is shown that the SDP approach is very suitable for use in data warehouses, as the weights can be conveniently put in the fact tables. The data can then still be accessed and sliced and diced up to a certain level of detail, and tables generated from the data warehouse are mutually consistent and safe.
Unable to display preview. Download preview PDF.
- Andersen E.D. 2000. The MOSEK Base System and Application Program Interface version 1.3 User's Manual, EKA Consulting ApS.Google Scholar
- Bacharach M. 1970. Biproportional Matrices & Input-Output Change, Cambridge University Press.Google Scholar
- Cuppen M. 2000. Source Data Perturbation in Statistical Disclosure Control, Report, Statistics Netherlands.Google Scholar
- Domingo-Ferrer J. and Torra V. 2002. A critique of the sensitivity rules usually employed for statistical table protection. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5): 545–556.Google Scholar
- Evans T., Zayatz L., and Slanta J. 1996. Using Noise for Disclosure Limitation of Establishment Tabular Data, U.S. Census Bureau.Google Scholar
- Hundepool A., van de Wetering A., Ramaswamy R., Franconi L., Capobianchi A., de Wolf P.P., Domingo J., Torra V., Brand R., and Giessing S. 2003. µ-ARGUS user's manual, Department of Statistical Methods, Statistics Netherlands.Google Scholar
- Hundepool A., van de Wetering A., de Wolf P.P., Giessing S., Fischetti M., Salazar J.J., and Caprara A. 2002. ?-ARGUS user's manual, Department of Statistical Methods, Statistics Netherlands.Google Scholar
- Willenborg L. and de Waal T. 1996. Statistical Disclosure Control in Practice, Lecture Notes in Statistics, Springer-Verlag, New York, vol. 111.Google Scholar
- Willenborg L. and de Waal T. 2001. Elements of Statistical Disclosure Control, Lecture Notes in Statistics, Springer-Verlag, New York, vol. 155.Google Scholar