Abstract
Releasing business microdata is a challenging problem for many statistical agencies. Businesses with distinct continuous characteristics such as extremely high income could easily be identified while these businesses are normally included in surveys representing the population. In order to provide data users with useful statistics while maintaining confidentiality, some statistical agencies have developed online based tools to allow users to specify and request tables created from microdata. These tools only release perturbed cell values generated from automatic output perturbation algorithms in order to protect each underlying observation against various attacks, such as differencing attacks. An example of the perturbation algorithms has been proposed by Thompson et al. (2013). The algorithm focuses largely on reducing disclosure risks without addressing much on data utility. As a result, the algorithm has limitations, including a limited scope of applicable cells and uncontrolled utility loss. In this paper we introduce a new algorithm for generating perturbed cell values. As a comparison, The new algorithm allows more control over utility loss, while it could also achieve better utility-disclosure tradeoffs in many cases, and is conjectured to be applicable to a wider scope of cells.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blakemore, M.: The potential and perils of remote access. In: Doyle, P., Lane, J., Zayatz, L., Theeuwes, J. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 315–340. Elsevier Science B. V., Amsterdam (2001)
Chipperfield, J.O., O’Keefe, C.M.: Disclosure-protected inference using generalised linear models. Int. Stat. Rev. 82(3), 371–391 (2014). doi:10.1111/insr.12054
Chipperfield, J., Newman, J., Thompson, G., Ma, Y., Lin, Y.-X.: Prospects for Protecting Aggregate Business Microdata via a Remote Server (2016). (working paper)
Decays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics, Canada (1993)
Domingo-Ferrer, J., Torra, V.: Disclosure protection methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. North-Holland, Amsterdam (2001)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Hawala, S., Zayatz, L., Rowland, S.: American FactFinder: disclosure limitation for the advanced query system. J. Official Stat. 20(1), 115–124 (2004)
Kim, J.J., Winkler, W.E.: Masking microdata files, American statistical association. In: Proceedings of the Section on Survey Research Methods, pp. 114–119 (1995)
Klein, M., Mathew, T., Sinha, B.: Noise multiplication for statistical disclosure control of extreme values in log-normal regression samples. J. Priv. Confidentiality 6, 77–125 (2014)
Lin, Y.X., Wise, P.: Estimation of regression paremeters from noise multiplied data. J. Priv. Confidentiality 4, 61–94 (2012)
Lucero, J., Singh, L., Zayatz, L.: Recent Work on the Microdata Analysis System at the Census Bureau. Research Report Series (Statistics #2009-09) (2009)
Moore, R.: Controlled Data Swapping Techniques for Masking Public Use Microdata Sets. U.S. Bureau of the Census, Washington, DC (1996). http://www.census.gov/srd/papers.pdf.rr96-4.pdf
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis, In: Johnson, D.S., Feige, U. (eds.) 39th ACM Symposium on Theory of Computing-STOC 2007, pp. 75–84. ACM (2007)
O’Keefe, C.M., Chipperfield, J.O.: A summary of attack methods and confidentiality protection measures for fully automated remote analysis systems. Int. Stat. Rev. 81(3), 1–30 (2013). doi:10.1111/insr.12021
Reiter, J.: New approaches to data dissemination: a glimpse into the future (?). Chance 17, 12–16 (2004)
Rubin, D.B.: Discussion: statistical disclosure limitation. J. Official Stat. 9, 461–468 (1993)
Salazar-GonzĂ¡lez, J.-J.: A unified mathematical programming framework for different statistical disclosure limitation methods. Oper. Res. 53(5), 819–829 (2005)
Sarathy, R., Muralidhar, K.: Evaluating laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 4, 1–17 (2011)
Soria-Comas, J., Domingo-Ferrer, J.: Optimal data-independent noise for differential privacy. Inf. Sci. 250, 200–214 (2013)
Thompson, G., Broadfoot, S., Elazar, D.: Methodology for the automatic confdentialisation of statistical outputs from remote servers at the australian bureau of statistics. In: UNECE Work Session on Statistical Data Confidentiality, Ottawa, pp. 28–30, October 2013
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)
Acknowledgements
We sincerely thank Sybille McKeown and all the others in the Australian Bureau of Statistics for providing so many good feedbacks for the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Additional information
Disclaimer: Views expressed in this paper are those of the author(s) and do not necessarily represent those of the Australian Bureau of Statistics. Where quoted or used, they should be attributed clearly to the author.
Appendix
Appendix
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ma, Y., Lin, YX., Chipperfield, J., Newman, J., Leaver, V. (2016). A New Algorithm for Protecting Aggregate Business Microdata via a Remote System. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds) Privacy in Statistical Databases. PSD 2016. Lecture Notes in Computer Science(), vol 9867. Springer, Cham. https://doi.org/10.1007/978-3-319-45381-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-45381-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45380-4
Online ISBN: 978-3-319-45381-1
eBook Packages: Computer ScienceComputer Science (R0)