Skip to main content

A New Algorithm for Protecting Aggregate Business Microdata via a Remote System

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9867))

Included in the following conference series:

Abstract

Releasing business microdata is a challenging problem for many statistical agencies. Businesses with distinct continuous characteristics such as extremely high income could easily be identified while these businesses are normally included in surveys representing the population. In order to provide data users with useful statistics while maintaining confidentiality, some statistical agencies have developed online based tools to allow users to specify and request tables created from microdata. These tools only release perturbed cell values generated from automatic output perturbation algorithms in order to protect each underlying observation against various attacks, such as differencing attacks. An example of the perturbation algorithms has been proposed by Thompson et al. (2013). The algorithm focuses largely on reducing disclosure risks without addressing much on data utility. As a result, the algorithm has limitations, including a limited scope of applicable cells and uncontrolled utility loss. In this paper we introduce a new algorithm for generating perturbed cell values. As a comparison, The new algorithm allows more control over utility loss, while it could also achieve better utility-disclosure tradeoffs in many cases, and is conjectured to be applicable to a wider scope of cells.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Blakemore, M.: The potential and perils of remote access. In: Doyle, P., Lane, J., Zayatz, L., Theeuwes, J. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 315–340. Elsevier Science B. V., Amsterdam (2001)

    Google Scholar 

  • Chipperfield, J.O., O’Keefe, C.M.: Disclosure-protected inference using generalised linear models. Int. Stat. Rev. 82(3), 371–391 (2014). doi:10.1111/insr.12054

    Article  MathSciNet  Google Scholar 

  • Chipperfield, J., Newman, J., Thompson, G., Ma, Y., Lin, Y.-X.: Prospects for Protecting Aggregate Business Microdata via a Remote Server (2016). (working paper)

    Google Scholar 

  • Decays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics, Canada (1993)

    Google Scholar 

  • Domingo-Ferrer, J., Torra, V.: Disclosure protection methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. North-Holland, Amsterdam (2001)

    Google Scholar 

  • Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Hawala, S., Zayatz, L., Rowland, S.: American FactFinder: disclosure limitation for the advanced query system. J. Official Stat. 20(1), 115–124 (2004)

    Google Scholar 

  • Kim, J.J., Winkler, W.E.: Masking microdata files, American statistical association. In: Proceedings of the Section on Survey Research Methods, pp. 114–119 (1995)

    Google Scholar 

  • Klein, M., Mathew, T., Sinha, B.: Noise multiplication for statistical disclosure control of extreme values in log-normal regression samples. J. Priv. Confidentiality 6, 77–125 (2014)

    Google Scholar 

  • Lin, Y.X., Wise, P.: Estimation of regression paremeters from noise multiplied data. J. Priv. Confidentiality 4, 61–94 (2012)

    Google Scholar 

  • Lucero, J., Singh, L., Zayatz, L.: Recent Work on the Microdata Analysis System at the Census Bureau. Research Report Series (Statistics #2009-09) (2009)

    Google Scholar 

  • Moore, R.: Controlled Data Swapping Techniques for Masking Public Use Microdata Sets. U.S. Bureau of the Census, Washington, DC (1996). http://www.census.gov/srd/papers.pdf.rr96-4.pdf

    Google Scholar 

  • Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis, In: Johnson, D.S., Feige, U. (eds.) 39th ACM Symposium on Theory of Computing-STOC 2007, pp. 75–84. ACM (2007)

    Google Scholar 

  • O’Keefe, C.M., Chipperfield, J.O.: A summary of attack methods and confidentiality protection measures for fully automated remote analysis systems. Int. Stat. Rev. 81(3), 1–30 (2013). doi:10.1111/insr.12021

    MathSciNet  Google Scholar 

  • Reiter, J.: New approaches to data dissemination: a glimpse into the future (?). Chance 17, 12–16 (2004)

    Article  MathSciNet  Google Scholar 

  • Rubin, D.B.: Discussion: statistical disclosure limitation. J. Official Stat. 9, 461–468 (1993)

    Google Scholar 

  • Salazar-GonzĂ¡lez, J.-J.: A unified mathematical programming framework for different statistical disclosure limitation methods. Oper. Res. 53(5), 819–829 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Sarathy, R., Muralidhar, K.: Evaluating laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 4, 1–17 (2011)

    MathSciNet  Google Scholar 

  • Soria-Comas, J., Domingo-Ferrer, J.: Optimal data-independent noise for differential privacy. Inf. Sci. 250, 200–214 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Thompson, G., Broadfoot, S., Elazar, D.: Methodology for the automatic confdentialisation of statistical outputs from remote servers at the australian bureau of statistics. In: UNECE Work Session on Statistical Data Confidentiality, Ottawa, pp. 28–30, October 2013

    Google Scholar 

  • Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Acknowledgements

We sincerely thank Sybille McKeown and all the others in the Australian Bureau of Statistics for providing so many good feedbacks for the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Ma .

Editor information

Editors and Affiliations

Additional information

Disclaimer: Views expressed in this paper are those of the author(s) and do not necessarily represent those of the Australian Bureau of Statistics. Where quoted or used, they should be attributed clearly to the author.

Appendix

Appendix

Table 1. Magnitude values that guarantee 15 % disclosure risk given \(\alpha =0.11\) and minimise the average utility loss for different distributions of top contributor values.
Table 2. Probability expressions for Table 3. \(m_1=y_1w_1\) and \(\lambda _1=\hat{s}_{-1}\beta \).
Table 3. Disclosure risk of perturbed estimates generated by the new algorithm.
Fig. 1.
figure 1

Utility-disclosure plots for Simulation 1 with different \(\alpha \) values. The box-plot represents results generated by the Thompson et al. algorithm and the dotted plot represents results generated by the new algorithm.

Fig. 2.
figure 2

Utility-disclosure plots for Simulation 2 with different \(\alpha \) values.

Fig. 3.
figure 3

Utility-disclosure plots for Simulation 3 with different \(\alpha \) values.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ma, Y., Lin, YX., Chipperfield, J., Newman, J., Leaver, V. (2016). A New Algorithm for Protecting Aggregate Business Microdata via a Remote System. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds) Privacy in Statistical Databases. PSD 2016. Lecture Notes in Computer Science(), vol 9867. Springer, Cham. https://doi.org/10.1007/978-3-319-45381-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45381-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45380-4

  • Online ISBN: 978-3-319-45381-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics