Using Partially Synthetic Data to Replace Suppression in the Business Dynamics Statistics: Early Results

  • Javier Miranda
  • Lars Vilhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8744)

Abstract

The Business Dynamics Statistics is a product of the U.S. Census Bureau that provides measures of business openings and closings, and job creation and destruction, by a variety of cross-classifications (firm and establishment age and size, industrial sector, and geography). Sensitive data are currently protected through suppression. However, as additional tabulations are being developed, at ever more detailed geographic levels, the number of suppressions increases dramatically. This paper explores the option of providing public-use data that are analytically valid and without suppressions, by leveraging synthetic data to replace observations in sensitive cells.

Keywords

synthetic data statistical disclosure limitation time-series local labor markets gross job flows confidentiality protection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abowd, J.M., Gittings, K., McKinney, K.L., Stephens, B.E., Vilhuber, L., Woodcock, S.: Dynamically consistent noise infusion and partially synthetic data as confidentiality protection measures for related time-series. Tech. rep. Federal Committee on Statistical Methodology (January 2012), http://www.fcsm.gov/events/papers2012.html
  2. 2.
    Abowd, J.M., Vilhuber, L.: Synthetic data server (2010), http://www.vrdc.cornell.edu/sds/
  3. 3.
    Drechsler, J.: Synthetische Scientific-use-files der Welle 2007 des IAB-Betriebspanels. FDZ Methodenreport 201101_de, Institute for Employment Research, Nuremberg, Germany (January 2011), http://ideas.repec.org/p/iab/iabfme/201101_de.html
  4. 4.
    Drechsler, J.: New data dissemination approaches in old Europe – synthetic datasets for a German establishment survey. Journal of Applied Statistics 39(2), 243–265 (2012), http://ideas.repec.org/a/taf/japsta/v39y2012i2p243-265.html CrossRefMathSciNetGoogle Scholar
  5. 5.
    Drechsler, J., Reiter, J.P.: Disclosure risk and data utility for partially synthetic data: An empirical study using the German IAB Establishment Survey. Journal of Official Statistics 25(12), 589–603 (2009), http://ideas.repec.org/a/eee/csdana/v55y2011i12p3232-3243.html Google Scholar
  6. 6.
    Drechsler, J., Reiter, J.P.: Sampling with synthesis: A new approach for releasing public use census microdata. Journal of the American Statistical Association 105(492), 1347–1357 (2010), http://ideas.repec.org/a/bes/jnlasa/v105i492y2010p1347-1357.html CrossRefMathSciNetGoogle Scholar
  7. 7.
    Gittings, R.K.: Essays in labor economics and synthetic data methods. Ph.d., Cornell University (2009)Google Scholar
  8. 8.
    Haltiwanger, J., Jarmin, R., Miranda, J.: Jobs created from business startups in the United States (2008), https://www.census.gov/ces/pdf/BDS_StatBrief1_Jobs_Created.pdf
  9. 9.
    Haltiwanger, J.C., Jarmin, R.S., Miranda, J.: Who creates jobs? small vs. large vs. young. Working Paper 16300, National Bureau of Economic Research (August 2010), http://www.nber.org/papers/w16300
  10. 10.
    Hethey, T., Schmieder, J.F.: Using worker flows in the analysis of establishment turnover: Evidence from German administrative data. FDZ Methodenreport 201006_en, Institute for Employment Research, Nuremberg, Germany (August 2010), http://ideas.repec.org/p/iab/iabfme/201006_en.html
  11. 11.
    Holan, S.H., Toth, D., Ferreira, M.A.R., Karr, A.F.: Bayesian multiscale multiple imputation with implications for data confidentiality. Journal of the American Statistical Association 105(490), 564–577 (2010), http://dx.doi.org/10.1198/jasa.2009.ap08629 CrossRefMathSciNetGoogle Scholar
  12. 12.
    Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality 60(3), 1–9 (2006)Google Scholar
  13. 13.
    Kinney, S.K., Reiter, J.: SynLBD: providing firm characteristics on synthetic establishment data. Presentation, World Statistics Conference (2013)Google Scholar
  14. 14.
    Kinney, S.K., Reiter, J., Miranda, J.: Improving the Synthetic Longitudinal Business Database. Working Paper 14-12, U.S. Census Bureau, Center for Economic Studies (2014)Google Scholar
  15. 15.
    Kinney, S.K., Reiter, J.P., Reznek, A.P., Miranda, J., Jarmin, R.S., Abowd, J.M.: Towards unrestricted public use business microdata: The Synthetic Longitudinal Business Database. International Statistical Review 79(3), 362–384 (2011), http://ideas.repec.org/a/bla/istatr/v79y2011i3p362-384.html CrossRefGoogle Scholar
  16. 16.
    Machanavajjhala, A., Kifer, D., Abowd, J.M., Gehrke, J., Vilhuber, L.: Privacy: Theory meets practice on the map. In: International Conference on Data Engineering, ICDE (2008)Google Scholar
  17. 17.
    Rodríguez, R.: Synthetic data disclosure control for american community survey group quarters (2007)Google Scholar
  18. 18.
    Sakshaug, J.W., Raghunathan, T.E.: Synthetic Data for Small Area Estimation in the American Community Survey. Working Papers 13-19, Center for Economic Studies, U.S. Census Bureau (April 2013), http://ideas.repec.org/p/cen/wpaper/13-19.html
  19. 19.
    U.S. Census Bureau: Synthetic LBD Beta version 2.0. [computer file], U.S. Census Bureau and Cornell University, Synthetic Data Server [distributor], Washington, DC and Ithaca, NY, USA (2011), http://www2.vrdc.cornell.edu/news/data/lbd-synthetic-data/

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Javier Miranda
    • 1
  • Lars Vilhuber
    • 2
  1. 1.U.S. Bureau of the CensusWashingtonUSA
  2. 2.Cornell UniversityIthacaUSA

Personalised recommendations