Population Research and Policy Review

, Volume 27, Issue 6, pp 639–665 | Cite as

Providing Spatial Data for Secondary Analysis: Issues and Current Practices Relating to Confidentiality

  • Myron P. Gutmann
  • Kristine Witkowski
  • Corey Colyer
  • JoAnne McFarland O’Rourke
  • James McNally
Article

Abstract

Spatially explicit data pose a series of opportunities and challenges for all the actors involved in providing data for long-term preservation and secondary analysis—the data producer, the data archive, and the data user. We report on opportunities and challenges for each of the three players, and then turn to a summary of current thinking about how best to prepare, archive, disseminate, and make use of social science data that have spatially explicit identification. The core issue that runs through the paper is the risk of the disclosure of the identity of respondents. If we know where they live, where they work, or where they own property, it is possible to find out who they are. Those involved in collecting, archiving, and using data need to be aware of the risks of disclosure and become familiar with best practices to avoid disclosures that will be harmful to respondents.

Keywords

Archives Confidentiality Data Disclosure Location 

References

  1. Abowd, J. M., & Lane, J. (2004). New approaches to confidentiality protection: Synthetic data, remote access and research data centers. In J. Domingo-Ferrer & V. Torra (Eds.), Privacy in statistical databases (pp. 282–289). New York: Springer-Verlag.Google Scholar
  2. Anderson, M., & Seltzer, W. (2007). Challenges to the confidentiality of U.S. federal statistics, 1910–1965. Journal of Official Statistics, 23(1), 1–34.Google Scholar
  3. Armstrong, M. P. (2002). Geographic information technologies and their potentially erosive effects on personal privacy. Studies in the Social Sciences, 27(1), 19–28.CrossRefGoogle Scholar
  4. Armstrong, M. P., & Ruggles, A. (2005). Geographic information technologies and personal privacy. Cartographica, 40(4), 63–73.Google Scholar
  5. Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18(5), 497–525.CrossRefGoogle Scholar
  6. Boulos, M. N. K., Cai, Q., Padget, J. A., & Rushton, G. (2006). Using software agents to preserve individual health data confidentiality in microscale geographical analyses. Journal of Biomedical Informatics, 39(2), 160–170.CrossRefGoogle Scholar
  7. Brownstein, J. S., Cassa, C. A., & Mandi, K. D. (2006). No place to hide—reverse identification of patients from published maps. New England Journal of Medicine, 355(16), 1741–1742.CrossRefGoogle Scholar
  8. Clemetson, L. (2004). Homeland Security given data on Arab-Americans. The New York Times, July 30, p. A14.Google Scholar
  9. Cox, L. H. (1980). Suppression methodology and statistical disclosure control. Journal of the American Statistical Association, 75(370), 377–385.CrossRefGoogle Scholar
  10. Dalenius, T. (1986). Finding a needle in a haystack or identifying anonymous census records. Journal of Official Statistics, 2(3), 329–336.Google Scholar
  11. Dalenius, T., & Reiss, S. P. (1982). Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6(1), 73–85.CrossRefGoogle Scholar
  12. Domingo-Ferrer, J., & Mateo-Sanz, J. M. (2002). Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1), 189–201.CrossRefGoogle Scholar
  13. Domingo-Ferrer, J., Oganian, A., Torres, A., & Mateo-Sanz, J. M. (2002). On the security of microaggregation with individual ranking: Analytical attacks. International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, 10(5), 477–491.CrossRefGoogle Scholar
  14. Domingo-Ferrer, J., & Torra, V. (2001). Disclosure control methods and information loss for microdata. In P. Doyle, J. I. Lane, J. J. M. Theeuwes & L. M. Zayatz (Eds.), Confidentiality, disclosure, and data access (pp. 91–110). Amsterdam: North-Holland.Google Scholar
  15. Doyle, P., Lane, J. I., Theeuwes, J. J. M., & Zayatz, L. V. (Eds.). (2001). Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies. Amsterdam: North-Holland.Google Scholar
  16. Duncan, G. T. (2001). Confidentiality and statistical disclosure limitation. In N. J. Smelser & P. B. Baltes (Eds.), International encyclopedia of the social & behavioral sciences (pp. 2521–2525). Oxford: Elsevier Science Ltd.Google Scholar
  17. Duncan, G. T., Jabine, T. B., & de Wolf, V. A. (Eds.). (1993). Private lives and public policies: Confidentiality and accessibility of government statistics. Washington, DC: National Academy Press.Google Scholar
  18. Duncan, G. T., & Lambert, D. (1986). Disclosure-limited data dissemination (with discussion). Journal of the American Statistical Association, 81(393), 10–18.CrossRefGoogle Scholar
  19. Duncan, G., & Lambert, D. (1989). The risk of disclosure for microdata. Journal of Business & Economic Statistics, 7(2), 207–217.CrossRefGoogle Scholar
  20. Duncan, G. T., & Pearson, R. W. (1991). Enhancing access to microdata while protecting confidentiality: Prospects for the future. Statistical Science, 6(3), 219–232.CrossRefGoogle Scholar
  21. Dunn, C. S., & Austin, E. W. (1998). Protecting confidentiality in archival data resources. ICPSR Bulletin, 19(1), 1–8.Google Scholar
  22. El-Badry, S., & Swanson, D. A. (2007). Providing census tabulations to government security agencies in the United States: The case of Arab Americans. Government Information Quarterly, 24(2), 470–487.CrossRefGoogle Scholar
  23. Elliot, M. (2001). Disclosure risk assessment: Confidentiality, disclosure, and data access. In P. Doyle, J. I. Lane, J. J. M. Theeuwes & L. M. Zayatz (Eds.), Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies (pp. 75–90). Amsterdam: North-Holland.Google Scholar
  24. Entwisle, B., & Stern, P. (Eds.). (2005). Population, land use, and environment: Research directions. Washington, DC: National Academy Press.Google Scholar
  25. Entwisle, B., Walsh, S. J., Rindfuss, R. R., & Chamratrithirong, A. (1998). Land-use/land-cover and population dynamics, Nang Rong, Thailand. In D. Liverman, E. F. Moran, R. R. Rindfuss & P. C. Stern (Eds.), People and pixels: Linking remote sensing and social science (pp. 121–144). Washington, DC: National Academy Press.Google Scholar
  26. Fellegi, I. P. (1972). On the question of statistical confidentiality. Journal of the American Statistical Association, 67(337), 7–18.CrossRefGoogle Scholar
  27. Fienberg, S. E., & McIntyre, J. (2004). Data swapping: Variations on a theme by Dalenius and Reiss. In J. Domingo-Ferrer & V. Torra (Eds.), Privacy in statistical databases (pp. 14–29). Berlin: Springer-Verlag.Google Scholar
  28. Goss, J. (1995). We know who you are and we know where you live: The instrumental rationality of geodemographic systems. Economic Geography, 71(2), 171–198.CrossRefGoogle Scholar
  29. Hawala, S. (2003). Microdata disclosure protection research and experiences at the U.S. Census Bureau. Paper Presented at the Workshop on Microdata, Stockholm, Sweden. http://www.census.gov/srd/sdc/microdataprotection.pdf. Cited 18 Feb 2007.
  30. Interagency Confidentiality and Data Access Group, Federal Committee on Statistical Methodology. (1999). Checklist on disclosure potential of proposed data releases. Washington, DC: Statistical Policy Office, Office of Information and Regulatory Affairs, Office of Management and Budget. http://www.fcsm.gov/committees/cdac/.
  31. Jabine, T. B. (1993). Statistical disclosure limitation practices of United States statistical agencies. Journal of Official Statistics, 9(2), 427–454.Google Scholar
  32. Kim, J. J. (1986). A method for limiting disclosure in microdata based on random noise and transformation. In American Statistical Association, Proceedings of the Section on Survey Research Methods (pp. 303–308).Google Scholar
  33. Kim, J. J., & Winkler, W. E. (2003). Multiplicative noise for masking continuous data. Census Statistical Research Report Series: RRS2003/01. http://www.census.gov/srd/papers/pdf/rrs2003-01.pdf.
  34. Lambert, D. (1993). Measures of disclosure risk and harm. Journal of Official Statistics, 9(2), 313–331.Google Scholar
  35. Little, R. J. A. (1993). Statistical analysis of masked data. Journal of Official Statistics, 9(2), 407–426.Google Scholar
  36. Liverman, D., Moran, E. F., Rindfuss, R. R., & Stern, P. C. (Eds.). (1998). People and pixels: Linking remote sensing and social science. Washington, DC: National Academy Press.Google Scholar
  37. Moran, E. F., & Brondizio, E. (1998). Land-use change after deforestation in Amazonia. In D. Liverman, E. F. Moran, R. R. Rindfuss & P. C. Stern (Eds.), People and pixels: Linking remote sensing and social science (pp. 94–120). Washington, DC: National Academy Press.Google Scholar
  38. National Research Council. (2007). In M. P. Gutmann & P. Stern (Eds.), Putting people on the map: Protecting confidentiality with linked social-spatial data. Washington, DC: National Academy Press.Google Scholar
  39. O’Rourke, J. M. (2003). Disclosure analysis at ICPSR. ICPSR Bulletin, 24(1), 3–9.Google Scholar
  40. O’Rourke, J. M., & Gutmann, M. P. (2005). Citations database—human subjects protection and disclosure risk analysis, project 3: Statistical disclosure control: Best practices and tools for the social sciences. Ann Arbor, MI: Inter-university Consortium for Political and Social Research. http://www.icpsr.umich.edu/HSP/citations/index.html.
  41. O’Rourke, J. M., Roehrig, S., Heeringa, S., Reed, B. G., Birdsall, W. C., Overcashier, M., et al. (2006). Solving problems of disclosure risk while retaining key analytic uses of publicly released microdata. Journal of Empirical Research on Human Research Ethics, 1(3), 63–84.CrossRefGoogle Scholar
  42. Raghunathan, T. E., Reiter, J. P., & Rubin, D. B. (2003). Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19(1), 1–16.Google Scholar
  43. Reiter, J. P. (2002). Satisfying disclosure restrictions with synthetic data sets. Journal of Official Statistics, 18(4), 531–543.Google Scholar
  44. Reiter, J. P. (2005). Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society Series A, 168(1), 185–205.Google Scholar
  45. Rindfuss, R. R. (2002). Conflicting demands: Confidentiality promises and data availability. Newsletter of the International Human Dimensions Programme on Global Environmental Change Update, 2. http://www.ihdp.uni-bonn.de/html/publications/update/IHDPUpdate02_02.html.
  46. Robbin, A. (2001). The loss of personal privacy and its consequences for social research. Journal of Government Information, 28(5), 493–527.CrossRefGoogle Scholar
  47. Roberts, H. V. (1986). Disclosure-limited data dissemination: Comment. Journal of the American Statistical Association, 81(393), 25–27.CrossRefGoogle Scholar
  48. Rubin, D. B. (1993). Satisfying confidentiality constraints through the use of synthetic multiply-imputed microdata. Journal of Official Statistics, 9(2), 461–468.Google Scholar
  49. Sande, G. (2002). Exact and approximate methods for data directed microaggregation in one or more dimensions. International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, 10(5), 459–476.CrossRefGoogle Scholar
  50. Seltzer, W., & Anderson, M. (2001). The dark side of numbers: The role of population data systems in human rights abuses. Social Research, 68(2), 481–513.Google Scholar
  51. Seltzer, W., & Anderson, M. (2005). On the use of population data systems to target vulnerable population subgroups for human rights abuses. Coyuntura Social, 32, 31–44.Google Scholar
  52. Seltzer, W., & Anderson, M. (2007). Census confidentiality under the Second War Powers Act (1942–1947). Paper presented at the Annual Meeting of the Population Association of America, New York. http://www.uwm.edu/~margo/govstat/Seltzer-AndersonPAA2007paper3-12-2007.doc. Cited 15 Jul 2007.
  53. Singer, E. (1978). Informed consent: Consequences for response rate and response quality in social surveys. American Sociological Review, 43(2), 144–162.CrossRefGoogle Scholar
  54. Singer, E. (1993). Informed consent in surveys: A review of the empirical literature. Journal of Official Statistics, 9(2), 361–375.Google Scholar
  55. Singer, E., Hippler, H.-J., & Schwarz, N. (1992). Confidentiality assurances in surveys: Reassurance or threat. International Journal of Public Opinion Research, 4(3), 257–268.CrossRefGoogle Scholar
  56. Singer, E., Mathiowetz, N. A., & Couper, M. P. (1993). The impact of privacy and confidentiality concerns on survey participation: The case of the 1990 U.S. Census. Public Opinion Quarterly, 57(4), 465–482.CrossRefGoogle Scholar
  57. Singer, E., Von Thurn, D. R., & Miller, E. R. (1995). Confidentiality assurances and survey response: A review of the experimental literature. Public Opinion Quarterly, 59(1), 66–77.CrossRefGoogle Scholar
  58. Snow, J. (1855). On the mode of communication of cholera (2nd ed.). London: John Churchill.Google Scholar
  59. Steel, P., & Sperling, J. (2001). The impact of multiple geographies and geographic detail on disclosure risk: Interactions between census tract and ZIP code tabulation geography. Paper presented at the annual meeting of the American Statistical Association, Survey Research Methods Section.Google Scholar
  60. Subcommittee on Disclosure-Avoidance Techniques, Federal Committee on Statistical Methodology. (1978). Statistical policy working paper 2: Report on statistical disclosure and disclosure avoidance techniques. Washington, DC: U.S. Department of Commerce.Google Scholar
  61. Subcommittee on Disclosure Limitation Methodology, Federal Committee on Statistical Methodology. (1994). Statistical policy working paper 22: Report on statistical disclosure limitation methodology. Washington, DC: Statistical Policy Office, Office of Information and Regulatory Affairs, Office of Management and Budget.Google Scholar
  62. Sweeney, L. (2001). Information explosion: Confidentiality, disclosure, and data access. In P. Doyle, J. I. Lane, J. J. M. Theeuwes & L. M. Zayatz (Eds.), Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies (pp. 43–74). Amsterdam: North-Holland.Google Scholar
  63. Takemura, A. (2002). Local recoding and record swapping by maximum weight matching for disclosure control of microdata sets. Journal of Official Statistics, 18(2), 275–289.Google Scholar
  64. Tufte, E. (2001). The visual display of quantitative information (2nd ed.). Cheshire: Graphics Press.Google Scholar
  65. U.S. Bureau of the Census. (2003). Census 2000, Public use microdata sample, (PUMS), technical documentation. Washington, DC: U.S. Census Bureau.Google Scholar
  66. VanWey, L., Rindfuss, R., Gutmann, M. P., Entwisle, B., & Balk, D. (2005). Confidentiality and spatially explicit data: Concerns and challenges. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15337–15342.CrossRefGoogle Scholar
  67. Willenborg, L., & de Waal, T. (1996). Statistical disclosure control in practice. New York: Springer-Verlag.Google Scholar
  68. Willenborg, L., & de Waal, T. (2001). Elements of statistical disclosure control. New York: Springer-Verlag.Google Scholar
  69. Zayatz, L. (2002). SDC in the 2000 U.S. decennial census. In J. Domingo-Ferrer (Ed.), Inference control in statistical databases (pp. 183–202). Berlin: Springer-Verlag.Google Scholar
  70. Zayatz, L. (2003). Disclosure limitation for census 2000 tabular data. Paper presented at the Joint European Commission for Europe and EUROSTAT Work Session on Statistical Data Confidentiality. http://www.unece.org/stats/documents/2003/04/confidentiality/wp.15.e.pdf.
  71. Zayatz, L., Moore, R., & Evans, B. T. (1996). New directions in disclosure limitation at the Census Bureau. Census Statistical Research Report Series: LVZ96/01.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Myron P. Gutmann
    • 1
  • Kristine Witkowski
    • 1
  • Corey Colyer
    • 2
  • JoAnne McFarland O’Rourke
    • 1
  • James McNally
    • 1
  1. 1.Inter-University Consortium for Political and Social Research, Institute for Social ResearchUniversity of MichiganAnn ArborUSA
  2. 2.Department of Sociology and AnthropologyWest Virginia UniversityMorgantownUSA

Personalised recommendations