Skip to main content

Working toward effective anonymization for surveillance data: innovation at South Africa’s Agincourt Health and Socio-Demographic Surveillance Site

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Linking people and places is essential for population-health-environment research. Yet, this data integration requires geographic coding such that information reflecting individuals or households can appropriately be connected with characteristics of their proximate environments. However, offering access to such geocoding greatly increases the risk of respondent identification and, therefore, holds the potential to breach confidentiality. In response, a variety of “geographic masking” techniques have been developed to introduce error into geographic coding and thereby reduce the likelihood of identification. We report findings from analyses of the error introduced by several masking techniques applied to data from the Agincourt Health and Socio-Demographic Surveillance System in rural South Africa. Using a vegetation index (Normalized Difference Vegetation Index (NDVI)) at the household scale, comparisons are made between the “true” NDVI values and those calculated after masking. We also examine the tradeoffs between accuracy and protecting respondent privacy. The exploration suggests that in this study setting and for NDVI, geomasking approaches that use buffers and account for population density produce the most accurate results. However, the exploration also clearly demonstrates the tradeoff between accuracy and privacy, with more accuracy resulting in a higher level of potential respondent identification. It is important to note that these analyses illustrate a process that should characterize spatially informed research but within which particular decisions must be shaped by the research setting and objectives. In the long run, we aim to provide insight into masking’s potential and perils to facilitate population-environment-health research.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. The NDVI Values and Quality Assessment (QA) files were obtained from the Land Satellites Data System (LSDS) Science Research and Development (LSRD) repository provided by the US Geological Survey (USGS) Earth Resources Observation and Science (EROS) center (LSRD, 2018).

  2. Calculations were also made with 1 km buffers with no substantial differences in overarching conclusions.

  3. We also examined the impact of removing boundary constraints which increases the distance of displacement for households by an average of 14%. However, the displacement distances varied substantially across villages. For example, villages with lower household density did not see large gains in displacement as compared to when with masking methods that account for household density. Also, household displacement distances were on average, lower when using k-anonymity methods. In all, this suggests that the household density is more limiting in terms of constraining displacement distances than the village boundary.

References

  • Abowd, J. M., & Schmutte, I. M. (2019). An economic analysis of privacy protection and statistical accuracy as social choices. American Economic Review, 109(1), 171–202.

    Article  Google Scholar 

  • Allshouse, W. B., Fitch, M. K., Hampton, K. H., Gesink, D. C., Doherty, I. A., Leone, P. A., Serre, M. L., & Miller, W. C. (2010). Geomasking sensitive health data and privacy protection: An evalution using an E911 database. Geocarto International, 25(6), 443–452.

  • Anane‐Sarpong, E (2016). Application of ethical principles to research using public health data in the Global South: Perspectives from Africa. Developing World Bioethics.

  • Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18(5), 497–525.

    Article  Google Scholar 

  • Byers, E., Gidden, M., Leclère, D., Balkovic, J., Burek, P., Ebi, K., & Johnson, N. (2018). Global exposure and vulnerability to multi-sector development and climate change hotspots. Environmental Research Letters, 13(5), 055012.

    Article  Google Scholar 

  • Cassa, C. A., Wieland, S. C., & Mandl, K. D. (2008). Re-identification of home addresses from spatial locations anonymized by Gaussian skew. International Journal of Health Geographics. 7(1), 1-9.

  • Collinson, M. A. (2010). Striving against adversity: The dynamics of migration, health and poverty in rural South Africa. Global Health Action, 3(1), 5080.

    Article  Google Scholar 

  • Elkies, N., Fink, G., & Bärnighausen, T. (2015). “Scrambling” geo-referenced data to protect privacy induces bias in distance estimation. Population and Environment, 37(1), 83–98.

    Article  Google Scholar 

  • Foody, G. M., Cutler, M. E., Mcmorrow, J., Pelz, D., Tangki, H., Boyd, D. S., & Douglas, I. (2001). Mapping the biomass of Bornean tropical rain forest from remotely sensed data published by: Blackwell Publishing Stable http://www.Jstor.Org/Stable/2665383. Global Ecology & Biogeography, 10(4), 379–387.

  • Giannecchini, M., Twine, W., & Vogel, C. (2007). Land-cover change and human–environment interactions in a rural cultural landscape in South Africa. Geographical Journal, 173(1), 26–42.

    Article  Google Scholar 

  • Grace, K., Nagle, N. N., Burgert-Brucker, C. R., Rutzick, S., Van Riper, D. C., Dontamsetti, T., & Croft, T. (2019). Integrating environmental context into DHS analysis while protecting participant confidentiality: A new remote sensing method. Population and Development Review, 45(1), 197.

    Article  Google Scholar 

  • Hunter, L. M., Twine, W., & Patterson, L. (2007). ``Locusts are now our beef'': Adult mortality and household dietary use of local environmental resources in rural South Africa1. Scandinavian Journal of Public Health35(69_suppl), 165–174.

  • INDEPTH Network. (2017). “About Us” http://www.indepth-network.org/about-us.

  • Leyk, S., Maclaurin, G. J., Hunter, L. M., Nawrotzki, R., Twine, W., Collinson, M., & Erasmus, B. (2012). Spatially and temporally varying associations between temporary outmigration and natural resource availability in resource-dependent rural communities in South Africa: A modeling framework. Applied Geography, 34(2012), 559–568.

    Article  Google Scholar 

  • Lu, Y., Yorke, C., & Zhan, F. B. (2012). Considering risk locations when defining perturbation zones for geomasking. Cartographica: The International Journal for Geographic Information and Geovisualization 47(3):168–78.

  • LSRD. (2018). Land Satelite Data System (LSDS) Science Research and Development (LSRD) Reposiory. Sioux Falls, ND. U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center. `https://espa.cr.usgs.gov.

  • Matsika, R., Erasmus, B. F. N., & Twine, W. C. (2013). Double jeopardy: The dichotomy of fuelwood use in rural South Africa. Energy Policy, 52, 716–725.

    Article  Google Scholar 

  • Mutanga, O., & Skidmore, A. K. (2004). Narrow band vegetation indices overcome the saturation problem in biomass estimation. International Journal of Remote Sensing, 25(19), 3999–4014.

    Article  Google Scholar 

  • NASA. (2000). Measuring Vegetation (NDVI & EVI). Measuring Vegetation (NDVI & EVI).

  • Olsson, L., Opondo, M., Tschakert, P., Agrawal, A., & Eriksen, S. E. (2014). Livelihoods and poverty. In: Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Field, C.B., V.R. Barros, D.J. Dokken, K.J. Mach, M.D. Mastrandrea, T.E. Bilir, M. Chatterjee, K.L. Ebi, Y.O. Estrada, R.C. Genova, B. Girma, E.S. Kissel, A.N. Levy, S. MacCracken, P.R. Mastrandrea, and L.L.White (Eds.), Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 793–832.

  • Paumgarten, F., & Shackleton, C. M. (2011). The role of non-timber forest products in household coping strategies in South Africa: the influence of household wealth and gender. Population and Environment, 33(1), 108.

    Article  Google Scholar 

  • Roerink, G. J., Menenti, M., Soepboer, W., & Su, Z. (2003). Assessment of climate impact on vegetation dynamics by using remote sensing. Physics and Chemistry of the Earth, 28(1–3), 103–109.

    Article  Google Scholar 

  • Ruggles, S., Fitch, C., Magnuson, D., & Schroeder, J. (2019). Differential privacy and census data: Implications for social and economic research. AEA Papers and Proceedings, 109, 403–408.

    Article  Google Scholar 

  • Sumner, D., Christie, M. E., & Boulakia, S. (2017). Conservation agriculture and gendered livelihoods in Northwestern Cambodia: Decision-making, space and access. Agriculture and Human Values, 34(2), 347–362.

    Article  Google Scholar 

  • Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), 557–570.

    Article  Google Scholar 

  • Tlou, B., Sartorius, B., & Tanser, F. (2017). Space-time patterns in maternal and mother mortality in a rural South African population with high HIV prevalence (2000–2014): Results from a population-based cohort. BMC Public Health, 17(1), 543.

    Article  Google Scholar 

  • Tucker, C. J. (1979). Red and Photographic Infrared l,Lnear Combinations for Monitoring Vegetation. Vol 8.

  • Wang, J. & Rich, P. M. (2008). Geocarto International Relations between NDVI, Grassland Production, and Crop Yield in the Central Great Plains.

  • Wang, H., & Reiter, J. P. (2012). Multiple imputation for sharing precise geographies in public use data. The Annals of Applied Statistics, 6(1), 229–252.

    Article  Google Scholar 

  • Warren, J. L., Perez-Heydrich, C., Burgert, C. R., & Emch, M. E. (2016). Influence of demographic and health survey point displacements on distance-based analyses. Spatial Demography, 4(2), 155–173.

    Article  Google Scholar 

  • Wessels, K. J., Prince, S. D., Frost, P. E., & Van Zyl, D. (2004). Assessing the effects of human-induced land degradation in the former homelands of Northern South Africa with a 1 km AVHRR NDVI time-series. Remote Sensing of Environment, 91(1), 47–67.

    Article  Google Scholar 

  • Wisely, S. M., Alexander, K., & Cassidy, L. (2018). Linking ecosystem services to livelihoods in southern Africa. Ecosystem Services, 30, 339–341.

    Article  Google Scholar 

  • Zandbergen, P. A. (2014). Ensuring confidentiality of geocoded health data: Assessing geographic masking strategies for individual-level data. Advances in Medicine, 1–14.

  • Zhou, F. D., & Louis, T. A. (2010). A smoothing approach for masking spatial data. The Annals of Applied Statistics, 4(3), 1451–1475.

    Article  Google Scholar 

Download references

Acknowledgements

Early versions of this manuscript were presented at the 2019 Annual Meeting of the Population Association of America, as well as the 2019 Conference on Demographic Responses to Changes in the Natural Environment. The latter was supported in part by an R13 grant from the Eunice Kennedy Shriver National Institute on Child Health and Human Development (#HD096853). We thank those who provided useful feedback at each venue.

Funding

Funding for this research was provided by the University of Colorado Boulder’s Research and Innovation Office. This research has also benefited from research, administrative, and computing support provided by the University of Colorado Population Center (Project 2P2CHD066613-06), funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development. The content is solely the responsibility of the author and does not necessarily represent the official views of the CUPC, NIH or CU Boulder. Indirect support was provided by the Wellcome Trust (Agincourt Unit, grant 085477/Z/08/Z) through its support of the MRC/Wits Rural Public Health and Health Transitions Research Unit. Indirect support was also provided by the University of Colorado Boulder's Earth Lab supported by CIRES and the Grand Challenge Initiative at CU Boulder. The authors would like to thank the communities, respondents, field staff, and management of the Agincourt Unit for their respective contributions to the production of the data used in this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lori M. Hunter.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 3 Estimated coefficients and significance of difference between true vs. displaced median NDVI estimates, nine illustrative geomasking approaches
Table 4 Estimated coefficients and significance of difference between true and displaced sum NDVI estimates divided by the number of households, nine geomasking approaches
Table 5 Estimated distance and actual k-anonymity between NDVI estimates provided by seven geomasking approaches relative to the NDVI estimate for true household locations

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hunter, L.M., Talbot, C., Twine, W. et al. Working toward effective anonymization for surveillance data: innovation at South Africa’s Agincourt Health and Socio-Demographic Surveillance Site. Popul Environ 42, 445–476 (2021). https://doi.org/10.1007/s11111-020-00372-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11111-020-00372-4

Keywords

  • Anonymity
  • Confidentiality
  • Geomasking
  • Geographic Masking
  • Jittering
  • Spatial Error
  • Agincourt
  • South Africa