Abstract
Linking people and places is essential for population-health-environment research. Yet, this data integration requires geographic coding such that information reflecting individuals or households can appropriately be connected with characteristics of their proximate environments. However, offering access to such geocoding greatly increases the risk of respondent identification and, therefore, holds the potential to breach confidentiality. In response, a variety of “geographic masking” techniques have been developed to introduce error into geographic coding and thereby reduce the likelihood of identification. We report findings from analyses of the error introduced by several masking techniques applied to data from the Agincourt Health and Socio-Demographic Surveillance System in rural South Africa. Using a vegetation index (Normalized Difference Vegetation Index (NDVI)) at the household scale, comparisons are made between the “true” NDVI values and those calculated after masking. We also examine the tradeoffs between accuracy and protecting respondent privacy. The exploration suggests that in this study setting and for NDVI, geomasking approaches that use buffers and account for population density produce the most accurate results. However, the exploration also clearly demonstrates the tradeoff between accuracy and privacy, with more accuracy resulting in a higher level of potential respondent identification. It is important to note that these analyses illustrate a process that should characterize spatially informed research but within which particular decisions must be shaped by the research setting and objectives. In the long run, we aim to provide insight into masking’s potential and perils to facilitate population-environment-health research.
Similar content being viewed by others
Notes
The NDVI Values and Quality Assessment (QA) files were obtained from the Land Satellites Data System (LSDS) Science Research and Development (LSRD) repository provided by the US Geological Survey (USGS) Earth Resources Observation and Science (EROS) center (LSRD, 2018).
Calculations were also made with 1 km buffers with no substantial differences in overarching conclusions.
We also examined the impact of removing boundary constraints which increases the distance of displacement for households by an average of 14%. However, the displacement distances varied substantially across villages. For example, villages with lower household density did not see large gains in displacement as compared to when with masking methods that account for household density. Also, household displacement distances were on average, lower when using k-anonymity methods. In all, this suggests that the household density is more limiting in terms of constraining displacement distances than the village boundary.
References
Abowd, J. M., & Schmutte, I. M. (2019). An economic analysis of privacy protection and statistical accuracy as social choices. American Economic Review, 109(1), 171–202.
Allshouse, W. B., Fitch, M. K., Hampton, K. H., Gesink, D. C., Doherty, I. A., Leone, P. A., Serre, M. L., & Miller, W. C. (2010). Geomasking sensitive health data and privacy protection: An evalution using an E911 database. Geocarto International, 25(6), 443–452.
Anane‐Sarpong, E (2016). Application of ethical principles to research using public health data in the Global South: Perspectives from Africa. Developing World Bioethics.
Armstrong, M. P., Rushton, G., & Zimmerman, D. L. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18(5), 497–525.
Byers, E., Gidden, M., Leclère, D., Balkovic, J., Burek, P., Ebi, K., & Johnson, N. (2018). Global exposure and vulnerability to multi-sector development and climate change hotspots. Environmental Research Letters, 13(5), 055012.
Cassa, C. A., Wieland, S. C., & Mandl, K. D. (2008). Re-identification of home addresses from spatial locations anonymized by Gaussian skew. International Journal of Health Geographics. 7(1), 1-9.
Collinson, M. A. (2010). Striving against adversity: The dynamics of migration, health and poverty in rural South Africa. Global Health Action, 3(1), 5080.
Elkies, N., Fink, G., & Bärnighausen, T. (2015). “Scrambling” geo-referenced data to protect privacy induces bias in distance estimation. Population and Environment, 37(1), 83–98.
Foody, G. M., Cutler, M. E., Mcmorrow, J., Pelz, D., Tangki, H., Boyd, D. S., & Douglas, I. (2001). Mapping the biomass of Bornean tropical rain forest from remotely sensed data published by: Blackwell Publishing Stable http://www.Jstor.Org/Stable/2665383. Global Ecology & Biogeography, 10(4), 379–387.
Giannecchini, M., Twine, W., & Vogel, C. (2007). Land-cover change and human–environment interactions in a rural cultural landscape in South Africa. Geographical Journal, 173(1), 26–42.
Grace, K., Nagle, N. N., Burgert-Brucker, C. R., Rutzick, S., Van Riper, D. C., Dontamsetti, T., & Croft, T. (2019). Integrating environmental context into DHS analysis while protecting participant confidentiality: A new remote sensing method. Population and Development Review, 45(1), 197.
Hunter, L. M., Twine, W., & Patterson, L. (2007). ``Locusts are now our beef'': Adult mortality and household dietary use of local environmental resources in rural South Africa1. Scandinavian Journal of Public Health, 35(69_suppl), 165–174.
INDEPTH Network. (2017). “About Us” http://www.indepth-network.org/about-us.
Leyk, S., Maclaurin, G. J., Hunter, L. M., Nawrotzki, R., Twine, W., Collinson, M., & Erasmus, B. (2012). Spatially and temporally varying associations between temporary outmigration and natural resource availability in resource-dependent rural communities in South Africa: A modeling framework. Applied Geography, 34(2012), 559–568.
Lu, Y., Yorke, C., & Zhan, F. B. (2012). Considering risk locations when defining perturbation zones for geomasking. Cartographica: The International Journal for Geographic Information and Geovisualization 47(3):168–78.
LSRD. (2018). Land Satelite Data System (LSDS) Science Research and Development (LSRD) Reposiory. Sioux Falls, ND. U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center. `https://espa.cr.usgs.gov.
Matsika, R., Erasmus, B. F. N., & Twine, W. C. (2013). Double jeopardy: The dichotomy of fuelwood use in rural South Africa. Energy Policy, 52, 716–725.
Mutanga, O., & Skidmore, A. K. (2004). Narrow band vegetation indices overcome the saturation problem in biomass estimation. International Journal of Remote Sensing, 25(19), 3999–4014.
NASA. (2000). Measuring Vegetation (NDVI & EVI). Measuring Vegetation (NDVI & EVI).
Olsson, L., Opondo, M., Tschakert, P., Agrawal, A., & Eriksen, S. E. (2014). Livelihoods and poverty. In: Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Field, C.B., V.R. Barros, D.J. Dokken, K.J. Mach, M.D. Mastrandrea, T.E. Bilir, M. Chatterjee, K.L. Ebi, Y.O. Estrada, R.C. Genova, B. Girma, E.S. Kissel, A.N. Levy, S. MacCracken, P.R. Mastrandrea, and L.L.White (Eds.), Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 793–832.
Paumgarten, F., & Shackleton, C. M. (2011). The role of non-timber forest products in household coping strategies in South Africa: the influence of household wealth and gender. Population and Environment, 33(1), 108.
Roerink, G. J., Menenti, M., Soepboer, W., & Su, Z. (2003). Assessment of climate impact on vegetation dynamics by using remote sensing. Physics and Chemistry of the Earth, 28(1–3), 103–109.
Ruggles, S., Fitch, C., Magnuson, D., & Schroeder, J. (2019). Differential privacy and census data: Implications for social and economic research. AEA Papers and Proceedings, 109, 403–408.
Sumner, D., Christie, M. E., & Boulakia, S. (2017). Conservation agriculture and gendered livelihoods in Northwestern Cambodia: Decision-making, space and access. Agriculture and Human Values, 34(2), 347–362.
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), 557–570.
Tlou, B., Sartorius, B., & Tanser, F. (2017). Space-time patterns in maternal and mother mortality in a rural South African population with high HIV prevalence (2000–2014): Results from a population-based cohort. BMC Public Health, 17(1), 543.
Tucker, C. J. (1979). Red and Photographic Infrared l,Lnear Combinations for Monitoring Vegetation. Vol 8.
Wang, J. & Rich, P. M. (2008). Geocarto International Relations between NDVI, Grassland Production, and Crop Yield in the Central Great Plains.
Wang, H., & Reiter, J. P. (2012). Multiple imputation for sharing precise geographies in public use data. The Annals of Applied Statistics, 6(1), 229–252.
Warren, J. L., Perez-Heydrich, C., Burgert, C. R., & Emch, M. E. (2016). Influence of demographic and health survey point displacements on distance-based analyses. Spatial Demography, 4(2), 155–173.
Wessels, K. J., Prince, S. D., Frost, P. E., & Van Zyl, D. (2004). Assessing the effects of human-induced land degradation in the former homelands of Northern South Africa with a 1 km AVHRR NDVI time-series. Remote Sensing of Environment, 91(1), 47–67.
Wisely, S. M., Alexander, K., & Cassidy, L. (2018). Linking ecosystem services to livelihoods in southern Africa. Ecosystem Services, 30, 339–341.
Zandbergen, P. A. (2014). Ensuring confidentiality of geocoded health data: Assessing geographic masking strategies for individual-level data. Advances in Medicine, 1–14.
Zhou, F. D., & Louis, T. A. (2010). A smoothing approach for masking spatial data. The Annals of Applied Statistics, 4(3), 1451–1475.
Acknowledgements
Early versions of this manuscript were presented at the 2019 Annual Meeting of the Population Association of America, as well as the 2019 Conference on Demographic Responses to Changes in the Natural Environment. The latter was supported in part by an R13 grant from the Eunice Kennedy Shriver National Institute on Child Health and Human Development (#HD096853). We thank those who provided useful feedback at each venue.
Funding
Funding for this research was provided by the University of Colorado Boulder’s Research and Innovation Office. This research has also benefited from research, administrative, and computing support provided by the University of Colorado Population Center (Project 2P2CHD066613-06), funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development. The content is solely the responsibility of the author and does not necessarily represent the official views of the CUPC, NIH or CU Boulder. Indirect support was provided by the Wellcome Trust (Agincourt Unit, grant 085477/Z/08/Z) through its support of the MRC/Wits Rural Public Health and Health Transitions Research Unit. Indirect support was also provided by the University of Colorado Boulder's Earth Lab supported by CIRES and the Grand Challenge Initiative at CU Boulder. The authors would like to thank the communities, respondents, field staff, and management of the Agincourt Unit for their respective contributions to the production of the data used in this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Hunter, L.M., Talbot, C., Twine, W. et al. Working toward effective anonymization for surveillance data: innovation at South Africa’s Agincourt Health and Socio-Demographic Surveillance Site. Popul Environ 42, 445–476 (2021). https://doi.org/10.1007/s11111-020-00372-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11111-020-00372-4