Abstract
Research on retail food environment (RFE) relies on data availability and accuracy. However, the discrepancies in RFE datasets may lead to imprecision when measuring association with health outcomes. In this research, we present a two-tier hierarchical point of interest (POI) matching framework to compare and triangulate food outlets across multiple geospatial data sources. Two matching parameters were used including the geodesic distance between businesses and the similarity of business names according to Levenshtein distance (LD) and Double Metaphone (DM). Sensitivity analysis was conducted to determine thresholds of matching parameters. Our Tier 1 matching used more restricted parameters to generate high confidence-matched POIs, whereas in Tier 2 we opted for relaxed matching parameters and applied a weighted multi-attribute model on the previously unmatched records. Our case study in San Diego County, California used government, commercial, and crowdsourced data and returned 20.2% matched records from Tier 1 and 18.6% matched from Tier 2. Our manual validation shows a 100% matching rate for Tier 1 and up to 30.6% for Tier 2. Matched and unmatched records from Tier 1 were further analyzed for spatial patterns and categorical differences. Our hierarchical POI matching framework generated highly confident food POIs by conflating datasets and identified some food POIs that are unique to specific data sources. Triangulating RFE data can reduce uncertain and invalid POI listings when representing food environment using multiple data sources. Studies investigating associations between food environment and health outcomes may benefit from improved quality of RFE.
Similar content being viewed by others
Data Availability
Data are not available to be transferred from the authors upon request. Yelp data are owned by Yelp.com and can be accessed through official API. Esri Data are part of the ArcGIS Business Analyst product and corresponding licenses are required for access. Government data are owned by San Diego Department of Environmental Health and require paperwork to access.
Notes
Yelp business category list: https://blog.yelp.com/businesses/yelp_category_list/
North American Industry Classification System (NAICS): https://www.census.gov/naics/
References
Gustafson A, Hankins S, Jilcott S. Measures of the consumer food store environment: a systematic review of the evidence 2000–2011. J Commun Health. 2012;37(4):897–911. https://doi.org/10.1007/s10900-011-9524-x.
Feng J, Glass TA, Curriero FC, Stewart WF, Schwartz BS. The built environment and obesity: a systematic review of the epidemiologic evidence. Health Place. 2010;16(2):175–90. https://doi.org/10.1016/j.healthplace.2009.09.008.
Sanchez-Vaznaugh EV, Weverka A, Matsuzaki M, Sánchez BN. Changes in fast food outlet availability near schools: unequal patterns by income, race/ethnicity, and urbanicity. Am J Prev Med. 2019;57(3):338–45. https://doi.org/10.1016/j.amepre.2019.04.023.
Crawford TW, Jilcott Pitts SB, McGuirt JT, Keyserling TC, Ammerman AS. Conceptualizing and comparing neighborhood and activity space measures for food environment research. Health Place. 2014;30:215–25. https://doi.org/10.1016/j.healthplace.2014.09.007.
Pearce J, Witten K. Geographies of obesity: environmental understandings of the obesity epidemic. Geograph Obesity: Environ Understand Obesity Epidemic. Published online 2010:1–331. https://doi.org/10.1080/07293682.2010.530594
Widener MJ, Liu B. Food environments. Int Encycloped Geograph. Published online 2021:1–6. https://doi.org/10.1002/9781118786352.wbieg2097
Caspi CE, Sorensen G, Subramanian SV, Kawachi I. The local food environment and diet: a systematic review. Health Place. 2012;18(5):1172–87. https://doi.org/10.1016/j.healthplace.2012.05.006.
Walker RE, Keane CR, Burke JG. Disparities and access to healthy food in the United States: a review of food deserts literature. Health Place. 2010;16(5):876–84. https://doi.org/10.1016/j.healthplace.2010.04.013.
Fleischhacker SE, Evenson KR, Sharkey J, Pitts SBJ, Rodriguez DA. Validity of secondary retail food outlet data: a systematic review. Am J Prev Med. 2013;45(4):462–73. https://doi.org/10.1016/j.amepre.2013.06.009.
Wilkins EL, Morris MA, Radley D, Griffiths C. Using geographic information systems to measure retail food environments: discussion of methodological considerations and a proposed reporting checklist (Geo-FERN). Health Place. 2017;44(February):110–7. https://doi.org/10.1016/j.healthplace.2017.01.008.
Cho C, McLaughlin PW, Zeballos E, Kent J, Dicken C. Capturing the complete food environment with commercial data: a comparison of TDLinx, ReCount, and NETS databases. Technical Bulletin - Economic Research Service, US Department of Agriculture. 2019;(TB-1953):39-pp. https://www.ers.usda.gov/webdocs/publications/92629/tb-1953.pdf. Accessed 10 Dec 2021
Smith LG, Widener MJ, Liu B, et al. Comparing household and individual measures of access through a food environment lens: what household food opportunities are missed when measuring access to food retail at the individual level? Ann Am Assoc Geogr. 2021;0(0):1–21. https://doi.org/10.1080/24694452.2021.1930513
Bader MDM, Ailshire JA, Morenoff JD, House JS. Measurement of the local food environment: a comparison of existing data sources. Am J Epidemiol. 2010;171(5):609–17. https://doi.org/10.1093/aje/kwp419.
Liese AD, Barnes TL, Lamichhane AP, Hibbert JD, Colabianchi N, Lawson AB. Characterizing the food retail environment: impact of count, type, and geospatial error in 2 secondary data sources. J Nutr Educ Behav. 2013;45(5):435–42. https://doi.org/10.1016/j.jneb.2013.01.021.
Fleischhacker SE, Evenson KR, Sharkey J, Pitts SBJ, Rodriguez DA. Validity of secondary retail food outlet data. Am J Prev Med. 2013;45(4):462–73. https://doi.org/10.1016/j.amepre.2013.06.009.
Jones KK, Zenk SN, Tarlov E, Powell LM, Matthews SA, Horoi I. A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments. BMC Res Notes. 2017;10(1):35. https://doi.org/10.1186/s13104-016-2355-1.
Cobb LK, Appel LJ, Franco M, Jones-Smith JC, Nur A, Anderson CAM. The relationship of the local food environment with obesity: a systematic review of methods, study quality, and results. Obesity. 2015;23(7):1331–44. https://doi.org/10.1002/oby.21118.
Widener MJ, Li W. Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US. Appl Geogr. 2014;54:189–97. https://doi.org/10.1016/j.apgeog.2014.07.017.
Stevenson AC, Kaufmann C, Colley RC, et al. A pan-Canadian dataset of neighbourhood retail food environment measures using Statistics Canada’s Business Register. Health Rep. 2022;33(2). https://doi.org/10.25318/82-003-x202200200001-eng
Firth C, Beairsto J, Ferster C, et al. Validity of food outlet databases from commercial and community science datasets in Vancouver and Montreal. Findings. Published online 2022:35619. https://doi.org/10.32866/001c.35619
Rummo PE, Gordon-Larsen P, Albrecht SS. Field validation of food outlet databases: the Latino food environment in North Carolina, USA. Public Health Nutr. 2015;18(6):977–82. https://doi.org/10.1017/S1368980014001281.
Powell LM, Han E, Zenk SN, et al. Field validation of secondary commercial data sources on the retail food outlet environment in the US. Health Place. 2011;17(5):1122–31. https://doi.org/10.1016/j.healthplace.2011.05.010.
Estellés-Arolas E, González-Ladrón-de-Guevara F. Towards an integrated crowdsourcing definition. J Inf Sci. 2012;38(2):189–200. https://doi.org/10.1177/0165551512437638.
Adams B, Janowicz K. On the geo-indicativeness of non-georeferenced text. ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media. Published online 2012:375–378. https://doi.org/10.1609/icwsm.v6i1.14309
Mckenzie G, Janowicz K, Adams B. A weighted multi-attribute method for matching user-generated points of interest. Cartogr Geogr Inf Sci. 2017;41(2):125–37. https://doi.org/10.1080/15230406.2014.880327.
Zandbergen PA. Geocoding quality and implications for spatial analysis. Geogr Compass. 2009;3(2):647–80. https://doi.org/10.1111/j.1749-8198.2008.00205.x.
Touya G, Antoniou V, Olteanu-Raimond AM, Van Damme MD. Assessing crowdsourced POI quality: combining methods based on reference data, history, and spatial relations. ISPRS Int J Geoinf. 2017;6(3). https://doi.org/10.3390/ijgi6030080
Lamprianidis G, Skoutas D, Papatheodorou G, Pfoser D. Extraction, integration and analysis of crowdsourced points of interest from multiple web sources. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. 2014;2014-Novem(November):16–23. https://doi.org/10.1145/2676440.2676445
Lucan SC, Maroko AR, Bumol J, Torrens L, Varona M, Berke EM. Business list vs ground observation for measuring a food environment: saving time or waste of time (or worse)? J Acad Nutr Diet. 2013;113(10):1332–9. https://doi.org/10.1016/j.jand.2013.05.011.
Jones KK, Zenk SN, Tarlov E, Powell LM, Matthews SA, Horoi I. A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments. BMC Res Notes. 2017;10(1):35. https://doi.org/10.1186/s13104-016-2355-1.
Russo RG, Ali SH, Mezzacca TA, et al. Assessing changes in the food retail environment during the COVID-19 pandemic: opportunities, challenges, and lessons learned. BMC Public Health. 2022;22(1):778. https://doi.org/10.1186/s12889-022-12890-x.
Althoff T, Nilforoshan H, Hua J, Leskovec J. Large-scale diet tracking data reveal disparate associations between food environment and diet. Nat Commun. 2022;13(1):267. https://doi.org/10.1038/s41467-021-27522-y.
Nguyen QC, Meng H, Li D, et al. Social media indicators of the food environment and state health outcomes. Public Health. 2017;148:120–8. https://doi.org/10.1016/j.puhe.2017.03.013.
Gomez-Lopez IN, Clarke P, Hill AB, et al. Using social media to identify sources of healthy food in urban neighborhoods. J Urban Health. 2017;94(3):429–36. https://doi.org/10.1007/s11524-017-0154-1.
Folch DC, Spielman SE, Manduca R. Fast food data: where user-generated content works and where it does not. Geogr Anal. 2018;50(2):125–40. https://doi.org/10.1111/gean.12149.
Hargittai E. Potential biases in big data: omitted voices on social media. Soc Sci Comput Rev. 2020;38(1):10–24. https://doi.org/10.1177/0894439318788322.
Chuang HM, Chang CH, Kao TY, Cheng CT, Huang YY, Cheong KP. Enabling maps/location searches on mobile devices: constructing a POI database via focused crawling and information extraction. Int J Geogr Inf Sci. 2016;30(7):1405–25. https://doi.org/10.1080/13658816.2015.1133820.
Yang W, Ai T. Poi information enhancement using crowdsourcing vehicle trace data and social media data: a case study of gas station. ISPRS Int J Geoinf. 2018;7(5). https://doi.org/10.3390/ijgi7050178
Asher J, Resnick D, Brite J, Brackbill R, Cone J. An introduction to probabilistic record linkage with a focus on linkage processing for WTC registries. Int J Environ Res Public Health. 2020;17(18):1–16. https://doi.org/10.3390/ijerph17186937.
Kelly B, Flood VM, Yeatman H. Measuring local food environments: an overview of available methods and measures. Health Place. 2011;17(6):1284–93. https://doi.org/10.1016/j.healthplace.2011.08.014.
Sun K, Hu Y, Ma Y, Zhou RZ, Zhu Y. Conflating point of interest (POI) data: a systematic review of matching methods. Comput Environ Urban Syst. 2023;103: 101977. https://doi.org/10.1016/j.compenvurbsys.2023.101977.
Levenshtein VI. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady. 1965;10(8):707–10.
Philips L. The double metaphone search algorithm. Cc Plus Plus Users J. 2000;18(6):38–43.
Acknowledgements
This research was funded by the NIH National Cancer Institute grant R01 CA228147, and by National Science Foundation grant 1561112.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cao, Y., Yang, JA., Nara, A. et al. Designing and Evaluating a Hierarchical Framework for Matching Food Outlets across Multi-sourced Geospatial Datasets: a Case Study of San Diego County. J Urban Health 101, 155–169 (2024). https://doi.org/10.1007/s11524-023-00817-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11524-023-00817-9