Skip to main content
Log in

Designing and Evaluating a Hierarchical Framework for Matching Food Outlets across Multi-sourced Geospatial Datasets: a Case Study of San Diego County

  • Original Article
  • Published:
Journal of Urban Health Aims and scope Submit manuscript

Abstract

Research on retail food environment (RFE) relies on data availability and accuracy. However, the discrepancies in RFE datasets may lead to imprecision when measuring association with health outcomes. In this research, we present a two-tier hierarchical point of interest (POI) matching framework to compare and triangulate food outlets across multiple geospatial data sources. Two matching parameters were used including the geodesic distance between businesses and the similarity of business names according to Levenshtein distance (LD) and Double Metaphone (DM). Sensitivity analysis was conducted to determine thresholds of matching parameters. Our Tier 1 matching used more restricted parameters to generate high confidence-matched POIs, whereas in Tier 2 we opted for relaxed matching parameters and applied a weighted multi-attribute model on the previously unmatched records. Our case study in San Diego County, California used government, commercial, and crowdsourced data and returned 20.2% matched records from Tier 1 and 18.6% matched from Tier 2. Our manual validation shows a 100% matching rate for Tier 1 and up to 30.6% for Tier 2. Matched and unmatched records from Tier 1 were further analyzed for spatial patterns and categorical differences. Our hierarchical POI matching framework generated highly confident food POIs by conflating datasets and identified some food POIs that are unique to specific data sources. Triangulating RFE data can reduce uncertain and invalid POI listings when representing food environment using multiple data sources. Studies investigating associations between food environment and health outcomes may benefit from improved quality of RFE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Data are not available to be transferred from the authors upon request. Yelp data are owned by Yelp.com and can be accessed through official API. Esri Data are part of the ArcGIS Business Analyst product and corresponding licenses are required for access. Government data are owned by San Diego Department of Environmental Health and require paperwork to access.

Notes

  1. Yelp fusion: https://www.yelp.com/developers/documentation/v3/get_started

  2. Yelp business category list: https://blog.yelp.com/businesses/yelp_category_list/

  3. North American Industry Classification System (NAICS): https://www.census.gov/naics/

References

  1. Gustafson A, Hankins S, Jilcott S. Measures of the consumer food store environment: a systematic review of the evidence 2000–2011. J Commun Health. 2012;37(4):897–911. https://doi.org/10.1007/s10900-011-9524-x.

    Article  Google Scholar 

  2. Feng J, Glass TA, Curriero FC, Stewart WF, Schwartz BS. The built environment and obesity: a systematic review of the epidemiologic evidence. Health Place. 2010;16(2):175–90. https://doi.org/10.1016/j.healthplace.2009.09.008.

    Article  PubMed  Google Scholar 

  3. Sanchez-Vaznaugh EV, Weverka A, Matsuzaki M, Sánchez BN. Changes in fast food outlet availability near schools: unequal patterns by income, race/ethnicity, and urbanicity. Am J Prev Med. 2019;57(3):338–45. https://doi.org/10.1016/j.amepre.2019.04.023.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Crawford TW, Jilcott Pitts SB, McGuirt JT, Keyserling TC, Ammerman AS. Conceptualizing and comparing neighborhood and activity space measures for food environment research. Health Place. 2014;30:215–25. https://doi.org/10.1016/j.healthplace.2014.09.007.

    Article  PubMed  Google Scholar 

  5. Pearce J, Witten K. Geographies of obesity: environmental understandings of the obesity epidemic. Geograph Obesity: Environ Understand Obesity Epidemic. Published online 2010:1–331. https://doi.org/10.1080/07293682.2010.530594

  6. Widener MJ, Liu B. Food environments. Int Encycloped Geograph. Published online 2021:1–6. https://doi.org/10.1002/9781118786352.wbieg2097

  7. Caspi CE, Sorensen G, Subramanian SV, Kawachi I. The local food environment and diet: a systematic review. Health Place. 2012;18(5):1172–87. https://doi.org/10.1016/j.healthplace.2012.05.006.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Walker RE, Keane CR, Burke JG. Disparities and access to healthy food in the United States: a review of food deserts literature. Health Place. 2010;16(5):876–84. https://doi.org/10.1016/j.healthplace.2010.04.013.

    Article  PubMed  Google Scholar 

  9. Fleischhacker SE, Evenson KR, Sharkey J, Pitts SBJ, Rodriguez DA. Validity of secondary retail food outlet data: a systematic review. Am J Prev Med. 2013;45(4):462–73. https://doi.org/10.1016/j.amepre.2013.06.009.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wilkins EL, Morris MA, Radley D, Griffiths C. Using geographic information systems to measure retail food environments: discussion of methodological considerations and a proposed reporting checklist (Geo-FERN). Health Place. 2017;44(February):110–7. https://doi.org/10.1016/j.healthplace.2017.01.008.

    Article  PubMed  Google Scholar 

  11. Cho C, McLaughlin PW, Zeballos E, Kent J, Dicken C. Capturing the complete food environment with commercial data: a comparison of TDLinx, ReCount, and NETS databases. Technical Bulletin - Economic Research Service, US Department of Agriculture. 2019;(TB-1953):39-pp. https://www.ers.usda.gov/webdocs/publications/92629/tb-1953.pdf. Accessed 10 Dec 2021

  12. Smith LG, Widener MJ, Liu B, et al. Comparing household and individual measures of access through a food environment lens: what household food opportunities are missed when measuring access to food retail at the individual level? Ann Am Assoc Geogr. 2021;0(0):1–21. https://doi.org/10.1080/24694452.2021.1930513

  13. Bader MDM, Ailshire JA, Morenoff JD, House JS. Measurement of the local food environment: a comparison of existing data sources. Am J Epidemiol. 2010;171(5):609–17. https://doi.org/10.1093/aje/kwp419.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Liese AD, Barnes TL, Lamichhane AP, Hibbert JD, Colabianchi N, Lawson AB. Characterizing the food retail environment: impact of count, type, and geospatial error in 2 secondary data sources. J Nutr Educ Behav. 2013;45(5):435–42. https://doi.org/10.1016/j.jneb.2013.01.021.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Fleischhacker SE, Evenson KR, Sharkey J, Pitts SBJ, Rodriguez DA. Validity of secondary retail food outlet data. Am J Prev Med. 2013;45(4):462–73. https://doi.org/10.1016/j.amepre.2013.06.009.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Jones KK, Zenk SN, Tarlov E, Powell LM, Matthews SA, Horoi I. A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments. BMC Res Notes. 2017;10(1):35. https://doi.org/10.1186/s13104-016-2355-1.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Cobb LK, Appel LJ, Franco M, Jones-Smith JC, Nur A, Anderson CAM. The relationship of the local food environment with obesity: a systematic review of methods, study quality, and results. Obesity. 2015;23(7):1331–44. https://doi.org/10.1002/oby.21118.

    Article  PubMed  Google Scholar 

  18. Widener MJ, Li W. Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US. Appl Geogr. 2014;54:189–97. https://doi.org/10.1016/j.apgeog.2014.07.017.

    Article  Google Scholar 

  19. Stevenson AC, Kaufmann C, Colley RC, et al. A pan-Canadian dataset of neighbourhood retail food environment measures using Statistics Canada’s Business Register. Health Rep. 2022;33(2). https://doi.org/10.25318/82-003-x202200200001-eng

  20. Firth C, Beairsto J, Ferster C, et al. Validity of food outlet databases from commercial and community science datasets in Vancouver and Montreal. Findings. Published online 2022:35619. https://doi.org/10.32866/001c.35619

  21. Rummo PE, Gordon-Larsen P, Albrecht SS. Field validation of food outlet databases: the Latino food environment in North Carolina, USA. Public Health Nutr. 2015;18(6):977–82. https://doi.org/10.1017/S1368980014001281.

    Article  PubMed  Google Scholar 

  22. Powell LM, Han E, Zenk SN, et al. Field validation of secondary commercial data sources on the retail food outlet environment in the US. Health Place. 2011;17(5):1122–31. https://doi.org/10.1016/j.healthplace.2011.05.010.

    Article  PubMed  Google Scholar 

  23. Estellés-Arolas E, González-Ladrón-de-Guevara F. Towards an integrated crowdsourcing definition. J Inf Sci. 2012;38(2):189–200. https://doi.org/10.1177/0165551512437638.

    Article  Google Scholar 

  24. Adams B, Janowicz K. On the geo-indicativeness of non-georeferenced text. ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media. Published online 2012:375–378. https://doi.org/10.1609/icwsm.v6i1.14309

  25. Mckenzie G, Janowicz K, Adams B. A weighted multi-attribute method for matching user-generated points of interest. Cartogr Geogr Inf Sci. 2017;41(2):125–37. https://doi.org/10.1080/15230406.2014.880327.

    Article  Google Scholar 

  26. Zandbergen PA. Geocoding quality and implications for spatial analysis. Geogr Compass. 2009;3(2):647–80. https://doi.org/10.1111/j.1749-8198.2008.00205.x.

    Article  Google Scholar 

  27. Touya G, Antoniou V, Olteanu-Raimond AM, Van Damme MD. Assessing crowdsourced POI quality: combining methods based on reference data, history, and spatial relations. ISPRS Int J Geoinf. 2017;6(3). https://doi.org/10.3390/ijgi6030080

  28. Lamprianidis G, Skoutas D, Papatheodorou G, Pfoser D. Extraction, integration and analysis of crowdsourced points of interest from multiple web sources. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems. 2014;2014-Novem(November):16–23. https://doi.org/10.1145/2676440.2676445

  29. Lucan SC, Maroko AR, Bumol J, Torrens L, Varona M, Berke EM. Business list vs ground observation for measuring a food environment: saving time or waste of time (or worse)? J Acad Nutr Diet. 2013;113(10):1332–9. https://doi.org/10.1016/j.jand.2013.05.011.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Jones KK, Zenk SN, Tarlov E, Powell LM, Matthews SA, Horoi I. A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments. BMC Res Notes. 2017;10(1):35. https://doi.org/10.1186/s13104-016-2355-1.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Russo RG, Ali SH, Mezzacca TA, et al. Assessing changes in the food retail environment during the COVID-19 pandemic: opportunities, challenges, and lessons learned. BMC Public Health. 2022;22(1):778. https://doi.org/10.1186/s12889-022-12890-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Althoff T, Nilforoshan H, Hua J, Leskovec J. Large-scale diet tracking data reveal disparate associations between food environment and diet. Nat Commun. 2022;13(1):267. https://doi.org/10.1038/s41467-021-27522-y.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Nguyen QC, Meng H, Li D, et al. Social media indicators of the food environment and state health outcomes. Public Health. 2017;148:120–8. https://doi.org/10.1016/j.puhe.2017.03.013.

    Article  CAS  PubMed  Google Scholar 

  34. Gomez-Lopez IN, Clarke P, Hill AB, et al. Using social media to identify sources of healthy food in urban neighborhoods. J Urban Health. 2017;94(3):429–36. https://doi.org/10.1007/s11524-017-0154-1.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Folch DC, Spielman SE, Manduca R. Fast food data: where user-generated content works and where it does not. Geogr Anal. 2018;50(2):125–40. https://doi.org/10.1111/gean.12149.

    Article  Google Scholar 

  36. Hargittai E. Potential biases in big data: omitted voices on social media. Soc Sci Comput Rev. 2020;38(1):10–24. https://doi.org/10.1177/0894439318788322.

    Article  Google Scholar 

  37. Chuang HM, Chang CH, Kao TY, Cheng CT, Huang YY, Cheong KP. Enabling maps/location searches on mobile devices: constructing a POI database via focused crawling and information extraction. Int J Geogr Inf Sci. 2016;30(7):1405–25. https://doi.org/10.1080/13658816.2015.1133820.

    Article  Google Scholar 

  38. Yang W, Ai T. Poi information enhancement using crowdsourcing vehicle trace data and social media data: a case study of gas station. ISPRS Int J Geoinf. 2018;7(5). https://doi.org/10.3390/ijgi7050178

  39. Asher J, Resnick D, Brite J, Brackbill R, Cone J. An introduction to probabilistic record linkage with a focus on linkage processing for WTC registries. Int J Environ Res Public Health. 2020;17(18):1–16. https://doi.org/10.3390/ijerph17186937.

    Article  Google Scholar 

  40. Kelly B, Flood VM, Yeatman H. Measuring local food environments: an overview of available methods and measures. Health Place. 2011;17(6):1284–93. https://doi.org/10.1016/j.healthplace.2011.08.014.

    Article  PubMed  Google Scholar 

  41. Sun K, Hu Y, Ma Y, Zhou RZ, Zhu Y. Conflating point of interest (POI) data: a systematic review of matching methods. Comput Environ Urban Syst. 2023;103: 101977. https://doi.org/10.1016/j.compenvurbsys.2023.101977.

    Article  Google Scholar 

  42. Levenshtein VI. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady. 1965;10(8):707–10.

    ADS  MathSciNet  Google Scholar 

  43. Philips L. The double metaphone search algorithm. Cc Plus Plus Users J. 2000;18(6):38–43.

    Google Scholar 

Download references

Acknowledgements

This research was funded by the NIH National Cancer Institute grant R01 CA228147, and by National Science Foundation grant 1561112.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanjia Cao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 31 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Yang, JA., Nara, A. et al. Designing and Evaluating a Hierarchical Framework for Matching Food Outlets across Multi-sourced Geospatial Datasets: a Case Study of San Diego County. J Urban Health 101, 155–169 (2024). https://doi.org/10.1007/s11524-023-00817-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11524-023-00817-9

Keywords

Navigation