1 Introduction

The food retail industry has reached a stage where energy-dense, nutrient-poor foods are ubiquitous. The pervasiveness of unhealthy foods, which are allegedly linked to rising obesity and obesity-related comorbidities, has affected many communities in developed countries, particularly the United States (Tillotson, 2004). A key driver in this obesogenic context is the lack of a supportive community food environment, which is the immediate neighborhood where food sources are provided (Jia, 2021). Therefore, the community food environment is vital to maintaining healthy eating and the improvement of diet-related chronic disease situations. To further explore the health effects of the community food environment, a growing body of literature has attempted to establish the associations between measures of food access within pre-defined analysis units (e.g., census tracts, ZIP codes) and socioeconomic status (SES) characterizing those units (Yang et al., 2021; Jia et al., 2021; Xin et al., 2021; Zhou et al., 2021; Li et al., 2021).

Noteworthy is that the statistical correlations between food accessibility metrics and residential SES variables are not consistent. For example, it was found that fast-food restaurants were more prevalent in economically deprived areas (Wang et al., 2019); however, this correlation was not observed in all studies (Jia et al., 2020), implying that this relationship is not geographically homogeneous. Meanwhile, the association between food availability and diet is far from conclusive (Mei et al., 2021). To date, few studies have attempted to scrutinize why such contradictions exist (Shannon, Reese, Ghosh, Widener, & Block, 2021). While individual-based behavioral and economic factors can influence food acquisition and subsequently dietary outcomes, a commonly overlooked factor is that unique or limited geospatial models are often used, which fail to consider the uncertainties of study scales and analysis units in defining the community food environment and perhaps the resulting scale-specific findings (Shannon et al., 2021). This issue is known as the modifiable areal unit problem (MAUP) (Fotheringham & Wong, 1991). The MAUP refers to the uncertainties in the geographic support and spatial scale that are used to conduct spatial analysis, leading to serious statistical bias (Fotheringham & Wong, 1991). This bias is caused by two uncertain spatial attributes in the formation of metrics, including the study scale (i.e., the extent of the study area) and the analysis unit (i.e., the smallest unit by which measurements are aggregated). The change of either attribute will affect the consistency of the spatial pattern. For example, it was revealed that the significance level of the correlations between food accessibility and obesity has changed when the analysis unit is changed (Fan et al., 2014; Jia, Xue, Cheng, & Wang, 2019), which is a typical MAUP issue.

In this paper, we have conducted a systematic review of existing community food literature that has evaluated the MAUP in terms of using different study scales or analysis units. By pivoting on the MAUP and its related scale issues, we urge that future community food environment and urban informatics research should appropriately define the study scale, choose the analysis unit, and justify the geospatial model to better address health disparity-oriented questions.

2 Modeling community food environment

Studies on the community food environment typically take two different approaches: the place-based approach focusing on food outlets (e.g., modeling store distribution) and the people-based approach focusing on food consumers (e.g., modeling food foraging activities) (Coveney & O’Dwyer, 2009). Traditional place-based approaches to studying the community food environment have employed in situ assessments (e.g., market basket analysis) with foci on food availability, price, variety, and quality (McKinnon, Reedy, Morrissette, Lytle, & Yaroch, 2009). The rise of geospatial technologies, particularly geographic information systems (GIS), has enabled the collection, mapping, and analysis of food locations at an expanded geographical scale (Jia et al., 2017; Jia et al., 2019). Most GIS-based spatial measures lean towards health disparities, discussing the spatial relationships between food outlets and community health. Because of the lack of consensus in the precise delineation of communities (Jia et al., 2019) an administrative unit (e.g., county, census tract) is often utilized as the analysis unit. In retrospect to the literature, there are generally six geospatial models for defining community food accessibility (Table 1).

Table 1 Geospatial models of the community food environment based on the spatial relationship between food outlets (i.e., triangles) and analysis units (i.e., shaded polygons). The table is extended from Chen (2017)

In Table 1, the first three measures can be regarded as container-based measures. They are the most widely adopted measures because of the simplicity for GIS implementation. Their popularity can be attributed to the limitations of the buffer-based measures: (1) there is a lack of agreement regarding the appropriate buffer threshold in different community food environments (Charreire et al., 2010); (2) the buffer-based measures cannot be analyzed in conjunction with area-based SES or health outcome variables in any straightforward manner and may require further down- or up-scaling. A third geospatial model is gravity-based measures, considering that food accessibility is affected by not only the supply (e.g., food outlets) but also the demand (e.g., food consumers), and is further moderated by the distance decay between the supply and demand. The recent development of gravity-based measures pertains to the two-step floating catchment area (2SFCA) method, which uses a two-step search procedure to identify the food supply-demand ratio within an analysis unit (Chen, 2019; Dai & Wang, 2011).

While these geospatial models are not standardized for community assessment across world countries, federal agencies in the United States have widely adopted container-based measures. The three primary sources of measurement are the United States Department of Agriculture (USDA) Economic Research Service (ERS) Food Access Research Atlas (Rhone et al., 2017), the USDA ERS Food Environment Atlas (USDA, 2019), and the Centers for Disease Control and Prevention (CDC) Modified Retail Food Environment Index (mRFEI) (CDC, 2014), as shown in Table 2.

Table 2 Comparison of three food accessibility metrics developed by USDA and CDC

Comparative studies identified that about 71% ~ 76.5% of the “food deserts” census tracts (i.e., areas without affordable, healthy food provisioning) were categorized consistently between the USDA Food Access Research Atlas measure and the CDC mRFEI measure (Liese, Hibbert, Ma, Bell, & Battersby, 2014; Santorelli & Okeke, 2017). The incongruity in the designation of “food deserts” is likely caused by discrepancies in data sources and geospatial models. We further argue that this lack of consistency could also be induced by the MAUP.

While the MAUP was an issue well defined in the early 1990s (Fotheringham & Wong, 1991), literature about the MAUP extends back as far as the 1930s (Gehlke & Biehl, 1934). More recently, the MAUP has been discussed in modeling healthcare access (Apparicio et al., 2017; Bryant Jr. & Delamater, 2019), but it has yet to be well recognized in community food research. Many existing container-based food accessibility measures using one analysis unit at a single study scale ignore the impact of the MAUP. As a result, the MAUP can lead to biased statistical relationships—when attempting to capture a mix of intervening factors that affect community food access, the correlations with SES variables would likely be inconsistent at different study scales or using different analysis units (Fleischhacker et al., 2011).

3 Methods of searching and filtering literature

To further examine past literature about the MAUP in community food environmental research, a systematic review has been conducted by following the reporting standard of the Preferred Reporting Items for Systematic Reviews (Moher, Liberati, Tetzlaff, & Altman, 2010). The literature selection criteria are: (1) focusing on community food, while those with the food environment being one of the built environmental variables were excluded; (2) employing at least one of the geospatial models in Table 1; (3) evaluating food accessibility across different study scales, used different analysis units, or employed more than one geospatial models; (4) being peer-reviewed original research; (5) being published in or before 2020; and (6) being published in English.

A keyword search was performed on PubMed and Scopus. All possible combinations of two groups of keywords relating to community food and the MAUP (see details in the Additional file 1) were employed in the title or abstract search. All articles in the preliminary search results were compiled, whereas duplicates were removed. The remaining abstracts were screened against the literature selection criteria, and the full texts of relevant articles were further scrutinized. In addition, a snowball method based on the reference lists of the identified articles was also adopted to enrich the literature. The search and filtering process eventually identified 19 studies as relevant to the topical area. The flowchart of the literature search and selection is shown in Fig. 1. The list of included studies is given in Table 3.

Fig. 1
figure 1

Flowchart of literature search and identification

Table 3 List of community food environmental research relating to the MAUP

4 Results

4.1 Study characteristics

We systematically reviewed the 19 studies with respect to the study scale, food store category, food data source, food data year, analysis unit, geospatial model, and model comparison (Table 1). The publication year ranged from 1997⁠–2018. Although keywords about the study area were not included in the literature search, all of the studies took place in developed countries, including the United States (n = 15), Canada (n = 3), and Australia (n = 1). There were diverse categories of food stores, with the majority of them being mixed types (n = 11), followed by unhealthy stores (n = 4) and healthy stores (n = 4). Most of the studies (n = 12) did not specify the criterion for categorizing the store’s health positioning. Among those with a clear definition, the following criteria were used: North American Industry Classification System (NAICS) (n = 2), Standard Industrial Classification (SIC) (n = 2), Nutrition Environment Measures Survey in Stores (NEMS-S) or Nutrition Environment Measures Study in restaurants (NEMS-R) (n = 2), Retail Food Environment Index (RFEI) (n = 1), and the definition in past literature (n = 1). One study employed multiple criteria to cross-validate store types (Minaker et al., 2014). There were a variety of food data sources. Commercial databases or services, such as InfoUSA (n = 3) and Dun & Bradstreet (n = 2), were the commonly used data sources.

Methodologically, most metrics were defined within an analysis unit (e.g., census tract, ZIP code) with aggregate spatial attributes of food stores (n = 10). Evaluations were also performed around locations of the school/workplace/household (n = 5) or the food store (n = 2). A common approach was to create a buffer distance around the food store location and then aggregate the buffer areas within an administrative unit, such as block group (Jiao, 2016), census tract (Larsen & Gilliland, 2008), the dissemination area (Luan, Minaker, & Law, 2016), or the local government area (Murphy, Koohsari, Badland, & Giles-Corti, 2017). It is noteworthy that two studies took a hybrid approach to food environment assessment: Fan et al. (2014) evaluated food accessibility based on block groups, census tracts, ZIP code zones, as well as a 1-km circular buffer of household addresses; Jia et al. (2019) combined two distance measures (i.e., 800-m circular buffer and 800-m network buffer) with two analysis units (i.e., school location and ZIP code zone), generating four food accessibility metrics for the entire United States.

4.2 Studies with different analysis units or scale variables

The effect of the MAUP is introduced by the change of the study scale or the analysis unit. Studies included in this review were conducted at various study scales, including nationwide (n = 1), state or province (n = 5), county (n = 4), city or municipality (n = 6), multi-unit region (n = 2), and a single zip-code zone (n = 1). However, none of them covered more than one study scale. All studies changed either the analysis unit (n = 3), a scale-related variable in the geospatial model (e.g., buffer distance; n = 15), or both (n = 1).

Correlations with SES variables, obesity-related health outcomes, or established food accessibility metrics (e.g., USDA Food Access Research Atlas) were employed to analyze the consistency among different food accessibility metrics. Three studies compared the correlation results using different analysis units. We found that (1) using ZIP code zone had the least degree of correlations with obesity-related health indicators, compared to that using an 800-m circular buffer of schools (Jia et al., 2019), census tracts (Fan et al., 2014), or a 1-km circular buffer of household locations (Fan et al., 2014); and (2) directions of the association between SES and food accessibility were different between two types of analysis units: census tract and block group (Barnes et al., 2016).

Many studies changed scale-related variables (e.g., buffer distance) in the geospatial model as a way to test the validity of the correlation with health indicators. The findings were relatively mixed: (1) studies found that there was no significant correlation between food accessibility and obesity-related health outcomes by changing the buffer distance in the geospatial model (Baek, Sanchez-Vaznaugh, & Sánchez, 2016; Luan et al., 2016; Murphy et al., 2017). However, the association became significant when the correlation analysis was applied to selected areas only, such as the central city (Baek et al., 2016) or low SES areas (Murphy et al., 2017); (2) using a circular buffer of households was more likely to reveal correlations between food accessibility and obesity, compared to that using the network buffer (Jia et al., 2019) or container-based measures (Fan et al., 2014); (3) the buffer distance should be limited to a certain size, as no correlation between food accessibility and food consumption was found beyond a 0.5-km buffer of household locations (Ollberding et al., 2012).

5 Discussion

5.1 Key findings of this review

The paper identifies that all MAUP-related community food environmental studies were focused on developed countries, particularly the United States. The majority of studies employed a “container” model by aggregating attributes of food stores within an analysis unit (e.g., census tract, ZIP code); and a “buffer” model with different buffer distances was also employed. While there has been no definite criterion to choose the best analysis unit or buffer distance, we identify these key findings: (1) ZIP code is not recommended as an appropriate analysis unit for modeling food accessibility, as it did not have significant correlations with health indicators (Fan et al., 2014; Jia et al., 2019); (2) using a circular buffer of less than 0.5 km around household locations was most likely to reveal health correlations, compared to network buffers or container-based measures (Fan et al., 2014; Jia et al., 2019); and using the 0.5-mile gravity-based measure had a better consistency with the USDA low-access measure (Chen, 2017); (3) to reveal health effects of the community food environment, it is recommended to focus in selected regions or partitions of a study area with similar SES, such as the central city or low SES areas (Baek et al., 2016; Murphy et al., 2017); (4) while it is impossible to completely remedy the MAUP, we suggest that any community food environmental study utilizing a single statistical unit or a distance measure should discuss the existence of the MAUP, such as evaluating the sensitivity of the model to the change of the unit or the distance measure.

5.2 MAUP-related issues

As an extension of the review, three MAUP-related issues in community food environmental research are discussed. The first issue is the edge effect (Chen, 2017; Van Meter et al., 2010), also known as the boundary effect (Bharti, Xia, Bjornstad, & Grenfell, 2008; Griffith, 1983). The edge effect arises as any spatial assessment based on analysis units will affect data quality, especially for units with small counts (Elliott & Wartenberg, 2004). Especially, using container-based measures ignores the fact that food items can be procured beyond the given boundary (Sadler, Gilliland, & Arku, 2011). The effect can be significant for the fast-food industry: fast-food restaurants are strategically located near urban arterial roads to minimize access barriers and cater to drive-through customers (Hurvitz, Moudon, Rehm, Streichert, & Drewnowski, 2009). Most often, these arterial roads are the divides of administrative units, such as census tracts. Thus, although many census tracts do not have fast-food restaurants, fast food could still be procured from adjacent tracts. The edge effect further affects the quality of data on the food stamp redemptions, as a credited SNAP store may be cross-listed under two zip codes ((Chen, 2019; Major, Delmelle, & Delmelle, 2018; Shannon, 2014). While there is no simple solution to address the issue (Caspi et al., 2012; Charreire et al., 2010), a solution to moderate the edge effect is the 2SFCA method (Chen, 2019; Dai & Wang, 2011).

A second related issue is the modifiable temporal unit problem (MTUP) (Cheng & Adepeju, 2014). Analogous to the MAUP, the uncertainty of the temporal scale exists in the aggregation and the segmentation of the food data over an extended period of investigation. Thus, using retail datasets derived from different years or aggregating them into different time periods will very likely produce inconsistent statistical results, as the observations become different in data collection or data aggregation. As pointed out in a review article (Fleischhacker et al., 2011), out of 18 studies that estimated correlations between SES variables and community food access, 11 exhibited a minimum difference of 3 years between the time that food environment data was gathered and the time that SES data was collected. The MTUP can only be addressed when the spatiotemporal scales and granularities of both the foodscape and SES variables are precisely defined.

The last discussion relevant to the MAUP is the uncertain geographic context problem (UGCoP) (Kwan, 2012, 2018) and the Selective Daily Mobility Bias (SDMB) (Plue, Jewett, & Widener, 2020). Different from the place-based food environmental measures, these two issues are focused on people-based food activity measures by considering how people’s environmental exposure and daily mobility patterns dictate their health behaviors and health outcomes. The UGCoP arises as food procurement is largely influenced by contextual attributes (e.g., food culture), as well as the spatial scope and temporal duration these attributes have been in effect shaping individual health behaviors, including food foraging behaviors (Chen & Kwan, 2015). The SDMB refers to the confounding effect that both environmental exposure and individual preferences could shape people’s daily mobility patterns and health outcomes. Thus, using place-based food environmental measures does not suffice to represent individual food activities and dietary behaviors (Glanz et al., 2005). Another corroborating evidence is that only 14.4% of food shoppers patronized stores in their residential census tracts (Giskes, Van Lenthe, Brug, Mackenbach, & Turrell, 2007). To advance community food research, recent food access studies have largely shifted the focus to the individual level, attempting to elaborate on the space-time dynamics of how individual travelers procure food daily. These assessments of individual food exposure made use of location-aware geospatial technologies, such as travel diaries (Bono & Finn, 2017; Ravensbergen, Buliung, Wilson, & Faulkner, 2016; Shannon & Christian, 2017), Global Positioning System (GPS) enabled devices (Chaix et al., 2013; Christian, 2012; Shearer et al., 2015; Wang & Kwan, 2018), and social media data (Chen, Zhao, & Yang, 2022). These studies suggest that exploring food access disparity from an individual perspective (e.g., financial constraints, individual mobility, food preference, and nutrition education) and exploring their daily mobility patterns (Kestens, Lebel, Daniel, Thériault, & Pampalon, 2010; Shannon, 2016) are of equal importance as place-based food environmental modeling. To this end, substantiating UGCoP and SDMB in these people-based food activity measures and examining the existence of the MAUP in these measures are worthy of future research.

5.3 Limitations

This review paper is subject to limitations. First, the primary focus of the review is urban food environments, and rural food environments are less covered in the paper. Because of the dispersed distribution of food stores and relatively large administrative units in rural areas, rural food access takes a largely different form (Bono & Finn, 2017). Geospatial modeling of rural food environments is less likely to be affected by the MAUP, as accessibility measures are less sensitive to the change of distance variables in rural areas (Chen & Jia, 2019). Second, because all reviewed studies target cases in developed countries, the conclusions cannot be generalized for studying community food environments in other world regions. For example, urban food access in East Asian countries (e.g., China, South Korea, and Japan) are reliant on mixed transportation modes, including private automobiles, public transit, and walking, and thus the distance threshold characterizing “low access” would be rather complex (Zhang et al., 2019). This void in research calls for developing new and robust food accessibility indices (e.g., those incorporating modal split) that can be adapted for modeling community food environments in developing and under-developed countries.

6 Conclusions

This systematic review summarizes geospatial models and existing literature in the community food environmental research relating to the MAUP. In addition to identifying the problem, the article also provides actionable strategies to improve the scientific rigor in future research. These strategies include using a small distance threshold in the geospatial model (Ollberding et al., 2012), targeting a geographically homogenous study area (Baek et al., 2016), selecting subgroups stratified by SES variables (Murphy et al., 2017), and testing the sensitivity of the model to the change in the statistical unit or the distance measure. By highlighting the MAUP, this paper could have policy implications—given that modeling food accessibility provides support for policy intervention and planning initiatives, using different metrics may lead to different interpretations of health disparities and could thus misinform policy decisions. Therefore, any assessment of the spatial patterns in the community food environment that may potentially lead to a policy change should consider the effects of the MAUP.