Introduction

Obesity is a form of malnutrition and a major public health problem that has drastically increased worldwide in the past three decades and affects around 650 million adults globally (WHO, 2021). Reducing obesity prevalence has become one of the main public health initiatives, and it is implicitly found among several Sustainable Development Goals including 2.2 End all forms of malnutrition, 3.4 Reduce premature mortality from non-communicable disease and 3.8 Achieve universal health coverage (WHO, 2021).

Mexico has one of the highest rates of obesity in both children and adults worldwide (WHO, 2022). Literature from Mexico suggests that obesity is associated with multiple factors including age, gender, socio-economic status, physical activity, and diet (Neri-Sánchez et al., 2019; Quezada and Lozada-Tequeanes 2015; Hernández-Cordero et al., 2017; Kolovos et al., 2021). Also, survey estimates at the regional level indicate that obesity is not evenly distributed across the Mexican territory and thus has a spatial pattern. However, Mexico is a large and diverse country and, as with other phenomena such as poverty, inequality and morbidity, aggregate geographical estimates are insufficient and uninformative to grasp more comprehensibly the distribution of obesity. The study of obesity across smaller geographic areas can help to understand where the burden of obesity is highest and then to target policy efforts to decrease it.

Estimates of obesity in Mexico are produced from the national Health and Nutrition Survey (ENSANUT for its Spanish acronym), which is the largest and only public data source for anthropometric and dietary data for the Mexican population. The ENSANUT 2021, like most health and nutrition surveys, has limited geographical coverage as it is only representative at national, regional, and rural/urban levels. Other versions like the ENSANUT 2018 are representative for the 32 states but it lacks public information on the municipal identification codes, which limits modelling possibilities. Hence, there is a need for reliable data at a smaller scale to gain an understanding of the geographical distribution of obesity in Mexico so that policy makers and researchers have access to more detailed data.

Small Area Estimation (SAE) is a field in social statistics that comprises a series of methods to model available data in order to produce figures for smaller areas such as countries, municipalities, census tracts and postcodes (Rao & Molina, 2015). In public health research, SAE has been increasingly used to model existing survey data to produce prevalence rates for obesity and other health markers (Cadwell et al., 2010; Ouma et al., 2021). In the case of obesity studies, the majority have been conducted in the USA to estimate prevalence in counties or census tracts (Tabano et al., 2017; Zhao et al., 2021; Mills et al., 2020; Zhang et al., 2011; D’Agostino-McGowan et al., 2013; Zgodic et al., 2021; Kramer et al., 2016; Koh et al., 2018; Li et al., 2009). Only two studies were found to have used SAE to calculate obesity prevalence at a smaller geographical scale in Europe (UK and Switzerland) (Amies-Cull et al., 2022; Panczak et al., 2016). These studies tend to rely on variations of two of the most widely used and reliable approaches in SAE: empirical Bayesian prediction (EBLUP) and hierarchical Bayesian (HB) estimators (Rao & Molina, 2015). In public health, initiatives like the Global Burden Disease (GBD), led by the Institute for Health Metrics and Evaluation (IHME), mainly rely on methods that are employed in SAE (i.e. special cases hierarchical modelling) (Murray et al., 2020). However, the target is primarily to correct the survey-design estimators (i.e., representative rates obtained from the surveys) to make them comparable across units and time points. Unlike SAE, the GBD rarely employs Census microdata to produce obesity estimations for units that are not the target of the survey or are simply not in the sample.

The use of SAE to study obesity in low-middle-income countries seems to be restricted to a single study in Rio de Janeiro (Cataife, 2014). In Mexico, although there is an emergent interest on the spatial analysis of the of key public health indicators at municipal-level (Baptista, 2024), the use of SAE is not as comprehensive as in other countries and has relied on methods that are more error prone (like the World Bank method) than the estimators developed in recent years such as EBLUP or HB (Guadarrama et al., 2016). The most notable and systematic exercise has been conducted since 2010 by the National Council for Social Development Policy (CONEVAL) to estimate poverty at municipal and census tract level. These estimates are drawn from different of common regression models with no random effects (i.e. synthetic models, which result in high systematic and random error, and from Empirical Bayesian estimators. Health-related SAE is even more rare in Mexico: 2010 Census data and 2012 survey data were used with HB to model stunting at municipal level (Nájera & Catalán, 2019); and in 2020 the National Institute of Statistics and Geography (INEGI) used area-level models with random effects for estimating rates of obesity, hypertension, and diabetes by combining the Health and Nutrition Survey (ENSANUT 2018) and the National Population Census 2020. Although these methods are more robust than the synthetic estimators, they do not fully exploit the hierarchical structure of the data, in that they mainly rely on the information available at the domain of analysis (e.g. municipal level). Finally, the GBD also has estimates for the 32 Mexican states but no estimates for smaller areas (Murray et al., 2020).

The aim of this paper is to implement some of the most recent advancements in SAE and Bayesian computation to produce reliable estimates of obesity at the municipal level in Mexico in 2020. This approach better accommodates the hierarchical structure of the Mexican data, is computationally feasible and leads to less estimation error (see the methods section).

Data and methods

Data and ethics

This paper uses two secondary data sources: microdata from the Mexican National Health and Nutrition Survey 2021 (ENSANUT 2021) (National Institute of Public Health, INSP) and microdata from the Mexican National Population Census 2020 (National Institute of Statistics and Geography, INEGI) (INEGI, 2020a; INSP, 2021). As per national and institutional regulations, all the data collection and publication protocols of both INSP and INEGI should be approved by their own ethics committee. Both the ENSANUT and Census respondents are informed about the aims of the data collection process and must give their verbal consent before beginning the interview. Personal details are protected in accordance with the national legislation. This paper had no contact or personal interactions with any of the participants.

ENSANUT (2021) was used to obtain anthropometric measures; it is a cross-sectional, probabilistic, multi-stage, stratified survey that is representative at national and regional level (INSP, 2021; Romero-Martínez et al., 2021).

A sample of 9,287 adults aged \(\ge 20\) was considered, from which weight, heigh and waist circumference were recorded using an international protocol described elsewhere INSP-INEGI, 2019). Based on these measurements, Body Mass Index (BMI) was calculated and categorized into four categories proposed by the World Health Organization: underweight (\(<18.5\) \(kg/{m}^{2}\)), normal BMI (18.5–24.9 \(kg/{m}^{2}\)), overweight (25.0–29.9 \(kg/{m}^{2}\)) and obesity (\(\ge 30\) \(kg/{m}^{2}\)). On the basis of previous studies in Mexico (Barquera et al., 2020), this study estimates BMI by only considering those adults with heights between 1.3 and 2.0 m and BMI estimates between 10 and 58 \(kg/{m}^{2}\), those outside this range (n = 262) and pregnant women (n = 77) were excluded. Once BMI was calculated, only those adults who had a BMI of \(\ge 30\) \(kg/{m}^{2}\) were included for subsequent estimations.

The census data were derived from a representative sample from the National Population Census 2020 and constituted a sample of 9,388,430 representative, at municipal level, of the 82,781,506 adults in the country in 2020 according to the Census. All the necessary variables for the model (see below) were harmonised across both sources, i.e. the same categories were constructed for the predictors in the model.

Methods

Of several approaches to the production of small-area estimates of social and public health indicators (Rao & Molina, 2015), the indirect or model-based estimation has become more popular in recent years owing to its accuracy and its capacity to use the increased availability of auxiliary data such as census, private or administrative data. The Empirical Bayes Linear and Non-Linear Unbiased Predictor (also known as EBLUP for linear models and more generally known as EB) and hierarchical Bayesian (HB) modelling are two of the most popular model-based alternatives. These two alternatives are equivalent and lead to very similar results under specific circumstances, i.e. under uninformative priors that assign equal probability to all values of the parameter of interest like a regression slope (Guadarrama et al., 2016). However, HB is a more general (offers the possibility of working with a number of distributions), flexible (parameters and spatial effects), and powerful (unlike maximum likelihood, it is computationally feasible for very complex models) estimator than EB thanks to recent breakthroughs in Bayesian computation such as the Non-U-Turn Sampler (NUTS) that relies on the Hamiltonian Monte Carlo (Hoffman & Gelman, 2014). This study used HB because it is better equipped to deal with large samples and high dimensional models (i.e., large data sets with many parameters) (Betancourt & Girolami, 2015). The ENSANUT 2021 has a relatively small sample size and Maximum Likelihood could be used to fit the models. However, estimation errors could be significantly reduced by fitting more complex models (random intercepts and slopes), albeit with an increase in computational challenges. Furthermore, the Bayesian workflow mirrors well the different aspects involved in small-area estimation in that it provides more clarity in model calibration, specification, and evaluation (Gelman et al., 2020).

This paper relies on an HB estimator in combination with the NUTS to produce municipal-level estimates of obesity. Drawing upon Rao and Molina (2015) the procedure consisted of the following four main steps. First, a common set of predictors (see below) of obesity was harmonised in both the survey and census data. Auxiliary data on mortality due to cardiovascular disease at both municipal and state levels was also considered along with aggregated data on salaries from the population census for both levels. Second, several hierarchical Bayesian models with Bernoulli distribution were fitted to the ENSANUT 2021 data (see final model below).

The third step was the selection of the best predictive model among the possible candidates. The chosen model fulfilled three characteristics. 1) The model convergence was satisfactory under Bayesian criteria: effective sample sizes for all posterior distributions (\(>300\) bulk and tail samples) and acceptable R-hat values, \(Rhat<1.01\), for all the relevant parameters (Vehtari et al., 2021). These statistics mean that there is a sufficient number of samples to make robust inferences of each parameter and that the composition of the samples is unlikely to lead to biases. 2) High predictive accuracy of the regional design estimate of obesity, i.e. all the survey- (design-) based rates at the regional level had to be reproduced by the model. 3) High cross-validation results, i.e., very high degree of in- and out-sample prediction (Vehtari et al., 2017). The chosen model predicted accurately 100% of all cases in the sample.

The final, three-level model had random intercepts (states and municipalities) and slopes at state-level (A diagram of the model can be found in the supplementary material):

$$\left\{\begin{array}{l}{y}_{ijk}|{p}_{ijk}\stackrel{ind}{\sim }Bernoulli\left({p}_{ijk}\right),\\ logit\left({p}_{ijk}\right)={\mathbf{x}}_{ijk}^{T}\beta +{\mathbf{z}}_{jk}^{T}\eta +{\mathbf{m}}_{k}^{T}\mu +{v}_{j}+{v}_{k},\hspace{0.17em}{\sigma }_{e}^{2},\hspace{0.17em}\\ j=1,...,n,i=1,...,m,k=1,...,o\\ {v}_{j}|{\sigma }_{v}^{2}\stackrel{ind}{\sim }\mathcal{N}\left(0,{\sigma }_{v}^{2}\right),\hspace{0.17em} j=1,..,n\\ {v}_{k}|{\sigma }_{u}^{2}\stackrel{ind}{\sim }\mathcal{N}\left(0,{\sigma }_{u}^{2}\right),\hspace{0.17em} k=1,..,o\end{array}\right.$$

where \({y}_{ijk}\) is the obesity status predicted by the \(\beta\) parameters of the level-1 variables (gender, age, educational attainment, rural/urban, adults’ average education attainment in the household, computer in the household, fridge in the household, internet in the household, household size, indigenous condition), by the \(\eta\) parameters of the level-2 variable (average salary in the municipality), and the \(\mu\) parameters of the level-3 variable. The other municipal-level variables (mortality for cardiovascular diseases) did not have effects different from zero and according to cross-validation did not improve the prediction from the model. \({v}_{j}\) are the random intercepts at municipal level and \({v}_{k}\) at state level. The random slopes at state level \(k\) were age, sex, and educational attainment.

All the models were fitted using the following priors: \(Normal\sim \left(0,10\right)\) for the parameters of the predictors, \(Normal\sim \left(0,5\right)\) for the random effects (state and municipalities). These priors are weak in the sense that the effect of binary variables very rarely is that high/low, i.e. it is very unlikely to see that the effect of gender/ethnicity is above two in the log scale. The chose prior concedes small probabilities to effects that are way above that like up to 15. Similarly, the variance of random intercepts and slopes is often rather small for models with Bernoulli distributions where most of the variance is accounted for by the individual-level factors.

The fourth step applied the HB estimator to the Census data. This procedure consisted in the following steps with some variations for municipalities in the survey sample and out-of-sample units. The posterior distributions from the survey model were extracted (slopes, intercepts, random intercepts and slopes) and then applied to the Census data variables for municipalities in the sample. For out-of-sample municipalities, application of the random intercepts at the municipal level was not possible. This procedure generates a posterior distribution of the probability of obesity for each person aged 20 + in the sample of the Census. In the final step, following Rao and Molina (2015), the individual posteriors were aggregated to compute a posterior distribution (e.g. prevalence of obesity) for each municipality. The prevalence, the standard error and the coefficient of variation of the estimation for each municipality can be found in the supplementary material.

The hierarchical Bayesian models were estimated in Rstan () (Carpenter et al. 2017) with the following auxiliary packages: survey (Lumley, 2020), brms (Bürkner, 2017), ggplot2 (Wickham, 2016), cmdstanr (Gabry & Češnovar, 2020), and tidybayes (Kay, 2022). The Moran’s I was computed using the R-package rgeoda (Anselin & Rey, 2022).

The estimated prevalence for all municipalities was used to formally assess spatial patterns via global and local Moran’s I (Anselin, 1995). The global Moran’s I judges whether there is a non-random pattern in the distribution of the phenomenon of interest, and the local Moran explores whether there are statistically identifiable clusters of municipalities with high/low prevalence; both used queen spatial weights. These weights take into consideration the neighbours of each municipality together with their common vertices, which is useful for the smaller municipalities that look more like a regular polygon. The results of the Moran’s I were mapped out to display the areas in Mexico with a high spatial correlation of obesity.

Results

Model estimates validation

The leave-one-out cross validation (loo-cv) statistics of the hierarchical model (Table 1) indicate that prediction of out-of-sample values is likely to be accurate. Hence, it is reasonable to apply the posterior distributions of the parameters to the Census data.

Table 1 Cross validation results (loo-cv). Three-level hierarchical Bayesian model

For each of the nine regions in the ENSANUT 2021, the obesity prevalence predicted by HB was an acceptable approximation to the design (survey) estimate, with a mean squared error of 0.31% (Fig. 1).

Fig. 1
figure 1

Comparison of the HB Estimator with the survey estimator for regional-level prevalence of obesity. Confidence intervals for the design-based estimator (survey)

Model correction of obesity estimates

At state level, the ENSANUT 2021 should have a substantial degree of error as was not designed to produce representative estimates at this level. The survey estimates seem to severely under- or over-estimate obesity prevalence across all 32 Mexican states, and the HB appears to offer a useful correction (Fig. 2).

Fig. 2
figure 2

Comparison of the HB Estimator with the survey estimator for state-level prevalence of obesity

Geographical distribution of obesity at municipal level

At municipal level, obesity tends to be more prevalent in the north of Mexico than on either the southwest coast or the Yucatán peninsula in the southeast (Fig. 3). Prevalence was highest of all in the municipalities of the north-west, although three of the main city capitals in the northwest (Hermosillo, Monterrey and Chihuahua) had lower prevalence rates relative to other municipalities in the north.

Fig. 3
figure 3

Rates of obesity at municipal level, Mexico 2020. Hierarchical Bayesian estimator

There is a concentration of municipalities with the lowest rates in the south border of Mexico, particularly in the state of Chiapas. This area is mainly rural, which accords with the perception that obesity seems to be a predominantly urban phenomenon. On the Yucatán peninsula (southeast), two of the municipalities with the fastest urban growth in Mexico (Cancún and Playa del Carmen) show very high obesity rates.

In the largest metropolitan area in Mexico (Mexico City) the less developed urban municipalities in the south and east tend to show higher obesity rates. In contrast, the more affluent municipalities (Benito Juárez, Coyoacán) have lower rates. Hence, within an urban area there may be an uneven distribution of obesity.

Figure 4 provides a formal assessment of the obesity spatial concentration patterns by means of Local Moran’s I computation. The red areas are clusters of municipalities with high prevalence rates of obesity. The blue areas, in contrast, are areas of low prevalence of obesity. Again, the municipalities in the north are characterised by high prevalence but surround two areas with low prevalence. Central west Mexico shows no concentration of either high or low obesity prevalence. Only in some municipalities in Jalisco is it possible to identify a cluster of low obesity rates. Eastern Mexico (La Huasteca) has a high concentration of municipalities with low obesity rates. The municipalities in Chiapas near the southern border have low obesity rates. This is in stark contrast to the cluster of municipalities on the Yucatán Peninsula. However, the metropolitan area of Mérida shows no clustering.

Fig. 4
figure 4

Local Morans’ I of Obesity. Mexican municipalities 2020

Discussion

This use of a hierarchical Bayesian estimator to produce municipal-level estimates of obesity 2020 in Mexico has shown a clear spatial distribution pattern of high and low obesity prevalence. The concentration of obesity in northern municipalities is consistent with previous regional studies, where the northern region showed higher prevalence of obesity than the southern regions (Barquera et al., 2009, 2020; Barquera et al., 2013). However, the more detailed estimates presented in this study show that this overall pattern is more complex, and that some urban municipalities in the north, in Mexico City and in the southeast have a lower prevalence of obesity. Further exploration of factors lowering obesity in these obesogenic regions might be needed, including access to healthier foods and to facilities for physical activity, greater health awareness and cultural norms with regard to body weight.

Socioeconomic status can influence the incidence of obesity (Clément et al., 2021; Hernández-Cordero et al., 2017; Quezada & LozadaTequeanes, 2015). The present study has shown a higher prevalence of obesity across most of the more affluent municipalities mainly in the north of Mexico, with the exception of some state capitals. However, the results also show that historically poor municipalities on the Yucatán Peninsula in the southeast — characterized by recent and rapid development of large-scale tourism — are an example of the way recent investment and urban development have drastically changed the local contexts. Mexico is experiencing a nutrition transition and is at the stage where obesity is still more prevalent where socioeconomic status is high but concurrently has started to affect some lower-status municipalities (Pérez-Ferrer et al., 2018). Hence, the spatial association of undernutrition, for example, and obesity is likely to become less general and clear over time in Mexico where municipalities might experience a mixture of obesity and undernutrition.

Lack of physical activity is another factor that has been associated with obesity in the recent literature (Kolovos et al., 2021). Those living in urban areas and with the highest socioeconomic status are more likely to fail to meet physical activity recommendations (Medina et al., 2021). No regional differences have been found across Mexico in the prevalence of physical inactivity (Medina et al., 2013). Spatial analyses could help to explore more in-depth the geographic distribution of physical inactivity in Mexico and may determine whether this is associated with the spatial distribution of obesity found in the present study.

These findings raise questions about the extent to which context may contribute to the individual likelihood of obesity and overweight. In this regard, literature has shown that food environment might be a contributing factor (Pineda et al., 2021; Wilkins et al., 2019). The population-based density of outlets of unhealthy foods such as fast food, convenience foods, sweets and desserts, and sugar sweetened beverages, is higher in municipalities in the north and southwest than in other regions (Ortega-Avila, 2022). This might indicate that individuals residing in these regions are more exposed to an unhealthy food environment than people in the central and southern regions.

Limitations

This paper has different limitations. The estimates are based on survey data from 2021 and census data from 2020. The estimation at municipal level is based on the known population distributions from the census, where the time reference is 2020 and not 2021. Another limitation of the study is that for some research questions, the unit of analysis (municipality) might be too broad. For example, the study of obesity in large metropolitan areas requires smaller units of analysis such as census tracts. A further limitation is that the validation with real data could be improved if the sample size of the ENSANUT were larger. For example, it would be possible to compare the state-level design estimate with the prediction from the HB estimator.

Although the study draws on a well-known indirect small-area estimator, the accuracy of the results is model dependent. The objective selection of the final model was based on leave-one-out cross-validation, and even though this criterion suggests that the final model is very likely to make accurate out-of-sample predictions, there is no full certainty that the posterior distributions obtained from the survey will make error-free predictions when using the census data. This is why, following small-area estimation standards, the paper reports both the point and variance estimates of the prevalence rates of obesity for each municipality. Hence, these must be seen and used as the most likely values given the selected model, which can always be improved with better predictions and with newer data from further rounds of the ENSANUT.

Conclusion

This study generates new and updated information about the spatial and local distribution of obesity in Mexico since it complements previous estimations that are based on more limited statistical approaches that do not fully exploit the hierarchical structure of the available data (INEGI, 2020b). The implementation and exploitation of the advantages of hierarchical Bayesian approaches is fairly recent in the study of obesity and practically new in developing countries. Hence, this study makes a methodological contribution that adds to the reliability of the estimates at local level. These estimates can thus be used by researchers interested in correlating obesity with different socio-economic, demographic, and health-centred factors (See supplementary material). The estimation process can be easily reproduced for other key variables such as undernutrition, diabetes, and hypertension, and even for smaller areas provided the data allow this.