An Open Source Spatiotemporal Model for Simulating Obesity Prevalence
Obesity may be the single most challenging example for a condition with causes and consequences at multiple levels and with multiple feedback loops among influencing factors. New approaches to modeling obesity prevalence are needed to fully understand the complexities associated with the relationship between obesity and the demographic, socio-economic and environmental factors.
We describe in this paper a computer simulation project that focuses on the causes of obesity-related health disparities. In particular, our project adopts the susceptible, infected, and recovered (SIR) framework and the categorization of population into normal, overweight, obese, and extremely obese subpopulations. This project is important to public health because the fully developed computer application provides a new, more comprehensive, decision support tool for policy makers than most existing applications. The implementation of policies that effectively combat obesity would improve the health and well-being of a high percentage of the population, including both adults and children. It will also greatly reduce associated economic costs to society such as health care expenses and loss of productivity.
Being written in open source, our computer application is entirely cross-platform, lowering the transmission costs in research and education. Free access to the source code allows a broader community to incorporate additional advances in generating research questions for specific goals, thus facilitating collaboration across disciplines.
KeywordsOpen source Spatiotemporal model Obesity
Obesity is an exceedingly complex public health problem with hypothesized causes at multiple interacting levels that are embedded in the very structure of society [1, 2]. This complexity appears to be the reason that most one-dimensional preventive or therapeutic interventions have not been very successful. For example, the Foresight causal map prepared by UK Government Office illustrates the inherent complexity of obesity as a public health problem . The Foresight map was built around energy balance and mammalian physiology, but the model rapidly expanded to include individual and collective physical activity, the built environment, individual and collective psychology, industrial food production, and population food consumption. Even with the expanded list of variables, obesogenic policy determinants of the relevant environments were excluded which seems to limit the validity of that approach. Obesity, per se, is only a small part of a larger public health problem that includes obesogenic policy, environments, and population characteristics. These population characteristics include unhealthy dietary habits and sedentary behavior, a high prevalence of obesity, high obesity-related morbidity and mortality, and high rates of diabetes or cardiovascular diseases among historically disadvantaged groups. Thus the obesity problem includes long-standing area disparities in health. Addressing these disparities, their spatio-temporal components, and their determinants requires new approaches.
Obesity prevalence has been predicted by using statistical models and simple dynamic models. However, they predicted only the size of the obese population as a whole without further distinguishing the population to various levels of obesity . Such models over-generalized the movements of subpopulations between different levels of obesity. In addition, the simple models from current literature (e.g., [5, 6]) are often too simplified in the following ways: modeling future trends of obese population at a geographic scale that is often too coarse to be useful in revealing area disparities. Finally, most models, in order to accommodate the statistical and simple dynamic modeling structure, often miss important factors, such as death rates, birth rates of the population, and more importantly; lumping all levels of normal weight/overweight/obese/extremely obese subpopulations into one.
As such, the results of statistical analysis and predictions have limited practical use in assisting policy-making process by public health districts when designing and implementing more geographically- and temporally-focused intervention programs. Auchincloss and Roux  pointed out the weaknesses of traditional epidemiologic approaches when dealing with complex multilevel data with spatio-temporal components. They noted that traditional regression-based approaches to analyzing multi-level exposures and health disparities are limited by a variety of assumptions. These assumptions include the requirements that realizations of each independent variable do not influence one another, and that there are no feedback loops to address the interactions among variables. These requirements do not fit well with the complex realities of obesogenic policy, environments, and population characteristics where dependencies and feedback loops are common.
Obesity may be the single most challenging example for a condition with causes and consequences at multiple levels and with multiple feedback loops among the causes. New approaches are obviously needed. The principal research question of our work is: can we develop a prototype for a comprehensive simulation mechanism for estimating obesity prevalence and obesity-related disease or disparities that (1) addresses obesogenic policy, environments, and population characteristics; and (2) is calibrated against obesity-related morbidity and mortality?
Obesity studies have been, and continue to be challenged by dealing with temporal trend of geographic patterns and spatial dynamics of health development. There is an imperative need for effective and efficient methods to represent and examine the coupled space-time attributes of obesity phenomena in the comparative context. As a multi-dimensional and multi-scale phenomenon, obesity studies witness the role of geography and the awakening emphasis on space among public health practitioners. As discussed above, it is clear that a space-time perspective has become increasingly relevant to our understanding of public health dynamics. To this end, we argue that an open source solution is needed to systematically integrate space and time so to share and promote any advances in this direction. Though rich conceptual frameworks have highlighted the complexity of obesity dynamics, the gap has been widening between empirical studies and theories. Hence, the most crucial step is to systematically understand obesity dynamics data from the theoretical and policy context. Thus, the availability of codes and tools to support space-time data analysis are vital in the adoption of such a perspective in obesity studies.
An Open Source Approach to Obesity Simulations
The prevalence of obesity among adults and children in the United States has increased dramatically in recent decades . This is a public health issue as obesity causes many other chronic health conditions, such as, hypertension, cardiovascular disease, type II diabetes, among others. Increasing obesity prevalence in a region affects the life expectancy and quality of its residents. It also increases social costs in many ways.
The basic cause of obesity is the imbalance between the amount of energy taken in through eating and drinking and the amount of energy expended through metabolism and physical activity . To offset excessive energy intake, increased physical activity is encouraged as a way to keep energy in balance. However, energy imbalances appear to be encouraged by features of the physical, social, and economic environments. Lee et al.  found that the density of fitness centers and non-fresh food outlets are related to the prevalence of obesity, and that an analysis of smaller geographic units provides more details regarding area disparities in health than analyses carried out with larger geographic units.
Most of the obesity studies that have looked at the food environment have concentrated on the hypothesized effect of non-fresh food (fast-food, packaged food, pre-processed food, etc.) consumption on people’s diet and public health. With today’s fast-paced life styles and intensive marketing of various types, non-fresh food outlets have become an important part in people’s daily diet because of convenience, price, distance and other cultural factors . The literature in this area suggests a positive correlation between regularly consuming non-fresh food and the prevalence of obesity unless daily physical activities are performed on a regular basis . Positive correlation means, the more frequently one eats from non-fresh food outlets over time, the higher are the chances of being obese .
A study on non-fresh food consumption and obesity among Michigan adults suggested that regular fast food consumption was higher among younger adults and men . In that study, the prevalence of obesity increased consistently with frequenting non-fresh food outlets, from 24% of those going less than once a week to 33% of those going three or more times per week. The predominate reason for choosing fast food was convenience. Another study found that youths 11–18 years old ate at non-fresh food outlets an average of twice per week , which also points to the alarming possibility of increasing obesity rates among young people.
Non-fresh food consumption has been found to be highly correlated with the prevalence of obesity. Reasons that may affect the consumption of non-fresh food are the price of the food, the walking or driving distance, and various cultural, behavioral, or environmental factors [8, 15]. In addition, marketing campaigns of non-fresh food outlets could play a significant role in the consumption of unhealthy food . If marketed well, non-fresh food outlets can attract a significant number of customers, which can later lead to increases of overweight and obese people. Most often non-fresh food outlets are unhealthy because of the way foods are cooked and the high calories per “serving”. The increased supply of non-fresh food outlets has a significant impact on obesity. Frequently eating at non-fresh food outlets is becoming an important issue in the public health literature because of the apparent health effects.
Physical activity and the distribution of fitness centers can have a significant impact on the prevalence of obesity if exercise is taken regularly . Over the last few years, there have been studies focused on the relationship between the built environment and physical activity . However, there were no other studies besides Lee et al.  that examine the relationship between distances from fitness centers and obesity rates by using small geographical units such as tracts or block group. The proximity of fitness centers could change the prevalence of overweight and obesity in some neighborhoods. A relevant study in New Zealand neighborhoods found evidence of a relationship between beach access and body mass index (BMI) and physical activities . Several other studies reported a positive association between the recreational environment and physical activity for both adults and children [18, 19]. Going to recreational centers regularly increased physical activity; therefore, lower rates of obesity and overweight can be expected in neighborhoods with sufficient access to fitness centers. Mobley et al.  found there is a lower average BMI in areas with more fitness centers. In addition, Boehmer et al.  reported that having fewer fitness centers within close proximity was associated with higher likelihood of obesity among women but not men.
Furthermore, being obese was found to be significantly associated with perceived absence of sidewalks, unpleasant communities, lack of interesting sites, and presence of garbage . Several studies show that people tend to increase their frequencies of visiting fitness centers when the distance between home and facilities decreases . For long-term health benefits, people should focus on improving fitness by increasing physical activity rather than relying only on diet for weight control . It should be noted, however, that going to fitness centers maybe a critical behavior, but there are multiple factors that may discourage or encourage this key behavior (such as the price of membership, geographical (distance), time required for finding a parking space, etc.)
Our review of the literature in obesity suggests that a comprehensive computation model of obesity-related disparities with extensive calibration is possible. Some basic components of the model have been developed, but key components of a comprehensive model have been omitted from prior work. Calibration is also insufficient. As far as we know, no one has developed a comprehensive model of obesity and related area disparities with extensive calibration against obesity-related morbidity and mortality. Our innovative project has scientific merit because of the breadth of the proposed model and the possible calibration of the simulation against hard outcomes including obesity-related morbidity and mortality. A strength of our approach is that it may be possible to use a multi-year sample of geocoded individual inpatient discharge data from all hospitals in a representative urban-suburban county (such as Summit County, Ohio) where the simulation will be anchored as well as a corresponding sample of geocoded death certificates, US Census data, and geocoded environmental data from Summit County Public Health, the Ohio Department of Health, and other sources. Use of real world geocoded individual health outcome data in this research project will provide more robust tests of a given modeling strategy in nearly all circumstances.
In terms of obesity simulations, there have been various attempts discussed in obesity literature. In their review of obesity simulations, Levy et al.  list two agent-based models (ABM) and seven Markov models. Burke and Heiland’s ABM  looks at the obesity epidemic in terms of food prices and social norms, while the Hammond and Epstein  ABM looks at obesity in terms of the physiology of dieting and socially influenced weight changes. More recently, Auchincloss et al.  models residential segregation, income disparities, and diet quality; while Yang et al.  models disparities and walking behaviors in an urban setting. While these obesity simulations achieved the objectives of estimating obesity prevalence in some ways, they all fell short of allowing more detailed classification of population (e.g., grouping populations into normal/overweight/obese/extremely obese) and allowing movements between subpopulations. Furthermore, the geographic units of these simulations are mostly too big to have practical uses in assisting policy-making processes for intervention programs.
Overall, from many of the analyses we reviewed, they showed that obesity ratios are indeed affected by educational attainment, income level, and unemployment level (see reviews in ). In addition, obesity ratios also show the expected relationships with densities of fitness centers and non-fresh food outlets. While such relationships are all statistically significant, it is important for us to explore in more detail where inside the county we can expect such relationships to be stronger or weaker. This is so that, when making policies on how to promote health and allocating funding to different areas in the county. For example, area disparities in health can be incorporated for more effective outcomes at neighborhood level.
In terms of implementing a software tool for simulating obesity prevalence, we argue that both space and time are critical components in such simulations. Spatial turn in many socioeconomic theories has been noted in many disciplines, encompassing both social and physical phenomena [29, 30, 31]. This intellectual and technological change has yielded important insights on physical sciences, social sciences and the humanities, with an explosion of interest across disciplines . During the past several decades, a number of efforts have been witnessed on the development and implementation of spatial statistical analysis packages, which continues to be an active area of research . Meanwhile, spatial public health analysis is increasingly being supported by the emergence of advanced analytical methods in space-time data analysis and data visualization. The interactive spatial data analysis has motivated, if not directly provoked, new queries on spatial public health theories. Therefore, the current research implements the new methodological advances in an open source environment for exploring data that has both temporal and spatial dimensions, which lend support to the notion that space and time cannot be meaningfully separated.
The fast growth of spatial public health analysis is increasingly seen as attributable to the availability of spatio-temporal datasets. By contrast, most public health geographers have been slow to adopt and implement new spatially explicit methods of data analysis due to the lack of extensible software packages, which becomes a major impediment to promoting spatial thinking in public health studies.
ABM is not new to public health inequality studies, whereas an open source solution would give better support for the scientific investigation and management of data sets, including its description, representation, analysis, visualization, and simulation. Additionally, comparative space-time analysis enables access to a much wider thinking that addresses the role of space at different stages and thus identifies the research gaps and opportunities for more in-depth study.
Obesity Prevalence Simulator: A Case Study of Summit County, Ohio
Timely and rigorous analysis of obesity will open up a rich empirical context for the social sciences and policy interventions. The Obesity Prevalence Simulator (ObPSim) was developed in Python programming language with funding provided by the Summit County Public Health District of Summit County, Ohio. Python is a versatile language that is free to acquire, install, and use. Python is also a cross-platform programming language, which means a python script can be used by computers with one platform of operating system and be usable in other operating system platforms. In addition, many libraries that process GIS and other forms of data have been developed and are freely available in public domain. This allows further improvements and updates for existing codes to be carried out easily. The open source environment offers a straightforward way of benefiting wider community.
While Lee et al.  used Summit County, Ohio as a case study because of the availability of key data and the project’s funding, their findings may be applicable to many other geographic locations since demographic and socio-economic profiles in this area are very close to the national average in the US.
The objective of the study reported here is to model known multiple parameters associated with changes in body mass index (BMI) classes and to establish conditions under which obesity prevalence will plateau. Following Thomas et al. , a differential equation system is adopted that predicts population-wide obesity prevalence trends. The equation system is complex but very logical and practical. Interested readers can find the equation set in Thomas et al. .
The model considers both social and non-social influences on weight gain, incorporates other known parameters affecting obesity trends, and allows for country specific population growth. With 2011 data from American Community Survey (Census Bureau, 2011) and the 2008–2013 BMI data from the Bureau of Motor Vehicles, Summit County has 452 census block groups with a wide spectrum of obesity ratios (ranging from 16 per 1000 population to 549 per 1000 population) and overweight ratios (ranging from 32 per 1000 population to 541 per 1000 population).
Normal weight (S_T),
Extremely Obese (3_T),
Exposed (E_T, or S_T ➔ 1_T), and
Recovered (R_T, or 1_T ➔ S_T).
α1(1_T ➔ 2_T),
α2(2_T ➔ 3_T),
β1(3_T ➔ 2_T),
β2(2_T ➔ 1_T),
ϒ1(S_T ➔ 1_T), and
ϒ2(1_T ➔ S_T).
Total population at time0 (TotalPopulation) = S_T + 1_T + 2_T + 3_T +E_T + R_T
The exposed subpopulation (E_T) are individuals who are exposed to either social or non-social influences that lead to weight gain and these individuals will eventually become overweight.
The subpopulation (R_T) are individuals who have weight loss under social or non-social influences.
Social interactions between compartments are governed by the law of mass action and modeled by multiplying the population numbers in each class.
Estimated subpopulations at time1 can be derived as solutions for α1, α2, β1, β2, ϒ1, and ϒ2 from a set of differential equations as proved in Thomas et al. .
For the purpose of modeling and simulations, initial values for model parameters are estimated from publications in the obesity literature:
The probability of being born in obesogenic environment is set to be 0.55 of females of reproductive age who are overweight or obese, based on Balcan et al. .
Birth rate is set to be 0.0144, based on Jacobson et al. (2007).
Baseline prevalence rates are set to be 0.32 for overweight, 0.22 for obese, 0.03 for strictly obese, based on Flegal et al. .
Social influence by overweight and obese are set to be 0.4 for overweight subpopulation and 0.2 for obese subpopulations, both are based on fitting to initial trends as discussed in Flegal et al. .
Spontaneous rate of weight gain to each class are set to be: exposed (0.05), overweight (0.14), obese (0.08), and extremely obese (0.014), also based on Flegal et al. .
Rate of weight loss to each class are set to be: extremely obese to obese (0.05), obese to overweight (0.03), and overweight to normal weight (0.033), also based on Flegal et al. .
Rate of weight regainers transitioning from normal weight to overweight is set to be 0.04, also beased on Flegal et al. .
Death rate of obese and extremely obese populations is set to vary between 16.5 to 22 per 1000 population as suggested by Oizumi .
S_T: the number of people in each neighborhood who are in normal weight range (BMI < = 25)
1_T: the number of people in each neighborhood who are considered overweight (20 < BMI < = 30)
2_T: the number of people in each neighborhood who are considered obese (30 < BMI < = 40)
3_T: the number of people in each neighborhood who are considered extremely obese (BMI > 40)
E_T: the number of people in each neighborhood who are exposed to possibility of changing from normal weight to overweight
R_T: the number of people in each neighborhood who may have weight loss so to return from overweight to normal weight.
It should be noted that estimations for E_T and R_T with the above regression are provided here purely for the purpose of demonstrating the usage of ObPSim. Additional studies and analysis may be needed in order to derive better or more precise estimates.
between S_T and the density of non-fresh food outlets in each neighborhood for estimating E_T and
between 1_T and the distance to the nearest fitness centers from the neighborhood center for estimating R_T.
A simulation control panel, entitled Simulation, shows the various simulated year, parameters, and the Update button as below:
Observe the spatial distribution of obesity prevalence at any given year.
Observe the changes in each neighborhood’s obesity prevalence over time.
Observe the spatio-temporal patterns by neighborhoods by changing one or more parameter values.
Each round of simulation will generate an output file.
The concept of exploratory space-time data analysis is strongly associated with visualization because graphical presentation enables the analyst to open-mindedly explore the structure of the data set and gain some new insights. Shneiderman  argues that exploratory data analysis can be generalized as a three-step process: “overview first, zoom and filter and then details-on-demand”. More importantly, it is worth noticing that this process should be iterative, and the methods implemented in the current research addressed the challenge. To explain the observed patterns and trends, a follow-up research is needed on collecting determinants of economic growth.
As the last, but the most important step in an analysis such as using ObPSim to investigate spatio-temporal changes in obesity prevalence is the calibration of the model. If (and when) actual data are available for simulated years, it is possible to run the simulations retroactively for a target year and then calibrate the model parameters by incorporating actual data. For example, one can first simulate obesity prevalence in 2012 by using 2000 data and then calibrate the model with actual 2012 data. Such calibration would help to derive a set of parametric values that best approximates simulated results to actual trends in 2012. Understandably, the calibration processes can be tedious and repetitive, they are, however, necessary steps in ensuring simulations are meaningful and applicable.
This paper explores the potential for the new open source tool to function in obesity studies. In other words, the current work is mainly from an exploratory perspective, which can motivate scholars to design a series of analysis questions and formulate new hypotheses from theoretical and policy perspectives. This space-time work provides an important contribution to the current literature, which lacks in comparative space-time studies. Although this comparative study stems from the analysis of obesity dynamics, it broadly aims to analyze the role of geography and location in public health phenomena. In addition, the methods are built in open source environments and thus easily extensible and customizable.
Obesity is an exceedingly complex public health problem with hypothesized causes at multiple interacting levels that are embedded in the very structure of society. This complexity appears to be the reason that one-dimensional preventive or therapeutic interventions are not very successful. The traditional epidemiologic approaches fail to address complex and multilevel data with spatial components. These simplifications do not fit well with the complex realities of obesogenic policy, environments, and population characteristics where dependencies and feedback loops are common. Hence, the reported research extends traditional regression-based approaches to multi-level exposures through a set of differential equation system. This project also integrates the following elements: spatial components, the influence among realizations of each independent variable, as well as feedback loops between outcomes and independent variables.
Given this, new approaches are needed to fully understand the complexities associated with obesity. ObPSim developed in this project is a new, more comprehensive, decision support tool for policy makers. The implementation of policies that effectively combat obesity would improve the health and well-being of a high percentage of the population, including both adults and children, as well as greatly reducing associated economic costs to society such as obesity-related health care expenses and loss of productivity. Based on the susceptible, infected, and recovered (SIR) framework, ObPSim is featured by categorizing the population into subpopulations of normal weight, overweight, obese, and extremely obese. Furthermore, ObPSim allows population to be moved between subpopulations. Such movements can be defined by any reasoning from the various physical environments, food environment, built environment, and socio-economic environments of the neighborhoods.
Beyond the features of categorizing a population to subpopulations and allowing people to move between subpopulations, ObPSim also allows users to set a suite of model parameters in estimating future obesity prevalence. These parameters do affect how estimations are calculated. However, the parameters as defined by the local conditions allow the simulations to be executed with spatial variations and with localized conditions. Finally, ObPSim provides a means of studying obesity prevalence at a very fine geographic scale. By using census block groups as neighborhoods, ObPSim goes beyond the conventional approaches of studying obesity prevalence at the scale of census tracts. The additional details reveal by using smaller geographic units certainly allow us to better understand spatial patterns and processes of obesity prevalence.
Beyond the scope of this project, studies that compare how simulated obesity prevalence levels react to different values of the model’s parameters would be valuable to engage. By fixing all but one parameter to vary in simulations, estimated obesity prevalence patterns can be used to related to how that particular parameter changes. If desired, multiple parameters can be allowed to change simultaneously so observations can be made to see how they affect obesity prevalence as a whole. This paper thus demonstrates an example to interface public health analysis with the open source revolution, which is among the burgeoning efforts seeking the cross-fertilization between the two fast-growing communities.
The ObPSim package is entirely open source, which can promote collaboration among researchers who want to improve current functions or add extensions to address specific research questions. Based on the strength of scientific visualization techniques, this paper stresses the need to study the space-time dimension underlying obesity data sets. Finally, a new interactive tool is suggested and demonstrated as providing an explanatory framework for space-time data. On this basis, the sincere hope here is that this dialogue between public health scholars and geographers will embrace the real world challenges of inequality issues.
This work is partially supported by the National Science Foundation under Grant No. 1416509, project titled “Spatiotemporal Modeling of Human Dynamics Across Social Media and Social Networks”. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.
- 3.Butland B, Jebb S, Kopelman P, McPherson K, Thomas S, Mardell J, Parry V (2012) Tackling obesities: future choices: project report, 2nd edn. Foresight, United Kingdom Government Office for Science, LondonGoogle Scholar
- 4.Thomas DM, Weedermann M, Fuemmeler BF, Martin CK, Dhurandhar NV, Bredlau C, Heymsfield SB, Ravussin E, Bouchard C (2013) Dynamic model predicting overweight, obesity, and extreme obesity prevalence trends. Obesity. doi:10.1002/oby.20520
- 8.Anderson B, Rafferty AP, Lyon-Callo S, Fussman C, Imes G (2011) Fast-food CONSUMPTION and obesity among Michigan adults. Prev Chronic Dis 8(4):A71Google Scholar
- 21.Boehmer TK, Hoehner CM, Despande AD, Brennan Ramirez LK, Brownson RC (2007) Perceived and observed neighborhood indicators of obesity among urban adults. Int J Obes (Lond) 97(3):486–492Google Scholar
- 23.Lee CD, Blair SN, Jackson AS (1999) Cardiorespiratory fitness, body composition, and all-cause and cardiovascular disease mortality in men. Am J Clin Nutr 69:373–380Google Scholar
- 26.Hammond R, Epstein J (2007) Exploring price-independent mechanisms in the obesity epidemic. Center on Social and Economic Dynamics Working PaperGoogle Scholar
- 29.Goodchild MF, Glennon A (2008) Representation and computation of geographic dynamics. In: Hornsby KS, Yuan M (eds) Understanding dynamics of geographic domains. CRC, Boca Raton, FL, pp 13–30Google Scholar
- 37.Shneiderman B (1996). The eyes have it: a task by data type taxonomy for information visualizations. In: Visual languages, 1996. Proceedings., IEEE Symposium on. IEEE, pp 336–343Google Scholar