Our research demonstrated the importance of public data sharing using a standardized approach for extracting, aligning, and integrating spatiotemporal data. Our curated dataset enabled the creation of dynamic maps to investigate travelling waves and persistent clusters of infection rates and conflict events during the Yemeni Civil War in 2016–2019. Movie 2 illustrated a persistent cluster of rates in Sana’a and Sana’a City supported by moderate-to-strong positive autocorrelation values for a ~ 20-week lag period. This movie also allowed us to identify a possible travelling wave of infection from this cluster to surrounding governorates in May–August of 2017.
We found a shorter lag period of moderate-to-strong positive autocorrelation values nationally for conflict events reflecting greater variability of conflict events in most governorates. However, Al-Hudaydah demonstrated persistent correlations across ~ 40-week lag period. Movie 3 showed that Al-Hudaydah’s conflict events were greatest from June of 2018 through December of 2019. This increase coincided with assaults on Al-Hudaydah by pro-government forces, backed by the Saudi-led coalition, in June–November of 2018 . These forces installed a blockade in Al-Hudaydah port, which continued to restrict both humanitarian medical supplies and food aid to the entire country the end of our study period .
Our findings illustrate the possibility of conducting early outbreak warnings if timely surveillance data are available and accessible. These efforts can help to develop humanitarian assistance strategies amidst ongoing public health emergencies. The cholera epidemic’s origin and persistence within Sana’a and Sana’a City suggests the importance of monitoring these governorates as markers for future outbreaks. Evidence of travelling waves from this epicentre necessitates the strengthening of health and environmental infrastructure and implementation of preventative infections’ mitigation strategies in surrounding governorates. These measures will reduce the likelihood of a national epidemic. Movie 3 also demonstrates the direct effect of war conflict in specific governorates, and therefore, challenges to implementing public health interventions. The persistent cluster of conflict events within Al-Hudaydah illustrates the extent of these challenges in complex emergency settings.
We encourage researchers to replicate dynamic mapping techniques for other data streams such as environment- or nutrition-related variables if properly aligned and integrated with our dataset. This will allow for data modellers to spatiotemporal associations between conflict-, environment-, or nutrition-related factors and cholera rates using our curated dataset. Many recent studies have explored factors associated with cholera transmission dynamics using granular remote sensing, climate-related spatial data, and conflict information [14, 45,46,47,48,49,50,51]. By improving data collection and processing capabilities, public health professionals will be well equipped with the data and tools to embrace a new era of precision health that prioritizes the sharing of granular temporal and spatial information and creation of high-quality data visualizations that capture complex spatiotemporal patterns of disease outbreaks. We recommend that future research use modelling approaches that properly account for complex, non-linear, and spatially-autocorrelated relationships between these variables.
Modern surveillance systems must improve to reflect both how internal data curators collect, store, monitor, and manage data––and how external data users extract, process, and analyse these data. Such systems can offer near- and real-time forecasts, long-term trend analyses, and outbreak modelling to develop early outbreak warnings and inform timely aid resource deployment. Disease surveillance systems should ensure data transparency and longevity by developing strong protocols for metadata standardization . Improving data quality and availability corresponds with a greater need for prioritizing information management within and across national and international public health, environmental, and humanitarian emergency agencies and organizations. A lack of coordination in data collection and sharing reduces the availability of granular temporal and spatial data for public use. In turn, this forces efforts and decision-making to occur at coarser spatial and temporal scales, reducing the efficacy and refinement of public health and humanitarian interventions.
From Week 46 of 2016 to Week 12 of 2017, we distributed national daily average cholera infections across governorates according to relative population estimates. We made this approximation by assuming that cholera outbreaks followed specific population transmission dynamics with higher incidence in more densely populated locations . Additionally, we reported missing weeks only if all days within that week had a missing estimate. Though sensitive to underestimation, this approach maximized the utility of available surveillance data for conducting time series analyses in the absence of additional information to estimate weekly rates. We stress the need for national public health agencies, international health organizations, and the global health community at large to dedicate more resources and funds to implement thorough infectious disease outbreak investigations worldwide .
ACLED methodological codebooks noted that fatalities were not easily verified and prone to manipulation by armed groups . Even so, these estimates provided the most accurate and reliable approximation of all-cause conflict fatalities during the Yemeni Civil War (neither civilian nor bystander causalities reported) . Furthermore, ACLED and YDP validated all fatality and conflict event information using a combination of health reports, news articles, field surveys, and media stories [31, 32]. While population estimates fluctuate dramatically during conflict due to rapid internal displacement and external migration, we lacked sufficient temporally granular displacement and migration data to improve our adjusted population rate calculations .
We used a 2017 population estimate calculated as the average of the WHO EMRO epidemiological bulletins (from which we extracted cholera infections), Yemeni Central Statistical Organization (in-country reporting), and the International Organization of Migration’s Displacement Tracking Matrix (monitoring migration during humanitarian emergencies) reports [38, 40, 41]. Together, these estimates provided the best approximation for governorate-level population. We prorated weekly population estimates using a low-fertility and moderate-mortality birth rate estimate and ACLED fatalities to favour under-reporting of infection rates during the Yemeni Civil War . We found no alternative calculation technique for describing population estimation during conflict events or humanitarian emergencies.
We used various data sources to harmonize this dataset including health reports, news articles, and field surveys. We believe these reports provided accessible, usable, and timely documentation of information related to Yemeni cholera infections and conflict-related outcomes in 2016–2019. While encouraged by this harmonization process, researchers must recognize that all estimates were only as accurate as the reports from which data were extracted. We strove for clarity and transparency in the applied methods, yet we acknowledge that metadata on the pre-processing of publicly available data from international organizations was extremely limited. Where possible, we compiled the metadata or raw text files for the extracted records used to create this dataset and uploaded files to our figshare repository .
The essence of an informative dynamic map is strong data structure and a rigorous process of compilation. In this study, we demonstrated that the curation of comprehensive global health repositories enabled the creation of dynamic maps for tracking, recognizing, and visualizing complex spatiotemporal processes. The standardization and harmonization of reporting publicly available data ensures the longevity of data usability even as the platforms used to store, analyse, and communicate data change over time. Curated global health datasets and web-based dashboards with built-in dynamic mapping tools improve the reporting and understanding of associations between diseases and manmade or natural risk factors. Both dynamic maps and the process of data extraction, aggregation, and alignment emphasize the importance of long-term surveillance data collection in usable time series data formats. Only with these tools can surveillance records and dynamic mapping be used effectively and efficiently to plan for and respond to complex emergencies with medical, fiscal, and humanitarian supplies and aid resources .
Our data curation techniques can be applied to updated cholera data when it becomes publicly available for the Yemeni outbreak. Researchers can also apply our data extraction, alignment, and compilation techniques for other infectious disease outbreaks worldwide, as many WHO globally monitored infections share similar epidemiological bulletin reporting formats. We encourage researchers to harmonize and integrate more global public health data streams within this dataset, especially those from the WHO and World Food Programme (WFP) [58,59,60,61,62,63]. Research has documented how environmental risk factors amplify cholera infection rates at the district level, though few studies have explored these factors in combination with other conflict- or nutrition-related risk factors . While key informant interviews have noted increased cholera morbidity and mortality in individuals with poor nutrition status, we found no studies investigating temporal relationships between cholera infection rates and risk factors related to food access, purchasing power, or insecurity [2, 3, 64]. Future research must be translational; we encourage researchers to standardize and harmonize data at granular spatial scales to inform and empower local actors to promote public health programming under constrained resource circumstances.
We reported confirmed cholera infections as weekly time series in two ways: with and without interpolating missing data. We urge the global health community to standardize reporting of missing with respect to reasons for, quantity of, and location where missing data occur within a time series. Metadata reports can include information describing the completeness of time series data over time and by geographic location . These reports inform data users how interpolation techniques impact time series analyses and forecasts . Our attempts at interpolating missing data demonstrated the difficulties and ambiguity of using publicly reported time-referenced data when no metadata or standardized reporting protocol exists. These concerns and difficulties also occurred when extracting and aligning conflict-related time series data.