Study area
Senegal is located in the western part of Africa in latitudes 12° and 17°N, and longitudes 11° and 18°W. The country is divided into 14 regions and has a total area of 196,190 km2.
The Demographic and Health Survey (DHS) data
The DHS Senegal is a national household survey that provides information to monitor population and health situation in Senegal. The sample is based on a stratified two-stage cluster design and drawn to be representative at the national, regional, and residence level [15]. The survey was conducted by trained enumerators using structured questionnaires. Interviewers visited only preselected households and no replacement of the preselected households was allowed.
We obtained the DHS Senegal 2019 dataset conducted from April to December 2019, which includes 4,538 households from 341 clusters across the country. In the DHS survey, census Enumeration Areas (EAs) generally become the survey clusters [16]. In urban areas, an EA can be a city block or apartment building while in rural areas it is typically a village or group of villages [16]. To protect the confidentiality of respondents, a cluster is assigned the geo-coordinates of the center of the sampled EA, a type of aggregation [16]. Secondly, georeferenced coordinates of cluster locations are displaced 0 – 2 km in urban areas and 0 – 5 km in rural areas with 1% displaced 0 – 10 km [16]. The displacement is a random direction and random distance process.
In this study, we extracted the eligible samples from the DHS database including 4,220 children under age 5 living in 214 survey clusters (Fig. 1). The symptoms of ARI were reported by mothers whether their children had short, rapid breathing which was chest-related and/or difficult breathing which was chest-related in the past 2 weeks preceding the survey. The presence of symptoms was assigned the value 1 and otherwise 0. Also, we identified some individual-level potential confounders [2, 3, 17, 18] available in the DHS Senegal 2019 dataset: child’s age (month), area of residence (rural vs. urban), wealth index (rich, middle, or poor), and maternal education level (low vs. high). The wealth index was originally classified into poorer, poorest, middle, richer, and richest, and we re-classified it as poor, middle, and rich. The maternal education level was categorized into low if a mother did not receive any education or attended only preschool or primary school, and high if a mother attended secondary school or higher.
Spatial distribution of NO2
The tropospheric NO2 concentrations were collected from the Sentinel-5 Precursor (Sentinel-5P) space-borne satellite using the Google Earth Engine API. The Sentinel-5P is operated and managed by the European Commission under the “Copernicus” program and has spatial resolution of 7 × 3.5 km2. The satellite operates in a sun-synchronous orbit at 824 km and an orbital cycle of 16 days. The satellite carries a TROPOspheric Monitoring Instrument (TROPOMI) which provides a near-global coverage of air pollution caused by NO2 and other pollutants such as O3, SO2, CO, CH4, CH2O, and aerosols [19]. Fig. 2 shows the spatial distribution of mean tropospheric NO2 concentrations over Senegal between April and December 2019.
In this study, we used the tropospheric NO2 concentrations collected from the Sentinel-5P as an approximation of ground-level NO2 concentrations across the country. Although they are not perfectly matched each other in terms of values, the ground-based validation of Sentinel-5P concluded that the tropospheric NO2 had a negative median bias of 23% for low ground-level NO2 and 37% for high ground-level NO2 [20]. For extremely polluted sites, a negative median bias over 50% was observed [20]. Additionally, another study that conducted a comparison between satellite-based TROPOMI NO2 products and ground-based observations revealed high correlation (r = 0.68) [21]. Based on the conclusions made by these validation studies, it appeared reasonable to use the satellite-based tropospheric NO2 concentrations for this study.
During the exposure data collection, the present study matched the satellite-detected tropospheric NO2 concentrations with the cluster locations provided in the DHS dataset spatially and temporally. Data collection and spatial matching was carried out using the Google Earth Engine API. At first, we constructed a buffer zone of 2 km around each reported DHS urban cluster location and 5 km around each reported DHS rural cluster location, and calculated the mean tropospheric NO2 level in this buffer zone. Secondly, the exposure data was temporally linked with each respondent of DHS survey using the individual interview date. Since the ARI symptoms were reported by mothers whether a child had symptoms in the past 2 weeks preceding the survey, we calculated the mean tropospheric NO2 values of 2 weeks preceding the interview date for each respondent.
Meteorological data
We identified additional confounders such as mean temperature and relative humidity, and collected them from another satellite data called Global Forecast System (GFS) 384-h predicted atmosphere data. GFS is a weather forecast model produced by the National Centers for Environmental Prediction. The 384-h forecasts, with 3-h forecast interval, are made at 6-h temporal resolution [22]. For temperature, the column values of temperature 2 m above ground were extracted, and for relative humidity, the ones of relative humidity 2 m above ground were collected using the Google Earth Engine API. Similar to NO2 concentrations, temperature and relative humidity values were matched with the DHS respondents spatially and temporally.
Data analysis
Firstly, we examined the sample characteristics stratified by whether children had ARI symptoms. The Chi-square test was used for categorical variables and two-sample t-test was used for continuous variables.
We constructed unadjusted and adjusted binary logistic regression models to determine the crude and adjusted Odds Ratio (OR) for prevalence of ARI symptoms by using the continuous variable of NO2 concentrations, which was added as a unit of 10 mol/m2. ORs would demonstrate the odds of a child having ARI symptoms per 10 mol/m2 increase in NO2 concentrations. Additionally, we used a quartile categorical variable of different NO2 levels to confirm the association. Confounders included child’s age, area of residence, wealth index, maternal education level, mean temperature, and mean relative humidity. Confounders were chosen based on previous literature identifying potential risk factors for ARIs in children under age 5. We also ensured that there does not exist multicollinearity among independent variables included in the adjusted logistic regression model by computing a Variance Inflation Factor (VIF) score.
In the adjusted logistic regression model, we examined the non-linearity of weather variables such as temperature and relative humidity because previous studies identified that these variables often demonstrate a typical association with health outcomes, characterized by non-linear effects [23]. Therefore, we added temperature and relative humidity as polynomial terms in the regression model with different degrees of freedom, and conducted the likelihood-ratio tests for goodness-of-fit of non-linear regression. As a result, the linear term was selected for temperature and the non-linear term with two degrees of freedom was selected for relative humidity.
For sensitivity analysis, we performed the repeated analysis by using the same models with and without influential points of NO2 concentrations defined by the 95th and 99th percentiles. For this, we developed unadjusted and adjusted models that exclude NO2 concentrations exceeding the 95th and 99th percentiles.
The statistical significance was established by the p-value as well as 95% Confidence Interval (CI). All statistical analyses were carried out with R. This analysis was reviewed by the University of California, Berkeley Institutional Review Board and was considered as not human subjects research because the study was based on a de-identified and anonymous dataset available for secondary analyses.