Study Region
We analyzed the distribution of monkeypox virus in Sankuru district, Kasai-Oriental Province, DRC where the most comprehensive surveillance data is currently available. Sankuru is located in the Cuvette Centrale of the Congo River, site of one of the world’s largest rainforest tracts, between the Sankuru and Lomami Rivers (Fig. 1; Bwangoy et al., 2010). The habitat is composed of closed-canopy, semi-deciduous and evergreen forest (Fig. 2; Arino et al., 2008). Sankuru has a total area of 105,378 km2 and a human population of approximately one million (Vijayaraj et al., 2007). The economy of rural Sankuru is largely based on agriculture and hunting, resulting in close contact between people and wildlife. The majority of settlements are surrounded by agricultural areas and are 3 to 5 km from the forest (Jezek and Fenner, 1988). Most protein is obtained from bushmeat, principally monkeys and rodents (Colyn et al., 1987).
Human Monkeypox Samples
Active surveillance was conducted from November 2005 to December 2007 in collaboration with DRC Ministry of Health and local health workers. Trained field teams monitored the district’s Health Zones and examined suspected cases of human monkeypox, defined as fever followed by papulovesicular rash. For each suspected case, at least two (and in most instances, three) types of data were collected. First, biological samples were collected from suspected cases by conducting a physical exam, during which scabs, vesicle fluid, and blood were acquired. Second, an extensive questionnaire was used to collect clinical and epidemiological data, as well as information about animal exposure (data not shown). Third, we recorded Geographic Information Systems (GIS) data on the latitude and longitude of most suspected cases (where the infected person resided). Our analysis does not distinguish human-to-human from wildlife-to-human transmission. Samples were stored at -20°C and sent to the Bundeswehr Institute of Microbiology and the United States Army Medical Research Institute of Infectious Diseases (USAMRIID) for laboratory diagnosis following established protocols. These protocols are published elsewhere (Rimoin et al., 2007).
Human monkeypox cases were confirmed at 156 geographic sites in Sankuru. The number of confirmed cases was greater than 156, but our GIS model required aggregating the data to the 1 km2 scale. In some instances, this resulted in distinct cases being treated as a single geographic site for modeling purposes. (This study analyzes the occurrence or nonoccurrence of monkeypox, rather than the number of monkeypox cases per site.) Additionally, a small number of suspected cases lacked GIS data. Given the incomplete GIS information, not all reported cases could be assigned exact point coordinates. Thus, our analysis included 201 confirmed human monkeypox cases at 156 geographic sites.
Environmental Variables
We used four types of environmental factors to predict cases of human monkeypox: climate (n = 19 variables), vegetation (n = 7), human population density, and habitat suitability for rodent reservoirs of monkeypox (n = 4). The climate variables consisted of temperature and precipitation, including both the annual mean and measures of seasonal variation (Supplementary Material Table 4, and Supplementary Material Section 1.1 and references therein). Our climate data were based on interpolating from weather stations. We confirmed that the number of stations did not bias the predictions by verifying that our climate data correlate with satellite-based precipitation estimates, which do not depend on weather stations on the ground (Pearson’s r = 0.74, P < 2.2 × 10−16). The vegetation variables represent tree density and the roughness of the canopy (Supplementary Material Table 5, and Supplementary Material Section 1.3). Sections 1.2 and 1.4 of the Supplementary Material describe the population density data and reservoir models.
Data Reduction
Our overall goal was to construct a regression model for predicting the probability of human monkeypox at each 1 km2 site in Sankuru. However, the 31 environmental variables were highly correlated, which can lead to inaccurate maximum likelihood estimates of the parameters of a regression model (Aguilera et al., 2006). To address this, we converted the 31 original variables to uncorrelated principal components (hereafter “PCs,” see Fig. 3; Wilks, 1995). We decided how many of the PCs to retain, as candidate variables for the regression analysis, by calculating the number of components that cumulatively explained at least 70% of the variance in original variables, and by constructing a scree plot (Everitt, 2005). Both approaches gave the same result.
In particular, the PC analysis simplified the original 31 predictor variables into three uncorrelated variables that collectively explained 71.2% of the variance in the original 31 (Table 1; Supplementary Material Figs. 1 and 2). (Utilizing additional PCs that explained more than 99% of the variance did not improve the fit of the model to our data [χ2 = 24.77, df = 16, P > 0.05] but such an approach has been useful in other niche modeling studies [Peterson et al., 2008]). We interpret the first PC, which explained 45% of the variance in the 31 original variables, as precipitation in lowlands that contain habitat for the two terrestrial African dormouse species. This is because the first PC assigns positive weights to lower elevation sites that have high precipitation, as well as a high probability of being in the dormouse’s ecological niche (since the two dormice are roughly collinear, this PC is a measure of both species). Exploratory data analysis indicated that analyzing each dormouse separately yielded similar results (see Supplementary Material Section 1.5). The second PC (variance explained: 14.8%) represents forest density and habitat suitability for arboreal rope squirrels. Thus, PC1 and PC2 represent both habitat suitability for monkeypox reservoirs and other ecological variables, such as climate and vegetation. The third PC is primarily a measure of temperature (variance explained: 11.4%). None of the PCs that we retained assigned a large positive weight to the giant pouched rat, another terrestrial species. Thus, the subsequent analysis of the first three PCs only considers two taxa, dormice and rope squirrels. However, this does not appear to be a serious shortcoming because the pouched rat is less important for explaining human monkeypox cases than the rope squirrel (see “Ecological Determinants of Human Monkeypox Cases” in the Results section). Although PC1 explained the most variance in the original environmental variables (45%), PC1 did not emerge as the most important predictor of monkeypox occurrences (see below).
Table 1 Variables Used to Predict Human Monkeypoxa
Statistical Models for Predicting Human Monkeypox
To ensure that our results were robust to the assumptions underlying our statistical analyses, two distinct statistical models were used to estimate the probability of human monkeypox occurrence throughout Sankuru: logistic regression and Maxent. In both models, the dependent variable was the occurrence (see “Human Monkeypox Samples” in Methods section) or nonoccurrence of human monkeypox. Following established modeling practices (Elith et al., 2006), we selected 10,000 sites at random throughout Sankuru to serve as pseudo-absences (i.e., assumed nonoccurrences). In both models, the independent variable was PC2 (see below).
In the initial multivariate logistic regression model (Jewell, 2004), the independent variables were the three PCs (Table 1). Next, we tested if using fewer than three independent variables would result in a more parsimonious model. We identified the best model using stepwise variable selection, which is a hybrid of forward and backward selection (Montgomery et al., 2006). The use of a distinct approach, the selection of the model that minimized Akaike’s information criterion (AIC), yielded the same results. We assessed the importance of the PCs by computing the Akaike weight for each one, which provides a ranking in terms of how important the PC is for predicting monkeypox.
Since logistic regression assumes that the dependent variable represents occurrences and true absences but our data consisted of monkeypox occurrences and pseudo-absences, we repeated the analysis using Maxent, which only requires occurrence and pseudo-absence data (Phillips and Dudik, 2008). We used logistic regression, in addition to Maxent, because the latter does not provide hypothesis tests to assess the significance of the predictor variables or χ2 statistics that can be used for stepwise selection. Thus, we compared a method that may be more familiar to epidemiologists, and that has more developed tests of variable importance (logistic regression), to a newer ecological niche modeling technique with limited tests of variable importance (Maxent). A potential benefit of using newer ecological niche modeling methods is that they can accommodate more complex patterns in the response of a reservoir to heterogeneities in the landscape (reviewed in Peterson, 2006).