Lymphatic filariasis (LF) is a mosquito-borne disease caused by the filarial worms, Wuchereria bancrofti, Brugia malayi and B. timori. Since the launch of the Global Programme to Eliminate Lymphatic Filariasis in 2000 an estimated total of 975 million people, including 198 million in sub-Saharan Africa, have benefitted from mass drug administration (MDA) programmes that deliver antifilarial drugs [1]. As a result of these efforts, it is estimated that the global prevalence of infection has decreased by 30 % and 18.2 million cases of LF morbidity have been averted [1]. At a country level, an increasing number of national LF control programmes have completed five or more rounds of MDA and, as a consequence, are conducting transmission assessment surveys (TAS) to determine whether prevalence of LF is <1 % and MDA can stop [2]. After stopping MDA, programmes will still need to conduct surveillance to ensure transmission has not re-emerged (i.e. there has been no recrudescence). This can either be achieved by periodic surveys, for example, by repeating a TAS 2–3 years after stopping MDA, or through screening of routine blood samples [3]. For the process of verification, countries will also need to conduct surveillance in areas judged to be non-endemic at the start of the programme. To help reduce costs, surveillance can be stratified according to the risk of recrudescence. This risk may be predicted from analysis of the historical, pre-intervention transmission levels, vector type and capacity and environmental and demographic factors known to influence the intrinsic sensitivity (receptivity) of transmission [46]. Recent work at the country level highlights the environmental, socio-demographic, and intervention drivers of LF and how this information can be used to stratify areas according to likelihood of transmission being interrupted or persisting [79]. Other work has used Bayesian geostatistical modelling to predict the distribution of LF at country [10] and continental [11] scales.

In this paper, we use a combination of Bayesian geostatistical and mathematical modelling to develop maps of the prevalence and transmission intensity of bancroftian filariasis in sub-Saharan Africa (SSA) prior to large-scale control. We use these maps to inform a stratified approach to LF surveillance. The work builds on recent work to develop a global atlas of LF infection [12], conducted as part of the Global Atlas of Helminth Infections project ( [13].


Data sources and data selection

Data on the prevalence of LF infection were identified from searches of the formal and informal literature and direct communication with LF control programmes. Details of the search strategies, inclusion criteria, data abstraction and geolocation procedures are provided by Cano et al. [12]. In brief, only population-based survey data based on random sampling were included, whereas data from non-random data, including surveys from hospitals, prisons, mental institutions or military facilities, were excluded. In the current analysis, only surveys conducted prior to the implementation of countrywide, population-based MDA were included. Infection prevalence was defined as either (i) the proportion of surveyed individuals with detectable microfilaraemia or (ii) the proportion of surveyed individuals with detectable antigenaemia. A table with a brief description of the surveys compiled and used in this work is provided as supplemental information (see Additional file 1).

Microfilaraemia (mf) was estimated from examination of thick blood films collected during night blood surveys to detect the presence of microfilariae using microscopy [14, 15]. It is generally assumed that the sensitivity of methods for detecting microfilaraemia is influenced by the volume of blood sampled and previous authors have adjusted prevalence estimates according to blood volume [16]. We therefore investigated this issue further but found that although estimates of infection prevalence varied according to blood volume, we were unable to derive consistent adjustment factors and therefore did not adjust prevalence estimates according to blood volume (see Additional file 1). It is also known that concentration and counting chamber methods have greater sensitivity [17] and therefore where estimates were derived using both concentration methods and thick blood smears (n = 10 surveys), we used only the thick smear data.

Wuchereria bancrofti antigenaemia was typically estimated using an immuno-chromatographic card test (ICT) [14, 18]. These tests are more sensitive than mf detection and can be conducted on blood collected at any time of the day and therefore since 2000 have been the diagnostic method of choice for mapping the distribution of LF caused by W. bancrofti. Recent work has highlighted potential cross-reactivity of the ICT test with Loa loa [19], therefore we excluded ICT-based surveys conducted in areas of L. loa transmission (n = 314), as defined by Zouré et al. [20].

Data derived from parasitological blood examinations during day time (n = 71), ELISA (n = 70), clinical examinations (n = 24) and other molecular or antibody-based tests (i.e. PCR, IFI, new rapid test) (n = 17) were excluded because of the lack of comparable data. Finally, we excluded data from Egypt since this country is no longer considered endemic for LF and has a different transmission ecology from SSA [21, 22]. Figure 1 summarises the survey data selection and Fig. 2 shows the geographical distribution of survey data by diagnostic method.

Fig. 1
figure 1

Selection of data for inclusion into the modelling, based on sample size, diagnostic method. NA: not available; EWH: eye worm history (presumptive of loiasis)

Fig. 2
figure 2

The spatial distribution of data on the prevalence of microfilaraemia (mf) (n = 1217 surveys) and antigenaemia, based on immuno-chromatographic card test (ICT) data (n = 3197 surveys). Countries defined as non-endemic by the World Organization are shown. Also shown are the spatial limits of transmission as previously defined by Cano et al. [12]

Covariate variables

There are a number of climatic, environmental and demographic factors that influence the transmission and epidemiology of LF, including temperature, humidity, elevation and population density [12, 2325]. We generated a suite of climatic and environmental covariates from surfaces interpolated from meteorological stations or remote sensing imagery, including estimates of mean, minimum and maximum temperature and precipitation at 1 km2 resolution [26], averaged long-term estimates of land surface temperature [27], elevation at 1 km2 resolution [28], and aridity index [28], averaged enhanced vegetation index for the period 2000 to 2012 [27], and land cover data [29]. Estimates of population density were obtained from the Gridded Population of the World [30] and the United Nations Environment Programme [31], which we used to classify areas as urban, peri-urban or rural areas, based on the assumption that urban extents (UE) have a population densities ≥1000 persons/km2, peri-urban >250 persons/km2 within a 15 km distance from the UE edge, and rural <250 persons/km2 and/or >15 km from the UE edge. From these datasets of population density, population growth rate for the period 1960 to 2010, as a measure of the change rate in population over this period, was also calculated. A detailed description of each covariate and source is provided in the supplemental file (Additional file 2: Table S3). Each covariate grid was resampled to a 10 km spatial resolution using bilinear interpolation for continuous surface and a majority approach for categorical data [32]. Covariate data were extracted to survey locations using ArcGIS Desktop v10.2 (Environmental Systems Research Institute Inc., Redlands CA, USA).

Vector distribution maps

LF is transmitted mainly by mosquito species belonging to the Anopheles, Culex, and Mansonia genera and to lesser extent Aedes, Coquillettidia and Ochlerotatus genera [33, 34]. Anopheles mosquitoes are the main vectors of LF through much of west and central Africa and inland east Africa, whereas Culex species are the main vectors in coastal east and southern Africa [33]. Studies have shown that the survival, parasite uptake and development of infective L3 stages and overall transmission potential varies by vector species [3537]. In order to capture these species differences, we developed maps of the distribution of each dominant LF vector species: Anopheles, Culex and Mansonia.

Maps of the distribution of mosquitoes belonging to the An. gambiae complex and An. funestus complex were obtained from the Malaria Atlas Project project ( [38, 39] and a binary map displaying the distribution of each complex was created. Maps of Culex and Mansonia mosquitoes were obtained from the VectorMap project (, which collates collection records of major vector insects and uses maximum entropy ecological niche modelling to develop occurrence maps [40, 41]. The maps present the probability of occurrence of Cx. quinquefasciatus, Cx. pipiens, Cx. univittatus, and Mansonia Africana, and we used a 90 % probability threshold to indicate dominance of a particular species according to a set of presence records obtained from the Global Biodiversity Information Facility database [42]. Each of the species-specific binary maps were combined to produce gridded maps of mosquito distribution by genus which, in turn, were combined into a single map of the different LF vector species at a 5 km spatial resolution (Fig. 3). The developed maps of Anopheles, Culex and Mansonia genera were subsequently incorporated as covariates in the goestatistical and mathematical modelling.

Fig. 3
figure 3

Predicted occurrence of the major potential vectors of lymphatic filariasis: a Anopheles, b Culex, c Mansonia, and d overlap of species

Geostatistical modelling approach

The prevalence of mf and prevalence of antigenaemia were modelled separately due to a poor correlation between outcomes and the inability to predict one from another, as detailed elsewhere [43]. Bayesian geostatistical models, which included both fixed and random effects, were used to predict the spatial distribution of each outcome. Fixed effects quantify the effects of the covariates on LF infections, whereas random effects account for unexplained spatial variation whose structure can eventually help identify anomalous areas of high or low risk that can be further investigated.

The ICT-based model accounted for environmental and demographic covariates and spatial random effects as data spanned only 15 years and no temporal trend was observed in the data. From ICT-based data, we fitted a model in which conditional on the true prevalence P(x i ) at location x i , i = 1, …, n, the number of positive results Y i out of N i individuals sampled at location x i , follows a binomial distribution Y i |P (x i ) ~ Binomial(N i ,P (x i )). The log-odds of P(x i ) is modelled as logit(P(x i )) = z i β + S(x i ) + u i , where z i  = (1, z i1, …, z ip ) denotes the vector of the intercept and the p covariates considered, and β = (β0, β1, . . ., βp) is the coefficient vector. S(x i ) is a zero-mean Gaussian process with Matérn covariance function that represents any residual spatial variation which is not explained by included covariates. The Mátern covariance function is expressed as

$$ \mathrm{C}\mathrm{o}\mathrm{v}\left(S\left({x}_i\right),S\left({x}_j\right)\right) = \frac{\upsigma^2}{2^{\upsilon -1}\Gamma \left(\upsilon \right)}{\left(k\parallel {x}_i-{\mathrm{x}}_{\mathrm{j}}\parallel \right)}^{\upsilon }{K}_{\upsilon}\left(k\parallel {x}_i-{\mathrm{x}}_{\mathrm{j}}\parallel \right), $$

where Κ υ is the modified Bessel function of second kind and order υ > 0. The integer value of υ determines the mean square differentiability of the underlying process. υ is usually fixed since typically it is poorly identified in applications. σ 2 is the marginal variance and k > 0 is a scaling parameter related to the range ρ, the distance at which the S(x i ) and S(x j ) are almost independent. In particular, we used the empirically derived definition \( \rho = \sqrt{8\upupsilon}/k \), with ρ corresponding to the distance at which the spatial correlation is approximately 0.1. We also include in the model a spatially unstructured random effect, u i , that captures the effects of unmeasured characteristics that affectall members in the survey. u i are assumed to be independent zero-mean Gaussian distributed with precision τ u .

The model for mf prevalence incorporated a temporal fixed effect to capture changes in mf prevalence over time and a spatio-temporal random effect under the assumption that fitted temporal correlations exist only with the preceding year [44]. Here, the number of positive results Y it out of N it people sampled at location x i , i = 1, …, n and time t = 1, 2, 3, follows a binomial distribution

$$ {Y}_{it}\Big|P\left({x}_i,t\right)\sim Binomial\left({N}_{it},P\left({x}_i,t\right)\right), $$
$$ logit\left(P\left({x}_i,t\right)\right)={z}_{it}\beta +\xi \left({x}_i,t\right)+{u}_{it}, $$

Here ξ(x i , t) denotes the between-location-time random effect at location x i and time t.

$$ \xi \left({x}_i,t\right)=a\xi \left({x}_i,t-1\right)+w\left({x}_i,t\right), $$

where |a| <1, and ξ(x i , 1) follows the stationary distribution of a first-order autoregressive process, namely N(0, σ 2w /(1 − a 2)). W(x i , t) is a spatial correlation term. Each w(x i , t) follows a zero-mean Gaussian distribution. The w(x i , t) is temporally independent but spatially dependent at each time t, with Matérn covariance function. u it denotes the unstructured random effect and are assumed to be independent zero-mean Gaussian distributed with precision τ u .

Variable selection and model development

We followed a model selection procedure to identify an optimal suite of covariates to include in the fixed effects part of the geostatistical models. In order to reduce any potential collinearity and confounding effects, we first grouped the variables and use a formal model selection criterion to select one variable within each of the groups (Additional file 2: Table S4). Continuous variables with an absolute value of correlation coefficient higher than 0.8, were part of the same group. Land cover categories formed another group. Within each group, we investigated the relationship between infection prevalence and each potential explanatory variable by fitting univariate generalized linear models relating the logit of infection to each of the variables (Additional file 2: Table S5). We compared the univariate models in terms of the Akaike Information Criterion (AIC) and selected the variables which had the lowest AIC in the univariate analysis. AIC is defined as

$$ \mathrm{A}\mathrm{I}\mathrm{C} = -2l\left(\widehat{\theta}\right) + 2\mathrm{k}, $$

where \( l\left(\widehat{\theta}\right) \) is the maximum log-likelihood function and k is the number of parameters. After that, we explored further simplification of the model by backward elimination of selected variables until it was no longer possible to reduce AIC by elimination of any of the remaining variables.

The age group (either below or above 15 years of age) of individuals studied on the prevalence surveys was considered at every step of the modelling process. In addition, population density was considered for the periods at which the surveys were undertaken. Since ICT-based data were available for surveys conducted from 1990 onwards, estimates of population density at 2000 were used. For mf data, in order to take into account temporal changes in population density, gridded maps of population density at 1960, 1980 and 2000 were used accordingly with the period of survey (1950–1969, 1970–1989, 1990–onwards).

The selection of covariates at this stage in the overall model-fitting process ignores spatial and temporal correlation, and as a result is likely to over-state the statistical significance of covariate effects. In the final stage of the model-fitting process we re-assess the covariate effects and their significance within a spatial mixed model for the ICT data and a spatiotemporal mixed model for the Mf data that take into account the spatial and spatiotemporal correlation, respectively.

Model validation

We assessed the predictive ability of the model using a leave-one-out cross-validation procedure. In this approach, a single observation is retained as the validation data, and the spatial model is fitted to the remaining data. Then, the observation in the validation data is predicted using the fitted model. The validation data needs to spatially represent the whole region where the prevalence is predicted. Therefore, instead of repeating the cross-validation procedure using each observation once as the validation data, we used each of the locations of a spatially representative sample of the prediction surface. To obtain a valid data set, 20 % of the observations were sampled without replacement where each observation had a probability of selection proportional to the area of the Thiessen polygon surrounding its location, that is, the area closest to the location relative to the surrounding points.

The performance of the model was assessed by comparing the observed and the predicted prevalences at each location, and by calculating the coverage probabilities of 95 % confidence interval (CI), that is, the proportion of times that the observed prevalence rates are within 95 % CIs. Specifically, we computed the correlation between the predicted and the observed prevalences, the mean prediction error (ME) defined as

$$ \mathrm{ME}=\frac{1}{\mathrm{m}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{m}}}\left({\mathrm{p}}^{*}\left({\mathrm{x}}_{\mathrm{i}}\right)-\mathrm{p}\left({\mathrm{x}}_{\mathrm{i}}\right)\right), $$

and the mean absolute error (MAE) defined as

$$ \mathrm{M}\mathrm{A}\mathrm{E}=\frac{1}{\mathrm{m}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{m}}}\left|{\mathrm{p}}^{*}\left({\mathrm{x}}_{\mathrm{i}}\right)-\mathrm{p}\left({\mathrm{x}}_{\mathrm{i}}\right)\right|, $$

where m is the number of observations in the validation data, p*(xi) is the predicted prevalence, and p(xi) is the observed prevalence at location xi, i = 1, …, m.

Implementation and spatial prediction

The models were fitted using the Integrated Nested Laplace Approximation (INLA) [45, 46] and the Stochastic Partial Differential Equation (SPDE) [47] approaches. INLA is a computationally less-intensive alternative to MCMC designed to perform fast Bayesian inference on a large class of latent Gaussian models. By using a combination of analytical approximation and numerical integration, INLA allows to fit models represented as follows,

$$ \left\langle {Y}_i|S,\theta \right\rangle \sim p\left\langle {Y}_i|{\eta}_i,\theta \right\rangle, $$
$$ {\eta}_i={\displaystyle \sum_j{c}_{ij}}{S}_j, $$
$$ \left\langle S|\theta \right\rangle \sim N\left(0,Q{\left(\theta \right)}^{-1}\right), $$
$$ \left(\theta \right)\sim p\left(\theta \right), $$

where Y denotes the observation variable that assume independence conditional on some underlying latent Gaussian field S and a vector of hyperparameters θ, and η is a linear predictor based on known covariate values c ij . The analysis of spatial point process data is possible by combining INLA and the SPDE approach. This consists in representing the continuously indexed Gaussian field S(x), as a discretely indexed Gaussian Markov random field (GMRF) by means of a basis function representation defined on a triangulation of the domain of interest. Thus,

$$ S(x)={\displaystyle \sum_{g=1}^G{\psi}_g}(x){S}_g, $$

where G is the number of vertices in the triangulation, ψ g (⋅) are piecewise polynomial basis functions on each triangle, and {S g } are zero-mean Gaussian distributed weights. The INLA and SPDE approaches are easily applied thanks to their implementation in the R-INLA software package [5].

We assigned a flat improper prior for the intercept, \( {\beta}_0\sim N\left(0,{\tau}_{\beta_0}^{-1}\right) \) with \( {\tau}_{\beta_0}=0 \), and independent vague Gaussian priors with fixed precision for all other components of the fixed effects, β i  ∼ N(0, 1/0.001), i = 1, … p. The smoothness parameter ν was considered fixed to 1 implying a continuous domain Markov field. In the spatial model, the vector of weights S = (S 1,..., S G )′ is assigned a Gaussian distribution, S ∼ N(0, Q − 1) where Q is a sparse precision matrix depending on the Matérn covariance function parameter κ and variance σ 2. In the spatiotemporal model, for ξ = (ξ 1, …, ξ n ) ', we use a Gaussian prior with zero mean and precision matrix depending on the autocorrelation hyperparameter a, k and σ 2w . To complete the models, we assign τ u a vague Gamma prior.

The model output provided the posterior distribution for each of the model parameters. We summarized these distributions to obtain the posterior mean and 95 % CI of the fixed effects and hyperparameters. Predictions of infection prevalence were provided at a spatial resolution of 10 km. Maps showing the predicted LF prevalence as well as its uncertainty were generated by summarizing the posterior distributions of the prevalence obtained at each of the prediction locations. Specifically, the predicted prevalence was represented as the posterior mean, and its uncertainty was represented using the 95 % CI. Predictions were standardized to provide estimates for entire communities (children and adults combined) and for the period 2000-onwards.

Mathematical modelling of the intensity of LF transmission

The prevalence of LF infection provides a useful metric to guide the planning of control, but provides limited insight to the dynamics of transmission. Instead the intensity of infection transmission is best quantified by the basic reproduction number, R0, which for macroparasites such as LF can be defined as the average number of female offspring per adult female worm surviving to reproduction (in the absence of density-dependence i.e. phenomena such as host immunity or worm mating probability, which accelerate or curtail the production of parasite life stages) [48]. Existing analytic expressions for R0 for LF, and helminths with similar natural history such as onchocerciasis, are defined in terms of average worm burden or microfilarial load in a community [4951], yet the majority of LF studies only present data on the prevalence of infection. In the framework published by Gambhir et al. a simple, differential equation transmission model of LF is used to estimate R0 for a given prevalence value [6]. This model assumes various density-dependent functions which alter the rates of transition between state variables in the model (for example, the rate of parasite establishment and rate of parasite fecundity). The model parameter prior distributions were defined using data from the literature and parameter posteriors were found by fitting to baseline mf prevalence data from low, moderate and high transmission settings in Tanzania [52] – settings which are representative of much of SSA.

For a setting with a particular endemic prevalence, there is an underlying bite rate and force of infection which leads to this endemic equilibrium. This parameter contributes to R0 (as explained in detail below), but can only be calculated indirectly through the method described in detail by Gambhir et al. [53] and outlined here.

The first stage in estimating R0 is to calculate the effective reproduction number, Reff in terms of the constant terms in the mathematical model and the density-dependent functions of the state variables. The effective reproductive number, Reff, is equal to the basic reproductive number when a parasite is newly introduced to a population, but as parasites become established in the population, density dependent effects, such a density dependent fecundity, can both limit and facilitate transmission, increasing or decreasing Reff. When the system is at equilibrium, the effective reproductive number is one (the number of parasites is neither increasing nor decreasing) and the parasite load in the population is at its equilibrium value. Therefore, we want to calculate the value of the biting rate parameters for which this expression equals 1 for a particular population parasite load. The expression for Reff is an implicit expression which cannot be solved analytically, and therefore has to be solved numerically for particular settings using the following method. When the function Reff is plotted against a population state variable (such as mf prevalence or community mf load), the resulting plot increases with parasite density in the population for low levels of parasite load, but for large levels of parasite load in the host population density dependent processes decrease Reff, leading to a “humped” function (Fig. 1 of Gambhir et al. [53]), i.e. as mf load (or prevalence) is varied, the function rises or falls corresponding to increasing or decreasing LF transmission. This humped function has a maximum. When this peak value of Reff is more than 1, the effective reproduction number intersects the Reff = 1 line twice, meaning that there are two equilibria [53]. The higher of these is the stable endemic equilibrium, and the lower is an unstable extinction ‘breakpoint’ [54]. Therefore in the absence of treatment, we assume that the higher point represents LF prevalence at stable endemic equilibrium (i.e. pre-intervention). Specifically, in order to fit the Reff function to the mf prevalence data, so that its upper equilibrium corresponds to the observed endemic prevalence value, we alter the mosquito-human biting rate parameter – the rate at which humans are bitten by mosquitoes – until this correspondence occurs. The model parameter values denoting a given endemic prevalence value, are used in Gambhir et al., equation 6 [53] to estimate the value of R 0 :

$$ {R}_0=\frac{\lambda {f}_1(0){f}_2(0)\alpha {f}_3(0)\beta {f}_4(0)}{\mu \delta \sigma } $$

where λ, α and β are the ‘immigration’ rates of each of the life stages (adult worms, microfilariae, and L3 infective larvae respectively; λ is a composite parameter, incorporating the mosquito-human biting rate which is estimated using the methodology describe above; μ, δ and σ are the death rates of each of the life stages respectively; and f 1(I), f 2(W), f 3(W), f 4(M) represent the modifying density-dependent functions acting on each of the respective parasite life-stage intensities (W, the number of adult worms per definitive host, M, the mf load per host, refer to parasite life stages within the definitive host population, whereas the L3 infective larval load per mosquito (larvae develop through L1 and L2 stages but only become infective once they reach the L3 stage), L, refers to the parasite life-stage within the vector host. The host immunity variable, I, increases in magnitude at a rate equal to the adult worm burden but can also decay over time, allowing the model to capture the waning of immunity). Note that for R 0, host immunity levels and worm burdens are assumed to be zero as this expression calculates the number of offspring in a wholly susceptible population. The above mathematical model was initially fitted separately for settings where either Anopheles or Culex species are the predominant vector, based on the develop vector distribution maps (Fig. 3). However, the relationship between the mf prevalence and R0 is practically identical for Anopheles or Culex species, as the model assumes similar mosquito life expectancy for each species, and therefore we present a single relationship between prevalence of mf and R0. Finally, the threshold mf prevalence value above which LF transmission can persist, which is higher than the prevalence at which R 0 = 1 due to the density dependencies assumed in the modelling framework [53], was calculated.


Collated survey data

The literature search identified 1224 surveys that provide data on the prevalence of mf and 3519 surveys that provide suitable data on the prevalence of antigenaemia, as estimated by using an ICT (Fig. 2). Most of the mf data arose from surveys conducted between 1950 and1969 or from 1990 onwards, whereas the ICT data were mainly collected from 2000 onwards (Table 1). Figure 2 maps the spatial distribution of the included mf and ICT data and shows that the majority of data were available from east and west Africa.

Table 1 Summary of data on the prevalence of microfilaraemia (mf) and prevalence of antigenaemia, based on immuno-chromatographic card test (ICT) by region and time period. Median and inter-quartile range (IQR) are presented

Vector distribution maps

Figure 3 presents maps of the predicted distributions of Anopheles, Culex, and Mansonia species. Anopheles mosquitoes of gambiae and funestus complexes are widely distributed across sub-Saharan Africa (Fig. 3a). In contrast, M. africana and Culex mosquitoes show a more limited and distinct distribution: Culex mosquitoes occur in eastern Africa, east coast of Madagascar and in restricted areas of west Africa (Fig. 3b), whereas M. africana occurs in west and middle Africa, and coastal areas of east Africa and Madagascar (Fig. 3c). A wider presence of Culex mosquitoes is predicted in Nigeria, north-east of Cameroon and north of Angola. Areas where the distribution of each of the major LF mosquito vectors occur across west Africa and coastal areas of east Africa (Fig. 3d).

Spatial prediction of LF prevalence

In the spatial model for ICT prevalence, the model selection process excluded land surface temperature, slope and distance to water bodies, while in the spatiotemporal model for mf prevalence no covariates were excluded during variable selection, as based on the AIC criterion (Additional file 2: Table S5). The predicted prevalence of LF, based on mf and ICT-based data, are presented in Fig. 4a and b, respectively, along with estimates of 2.5 and 97.5 % quantiles. Overall, predicted mf prevalence is lower than predicted ICT prevalence across sub-Saharan Africa, not exceeding the 5 % threshold in most endemic areas. Only a few small pockets of mf prevalence higher than 30–40 % are predicted in east Africa, at the north of coastal Tanzania and the southeast coast of Madagascar. Broader areas of high mf prevalence are predicted in west Africa; south of Mali (Sikasso region), large central areas of Benin and north-west of Ghana (bordering with Benin) and at the south of Abuja, the federal capital of Nigeria.

Fig. 4
figure 4

Predicted geographical distribution of the prevalence of a microfilaraemia and b antigenaemia, based on a Bayesian geostatistical modelling approach for the period 1990–onwards and before the implementation of large-scale interventions. Point estimates (based on posterior median) together with lower (2.5 %) and upper (97.5 %) percentiles are presented

The ICT-based model indicate large areas of high prevalence (above 40 %) are predicted along the coast of east Africa, from southern Kenya to central regions of Mozambique, and the east coast of Madagascar. High prevalences of antigenaemia are predicted across large areas of west Africa and pockets at the north-west of the Democratic Republic of Congo (DRC) and central Congo. In general, lower (<5 %) estimates of ICT prevalence are predicted in inland areas of east Africa (Ethiopia, Rwanda, Burundi, central and south of Uganda) and middle Africa (southern and central DRC).

Validation of the spatiotemporal model for mf prevalence was based on 86 locations and yielded a correlation coefficient between predicted and observed values of 0.69 and a mean error of 1.89 % - this indicates that, on average, the model predictions overestimate the observed prevalence by 1.89 %. The mean absolute error, which illustrates the average magnitude of the prediction errors, was 4.64 % and the percentage of locations from the sample with observed prevalence falling in the 95 % CI was 70.93 % (Table 2). Validation of the spatial model for ICT-based prevalence was based on 639 locations and revealed a correlation coefficient of 0.86, mean error of 0.06 %, and mean absolute error of 6.51 %. The percentage of locations from the sample with observed prevalence falling in the 95 % CI was 52.83 % - this low percentage was due to the fact that intervals do not cover the actual values which are observed in locations where prevalence is 0. The coverage percentage increased to 77.49 % if we approximated by 0 the interval lower limits which are lower than 0.01 %.

Table 2 Validation statistics for spatio-temporal model for mf prevalence and spatial model for ICT-based prevalence

Predictive distribution of R0

The relationship between the prevalence of mf and R0 is presented in Fig. 5. Figure 6 shows a predicted map of R0 based on mf prevalence. The estimated R0 values vary from 2.7 to 30 across sub-Saharan Africa. The areas of low and moderate intensity of LF transmission (R0 value lower than 10) predominate in all endemic areas and the highest intensity is predicted in restricted areas of west Africa (Nigeria, Ghana and Burkina Faso), along the coast of east Africa, from north-east of Kenya to southern Mozambique and larger areas in central and eastern Madagascar.

Fig. 5
figure 5

The modelled relationship between prevalence of microfilaraemia (mf) and the reproduction number (R0). Also shown is the distribution of observed mf prevalence (1990–onwards) included in the mapping analysis (n = 434 surveys)

Fig. 6
figure 6

Geographical distribution of lymphatic filariasis basic reproductive number (a) and uncertainty on R0 estimates (b) prior to the large-scale implementation of interventions. Uncertainty was calculated as the range of the 95 % confidence interval in predicted R0 estimates for each pixel and rescaling to a 0–1 scale


Here we present maps of the distribution of LF microfilaraemia and antigenaemia across SSA under a pre-intervention scenario, based on Bayesian geostatistical modelling. We also present the first attempt to map geographical variation in R0, which reflects differential risks of recrudescence and can be used to inform the design of post-MDA surveillance activities.

The geostatistical maps presented here indicate lower and more focal patterns of mf compared patterns of antigenaemia. The maps also suggest a more restricted distribution of mf compared to previous mapping work by Slater and Michael [11] which suggested higher levels of infection overall and the occurrence of transmission where LF has never or only occasionally been reported. For example, the Slater and Michael model predict mf prevalences above 75 % in Sudan and South Sudan-Ethiopia border, whereas our predicted prevalence <10 % and recent surveys using ICTs found low levels of antigenaemia [5557]. Our predicted distribution of antigenaemia is also consistent with previous country-level models [8, 5860]. Uncertainty in predictions based on mf prevalence data is greatest for central Africa, primarily driven by a scarcity of mf surveys conducted since 1990. This lack of contemporary data is compensated in part by the inclusion of older data and a temporal effect in the final model.

Here we also present the first composite map of the distribution of dominant LF vector species in SSA. Anopheles mosquitoes, mostly those species belonging to the gambiae and funestus complexes, are considered the major LF vector in SSA, particularly in rural settings where LF is more prevalent [33]. Mosquitoes of the Cx. pipiens complex, especially Cx. quinquefasciatus, are suggested to be important vector of LF in eastern Africa, especially in urban and peri-urban settings [61, 62]. In west Africa, however, it is widely accepted that Anopheles species are the main vectors since Culex mosquitoes, despite being ubiquitous and potential vectors in the region [63], are considered to transmit LF poorly [6466]. Interestingly, the areas where at least two vector genera occur coincide with areas showing the highest estimates of antigenaemia and microfilaraemia. This apparent concordance makes us think that a more efficient and higher intensity of LF transmission is expected where multiple potential LF vectors occur.

Strengths and challenges in LF mapping

The presented work has a number of strengths and limitations. Our use of Bayesian model-based geostatistics provides a flexible and robust approach for spatial modelling that takes into account the spatial dependence structure of the data and provides a formal expression of uncertainty in the prediction estimates [67]. Traditionally, Bayesian predictive inference has been implemented via MCMC methods which make inference tractable for complex models but present problems in terms of convergence and computational time [68]. We overcome these issues by using the INLA approach [46] which is a computationally effective alternative to MCMC designed for latent Gaussian models. By using a combination of analytical approximation and numerical integration, INLA produces fast and accurate approximations to posterior distributions. The analysis of spatial point data is possible by combining INLA and SPDE approaches [47], whereby a continuously indexed Gaussian field is represented as a discretely indexed Gaussian Markov random field (GMRF) using a basis function representation defined on a triangulation of the domain of interest.

Intensity of LF transmission is influenced by a complexity of factors, yet our models incorporated only environmental and demographic drivers of LF transmission in pre-intervention scenarios. The models do not incorporate socioeconomic status [6972] or the coverage of LF interventions, including MDA and vector control, which will influence transmission, especially once interventions have been scaled up. The role of vector control measures in modifying patterns of LF transmission has recently been highlighted in The Gambia where the large-scale distribution of bed nets for malaria control has, in part, contributed to the elimination of LF in the country [73]. In coastal Kenya, large-scale distribution of ITNs for malaria control has been proposed to sustain reduction of LF infection levels, even after MDA is interrupted [74]. An increasing number of studies have demonstrated the impact of repeated rounds of albendazole and ivermectin alter patterns of transmission [75] and over time dramatically reduce transmission [76, 77]. A methodological challenge for incorporating intervention-related factors into future models is that much of the available data are presented at an area-level. Bayesian hierarchical models have been used to address this problem by providing a natural way to combine data from different sources taking into account the different uncertainties [7880]. However, such models have three important limitations. First, they rely on MCMC methods for Bayesian inference which are computationally intensive and may become unfeasible for large datasets. Second, few approaches for temporally misaligned data exist and they do not account for correlation in spatial random effects over time. Third, correlation studies use the predictions from several models as covariates in regression models and as such predictions contain error as the predicted values do not equal the true exposures. These important issues indicate that there is considerable scope to develop new model-based methods for the analysis of misaligned data.

The vector distribution maps are also not without their limitations. First, different methodologies were used for different genera: maps for Culex and Mansonia were constructed by using maximum entropy ecological niche (MaxEnt) modelling [40]; and boosted regression tree (BRT) modelling was used to modelling environmental suitability for Anopheles mosquitoes [38]. Second, there is a lack of data for some areas of central and northeast Africa, introducing imprecision into maps.

Finally, the presented map of R0 is based on a mathematical model of transmission dynamics [53]. The relationship between prevalence and R0 is non-linear and there are very high values of R0 for moderate to high levels of prevalence, suggesting great challenges for control. However, it should be remembered that these values are for a population when there is no immunity, unlike endemic communities, and that we do not yet have a reliable estimate for how long this immunity will last. Therefore the high values of R0 may not represent such a large challenge for control in the short term. For recrudescence or re-emergence however, they may represent a higher risk. The model uses prevalence to estimate transmission intensity because of the availability of prevalence data. Alternative measures of transmission intensity include annual transmission potential and annual biting rates [81], but these are very rarely measured. Ideally, the model would have been parameterised with data from a range of transmission settings, but there are a lack of detailed data and we recognise that transmission dynamics will vary in settings other than Tanzania from where data were used to parameterise the model.

Mapping LF to guide control and surveillance

The developed map of R0 can be used to identify areas of higher transmission intensity where additional control efforts may be required to achieve the elimination of LF infection (i.e. implementing vector control or extending MDA rounds). Our model predicts a marked geographical heterogeneity in the intensity of transmission in SSA. Current MDA strategies do not account for this spatial heterogeneity, and therefore coverage targets and implementation schemes are common for all endemic areas [3]. The precise level and duration of treatments to achieve LF elimination in different endemic regions remains unknown [82], such that it is difficult to decide when to stop ongoing MDA interventions [51]. Indeed, although some model predictions suggest that LF transmission can be interrupted by annual MDA alone, it is not clear that this can be achieved everywhere with 4–6 yearly rounds of MDA [83].

In recent years, some experts have suggested that current control programmes based on standard MDA schemes should give rise to more tailored control interventions adapted to the local specificities of LF transmission [6, 14, 84]. Our maps provide programme managers with a thorough picture of the initial level of LF endemicity and potential geographical variation in the intensity of transmission prior to the implementation of large-scale interventions. The R0 maps also provide insight into (i) the risk of persistent transmission despite repeated MDA and highlight that pockets of residual transmission may occur in settings where overall transmission has declined [77], and (ii) the possibility of recrudescence, especially where transmission is highly heterogeneous [85]. Careful use of the developed maps can guide decision makers and ultimately programme managers first to define the most appropriate control strategy and adapt it to local conditions of transmission and second to identify areas which may require special attention during monitoring and surveillance stages. How to integrate an understanding of the spatial heterogeneity of LF into the design of optimal LF surveillance is the subject of ongoing work.


The transmission of LF is spatially heterogeneous and it is essential that intervention, monitoring and surveillance strategies take this variation into account. The developed maps of mf, antigenaemia and R0 are intended to contribute to this effort. In particular, our maps can help guide control programmes in future LF surveillance activities by identifying areas of higher intensity of transmission that may require enhanced and tailored interventions and by setting a benchmark for evaluating the impact of scaling up of LF interventions on the pathway to LF elimination. Country maps displaying the results of this modelling work will be available at the website of the Global Atlas of Helminth Infection project (