Keywords

Introduction

Thegeography of disease mapping in Africa stems back to 1951 following publication of atlas of diseases in Africa after the Second World War (Simmons et al. 1951). Historical attempts to eradicate malaria in Africa started mid-1950s during the Global Malaria Eradication Programme (GMEP) era. In the 1950s to 1960s, many African colonial governments developed crude national-level maps of malaria based on ecological zones and seasonality as part of the GMEP planning, for example, in Kenya (Butler 1959), Madagascar (Joncour 1956), Senegal (Lariviere et al. 1961), and Uganda (Mccrae 1975). The Failure of GMEP in Africa led to the resurgence of malaria through the 1970s and 1980s. Efforts to describe the geographical extent of malaria in Africa were resurrected in the 1990s. In 1996, the Mapping Malaria Risk in Africa/Atlas du Risqué de la Malaria en Afrique (MARA/ARMA), a collaboration between Africa research institutes, started to assemble data on malaria prevalence in sub-Saharan Africa (SSA) (Snow et al. 1996; Le Sueur et al. 1997). This was an initiative that started an assembly of data on malaria prevalence in Africa to used in malaria cartographic descriptions. Advances in computation and Geographic Information Systems (GIS) between the mid-1990s and 2000s has aided the development of robust malaria cartography including statistical description at national and sub-regional levels independently (Craig et al. 2007; Gemperli et al. 2006; Kazembe 2007; Noor et al. 2008, 2009, 2013b, 2014), and through the inception of the Malaria Atlas Project in the mid-2000s (Hay et al. 2004, 2008; Snow et al. 2005; Hay and Snow 2006; Guerra et al. 2007; Snow 2014).

Prevalence or incidence are two common indices that are now used frequently in the mapping of malaria (Macdonald 1950, 1957; Ray and Beljaev 1984). These indices provide epidemiological evidence of the spatial distribution of disease in the population. Prevalence represents the number of cases or infections at a given time (cross-section measure), while incidence represents the number of new cases arising over a specified period in the population (dynamic measure) (Fig. 3.1). Prevalence is usually stated as a rate (i.e. per fixed number in the population) while incidence is commonly expressed as the number of cases per 1000 population per year (Swaroop et al. 1966; Pull 1972). There are many reasons for describing the geography of these two metrics. Maps are useful tools to visualise the extent of a public health problem and for planning interventions. Maps can also be used as measurement tools to assess the impact of public health investments providing evidence on the success or failure of health interventions (Hay et al. 2013).

Fig. 3.1
figure 1

Differences between prevalence and incidence of disease. Each horizontal red bar represents a case with the length of bar illustrating the duration of illness, e.g., fever

Since the 1990s, with advances in computation and software, maps of malaria prevalence and incidence are increasingly available at global and national scales. These maps, however, are presented with varying degrees of precision due to the wide variety of approaches used in their production. Variation in the cartographic description of prevalence and incidence in sub-Saharan Africa is also driven by the quality and quantity of data available. For malaria, countries with good surveillance systems utilise routine data without the requirement for modelling, e.g. Comoros and Sao Tome and Principe (Alegana et al. 2020). However, poor quality of the routine data require modelling to adjust for the use of health services, inconsistent data reporting, and climatic drivers of transmission. As a result, complex statistical modelling schemes have been developed for mapping disease (Diggle et al. 1998; Giorgi et al. 2015).

Model-based geostatistical (MBG) approaches combined with environmental variables (predictors) that support dynamic transmission and incidence are now used commonly to produce a gridded, fine spatial resolution estimates (Diggle et al. 1998). The advantage of MBG methods is the ability to harness the spatial and temporal dependencies in the observed data and environmental predictors. MBG also estimates the uncertainty associated with the predicted maps which are often defined in space and time. In practice, the generalised linear mixed class of models is used to connect the observed data to environmental predictors (Dalrymple et al. 2015; Alegana et al. 2016). The precision and accuracy of predictions can be evaluated via internal model parameters, or via exceedance probabilities (Giorgi et al. 2018), and by comparing to out of sample data. One source of uncertainty in passive surveillance systems such as the Health Management Information Systems (HMIS), is contributed by variation in health sector use by the population.

Several further issues impact our ability to describe the geography of disease burden. Firstly, as prevalence declines, increasingly large sampling at the community level is required. Disease biomarkers are included in surveys conducted every 3–4 years. The precision of various biomarkers in these community surveys therefore varies and may not always be optimal for monitoring and evaluation (Alegana et al. 2017). For malaria, as prevalence declines in sub-Saharan Africa, the use of surveillance through a combination of active case detection (ACD) and passive case detection (PCD) are now part of the Global Technical Strategy (WHO 2015). This method is currently used in Swaziland and a few countries in southern Africa (Hsiang et al. 2012; Dlamini et al. 2018). Reactive Case Detection (RACD) is also used during epidemics (Sturrock et al. 2013). In practice, PCD is labour intensive and is hampered by the high costs of tracking cases in the population. PCD should ideally complement the ACD approach. However, most data from PCD are unreliable and incomplete (Githinji et al. 2017). Moreover, some case data reported through this system are based on clinical examination rather than parasitology. With declining burden, the ability to identify symptomatic and asymptomatic infections is critical for control and pre-elimination programmes.

This chapter reports on the highlighted data and methodological advances in disease mapping along with the advantages of using routine data. This contribution has important implications for future research on malaria in line with a declining burden for traditionally high malaria burden countries as well as for low-transmission settings. Furthermore, emphasise that routine surveillance remains the foundation for gathering evidence, tracking progress, identifying areas for rapid response and promoting the use of data for decision making.

Disease Cartography from Routine Surveillance Systems

Role of Surveillance for Geographies of Disease

Surveillance started in the 1950s as part of GMEP and was used as a means of preventing re-emergence of disease (World Health Organization 2012a). According to the WHO, surveillance included the identification of infections, investigation, elimination of transmission and prevention as well as cure. Surveillance is a recommended intervention for tracking disease burden for targeting interventions. There are two broad areas concerned with determining the incidence of disease including the identification of cases (PCD) and elimination of the identified cases.

Introduction to Using PCD for Mapping

Innovative approaches now exist to harness PCD and, thus, complement ACD which is not yet adopted by much of sub-Saharan Africa. To properly utilise routine data, there is a need to establish the denominator, i.e. population covered by the health system (the catchment population). The methods now exist for capturing the febrile population using the healthcare system and combining this with fine spatial resolution population maps (Tatem 2014) to estimate catchment populations. Secondly, alongside improvements in HMIS data, for example, through District Health Information Systems (DHIS 2) (Karuri et al. 2014; Dehnavieh et al. 2018) statistical techniques can be used to model the spatial and temporal variability in incidence while at the same time accounting for the rate of health facility utilisation and incompleteness (Alegana et al. 2013). Such approaches incorporate ecological or environmental drivers to predict risk in receptive areas while at the same time quantifying the uncertainty associated with disease predictions (Noor et al. 2012, 2013a).

Overcoming Barriers in Mapping Using HMIS Data

Health facility data serve as indicators of the disease epidemiology amongst the populations they serve. As surveillance centres, health facilities are better barometers of changing disease landscapes than modelled snapshots of prevalence.

Despite methodological advances, HMIS data have been previously ignored for burden estimation because of incomplete reporting and variation in the population using public health sectors across sub-Saharan Africa (Battle et al. 2016; Alegana et al. 2018). This implies that cases recorded at the health facility often indicate only the ‘tip of the iceberg’ of the actual burden. This variation in utilisation potentially introduces a bias in the estimation of disease burden. In addition, weak health systems in relation to the quality and quantity of data have in the past contributed to a general lack of confidence in the use of health facility data in sub-Saharan Africa.

HMIS , however, remain an important source of data for future disease mapping for several reasons. Firstly, the spatial distribution of health facilities is usually congruent to the population distribution (Fig. 3.2). Secondly, health facility case data are often collected in an ongoing manner (e.g. daily, weekly and monthly) (Mueller et al. 2011). The implication is that data are likely to have a temporal signal useful in identifying the seasonal dynamics of the disease. Thirdly, the coverage of a health facility catchment population often encompasses several villages, communities and sometimes the whole administrative region (e.g. district). This implies a wider geographic coverage of a single health facility in an HMIS than of a single village in a cross-sectional prevalence survey.

Fig. 3.2
figure 2

(a) Population density map of sub-Saharan Africa, (b) the spatial distribution of health facilities superimposed on the population map (Maina et al. 2019), (c) an illustration of outpatient malaria cases from administrative areas in Namibia showing variation in the number and seasonal patterns (Alegana et al. 2013). These seasonal average trends have not been adjusted based on total facility reporting rate. In this case, the Namibia case reporting rate was greater than 90% at the regional level

The Geography of the Denominator for Burden Estimation

Disease estimation based on health facility data requires a definition of the denominator (febrile population within the health facility catchment population). Thus, using health facility data for mapping incidence requires an adjustment for healthcare use, both in the public and private sectors. Utilisation has in the past been estimated from household surveys by quantifying the probability of public or private sector use (Stekelenburg et al. 2005; Noor et al. 2006). Such an approach is potentially beneficial in identifying the population not covered by healthcare systems. Previously, this has been characterised in GIS by defining a distance metric (or travel times) (Apparicio and Seguin 2006; Noor et al. 2006; Apparicio et al. 2008). At the second stage, the reported rates of use at the community level are modelled as a function of travel time or distance to define a utilisation probability index (Alegana et al. 2012) (Fig. 3.3). The probabilistic estimate is useful for burden estimation because patients located far from a health facility are less likely to be treated in a formal care setting. It is then possible to estimate a population coverage indicator as well as hard catchment boundaries (e.g. for probability >40%). An example of this approach has been used in Namibia to zone catchment areas and estimate the age-structured catchment population (Alegana et al. 2016).

Fig. 3.3
figure 3

(a) Shows an example of malaria landscape showing transmission aspects (mosquito habitats). Often environmental suitability drives transmission and the location of a hotspot could be far from the nearest health facility. (b) A representation of the probability of seeking treatment at health facilities. Often probability of use within the health facility catchment area reduces with geographic distance as well as other socio-demographic factors. (Adopted from Alegana et al. 2016)

Environmental Drivers of Geographical Risk

Disease burden mapping generally requires a statistical model with a suitable combination of environmental variables (covariates) to predict incidence or prevalence. Several covariates have been shown to drive disease dynamics and transmission. It is important to select a biologically plausible set of covariates related to disease based on some criterion to achieve parsimony. This is because using many covariates may result in over-fitting or introduce multicollinearity (Babyak 2004). Thus, preliminary selection of a set of covariates that best describes the response is a widely accepted exercise in statistical modelling of burden (Murtaugh 2009).

For malaria mapping, environmental variables affect the development and survival of the malaria parasite as well as the malaria vector (Molineaux et al. 1988). Examples of these include the monthly rate of precipitation, temperature, vegetation cover, aridity and urbanisation (Craig et al. 1999; Guerra et al. 2008). Figure 3.4 shows an example of satellite remotely sensed covariates plotted against PCD in Eritrea. These include precipitation, minimum temperature, maximum temperature and mean temperature, the normalised difference vegetation index (NDVI) and the enhanced vegetation index (EVI). The vegetation indices are derived from MODerate-resolution Imaging Spectroradiometer (MODIS) sensor imagery; produced after removing heavy aerosols through atmospheric correction, elimination of shadows and clouds and correcting to bidirectional reflectance (Huete et al. 2002). The mean monthly gridded temperature estimates were downloaded from the WorldClim repository at approximately 1 km spatial resolution (0.000833° × 0.000833°). These gridded estimates were produced from long-term climate observations for the period 1950–2000, interpolated using smoothing spline algorithms. Precipitation data were obtained from the Tropical Rainfall Measuring Mission sensor (TRMM 3B43 product) that combines ground observations and satellite sensor data to generate gridded rainfall estimates at approximately 0.25° × 0.25° spatial resolution (Huffman et al. 2007). TRMM 3B43 is a gridded mean monthly average product of precipitation rate in mm h−1.

Fig. 3.4
figure 4

Seasonal monthly plot of the observed malaria cases (green bars) from 2010 to 2012 with the dark grey representing P. falciparum malaria cases and light grey the P. vivax cases. The magnitude for the cases is shown on the primary vertical axis. Covariates (secondary vertical axis) are plotted as dashed lines

Example of Mapping PCD from HMIS

Spatial regression models are common in disease mapping (Bernardinelli et al. 1997; Clements et al. 2006; Schrödle and Held 2010, 2011). Two common approaches involve smoothing of disease rates in space applying small-area estimation methods (conditional autoregressive (CAR)) and the interpolation via geostatistical approaches (Banerjee et al. 2004). The CAR framework involves spatial smoothing between administrative areas (e.g. districts) (Besag et al. 1991). The level of smoothing is controlled via modelling parameters. A suitable smoothing approach should take into consideration the arrangement of spatial units to yield optimal spatial variation. A general problem common to this approach, however, relates to a change in the statistical outputs as a result of a change in the shape or size of the geographic unit, the modifiable areal unit problem (MAUP). Hierarchical modelling aims to mitigate some aspects of MAUP.

An example of the use of a hierarchical Bayesian model applied to smooth monthly malaria incidence at the district level for case data is shown based on data in Fig. 3.4. The numerator is presented as the sum of cases recorded at the facility (include both confirmed case through parasitology diagnosis and clinical diagnosis). The denominator was derived from the population-weighted catchments representing all-age febrile case risk at each health facility. An adjustment is made to clinical case diagnosis using the slide positivity rate at the facility for the numerator. For the denominator, adjustment is necessary for reporting rate and health facility use (Fig. 3.5). This modelling example was conducted using facility-level data. Thus, a facility-level random effect was incorporated to allow for variation between facilities at the district level as well as a seasonal trend. Such an approach improves smoothing and estimation. To deal with incomplete reporting missing data months were imputed as ‘NAs’. Random effects were incorporated at the district and regional levels. Non-linear parametric smoothing functions were used for the covariates rather than an assumption of linearity (constant) (Fahrmeir and Knorr-Held 2000) (Fig. 3.6).

Fig. 3.5
figure 5

Schematic diagram showing the general modelling framework. The Test Positivity Rate (TPR) is defined based on the testing proportion at the health facility while the estimation of the catchment population is based on geographic access modelling

Fig. 3.6
figure 6

An example of monthly maps of the incidence of P. falciparum per 1000 population in Eritrea using a Bayesian spatio-temporal Poisson model. Districts with low risk are classified as <5 cases per 1000 population) and moderate risk with >5 cases per 1000 population. Data were from a 3-year time-series (2010–2012) of malaria cases from the HMIS

Discussion

Routine Surveillance for Mapping Disease Burden

The chapter aimed to highlight issues around the use of these data in SSA , challenges and examples of methods deployed to map routine surveillance data. HMIS coordinates the routine acquisition of data from health facilities (public and private) and compilation of these data (e.g. cases) through the district, regional and national levels (Abouzahr et al. 2007; Boerma and Stansfield 2007). Such data form an integral part of healthcare delivery and are useful for planning, resource allocation and disease monitoring. In reality, however, HMIS are often incomplete in many African countries as outlined in this chapter and the utilisation of health facilities is not uniform. Some of the factors contributing to low facility utilisation include the availability of health services, financial factors, geographic access and waiting times at facilities (Breman 2001). Studies carried out in Kenya suggested cost, distance and opening times as some of the main factors influencing choice and decisions to seek treatment in either the public or private sector (Chuma et al. 2010) impacting data on cases recorded at the health facility and within HMIS. Therefore, specific methods using surveillance data to produce disease cartography are necessary to smooth estimates of incidence and adjust for sporadic reporting and utilisation by the population. These were demonstrated in this chapter alongside accounting for environmental variables when estimating incidence. Mapping disease incidence is important to the various national health programmes for resource allocation and provides useful insights in carrying out targeted surveillance.

Geographies of Disease Burden in Low-Transmission Settings

The declining prevalence of disease presents several challenges. With low transmission, the disease tends to cluster in specific population ‘hotspots’ (Bousema et al. 2012, 2016). The traditional household surveys become challenging to implement because of the requirement for large sampling and cross-sectional surveys fail to detect short-term changes in disease prevalence (at small temporal scales). This is because cases vary temporally, being susceptible to changes in climate, ecology and population movements (Erbach-Schoenberg et al. 2016). The cartographic challenge is then to identify hotspots of transmission at fine spatial resolution based on the aggregated case data observed passively or combined with active case detection. Approaches to mapping disease based on cases aggregated at the district level and prediction of spatio-temporal maps at a fine spatial resolution can be used (Alegana et al. 2016) (Fig. 3.7). The approach improves the ability to characterise hotspots at the fine spatial resolution and can be used to target resources to specific local populations. This targeting can be cost-effective where the population distribution is sparse and further surveillance can be limited to specific local areas.

Fig. 3.7
figure 7

Example of a fine-resolution map of incidence for Namibia based on the data in Fig. 3.2c. Map produced only for the endemic northern regions of Namibia

Challenges and Opportunities for Cartography for Elimination

Progress in identifying symptomatic cases within the population has important implications for asymptomatic case detection. Both PACD and RACD will benefit from the improved mapping of passively detected cases at fine spatial resolution. Improvement in routine data quality is likely to enhance malaria cartography. A different challenge exists in areas where multiple malaria parasites co-infect (Cotter et al. 2013). Most of the approaches outlined for disease cartography often focus on one parasite species. For example, there has been some progress in mapping other malaria parasites on the continent such as P. vivax (Battle et al. 2019). There is increasing evidence of Pv distribution (Twohig et al. 2019). More effective approaches need to be developed for mapping co-infections (Commons et al. 2019). The challenges posed by P. vivax are considerable due to the biological characteristics (Mueller et al. 2009). P. vivax exhibits a dormant liver stage responsible for most relapses up to weeks or months after an initial attack (White 2011). This complicates the ability to detect and apply suitable cartographic approaches to the asymptomatic Pf and Pv co-infections within the population.

Conclusion

The last decade has seen a transformation in Health Management Information System (HMIS) data in Africa. Two key advances include data digitisation of data through DHIS2 and Firstly, the ability to define malaria-specific morbidity presenting to the health facilities through Test. Treat. Track (T3) initiative (World Health Organization 2012b). The potential benefit of this transformation cannot be over-stated since the data represent the entirety of the presenting cases in national public health systems in participating African countries. Moreover, most African nations now have operational digital and georeferenced HMIS, meaning that the ensemble of HMIS represents a powerful lens through which to assess the health of the people of Africa as a whole. The data are subject to some biases, most notably that the public health system is only a part of the full health system, albeit a major part and that under-utilisation of the health system can occur at alarming rates, particularly in rural areas. Nevertheless, Bayesian statistical approaches have been developed by the authors that allow for suppression of these biases when mapping disease incidence through space and time. With appropriate Bayesian statistical handling, including the use of environmental covariates, the HMIS data have great potential for monitoring the health of Africa over space and time and for targeting interventions in both space and time. They have a particular utility for low endemicity settings, or in pre-elimination settings, where prevalence of disease is low and clustered in hotspots. In such settings, active case detection is extremely inefficient to the point of being unusable, and passive case detection, as afforded by the HMIS, can be invaluable for residual or emerging hotspot detection. We hope that this chapter will lead to greater awareness of the potential of African HMIS for and the space-time statistical techniques that allow their proper and principled use.