Background

Assessments of malaria control measures that target human-biting adult mosquitoes require clear understanding of how effectively they protect individual end users and how gaps in personal protection arise [1,2,3,4]. Improved understanding of behavioural interactions between humans and mosquitoes, especially where and when they overlap in time and space, is critical to estimating the impact of personal protection measures such as insecticide-treated nets (ITNs) and identification of where and when supplementary vector control tools are needed [1,2,3,4]. Similar issues are relevant to personal protection against a variety of other vector-borne pathogens, especially arboviruses such as dengue, Zika and chikungunya carried by mosquito species like Aedes aegypti that often bite people while they are active outdoors [5].

While methods for weighting estimates of human exposure to bites from Anopheles mosquitoes according to where people spend their time have been available for decades [6], they remain underutilized [1,2,3,4, 7]. Crude indoor and outdoor biting rates are still commonly misinterpreted as being indicative of human-vector contact patterns, without any adjustment for the influence of human behaviour or the personal protection effects of ITNs [1,2,3,4, 7]. Fortunately, these deficits in current common practice may be readily addressed by relatively straightforward adjustments to data collection and analysis practice [4, 7]. More accurate representations of exposure patterns can be achieved by supplementing mosquito biting activity data with complementary surveys of a small set of human behavioural variables that capture the following over the course of the night: (i) the distribution of human populations indoors and outdoors, (ii) whether they are awake or asleep and (iii) if and when they use an ITN [4, 7]. Important examples of useful indicators that can be calculated with such behaviour-weighted approaches include the proportion of vector bites occurring indoors for an unprotected individual (\({\pi }_{I}\)) and the proportion of vector bites occurring while asleep for an unprotected individual (\({\pi }_{S}\)), as well as derived terms like the proportion of all vector bites directly prevented by ITN use or the proportion occurring indoors despite ITN use [4, 7]. If surveyed, estimated and interpreted consistently, these indicators can greatly improve understanding of how malaria transmission persists despite high coverage of prevention measures such as ITNs, how exposure patterns may change as supplementary vector control tools are introduced, and the potential impacts of these new tools [1,2,3,4, 7].

The most widely used approaches for calculating behaviour-adjusted estimates of human exposure distribution deliberately use probabilistic or soft classification to allow for the considerable variability between individual people in terms of where and how they spend their time [4, 7]. The probabilities that any given individual will be indoors or outdoors during a given time increment is estimated as the proportion recorded as doing so through questionnaire or observational surveys of the human population [4, 7]. These probabilities are then used to weight entomological measurements of human exposure to mosquito bites occurring indoors and outdoors, yielding nuanced and representative distributions of mean exposure to biting mosquitoes across entire human populations or population groups [4]. However, these weighted averages rely on aggregating individual-level data to obtain mean human population distributions across the relevant behavioural classes for each time increment. These summary outcomes are therefore quite complex functions of the disaggregated data, so they do not match the standard binomial or count distributions to which routine off-the-shelf statistical tools may be confidently applied. Consequently, testing for variation between individuals, much of which may be associated with epidemiologically important covariates such as age, sex, occupation and housing [7,8,9,10,11,12,13,14], requires advanced Bayesian techniques that are beyond the reach of most field entomologists and epidemiologists.

Fortunately, the proportions of exposure to mosquito bites that occur indoors or while asleep can also be estimated in a more simplified binomial fashion, based on hard classification of human location at a given time increment, as being either completely indoors or completely outdoors [15]. Such clear-cut assignment of humans to either location compartment then allows each mosquito caught attacking a person to be simplistically and unambiguously assigned to one of three categories, on the basis that it was either: (i) caught at a time and place when most people may practically protect themselves by using an ITN, (ii) caught at a time and place when people cannot practically use an ITN or (iii) caught at a time and place assumed to be irrelevant to normal exposure patterns because the majority of people are elsewhere (Fig. 1). This simplification is obviously cruder and less precise because it disregards many valid observations of mosquito-human interactions that occur in the evenings and mornings, when some people are asleep indoors while others are awake outdoors.

Fig. 1
figure 1

A schematic illustration of how the proportions of human exposure to mosquitoes occurring indoors may be estimated as a simple binomial indicator based on hard classification of where human individuals and populations spend their time [16,17,18,19,20,21]

However, it does offer the advantage of allowing convenient analysis with standard logistic regression methods, to statistically estimate confidence intervals around means, compare vector species and human population groups and assess the influence of individual behaviour on exposure patterns and malaria risk [16,17,18,19,20,21] (Additional file 1: Poster S1). Also, because such hard classification techniques allow these key indicators to be calculated in simple binomial form, standard sample size estimation techniques may be readily used to estimate necessary minimum experimental scales and data collection targets required for field studies. While crude hard classification understandably appears to provide less precise estimates, no consistent trend toward under- or overestimation was obvious in a multi-country study encompassing ten different mosquito populations from across Africa [9]. Furthermore, hard and soft classification approaches have different advantages and disadvantages, so both may be included for complementary purposes in a given report [9].

Here, we present a brief explanation and discussion of relatively simple methodological options for applying such hard classification techniques to address common objectives in epidemiological entomology [22, 23]. We place particular emphasis on relatively simple approaches and tools that are accessible to numerate entomologists familiar with widely available, off-the-shelf analytical tools.

Simplified binomial estimation of human exposure distributions based on hard classification of where and when mosquitoes are caught

Simplified binomial estimation of the proportion of exposure of unprotected individuals to mosquito bites that occurs indoors requires hard classification of human behaviours (Box 1), assuming the night is split into distinct periods, during which all exposure is assumed to occur either entirely indoors or entirely outdoors (Fig. 1). Here, we rely on examples relevant to malaria transmission by predominantly nocturnal vectors, so we use the term night to inclusively denote all times at which such Anopheles mosquitoes are active, even if that includes daylight hours before dusk and after dawn. For more diurnal vectors of other pathogens such as dengue, chikungunya and Zika, the term day may be more appropriate and can include all 24 h of the daily cycle [15] whenever relevant.

The calculations described in Box 1 have already featured in several published applications [15,16,17,18,19,20,21] that readers may draw on as illustrative examples when adapting these techniques to their own needs. In the first of these examples [15], it was demonstrated that most human exposure to Anopheles funestus and An. quadriannulatus in southeastern Zambia occurred indoors in the absence of a protective ITN (\({\pi }_{I,u}\) ≥ 0.97), because these mosquito species are most active at times of the night when humans are indoors (\({P}_{FL,I}\) ≥ 0.90). Note, however, that if one only accounts for the behaviour of the mosquitoes, as observed attacking participants in human landing catches who artificially spend equal amounts of time indoors and outdoors for the purpose of such experiments, these two mosquito populations had no apparent preference for feeding indoors (\({P}_{I}\) ≈ 0.5) [15]. Subsequent pooled analysis of similar data from distinct sites scattered across Africa revealed similar patterns [16], confirming that the most important vectors on the continent are not innately endophagic in the strict sense but rather highly nocturnal. Therefore, it was concluded that high proportions of human exposure to malaria transmission have historically occurred indoors because that is where most people sleep at night [1, 3, 23, 25]. Further applications of such simplified binary formats for behavioural interaction indicators include demonstrating statistically significant changes in human exposure distributions following scale up of ITNs [17, 18, 21].

Beyond comparing vector species and human population groups, these simplified binary behavioural interaction indicators (Box 1) can also be used to assess the influence of variable, often idiosyncratic individual behaviours on exposure patterns and malaria risk [16,17,18,19,20,21]. Whenever possible, human behaviour data should therefore be collected in a disaggregated format that is linkable to individual human study participants, so that the epidemiological importance of differences between individuals and population subgroups can be formally assessed [4] with statistical contrasts using standard logistic regression models and off-the-shelf software [19, 20]. For ease of application to readers own data sets, a Microsoft Excel® template for calculating individual-level estimates of \({\pi }_{I}\) and \({\pi }_{S}\) is provided and populated with a sample of anonymized data from previous large-scale cross-sectional surveys [20] for illustrative purposes (Additional file 2: Dataset S1). The additional insight that may be obtained from disaggregated data with such individual-level calculations is exemplified by comparing the in-depth epidemiological analyses reported along with the original version of this template [20] with the much broader, population-wide mean overview obtained through the preceding entomological report [26].

Simplified sample size calculations for studies surveying the proportions of human exposure to malaria vectors occurring indoors or while asleep

Because these hard classification techniques allow the proportions of human biting exposure occurring indoors (\({\pi }_{I,u}\)) and while asleep (\({\pi }_{S,u}\)) to be calculated in simple binomial form, it is also possible to apply standard sample size estimation techniques [27] to plan necessary minimum experimental scales and data collection targets for field studies (Fig. 2). To enable sample size estimation with the most intuitive and readily accessible statistical tools, here an example is provided (Box 2) that adapts well-established simple deterministic statistical models, originally formulated for cluster-randomized trials with disease infection prevalence as the primary binomial outcome [27]. Of course, more advanced, simulation-based stochastic approaches may also be applied to sample size calculations for surveying these same binary indicators [28], and these more intricate techniques may be more rigorous and appropriate for teams with sufficient analytical capacity. However, relevant analytical capacity remains underdeveloped for malaria generally, and entomology in particular, in endemic tropical countries [3]. Therefore, the simple but widely accepted deterministic models [27] and calculation tools (Additional file 3: Dataset S2) used in Box 2 may represent an accessible and practical alternative for teams at an earlier stage of analytical capacity development.

Fig. 2
figure 2

The predicted relationship between the number of mosquitoes caught per cluster and the number of village-scale population clusters required to achieve 80% power if all other assumed parameters are exactly as described in Box 2. Note that in the field of epidemiology, the simple term cluster usually refers a geographically distinct, but often demographically defined, unit of observation that may be considered independent in the statistical sense [27]

Note that while many investigators may not need to implement surveys across such extensive scales, they should nevertheless allow for the fact that some natural variation of πI,u and πS,u is likely to occur among different sub-villages, villages or other geographic subunits/population clusters far enough apart to be considered independent units of observation. Furthermore, such natural spatial variation may be further exaggerated by natural seasonality, which may also be of interest in its own right. Therefore, sample size calculations should always allow for clustering and covariance within clusters, as in the example provided in Box 2. Note that it may not be logistically feasible to survey all different study locations simultaneously, especially on scales of entire districts or countries. Rolling cross-sectional designs that survey clusters sequentially (e.g. [14, 31]), rather than at the same time, may therefore be necessary to survey and resurvey the large numbers of distinct locations sometimes required to achieve sufficient power (Fig. 2).

Conclusions

Simplified binomial estimates of the proportions of exposure to mosquito bites that occur indoors or while asleep, based on hard classification of human location at a given time, allow convenient statistical analysis to estimate confidence intervals around means, compare vector species and human population groups, and assess the influence of individual behaviour on exposure patterns and malaria risk. Also, because such hard classification techniques allow these key indicators to be calculated in simple binomial form, standard sample size estimation techniques may be readily used to plan a priori the necessary experimental scales and data collection targets required for field studies. Sample size calculations for field studies should allow for natural geographic variation and seasonality, taking advantage of rolling cross-sectional designs to survey and re-survey's large numbers of separate study locations in a logistically feasible manner.