Introduction

While the global burden of disease has been shifting from communicable to non-communicable diseases and injuries (Murray et al. 2012), infectious diseases continue to causes persistent (HIV/AIDs; Tuberculosis; Malaria), and episodic (Ebola in West Africa) public health threats. In both cases, the environment is critical to the risks to human health and wellness posed by infectious agents (Lash et al. 2008; Weiss and McMichael 2004), environmental toxins (Ard 2015), access to healthcare (Wang and Luo 2005; Kwan 2013), social disadvantage (Wilkinson 1994), and stress (Shankardass 2012). There is a pressing need to develop the theories, tools and datasets that support investigating links between the environment and health.

A number of converging trends are currently aiming towards a greater role for geography in studies of environment and human health. Firstly, the emergence and re-emergence of infectious pathogens over the last few decades has re-emphasized the importance of local exposures and risks, whether due to environmental or communicable transmission processes that result from complex assemblages of local and global-scale processes. Secondly, spatially explicit models of infectious diseases have been developed at a range of spatial and temporal scales—from household-level models of infectious disease spread (Riley 2007), to global models of the factors driving the spatial distribution of zoonotic pathogens (Jones et al. 2008). These models require spatial datasets describing every aspect of disease risk from wind patterns to health-seeking behaviours. The use of geographic information systems (GIS) is critical to the building and operation of spatially explicit disease risk models. Finally, spatial modelling has been improved by the concurrent rise in sources of geographical data to support such modelling efforts across all scales (though not all diseases). The importance of the ‘landscape’ in emerging, zoonotic, and infectious disease literatures is widely recognized from both a substantive perspective due to greater interdependencies between populations (Khan et al. 2009), incursion into natural areas (e.g., Field et al. 2001), intensification of livestock production (Gerber et al. 2005; Graham et al. 2008), wildlife trade (Karesh et al. 2005) as well as more practical considerations due to the abundant data and technologies now available (e.g., Vitolo et al. 2015). We focus on landscape epidemiology (LE) which is interested in the interaction between features of the landscape (and their composition and configuration) with disease and risks of disease. This is distinct from the broader domain of spatial epidemiology, which includes studies of health services and health systems.

Despite the growing adoption of geographic tools and methodologies, their deployment in infectious disease epidemiology, modelling, and control at the implementation level has been mostly piecemeal and study-specific. Lambin et al. (2010) suggested ten propositions which help to situate relationships between landscape change and infection risk, identifying key themes such as connectivity, landscape configuration and composition, the importance of examining processes and patterns at multiple spatial and temporal scales, and human behaviours. While these propositions provide a valuable guide for researchers, they do not provide aid to the use of that knowledge in applied settings. Most research into landscape and disease risks assume a better understanding of infection risk will ultimately ‘trickle down’ to effective uses of that enhanced knowledge. Recent advances in understanding of how knowledge is adopted in decision-support contexts, and the translation of knowledge to specific communities of users in application of health research and technologies has shed light on the importance of the types of knowledge and expertise required to make this transaction possible (Straus et al. 2011). The simple fact that knowledge exists does not necessarily mean that it will be pushed to development and application (Estabrooks et al. 2006). In this paper we take a different approach, by focusing on three core knowledge use contexts that broadly encompass the majority of spatial epidemiological studies. We aim to develop a single organizational framework for conducting studies of landscape and infectious diseases that situate methods and data within the broader conceptual context and thereby promote more rigorous and transparent study design and interpretation and translation of findings.

Need for organizing framework for landscape approaches to health

The rapid expansion of methods and data available for landscape-oriented epidemiological studies has considerably transformed health-environment research over the last two decades. It is now customary to include landuse/landcover data sensed from satellites, climate data obtained from global repositories of networked weather stations (e.g., WorldClim) or from satellite sensors such as the Tropical Rainfall Monitoring Mission that provides daily rainfall data for all of Asia. Also, these data products are increasingly available at a global scale, and as ‘pre-processed’ information products consumable by spatial models. For example, the WorldPop project (http://www.worldpop.org.uk/) aims to provide high-resolution spatially explicit demographic data across South and Central Americas, Asia, and Africa (Tatem 2014). Geographically granular case data aggregated over small health areas are also commonplace, both in studies in the developed and developing worlds. Data describing other features of the environment such as roads, river networks, water bodies, built-up areas, health care service locations are also increasingly available in even remote parts of the world. We are on the cusp of real-time global-scale epidemiology, whereby ‘feeds’ describing the important features of the environment for health is updated in real-time. For diseases with immediate environmental components of risk (e.g., arboviruses, heat-related illnesses), the benefits of timely updating of spatially explicit information on risk factors provide opportunities for highly granular public health responses and data-driven policy formulations. Further, diseases with longer latencies and more complex risk profiles also stand to benefit from enhanced environmental information. With this data-rich environment in machine-readable forms, and the capacity for automated model-building, it is important to structure analysis using such large, spatially explicit information sources within a broader framework for understanding disease-landscape dynamics (e.g., Lambin et al. 2010).

Citizen-based, user-generated, or volunteered data have also become increasingly prevalent, sometimes described as participatory epidemiology (Brownstein et al. 2009). The recent cholera outbreak in Haiti provides a salient example, where volunteers on the ground and on the other side of the world contributed data and expertise to the mapping of local environment in support of disease control and health programs. Citizen-generated data is also increasingly used in empirical studies themselves, for example—actively in mobile health-reporting apps such as ‘outbreaks near me’ (Freifeld et al. 2010) as well as passively, when web-search data are repurposed for syndromic surveillance of seasonal influenza (Hulth et al. 2009), in support of what is often called ‘epidemiological intelligence’. In landscape epidemiological research, all of these datasets can be accessed, mapped, and integrated via their geographies to derive factors affecting disease risk, and the cumulative risk distribution in space and time.

The data processing methods, software, statistical tools, visualization approaches, and reporting mechanisms that drive this research vary considerably, and encompass a wide array of levels and areas of expertise. As such, the place of geocomputation—the tools and methodologies of geographic computing—within this spectrum of approaches is unclear, and more importantly, selecting the appropriate approach for a given study objective is increasingly difficult.

Reviews of ‘GIS in Epidemiology’ have not provided this sort of framework. Many reviews have rightly focused on broad overviews of spatial analysis or GIS methods and disease (Cromley 2003; Rytkönen 2004), and/or their application to areas such as animal disease (Freier et al. 2007), and other areas such as surveillance (Robertson et al. 2010). The shortcomings of GIS approaches to health research and specifically the disconnect between analytical methods and parameters such as neighbourhood weights matrices in disease mapping models, and their epidemiological meaning was highlighted by Yang et al. (2013), whereas Fritz et al. (2013) review and compare methods for handling point event data in spatial health research. Barrett et al. (2013) reviewed the scope for big data to improve understanding and tracking of health-related behaviours and outcomes. And while the emphasis in health geography over the last two decades has focused largely on social determinants (Kearns 1993; Kearns and Moon 2002), these approaches have bifurcated somewhat from approaches dominant in spatially oriented infectious disease epidemiology, which have remained quantitative and modelling/biomedically oriented. Rather than debate the relative merits of these paradigms, it is instructive to re-examine the motivation for these studies in the first place, and perhaps contextualize these differing research frames within a common framework. Recent work relating infectious disease risk to climate change has introduced ideas from adaptation research such as vulnerability and adaptive capacity that are conditioned on socioeconomic processes (Kienberger and Hagenlocher 2014) is an example of a new generation of integrative spatial studies of health and disease that may be bridging this gap. In this paper, we present a framework oriented around knowledge uses and existing tools in order to provide a methodological grounding for new integrative studies in landscape epidemiology (LE).

A key requirement for landscape epidemiology is the ability to handle, process, represent, and transform large amounts of data. The backbone of GIS—relational database management systems—has provided the technology for this historically, although rarely identified explicitly in research related to LE studies. However, given the increasingly granularity of satellite data, the volume of mobile sensor data, and the advent of ‘big data’ generally (and real-time epidemiology), these technologies are starting to become insufficient to handle the current data-rich environment. New geocomputational methods for spatial data storage and access may be required for new research in landscape epidemiology—including spatial indexes, data compression algorithms, and knowledge discovery algorithms. For example, social media data streams are by definition ‘always on’—and extracting and storing a subset of these defined by space and time into a relational database for the purposes of on-line surveillance of new disease reports induces a delay in the processing chain. Application programming interfaces (APIs) as a way of accessing data portals algorithmically partially sidestep this issue, but these tools are by their nature reserved for researchers with advanced technical skills. Consequently, the ‘data processing step’ is an increasingly relevant but frequently overlooked component of LE studies, as data integration based on geography is often required prior to undertaking any analysis. Complex processing chains that incorporate web-based geographic data repositories, cloud-based storage and analysis, and delivery of results via web-map services may encompass the computational tools of neogeography (e.g., OpenLayers, cartoDB, D3, leaflet), open data portals and cloud services (e.g., Amazon EC2) and a myriad other software tools and packages. Situating these methods and tools within their functional context relevant for LE will provide an organizational mapping to aid understanding of this complex and rapidly evolving landscape.

A key theme in the adaptation of landscape approaches to human health is the explicit recognition of the importance of multiple spatial and temporal scales in the study of health outcomes. While scale has increasingly been emphasized in the health geography literatures, much of this work has been conceptual and theoretical rather than applied. While laying the necessary theoretical foundations for a multi-scale approach to understanding health and disease in populations is an important first step, concrete tools and methodologies are required in order to operationalize these ideas for applied LE. Geocomputational methods are well-suited to multi-scale analysis, however there are several approaches to multi-scale analysis that can be taken, and little guidance is available to researchers aiming to take a multi-scale approach to LE. Characterizing features of the landscape at multiple scales is perhaps one of the most widely employed tactics, whether it is compositional factors (e.g., percent forest cover) or configurational properties (e.g., edge density, fragmentation). Identifying ‘zones’ of scale consistency and abrupt changes in landscape properties with scale can provide insight into the appropriate spatial scale of investigation (Wu 2004). This approach is consistent with the landscape ecology paradigm of hierarchical patch dynamics, which may provide a conceptual basis for investigating landscape influences on health in patchy landscapes.

The theories, tools and data driving applied LE research today have significant potential to improve health of individuals and the allocation of scarce healthcare resources. Our aim in proposing the framework that follows is to more clearly organize current methods, provide a structure for research design, and more concretely link methods to knowledge translation and use. Surveillance can provide health workers with the baseline knowledge required to judge when and where unusual events are taking place, which can catalyze early responses to outbreaks or changes in disease incidence. Knowledge of risk factors can contribute to formulating effective interventions, and identifying areas and populations vulnerable to disease. Once knowledge of landscape influences on health and disease is established, this can be applied and integrated into healthcare delivery systems in contextually appropriate ways (Lai et al. 2008).

Landscape approaches: use cases

We provide the motivation for our framework through the lens of applied LE. While these use-cases are not encompassing of all studies that might employ LE approaches, these categories cover the majority of applied research objectives, and as such will provide a useful lens through which to contextualize our organization of geocomputational methods applicable for LE. It is worth highlighting that these categories are not mutually exclusive, and often studies and research projects aiming to take a LE approach will encompass two or even three of the use-cases.

Surveillance: What has changed?

Public health surveillance is defined by the World Health Organization (WHO) as continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice (WHO 2005). A variety of objectives may drive surveillance including outbreak detection, situational awareness, and identifying long term trends, and statistical and computational methods will be specific to the system objective (Robertson et al. 2010). Additionally, a variety of information sources may be required to achieve a given public health objective targeted through surveillance (WHO 2014). The inclusion of explicit geographical information is now a key component of most actionable public health surveillance systems, including those focused on infections with major landscape components.

Spatial risk modelling: Why are things are where they are?

Many spatial–epidemiological studies are interested in identifying the factors that contribute to the spatial risk distribution. Most regression-type models employed through either generalized linear modelling framework or Bayesian hierarchical modelling fall into this category. With respect to landscape risks, a wide variety of modelling methods are now being used such as MaxEnt (Phillips et al. 2006), random forests (and related methods) (Breiman et al. 1984), and general additive models employing various nonlinear mappings (Hastie and Tibshirani 1990). The elucidation of spatial risk factors in LE can provide key insight into how landscape impacts risk, but often these connections are difficult to identify and confounded with sampling unit and spatial scale (e.g., modifiable areal unit problem).

Interventions: What would happen if …?

Evaluating disease control methods has been largely the domain of mathematical modelling methods that have been developed for outbreaks of infectious diseases, and environmental heterogeneities are rarely included—often because spatially explicit data are lacking and/or relationship to environmental factors are unknown (although see a recent example of the Ebola outbreak by Merler et al. 2015). However policy-scenario modelling in the context of disease control can be used to investigate the effect of different public health policies on health outcomes of interest (e.g., Claude et al. 2009). The required knowledge in order to parameterize models for evaluating interventions is generally very extensive and requires understanding some degree of absolute risk (e.g., Lengeler et al. 1998).

Framework for geocomputational landscape epidemiology (GLE)

Here we propose a functional framework for organizing analytical tasks that are commonly employed as part of LE, which we are terming Geocomputational Landscape Epidemiology (GLE). Geocomputation was defined in Openshaw (2014) as the ‘application of a computational science paradigm to study a wide range of problems in geographical and earth systems (the geo) contexts’. This definition makes special point to emphasize that the geo includes both human and physical systems, and that geocomputation is part of a wider shift towards computational science in social sciences, natural sciences, and subfields of the humanities disciplines that emphasizes mathematical models, simulation, and high-performance computing. Geocomputation is not equivalent to data mining, machine learning, or computer science, but may include aspects of these disciplines (Couclelis 1998). The defining characteristic of geocomputation is that advanced computational tools and methods are deployed for solving complex geographical problems that would otherwise not be possible. Thus the tools in geocomputation do not only provide a faster way of doing calculations, they provide a methodology for conceptualizing new research questions from a computational science perspective, and the tools to answer these questions.

Geocomputation may be seen as analogous or congruous to the computational science approach to social science generally, described by Torrens (2010) as “making use of computing and informatics in exploring the mechanisms that drive complex social, behavioral, and economic systems”, emphasizing themes of complexity, modelling and simulation, visualization, cyberspaces, semantics, and socio-technical systems. In the natural sciences, advanced computational methods have been deployed for modelling and simulating environmental processes for many years, often as implementations of numerical models.

There is a natural fit to apply the tools of geocomputation to landscape epidemiology, which was described originally by Pavlovksy (1966) as the concept of the ‘natural nidality of human diseases’, recognizing that the sources of human disease existed naturally in the environment, varying with climate, soils, elevation, vegetation, and other landscape components. These ideas, most evident for the zoonotic infections for which they were discovered, have also been extended to chronic disease, and recognition of environmental effects on health is now widespread. Recent adaptations and extensions have been formulated as ecosystem-health, which focuses on health from an ecological perspective (Waltner-Toews et al. 2008), One Health, which emphasizes inter-relatedness of animal and human health (Coker et al. 2011), and wildlife health (Grogan et al. 2014; Stephen 2014), which takes a similar approach to the protection and promotion of healthy wildlife populations.

We present this framework as a general organizing system for studying disease from an explicitly ‘landscape’ approach, one which by its nature requires some sophistication in spatial analysis and geographic information handling methodology. More importantly, the framework provided in Fig. 1 outlines a knowledge-based organization of geocomputational methods.

Fig. 1
figure 1

Framework for geocomputational landscape epidemiology which moves from lower level complexity of a describing patterns, b describing processes that interact with those patterns, c examining how patterns and processes contribute to disease risk and health-promoting factors, and d evaluating final information products (maps or other) and link these to research gaps and knowledge uses

Characterize the assemblage

The first component of GLE is called ‘characterizing the assemblage’ and encompasses all methods of analysis that aim to describe a pattern or relationship. This phase usually comes before other phases and often will be the aim of study itself, such as to estimate a realistic distribution of disease risk from aggregated case data. There are two basic approaches to characterizing assemblages, which correspond to the two types of properties associated with a spatial pattern: pattern composition, and pattern configuration. Pattern composition includes measures that describe how much of a quantity is distributed on the map, or broadly answer the question ‘what is where?’ We keep this definition broad as many of the approaches used in geocomputational approaches to LE are aimed at answering these types of questions. For example, cluster detection and hotspot mapping methods are examples of this, which might not normally be described as measures of composition. Measures of configuration—alternatively, seek to describe the spatial configuration of quantities on the landscape. Configurational measures answer the question ‘how is x configured on the landscape?’. Answers to these types of questions are therefore descriptions of spatial pattern, rather than locations. Details about the methods within this part of the framework can be found in Table 1.

Table 1 Methods and information sources for characterizing compositional properties of geographic distributions

Composition

Measures of composition in this context comprise all methods that aim to describe or characterize the spatial or temporal distribution of risks and/or risk factors. As this is an extremely common and broadly defined task, this encompasses many different statistical approaches, but all aim to quantify or explain the distribution.

Configuration

Measures of configuration contrast with those of composition as they are aimed at quantifying the spatial configuration of the pattern. The methods here are used to complement compositional measures as part of a GLE study, and often have direct epidemiological relevance and can be used as covariates with modelling methods.

Characterize functions

Characterizing functions of a system or disease of interest with geocomputational methods differs from the above as the focus here is on dynamic elements of quantities of interest. Many of the methods and approaches noted in this section are more general than geocomputation alone, but do have important or special considerations when implemented in a setting where space is explicit. Functional properties include describing inputs, outputs, and connections between different functions that might combine to form a complex system. As the ideas here are very general, we have broken up the concepts into some classes, though emphasize they area all descriptive of functional characteristics, and in most cases, require that basic characterization of the underlying landscape-disease assemblage has already been done.

Flows: activity patterns, trade flows, animal movement, vector movement

Many concepts relevant for describing health in LE are better described as ‘flows’ then factors or static measurements of landscape properties. In our framework, flows can be defined as methods, variables, and concepts that describe movement of a quantity of interest in geographic and/or parameter space. In practice, this step of analysis might be used to develop spatial variables that are used in a study aimed at the ‘characterize the assemblage’ step, or may be the objective of analysis. Examples of flows include the movement of mosquito vectors with climate change, movement of cattle to markets in a region, commuting patterns of residents of a large metropolitan area, and the flow of capital into resource extraction industries in individual countries or regions. While we don’t constrain our description to physical flows (e.g., diurnal movement of people within the city), we emphasize aspects of flows that can be represented and or analyzed within a geocomputational framework. Often, non-geographic flows have important properties and expressions that are embedded within geographic space. For example, the use of social network analysis in recent years has increased—in order to identify physical contact tracing and exposure/transmission opportunities, but also to model the flow of knowledge about public health-related factors such as health-promoting activities or risk perceptions within and between vulnerable communities.

Feedbacks

In complex systems, feedbacks are defined as those components of a system that engage in or foster learning and exhibit behavior. In geocomputation, learning and behavior are increasingly relevant concepts, as algorithms are developed that learn relationships across space from continuous data streams (Young 2013). Many of the dynamical models of mathematical modelling center on feedbacks between system components, and estimation of critical epidemiological parameters. While geography has traditionally been excluded or assumed away in these models, the field of spatial statistics has developed spatially explicit extensions to classical epidemiological models (Riley et al. 2015). However, for LE the majority of approaches that are focused on representing connectivity and positive and negative feedback between system components fall into the category of conceptual modelling approaches rather than empirical/mathematical modelling. The translation of conceptual models of complex systems to workable mathematical models that can be fit to observational data is one of the most challenging aspects of GLE. In practice, there is often iteration between the approaches described in Tables 2 and 3 and those in Table 4. Feedbacks can also exist within complex systems designed to represent a health domain that are created as loosely coupled ensembles of simple models—which are integrated with an aim of intelligence, situational awareness, or monitoring.

Table 2 Methods and information sources for characterizing configurational properties of geographic distributions
Table 3 Methods and information sources for characterizing functional flows of geographical processes
Table 4 Methods and information sources for characterizing functional feedbacks of geographical processes

Events

Events in the context of GLE are discrete in space and time, and have relevance for understanding how landscape influences disease dynamics. The International Health Regulations (2005) define an event as “a manifestation of disease or an occurrence that creates a potential for disease” (IHR 2005). This includes infectious disease, zoonotic disease, toxic pollution events, and these threats to public health form the basis for the ‘Event-based surveillance’ approach described as part of WHO’s early warning and response system (WHO 2014). Events in GLE therefore correspond to methods and tools that are required for the handling, processing, and analysis of information sources that provide context or early-warning for occurrences that create potential for landscape-oriented disease (see Table 5).

Table 5 Methods and information sources for characterizing events in geographical processes

Map interdependencies

While the ‘phases’ above have focused on a categorization of existing methods used in geocomputational approaches to LE, we now turn to a phase where the functions described in part 2 are mapped back onto the compositional and configurational patterns described in part 1. Here, we begin to see how interacting components of the system are spatially structured, which functional components share geographic properties, and what the overall controlling contribution of landscape is to the risk profile of interest. This might be considered an integrative or meta-analysis step as this is rarely done within the context of a single study, but generally is part of a review study or systems-based empirical analysis of existing literature. There are few formal methods available for this step, but these types of meta-analyses or systems analysis are typically depended on when major new health events such as the emergence of a new disease occurs, as they provide a holistic description of the processes that conspired to lead to the event. A good example is provided by Wang and Eaton (2007) which describes the overall conditions that led to the emergence of SARS in southeast China in 2002 including documented human-to-human spread, independent transmission events from animal-to-human in four separate cases, and animal-to-animal transmission among palm civet cats in a market environment.

An example of a technological approach to this step might be that of HealthMap which provides integration of reported health events obtained from the web from all over the world (Brownstein et al. 2008). Through scraping, geocoding, and mapping health event data in a common platform, this platform provides the capacity for ‘epidemic intelligence’—using space to index health events and draw common cause and connections where they otherwise might not be apparent. This also provides multi-scale exploration of the patterns of outbreaks, from the local to international.

Examine outcomes

The final part of this framework is to examine the outcomes, real or simulated, from the geocomputational representation of the disease/landscape system of interest. The task of examining the outputs of a model can include visual analysis, pattern comparison measures and sensitivity analyses to changes in model parameters. Additionally, this sort of integrated interpretation also considers the advances made in the non-landscape aspects of the disease that have taken place and may provide alternate (re)-interpretations of patterns. For example, when a new disease emerges, patterns of spread and distribution will be unexplained until the natural reservoir has been isolated and genetic work completed. The objective of this level of analysis is to realize integrated insight into landscape–disease interactions and risk.

Case study

What follows is a brief case study that highlights the components of the framework in relation to a zoonotic disease of global importance, leptospirosis (Bharti et al. 2003). The case study is not exhaustive of the framework phases, and includes a review of previously published work and new analysis of a large outbreak in 2008 in Sri Lanka.

Surveillance: leptospirosis in Sri Lanka—epidemic in notified cases

Leptospirosis is a one of the most prevalent zoonoses in the world, affecting millions of people annually every year. The bacterial spirochaete (i.e., leptospira) that gives rise to the disease in humans have a large number of animal reservoirs, including cattle, rodents and foxes. The bacteria are passed from animal reservoirs to the environment through urine—where they can then infect humans.

In Sri Lanka, leptospirosis is historically associated with rice agriculture. Transmission occurs when open wounds or skin abrasions come into contact with urine from infected rodents (giving rise to the local name ‘rat fever’). Diagnostic facilities are limited and the clinical presentation is similar to Hantavirus infections (Gamage et al. 2011; Sunil-Chandra et al. 2015), and only a fraction of suspected cases are tested. A large outbreak of suspected leptospirosis occurred in Sri Lanka in 2008. The reasons for the outbreak remain speculative, but are hypothesized to be at least partially due to expanded cultivation of paddy field areas that resulted from policies aimed at increasing domestic rice production. As part of the programme ‘Api Vavamu, Rata Nagamu’, unused fertile lands were targeted for food production among landholders, many which had not traditionally been involved in farming. The development of this program was a direct response to the global rise in food prices that began accelerating in 2007, with international price of rice increasing 250 % between 2007 and 2008 (Kelegama 2010). According to the Central Bank of Sri Lanka, rice paddy production increased 18.2 % between 2007 and 2008 (Kelegama 2010).

There are many questions related to the causes of the epidemic of notified cases of leptospirosis in Sri Lanka in 2008 to be investigated, including (a) whether cases truly were leptospirosis or a concurrent outbreak of some other clinically indistinct outbreak as has been theorized (Agampodi et al. 2011); (b) whether the epidemic had strong environmental determinants—either in higher than average rainfall, flooding, or other factors, and whether the epidemic was predictable and (c) what early warning could have been forecast using available data; (d) would the epidemic have happened had the local food production policy not been in place, and finally—(e) what was the importance of variability in clinical suspicion in driving the spike in reported cases. Many of these questions have geographical dimensions that warrant and/or necessitate a geocomputational approach.

Employing the framework outlined in this paper as a guide, we will show how we could approach some of these research questions. Starting with part 1—in many studies where a GLE approach is employed, variables related to local meteorology and LU/LC will be the first step towards characterizing the assemblage of relevance for understanding the disease distribution. In many developing countries, these variables exist but are difficult to access or not available directly in digital form, and researchers often rely on long-term normals. Depending on the application, these may be suitable candidates, but for most applications and GLE contexts daily meteorological observations are required (or a combination of daily and normal)—a step that can require extensive text parsing to transform the data into a format suitable for storage in a geographic database. Note that an important precursor to this step is to identify the spatial and temporal scales that are relevant for the study.

Characterize the assemblage

The key questions we will explore in this analysis relate to the relationship between rainfall and reported cases of leptospirosis during the peak of the epidemic in 2008. As disease reporting occurs over the administrative units scribed for the Ministry of Health, called Medical Officer of Health (MOH) Areas in Sri Lanka, we need to standardize both measures over these geographical units. The potential mechanistic relationships include at least those outlined in Table 6 so the actual temporal lag required to investigate the relationship between cases is a key parameter of interest, as this could be used to determine the early-warning value of rainfall modelling.

Table 6 Hypothesized rainfall–leptospirosis relationships in Sri Lanka

Methods

In order to transform daily rainfall records obtained from the Department of Meteorology into seamless rainfall maps for the entire country, spatial interpolation modelling was performed. Previous comparative analysis of interpolation methods found that thin-plate smoothing splines were a generally effective method for interpolating rainfall in Sri Lanka (Plouffe et al. 2015). Daily rainfall records were tabulated into monthly totals for each of the 446 stations, and each month was interpolated using TPS function in the Fields package in R (Nychka et al. 2015). Averaging interpolated total rainfall for each month over each MOH area provided concurrent estimates of rainfall and reported cases of leptospirosis over equivalent geographical units. Surveillance data for weekly reported cases of leptospirosis by MOH area were aggregated by year, and standardized by population to produce estimates of incidence per 100,000 for the year 2008 in each geographical unit. Plots of the time series of annual cases and the monthly rainfall maps were compared visually. Simple cross-correlation analysis between case counts and rainfall was used to identify potential temporal lags of importance for early-warning.

Results

The monthly time series of cases reveals a large spike in cases in September 2008 (Fig. 2). The timing of this spike is consistent with peaks in cases in previous years, however the magnitude is much greater and worthy of further investigation. Examining the rainfall patterns over 2008 evident in Fig. 3, the most striking pattern is the unseasonably intense and extensive rainfall that occurred in March 2008. Whether this unusually high amount of rainfall occurred was related to the spike in reported cases on leptospirosis is unclear. Correlation analysis revealed both positive and negative associations between incidence and rainfall (Table 7). The highest rank monthly correlations in terms of t-statistic magnitude were July (positive), March (negative), and April (positive). The magnitudes of the correlations were not high; indicating limited predictive potential at least at the scales investigated here. Direct comparison of month-to-month correlations would provide a more robust indicator of the relationship—however power is also reduced due to the smaller numbers and smaller effect sizes. Month-to-month cross correlations indicated both positive and negative associations (Fig. 4). While some interesting spatial patterns emerge from this analysis, the evidence is insufficient to explain the outbreak in 2008, as key information is missing. Note that our previous modelling work of landscape factors related to clusters of high risk found associations between paddy areas, small agricultural holdings, distance to cities and distance to rivers (Robertson et al. 2012).

Fig. 2
figure 2

Monthly distribution of reported cases of leptospirosis in Sri Lanka, 2008

Fig. 3
figure 3

Monthly total rainfall estimates in Sri Lanka, 2008

Table 7 Monthly correlations between rainfall and annual incidence of reported leptospirosis in Sri Lanka, 2008
Fig. 4
figure 4

Cross correlation of reported cases of leptospirosis and rainfall at the monthly scale in Sri Lanka, 2008 (red negative, white no correlation, blue positive) for a 0 to f 5 month lags. (Color figure online)

Characterize functions

With a basic description of the reported leptospirosis surveillance data now in place, we can move to use this example to motivate additional analyses within the GLE framework that would improve our understanding of the dynamics of risk and the causes of the outbreak. The analyses reported in the section above indicated some evidence for a relationship driving at least the endemic cycle of cases, an anomalous event in September 2008 which does not have an obvious rainfall signal, and evidence of spatial variability in the relationships, suggesting perhaps both rural agriculture risk profile and a peri-urban risk profile combining to produce the risk landscape in 2008. A functional analyses would examine the flows, feedbacks, and events that would describe the movement of people, their interaction with the health care system and care-seeking behaviours, application of control mechanisms, distributions of animal reservoirs and their habitats or production chains in the case of livestock, and individual interactions with paddy fields. As these forms of data are much more difficult to obtain than those used in the previous section, a dynamic modelling approach using individual-based models would allow for exploration of the parameter space in a way that we could test alternate scenarios that might lead to the observed pattern.

We may also investigate specific events that occurred in September 2008 that may explain the spike in reported cases. For example, the year 2008 saw heavy fighting as the Sri Lankan Army engaged in an offensive military push to end the civil war with the Tamil Tigers (LTTE) that controlled much of the territory in the north of the country. In early September, the Sri Lankan Army captured the town of Mallavi that served as an administrative centre for the LTTE. Further study could investigate the impact of these events on disease risk and disease reporting.

Map interdependencies

Examining the surveillance data, the largest number of reported cases in September was reported from the Homagama area; part of the Colombo District in Western Province. This represents a 445 % increase over the next highest reported month in this district, which otherwise follows expected seasonal dynamics. The cause(s) of this spike remains to be investigated, and could range from a simple data entry error, misdiagnoses, to linkages with the activities associated with the end of civil conflict in the north. Looking at the weekly surveillance data, the fact that these cases were relatively evenly distributed across the weeks suggest a disease-causing event did occur during this time frame.

Examine outcomes

In order to obtain system-level inferences on the leptospirosis-landscape system being investigated here, several additional studies would be required. We would suggest that these ‘higher-level’ inferences tend to be qualitative in nature, collaborative in their genesis, and built from shared interpretations of the more quantitative approaches to analyses described earlier. This aspect of analyses, though rarely formally articulated, should constitute a significant step in GLE, especially given the highly multi and interdisciplinary nature of team-based research projects including ecologists, biologists, veterinarians, geographers, and social scientists. Knowledge translation activities that include workshops, videos, tutorials, and others that optimally translate scientific knowledge to user-communities and knowledge-users is a critical integrative step for realizing ‘decision-support’ capacity of advanced spatial and statistical modelling efforts.

Conclusions

We have presented a framework for GLE that provides a categorization of methods commonly used to investigate landscape–disease interactions, can be used to design and frame future studies, and to provide a functional mapping between knowledge uses and methods. The framework is general and extendable, and will situate stronger research design for spatially focused projects in LE. Additionally, the framework may be a useful educational tool for introducing newcomers to the extensive library of models and methods available for spatially explicit data. This may enhance multidisciplinary research teams working on complex disease–landscape interactions.