FormalPara Overview

The purpose of this chapter is to demonstrate how Google Earth Engine may be used to support modeling and forecasting of vector-borne infectious diseases such as malaria. In doing so, the chapter will also show how Earth Engine may be used to gather data for subsequent analyses outside of Earth Engine, the results of which can then also be brought back into Earth Engine.

We will be calculating and exporting data of remotely sensed environmental variables: precipitation, temperature, and a vegetation water index. These factors can impact mosquito life cycles, malaria parasites, and transmission dynamics. These data can then be used in R for modeling and forecasting malaria in the Amhara region of Ethiopia, using the Epidemic Prognosis Incorporating Disease and Environmental Monitoring for Integrated Assessment (EPIDEMIA) system, developed by the EcoGRAPH research group at the University of Oklahoma.

FormalPara Learning Outcomes
  • Extracting and calculating malaria-relevant variables from existing datasets: precipitation, temperature, and wetness.

  • Importing satellite data and filtering for images over a region and time period.

  • Joining two data products to get additional quality information.

  • Computing zonal summaries of the calculated variables for elements in a FeatureCollection.

FormalPara Helps if you know how to
  • Import images and image collections, filter, and visualize (Part 1).

  • Perform basic image analysis: select bands, compute indices, create masks (Part 2).

  • Use expressions to perform calculations on image bands (Chap. 9).

  • Write a function and map it over an ImageCollection (Chap. 12).

  • Mask cloud, cloud shadow, snow/ice, and other undesired pixels (Chap. 15).

  • Flatten a table for export to CSV (Chap. 22).

  • Use reduceRegions to summarize an image with zonal statistics in irregular shapes (Chaps. 22 and 24).

  • Write a function and map it over a FeatureCollection (Chaps. 23 and 24).

1 Introduction to Theory

Vector-borne diseases cause more than 700,000 deaths per year, of which approximately 400,000 are due to malaria, a parasitic infection spread by Anopheles mosquitoes (World Health Organization 2018, 2020). The WHO estimates that there were around 229 million clinical cases of malaria worldwide in 2019 (WHO 2020). Environmental factors including temperature, humidity, and rainfall are known to be important determinants of malaria risk as these affect mosquito and parasite development and life cycles, including larval habitats, mosquito fecundity, growth rates, mortality, and Plasmodium parasite development rates within the mosquito vector (Franklinos et al. 2019; Jones et al. 2008; Wimberly et al. 2021).

Data from Earth-observing satellites can be used to monitor spatial and temporal changes in these environmental factors (Ford et al. 2009). These data can be incorporated into disease modeling, usually as lagged functions, to help develop early warning systems for forecasting outbreaks (Wimberly et al. 2021, 2022). Accurate forecasts would allow limited resources for prevention and control to be more efficiently and effectively targeted at appropriate locations and times (WHO 2018).

To implement near-real-time forecasting, meteorological and climatic data must be acquired, processed, and integrated on a regular and frequent basis. Over the past 10 years, the Epidemic Prognosis Incorporating Disease and Environmental Monitoring for Integrated Assessment (EPIDEMIA) project has developed and tested a malaria forecasting system that integrates public health surveillance with monitoring of environmental and climate conditions. Since 2018, the environmental data have been acquired using Earth Engine scripts and apps (Wimberly et al. 2022). In 2019, a local team at Bahir Dar University in Ethiopia had been using EPIDEMIA with near-real-time epidemiological data to generate weekly malaria early warning reports in the Amhara region of Ethiopia.

In this example, we are looking at near-real-time environmental conditions that affect disease vectors and human transmission dynamics. On longer time scales, issues such as climate change can alter vector-borne disease transmission cycles and the geographic distributions of various vector and host species (Franklinos et al. 2019). More broadly, health applications involving Earth Engine data likely align with a One Health approach to complex health issues. Under One Health, a core assumption is that environmental, animal, and human health are inextricably linked (Mackenzie and Jeggo 2019).

2 Practicum

The goal of the practicum is to create a download of three environmental variables:

  1. 1.

    Precipitation.

  2. 2.

    Mean land surface temperature (LST).

  3. 3.

    Normalized Difference Water Index (NDWI) spectral index.

These downloads will be zonal summaries based on our uploaded shapefile of woredas (districts) in the Amhara region of Ethiopia.

The practicum is an extract from the longer Retrieving Environmental Analytics for Climate and Health (REACH) Earth Engine script (developed by Dr. Michael C. Wimberly and Dr. Dawn Nekorchuk) used in the EPIDEMIA project (Dr. Michael C. Wimberly, PI). This script also has a more advanced user interface for the user to request date ranges for the download of data. Links to this script and related apps can be found in the “For Further Reading” section of this book.

2.1 Section 1: Data Import

To start, we need to import the data we will be working with. The first item is an external asset of our study area—these are woredas in the Amhara region of Ethiopia. The four that follow are remotely sensed data that we will be processing:

  • The Integrated Multi-satellite Retrievals for GPM (IMERG) rainfall estimates from Global Precipitation Measurement (GPM) v6.

  • Terra Land Surface Temperature and Emissivity 8-Day Global 1 km.

  • MODIS Nadir Bidirectional Reflectance Distribution Function (BRDF) Adjusted Reflectance Daily 500 m.

  • MODIS BRDF-Albedo Quality Daily 500 m.

A snippet of several lines of code. It includes functions of section 1 data import, var woredas = e e dot feature collection, create region outer boundary, var amhara = woredas dot geometry, var g p m, var L S T terra 8, and var b r d f reflect, among others.

We can take a look at the woreda boundaries by adding the following code to draw it onto the map (Fig. 37.1). See Chap. 25 for more information on visualizing feature collections.

A snippet of several lines of code. It includes functions of visualize woredas with black borders and no fill, create an empty image, var empty = e e dot image, paint all the polygon edges, var outline = empty dot paint, feature collection, color, width, and map dot set center, among others.
Fig. 37.1
A map of Ethiopia marks the areas around Addis Ababa, Bahir Dar, and Gondar. It also marks Mek'ele, Aksum, Nekemte, Jimma, Awassa, Djibouti, Dire Dawa, Harar, Hargeisa, and Burco, among others.

Woreda (district) boundaries in the Amhara region of Ethiopia

Code Checkpoint A16a. The book’s repository contains a script that shows what your code should look like at this point.

2.2 Section 2: Date Preparation

The user will be requesting the date range for the summarized data, and it is expected that they will be looking for near-real-time data. Different data products that we are using have different data lags, and some data may not be available in the user-requested date range. We will want to get the last available data date, so we can properly create and name our export datasets.

We need daily data, but the LST data are in 8-day composites. For this, we will assign the 8-day composite value to each of the eight days in the range. This means we also need to acquire the 8-day composite value that covers the requested start date (i.e., the previous image).

A snippet of several lines of code. It includes functions of section 2 handling of dates, 2.1 requested start and end dates, var r e q start date, var r e q end date, 2.2 L S T dates, L S T MODIS is every 8 days, var L S T earliest date, and var prior L s t I m g C o l, among others.
A snippet of a 5-line code. It includes functions of if data ends before requested start, take last data date, otherwise use requested date, var precip start date = e e dot date, dot m i n, and print, among others.

Code Checkpoint A16b. The book’s repository contains a script that shows what your code should look like at this point.

Question 1. Explore the earliest date of LST images you get if you do not specifically acquire the previous image. The following code may be useful:

A snippet of a 7-line code. It includes functions of var naive L s t filter = L S T terra 8, var naive L s t start = naive L s t filter dot reduce columns, reducer, selectors, and var naive L s t start date = e e dot date, among others.

Question 2. Try changing the requested dates to closer to the current date to see how the dates for the different data products adjust. If you have a narrow window (1–2 weeks), you may find that some data products do not have any data available for the requested time period yet.

2.3 Section 3: Precipitation

Now, we will calculate our precipitation variable for the appropriate date range and then perform a zonal summary (see Chap. 24) of our woredas.

2.3.1 Section 3.1: Precipitation Filtering and Dates

Using the dates when data actually exist in the user-requested date range, we create a list of dates for which we will calculate our variable.

A snippet of several lines of code. It includes functions of section 3 precipitation, section 3.1 precipitation filtering and dates, filter g p m by date, var g p m filtered = g p m, calculate date of most recent measurement, var g p m max, var g p m end date, and var precip end date, among others.
A snippet of several lines of code. It includes functions of section 3 precipitation, section 3.1 precipitation filtering and dates, filter g p m by date, var g p m filtered = g p m, calculate date of most recent measurement, var g p m max, var g p m end date, and var precip end date, among others.

2.3.2 Section 3.2: Calculate Daily Precipitation

In this section, we will map a function over our filtered FeatureCollection (gpmFiltered) to calculate the total daily rainfall per day. In this product, precipitation in millimeters per hour is recorded every half hour, so we will sum the day and divide by two.

A snippet of several lines of code. It includes functions of section 3.2 calculate daily precipitation, function c a l c daily precip, cur date = e e dot date, var cur year, var cur doy, var t o t p r e c, return t o t p r e c, map function, and var daily precip extended, among others.
A snippet of several lines of code. It includes functions of section 3.2 calculate daily precipitation, function c a l c daily precip, cur date = e e dot date, var cur year, var cur doy, var t o t p r e c, return t o t p r e c, map function, and var daily precip extended, among others.

2.3.3 Section 3.3: Summarize Daily Precipitation by Woreda

In the last section for precipitation, we will calculate a zonal summary, a mean, of the rainfall per woreda and flatten for export as a CSV. The exports (of all variables) will be all done in Sect. 37.2.7.

A snippet of a 6-line code. It includes functions of section 3.3 summarize daily precipitation, filter precip data, var precip summary = daily precip, and dot filter date, among others.
A snippet of several lines of code. It includes functions to calculate zonal statistics, function sum zonal precip, var image 2 = image dot add bands, image dot metadata, var output = image 2 dot select, collection, reducer, return, var precip woreda, and var precip flat, among others.

Code Checkpoint A16c. The book’s repository contains a script that shows what your code should look like at this point.

2.4 Section 4: Land Surface Temperature

We will follow a similar pattern of steps for land surface temperatures, though first we will calculate the variable (mean LST). Then, we will calculate the daily values and summarize them by woreda.

2.4.1 Section 4.1: Calculate LST Variables

We will use the daytime and nighttime observed values to calculate a mean value for the day. We will use the quality layers to mask out poor-quality pixels. Working with the bitmask below, we are taking advantage of the fact that bits 6 and 7 are at the end, so the rightShift(6) just returns these two. Then, we check if they are less than or equal to 2, meaning average LST error <= 3k (see MODIS documentation for the meaning of each element in the bit sequence). For more information on how to use bitmasks in other situations, see Chap. 15. To convert the pixel values, we will use the scaling factor in the data product (0.2) and convert from Kelvin to Celsius values (−273.15). See Chap. 36, about Heat Islands, for another example using LST data.

A snippet of several lines of code. It includes functions of section 4 land surface temperature, section 4.1 calculate L S T variables, var L S T filtered = L S T terra 8, dot filter date, dot filter bounds, function filter L s t Q a, var L S T filtered Q A, and function rescale l s t, among others.
A snippet of several lines of code. It includes functions of section 4 land surface temperature, section 4.1 calculate L S T variables, var L S T filtered = L S T terra 8, dot filter date, dot filter bounds, function filter L s t Q a, var L S T filtered Q A, and function rescale l s t, among others.

2.4.2 Section 4.2: Calculate Daily LST

Now, using a mapped function over our filtered collection, we will calculate a daily value from the 8-day composite value by assigning each of the eight days the value of the composite. We will also filter to our user-requested dates, as data exist in that range.

A snippet of several lines of code. It includes functions of section 4.2 calculate daily L S T, var L S T range = L S T vars dot reduce columns, var L S T end date, var L S T days, function make L s t dates, function c a l c daily L s t, var cur year, and var base date, among others.
A snippet of a 9-line code. It includes functions of var L S T underscore mean, dot select, dot filter date, return L S T underscore day, dot add bands, and dot set, among others.

2.4.3 Section 4.3: Summarize Daily LST by Woreda

In the final section for LST, we will perform a zonal mean of the temperature to our woredas and flatten in preparation for export as CSV. The exports (of all variables) will be all done in Sect. 37.2.7.

A snippet of several lines of code. It includes functions of section 4.3 summarize daily L S T, var L S T summary = daily L s t, function sum zonal list, var image 2, and image dot metadata, among others.

Code Checkpoint A16d. The book’s repository contains a script that shows what your code should look like at this point.

2.5 Section 5: Spectral Index: NDWI

We will follow a similar pattern of steps for our spectral index, NDWI, as we did for precipitation and land surface temperatures: first, calculate the variable(s), then calculate the daily values, and finally summarize by woreda.

2.5.1 Section 5.1: Calculate NDWI

Here, we will focus on NDWI, which we actively used in forecasting malaria. For examples on other indices, see Chap. 9.

The MODIS MCD43A4 product contains simplified band quality information, and it is recommended to use the additional quality information in the MCD43A2 product for your particular application. We will join these two products to apply our selected quality information. (Note that we do not have to worry about snow in our study area.) For more information on joining image collections, see Chap. 21.

A snippet of several lines of code. It includes functions of section 5 spectral index N D W I, section 5.1 calculate N D W I, filter B R D F-adjusted reflectance, var b r d reflect vars, dot filter date, var b r d reflect Q a, var i d join, var inner join, and function add Q a bands, among others.
A snippet of several lines of code. It includes functions of section 5 spectral index N D W I, section 5.1 calculate N D W I, filter B R D F-adjusted reflectance, var b r d reflect vars, dot filter date, var b r d reflect Q a, var i d join, var inner join, and function add Q a bands, among others.
A snippet of several lines of code. It includes functions of var q a = e e dot image, var water = e e dot image, return n bar dot add bands, var b r d merged, function filter B r d f, var q a band, var w band, return image dot add bands, and function c a l c B r d f indices, among others.

2.5.2 Section 5.2: Calculate Daily NDWI

Similar to the other variables, we will calculate a daily value and filter to our user-requested dates, as data exist in that range.

A snippet of several lines of code. It includes functions of section 5.2 calculate daily N D W I, var b r d f range, reducer, var b r d f end date, var b r d f days, function make B r d f dates, function c a l c daily b r d f exists, and function c a l c daily B r d f filler, among others.
A snippet of several lines of code. It includes functions of dot filter date, var b r d f size, var out i m g = e e dot image dot constant, dot add bands, dot rename, dot set, return out i m g, var daily B r d f extended filler, and var daily B r d f = daily B r d f extended, among others.

2.5.3 Section 5.3: Summarize Daily Spectral Indices by Woreda

Lastly, in our NDWI section, we will use the mean to summarize the values for each of the woredas and prepare for export by flattening the dataset. The exports (of all variables) will be all done in Sect. 37.2.7.

A snippet of a 7-line code. It includes functions of section 5.3 summarize daily spectral indices, filter spectral indices, var b r d f summary, dot filter date, and function to calculate zonal statistics, among others.

Code Checkpoint A16e. The book’s repository contains a script that shows what your code should look like at this point.

Question 3. Here, we are only calculating NDWI, which is calculated from the near-infrared (NIR) and shortwave infrared 2 (SWIR2) bands. If we wanted to calculate a vegetation index like the Normalized Difference Vegetation Index (NDVI), which bands would we need to add? Where in Sects. 37.2.5.1 through 37.2.5.3, would we need to add or select the raw bands and/or our new calculated band? Note: Fully implementing this is one of the synthesis challenges, so this is a good head start!

2.6 Section 6: Map Display

Here, we will take a look at our calculated variables but prior to zonal summary (Fig. 37.2). The full user interface restricts the date to display within the requested range, so be mindful in the code below which date you choose to view (we set our time range here in code Sect. 2.1, Sect. 37.2.2).

A snippet of several lines of code. It includes functions of section 6 map display of calculated environmental variables, var display date = e e dot date, var precip d i s p, var precip image, var L S T m image, var n d w i 6 image, var palette precip, and map dot add layer, among others.
A snippet of several lines of code. It includes functions of section 6 map display of calculated environmental variables, var display date = e e dot date, var precip d i s p, var precip image, var L S T m image, var n d w i 6 image, var palette precip, and map dot add layer, among others.
Fig. 37.2
A screenshot of a Google Earth engine window. It has a dialog box of layers with precipitation and woredas checkmarked and a map of Ethiopia in the background. The map has some of the areas in different color gradients.

Calculated total daily precipitation overlaid on woreda boundaries in the Amhara region of Ethiopia

Code Checkpoint A16f. The book’s repository contains a script that shows what your code should look like at this point.

2.7 Section 7: Exporting

Two important strengths of Google Earth Engine are the ability to gather and process the remotely sensed data all in the cloud and to have the only download be a small text file ready to use in the forecasting software. Most of our partners on this project were experts in public health and did not have a remote sensing or programming background. We also had partners in areas of limited or unreliable internet connectivity. We needed something that could be easily usable by our users in these types of situations.

In this section, we will create small text CSV downloads for each of our three environmental factors prepared earlier. Each factor may have different data availabilities within the user’s requested range, and these dates will be added to the file name to indicate the actual date range of the downloaded data (Fig. 37.3).

Fig. 37.3
A set of 3 screenshots of Excel sheets for precip, L S T, and spectral data. Precip has columns of w i d, woreda, day, year, and t o t p r e c. L S T has columns of w i d, woreda, day, year, l s t day, l s t night, and l s t mean. Spectral has columns of w i d, woreda, day, year, and n d w i 6.

Examples of the three CSV files returned from the script

A snippet of several lines of code. It includes functions of section 7 exporting, 7.1 export naming, var r e q start date text, var precip prefix, var precip last date, var precip file name, var L S T prefix, var L S T last date, and var L S T file name, among others.
A snippet of a 9-line code. It includes functions of selectors, export dot table dot to Drive, collection, description, and selectors, among others.

Code Checkpoint A16g. The book’s repository contains a script that shows what your code should look like at this point.

In the Earth Engine Tasks tab, click Run to configure and start each export to Google Drive.

2.8 Section 8: Importing and Viewing External Analysis Results

As mentioned at the start of the chapter, the environmental data obtained from Earth Engine can be used for infectious disease modeling and forecasting. The above Earth Engine code was written in support of EPIDEMIA, a software system based in the R language and computing environment for forecasting malaria, and was actively used in certain study pilot woredas in the Amhara region of Ethiopia. The R system consists of an R package—epidemiar—for generic functions and a companion R project for handling all the location-specific data and settings.

One of the main outputs of EPIDEMIA is the forecasted incidence of malaria in each woreda by week from one to eight (or more) weeks in advance. Using our publicly available demo project that uses synthetic data (not for use in epidemiological study), we created forecasts for week 32 of 2018 made eight weeks prior (“knowing” data up to week 24) and also added the observed incidence for comparison. (Note: dates and weeks follow International Organization for Standardization [ISO] standard 8601). These new data can be re-uploaded to Earth Engine for further analyses or exploration.

Starting a new script, you can use Sect. 37.2.8 code that follows to visualize the pre-generated demo 2018W32 results (Fig. 37.4).

A snippet of several lines of code. It includes functions of section 8 viewing external analyses results, var epidemia results, var pilot, var nonpilot, map dot set center, paint the pilot woredas with different colors, var fill underscore f c, var palette, and map dot add layer, among others.
A snippet of several lines of code. It includes functions of var fill underscore o b s, var palette, map dot add layer, fill underscore o b s, var fill underscore n a, var outline = empty dot paint, and add woreda boundaries to map, among others.
A snippet of several lines of code. It includes functions of var fill underscore o b s, var palette, map dot add layer, fill underscore o b s, var fill underscore n a, var outline = empty dot paint, and add woreda boundaries to map, among others.
Fig. 37.4
A screenshot of the Google Earth engine window. It has a 9-line code at the top with an options bar. It has a layers dialog box below with woredas, non-study woredas, and forecasted incidence checkmarked, and a map of Ethiopia in the background. The map has areas marked in different color gradients.

Visualization of forecasted malaria incidence for week 32 of 2018 made during week 24 (an eight-week lead time). Malaria data are synthetic, for demonstration purposes only. The incidence has been categorized into five categories (from lighter to dark red): 0–0.25, 0.25–0.5, 0.5–0.75, 0.75–1, and greater than 1. Only woredas in the pilot project have values; the rest of the Amhara region is marked in gray fill. Another layer available to view is the observed (synthetic) incidence rate for 2018W32

Code Checkpoint A16h. The book’s repository contains a script that shows what your code should look like at this point.

3 Synthesis

Assignment 1. Calculate other spectral indices: In this chapter, we only calculate and export the NDWI from the spectral data. Calculate another index, such as a vegetation index like NDVI, Soil Adjusted Vegetation Index (SAVI), or Enhanced Vegetation Index (EVI) to the calculations. Think about what bands you will need, how to calculate the index, and how to propagate the band through all the remaining processing steps (including exporting).

Assignment 2. Change location: In this chapter, we obtained data for woredas in the Amhara region of Ethiopia. Upload or import a new shapefile of different locations and acquire environmental data for there instead. Remember that you will need to adjust any references to asset-specific fields (as we did here for “woreda”). See Chap. 22 for help with uploading assets, if needed.

4 Conclusion

In this chapter, we saw how Earth Engine can be used to acquire environmental data to support external analyses, such as forecasting of malaria, a vector-borne disease. An understanding of the biology of the vector (e.g., mosquito, tick) and how different environmental conditions can affect the disease system and transmission risk will help identify environmental variables to investigate for use in mathematical modeling.

In this chapter, we obtained data from three different satellite-based datasets: rainfall from IMERG/GPM, land surface temperature 8-day composite values from MODIS, and the calculation of spectral indices from MODIS bands. We saw how to perform zonal summaries to our location of interest and download CSV files that are suitable for import into other programs for additional analyses.

This chapter shows the value of cloud computation and generates small downloads for use by professionals who may not have expertise in remote sensing or the computing resources that would otherwise be needed. Finally, we saw that the results of intermediate processing and work outside of Earth Engine can be re-imported for additional analyses within Earth Engine.