FormalPara Overview

The purpose of this chapter is to review methods of converting between raster and vector data formats, and to understand the circumstances in which this is useful. By way of example, this chapter focuses on topographic elevation and forest cover change in Colombia, but note that these are generic methods that can be applied in a wide variety of situations.

FormalPara Learning Outcomes
  • Understanding raster and vector data in Earth Engine and their differing properties.

  • Knowing how and why to convert from raster to vector.

  • Knowing how and why to convert from vector to raster.

  • Write a function and map it over a FeatureCollection.

Assumes you know how to

  • Import images and image collections, filter, and visualize (Part I).

  • Understand distinctions among Image, ImageCollection, Feature, and FeatureCollection Earth Engine objects (Part I, Part II, Part V).

  • Perform basic image analysis: select bands, compute indices, and create masks (Part II).

  • Perform image morphological operations (Chap. 10).

  • Understand the filter, map, and reduce paradigm (Chap. 12).

  • Write a function and map it over an ImageCollection (Chap. 12).

  • Use reduceRegions to summarize an image in irregular shapes (Chap. 22).

1 Introduction to Theory

Raster data consist of regularly spaced pixels arranged into rows and columns, familiar as the format of satellite images. Vector data contain geometry features (i.e., points, lines, and polygons) describing locations and areas. Each data format has its advantages, and both will be encountered as part of GIS operations.

Raster data and vector data are commonly combined (e.g., extracting image information for a given location or clipping an image to an area of interest); however, there are also situations in which conversion between the two formats is useful. In making such conversions, it is important to consider the key advantages of each format. Rasters can store data efficiently where each pixel has a numerical value, while vector data can more effectively represent geometric features where homogenous areas have shared properties. Each format lends itself to distinctive analytical operations, and combining them can be powerful.

In this exercise, we’ll use topographic elevation and forest change images in Colombia as well as a protected area feature collection to practice the conversion between raster and vector formats, and to identify situations in which this is worthwhile.

2 Practicum

2.1 Section 1: Raster to Vector Conversion

2.1.1 Section 1.1: Raster to Polygons

In this section, we will convert an elevation image (raster) to a feature collection (vector). We will start by loading the Global Multi-Resolution Terrain Elevation Data 2010 and the Global Administrative Unit Layers 2015 dataset to focus on Colombia. The elevation image is a raster at 7.5 arc-second spatial resolution containing a continuous measure of elevation in meters in each pixel.

A multi-line script includes the following. Load raster and vector datasets, elevation, feature collection, display elevation image, and add layer are depicted.

When converting an image to a feature collection, we will aggregate the categorical elevation values into a set of categories to create polygon shapes of connected pixels with similar elevations. For this exercise, we will create four zones of elevation by grouping the altitudes to 0–100 m = 0, 100–200 m = 1, 200–500 m = 2, and > 500 m = 3.

A multi-line script includes the following. Initialize the image with zeros and define elevation zones, mask pixels below sea level to retain only land areas, name the band with values 0 to 3 as zone, and elevation zones.

We will convert this zonal elevation image in Colombia to polygon shapes, which is a vector format (termed a FeatureCollection in Earth Engine), using the ee.Image.reduceToVectors method. This will create polygons delineating connected pixels with the same value. In doing so, we will use the same projection and spatial resolution as the image. Please note that loading the vectorized image in the native resolution (232 m) takes time to execute. For faster visualization, we set a coarse scale of 1000 m (Fig. 23.1).

Fig. 23.1
Four maps. The zonal elevation map of Columbia depicts the raster-based elevation and zones, vectorized elevation zones are depicted.

Raster-based elevation (top left) and zones (top right), vectorized elevation zones overlaid on the raster (bottom-left), and vectorized elevation zones only (bottom-right)

A multi-line script includes the following. v a r projection = elevation, v a r scale = elevation, elevation vector = zones, print, map add layer, and elevation drawn are depicted.

You may have realized that polygons consist of complex lines, including some small polygons with just one pixel. That happens when there are no surrounding pixels of the same elevation zone. You may not need a vector map with such details—if, for instance, you want to produce a regional or global map. We can use a morphological reducer focalMode to simplify the shape by defining a neighborhood size around a pixel. In this example, we will set the kernel radius as four pixels. This operation makes the resulting polygons look much smoother, but less precise (Fig. 23.2).

Fig. 23.2
Two maps depict the resulting polygons with pixels and complex lines, and layers are represented.

Before (left) and after (right) applying focalMode

A multi-line script includes the following. v a r zones smooth = zones focal model, minimum 1, maximum 3, palette, opacity 0.7, elevation zones, geometry, Columbia, c r s projection, scale, geometry type polygon, color black, and map add a layer.
A multi-line script includes the following. v a r zones smooth = zones focal model, minimum 1, maximum 3, palette, opacity 0.7, elevation zones, geometry, Columbia, c r s projection, scale, geometry type polygon, color black, and map add a layer.

We can see now that the polygons have more distinct shapes with many fewer small polygons in the new map (Fig. 23.2). It is important to note that when you use methods like focalMode (or other, similar methods such as connectedComponents and connectedPixelCount), you need to reproject according to the original image in order to display properly with zoom using the interactive Code Editor.

2.1.2 Section 1.2: Raster to Points

Lastly, we will convert a small part of this elevation image into a point vector dataset. For this exercise, we will use the same example and build on the code from the previous subsection. This might be useful when you want to use geospatial data in a tabular format in combination with other conventional datasets such as economic indicators (Fig. 23.3).

Fig. 23.3
An illustration and table depict the elevation points. System index, elevation, latitude, longitude, and g e o values are illustrated.

Elevation point values with latitude and longitude

The easiest way to do this is to use sample while activating the geometries parameter. This will extract the points at the centroid of the elevation pixel.

10 lines of pseudo-code presents the parameters. It includes the following. To zoom into the area uncomment and run below, among others.

We can also extract sample points per elevation zone. Below is an example of extracting 10 randomly selected points per elevation zone (Fig. 23.4). You can also set different values for each zone using classValues and classPoints parameters to modify the sampling intensity in each class. This may be useful, for instance, to generate point samples for a validation effort.

Fig. 23.4
An illustration depicts sample points per elevation zone. The four layers of elevation zones with selected points are illustrated.

Stratified random sampling over different elevation zones

A multi-line script includes the following. v a r elevation samples stratified = zones, number points 10, region geometry, scale, projection, geometries, and map add layer.

Code Checkpoint F51a. The book’s repository contains a script that shows what your code should look like at this point.

2.1.3 Section 1.3: A More Complex Example

In this section, we will use two global datasets, one to represent raster formats and the other vectors:

  • The Global Forest Change (GFC) dataset: a raster dataset describing global tree cover and change for 2001–present.

  • The World Protected Areas Database: a vector database of global protected areas.

The objective will be to combine these two datasets to quantify rates of deforestation in protected areas in the ‘arc of deforestation’ of the Colombian Amazon. The datasets can be loaded into Earth Engine with the following code:

A multi-line script includes the following. Read input data, consider searching the data catalog for newer versions, print assets to show available layers and properties, and show the first 10 records.

The GFC dataset (first presented in detail in Chap. 2) is a global set of rasters that quantify tree cover and change for the period beginning in 2001. We’ll use a single image from this dataset:

  • ‘lossyear’: a categorical raster of forest loss (1–20, corresponding to deforestation for the period 2001–2020), and 0 for no change

The World Database on Protected Areas (WDPA) is a harmonized dataset of global terrestrial and marine protected area locations, along with details on the classification and management of each. In addition to protected area outlines, we’ll use two fields from this database:

  • ‘NAME’: the name of each protected area

  • ‘WDPA_PID’: a unique numerical ID for each protected area

To begin with, we’ll focus on forest change dynamics in ‘La Paya’, a small protected area in the Colombian Amazon. We’ll first visualize these data using the paint command, which is discussed in more detail in Chap. 25. This will display the boundary of the La Paya protected area and deforestation in the region (Fig. 23.5).

A multi-line script includes the following. Display deforestation, map, add a layer, minimum 1, maximum 20, palette, display W D P A data, display protected area as an outline, add layer, palette white, and setup map display.
Fig. 23.5
A map depicts the outline of a protected area in the Columbian Amazon, and the recent changes are also depicted.

View of the La Paya protected area in the Colombian Amazon (in white) and deforestation over the period 2001–2020 (in yellows and reds, with darker colors indicating more recent changes)

We can use Earth Engine to convert the deforestation raster to a set of polygons. The deforestation data are appropriate for this transformation as each deforestation event is labeled categorically by year, and change events are spatially contiguous. This is performed in Earth Engine using the ee.Image.reduceToVectors method, as described earlier in this section. Figure 23.6 shows a comparison of the raster versus vector representations of deforestation within the protected area.

A multi-line script includes the following. Convert from a deforestation raster to vector, label polygons with a change year, count the number of individual change events, and display deforestation polygons and deforestation vector.
Fig. 23.6
Two maps depict the comparison of the raster versus vector representations of deforestation within the protected area. The protected area outline is marked.

Raster (left) versus vector (right) representations of deforestation data of the La Paya protected area

Having converted from raster to vector, a new set of operations becomes available for post-processing the deforestation data. We might, for instance, be interested in the number of individual change events each year (Fig. 23.7):

Fig. 23.7
A bar graph of the number of deforestation events versus year. It denotes a high at (20, 950), and a low at (6, 70). The values are approximate.

Plot of the number of deforestation events in La Paya for the years 2001–2020

A multi-line script includes the following. Chart features, property loss year, title year, number of deforestation events, position none, and print chart are depicted.

There might also be interest in generating point locations for individual change events (e.g., to aid a field campaign):

A multi-line script includes the following. Generate deforestation point locations, vector, return feat centroid, map, add layer, and color dark blue.

The vector format allows for easy filtering to only deforestation events of interest, such as only the largest deforestation events:

A multi-line script includes the following. Add a new property to the deforestation feature collection, describing the area of the changed polygon, return feat, divide, filter the deforestation feature collection for only large-scale changes, display forestation area outline by year, color loss year, width 1, palette yellow, orange, red, minimum 1, and maximum 20.

Code Checkpoint F51b. The book’s repository contains a script that shows what your code should look like at this point.

2.1.4 Section 1.4: Raster Properties to Vector Fields

Sometimes we want to extract information from a raster to be included in an existing vector dataset. An example might be estimating a deforestation rate for a set of protected areas. Rather than performing this task on a case-by-case basis, we can attach information generated from an image as a property of a feature.

The following script shows how this can be used to quantify a deforestation rate for a set of protected areas in the Colombian Amazon.

A multi-line script includes the following. Load required datasets, display deforestation, add a layer, select protected areas in the Colombian Amazon, display protected areas as an outline, set up a map display, and note the new deforestation area property.
A multi-line script includes the following. Load required datasets, display deforestation, add a layer, select protected areas in the Colombian Amazon, display protected areas as an outline, set up a map display, and note the new deforestation area property.

The output of this script is an estimate of deforested area in hectares for each reserve. However, as reserve sizes vary substantially by area, we can normalize by the total area of each reserve to quantify rates of change.

A multi-line script includes the following. Normalize by area, function, and print to identify rates of change per protected area, which has the fastest rate of loss? and selectors name deforestation rate.

Code Checkpoint F51c. The book’s repository contains a script that shows what your code should look like at this point.

2.2 Section 2: Vector-To-Raster Conversion

In Sect. 23.2.1, we used the protected area feature collection as its original vector format. In this section, we will rasterize the protected area polygons to produce a mask and use this to assess rates of forest change.

2.2.1 Section 2.1: Polygons to a Mask

The most common operation to convert from vector to raster is the production of binary image masks, describing whether a pixel intersects a line or falls within a polygon. To convert from vector to a raster mask, we can use the ee.FeatureCollection.reduceToImage method. Let’s continue with our example of the WDPA database and Global Forest Change data from the previous section:

A multi-line script includes the following. Load required datasets, get deforestation, generate a new property called protected to apply to the output mask, rasterize using new property, unmask sets areas outside protected area polygons to 0, center on Colombia, and display on the map.
A multi-line script includes the following. Load required datasets, get deforestation, generate a new property called protected to apply to the output mask, rasterize using new property, unmask sets areas outside protected area polygons to 0, center on Colombia, and display on the map.

We can use this mask to, for example, highlight only deforestation that occurs within a protected area using logical operations:

A multi-line script includes the following. Set the deforestation layer to 0 where outside a protected area, update the mask to hide where deforestation layer = 0, display deforestation in protected areas, minimum 1, maximum 20, palette yellow, orange, red, and deforestation protected.

In the above example, we generated a simple binary mask, but reduceToImage can also preserve a numerical property of the input polygons. For example, we might want to be able to determine which protected area each pixel represents. In this case, we can produce an image with the unique ID of each protected area:

A multi-line script includes the following. Produce an image with a unique I D of protected areas, map add a layer, minimum 1, maximum 100000, and protected area I D.

This output can be useful when performing large-scale raster operations, such as efficiently calculating deforestation rates for multiple protected areas.

Code Checkpoint F51d. The book’s repository contains a script that shows what your code should look like at this point.

2.2.2 Section 2.2: A More Complex Example

The reduceToImage method is not the only way to convert a feature collection to an image. We will create a distance image layer from the boundary of the protected area using distance. For this example, we return to the La Paya protected area explored in Sect. 23.2.1.

A multi-line script includes the following. Load required datasets, select a single protected area, maximum distance in meters is set in the brackets, map add a layer, minimum 0, maximum 20000, palette white, grey, black, opacity 0.6, and distance.

We can also show the distance inside and outside of the boundary by using the rasterized protected area (Fig. 23.8).

Fig. 23.8
Three illustrations depict the distance inside and outside of the boundary by using the rasterized protected area. The distance from the La Paya boundary is on the left, the distance within the La Paya in the center, and the distance outside the La Paya is on the right.

Distance from the La Paya boundary (left), distance within the La Paya (middle), and distance outside the La Paya (right)

A multi-line script includes the following. Produce a raster of inside or outside the protected area, distance inside the protected area, and distance outside the protected area.

Sometimes it makes sense to work with objects in raster imagery. This is an unusual case of vector-like operations conducted with raster data. There is a good reason for this where the vector equivalent would be computationally burdensome.

An example of this is estimating deforestation rates by distance to the edge of the protected area, as it is common that rates of change will be higher at the boundary of a protected area. We will create a distance raster with three zones from the La Paya boundary (>1 km, > 2 km, > 3 km, and > 4 km) and to estimate the deforestation by distance from the boundary (Fig. 23.9).

Fig. 23.9
Four illustrations of distance raster with three zones from the La Paya boundary less than 1 kilometer, less than 3 kilometers, and less than 5 kilometers, to estimate the deforestation by distance from the boundary.

Distance zones (top left) and deforestation by zone (<1 km, < 3 km, and < 5 km)

A multi-line script includes the following. Distance zones, deforestation, 1, 3, and 5, kilometers, update mask, add layer, deforestation within a 1-kilometer buffer, minimum 0, maximum 1, and opacity 0.5.
A multi-line script includes the following. Deforestation within a 3 kilometers buffer, map add a layer, minimum 0, maximum 1, opacity 0.5, and deforestation within a 5-kilometer buffer.

Lastly, we can estimate the deforestation area within 1 km of the protected area but only outside of the boundary.

A multi-line script includes the following. Get the value of each pixel in square meters, and divide by 10000 to convert to hectares, we need to set a larger geometry than the protected area, for the geometry parameter to reduce region and deforestation within a 1-kilometer buffer outside the protected area.

Code Checkpoint F51e. The book’s repository contains a script that shows what your code should look like at this point.

3 Synthesis

Question 1. In this lab, we quantified rates of deforestation in La Paya. There is another protected area in the Colombian Amazon named Tinigua. By modifying the existing scripts, determine how the dynamics of forest change in Tinigua compare to those in La Paya with respect to:

  • the number of deforestation events;

  • the year with the greatest number of change events;

  • the mean average area of change events;

  • the total area of loss.

Question 2. In Sect. 23.2.1.4, we only considered losses of tree cover, but many protected areas will also have increases in tree cover from regrowth (which is typical of shifting agriculture). Calculate growth in hectares using the Global Forest Change dataset’s gain layer for the six protected areas in Sect. 23.2.1.4 by extracting the raster properties and adding them to vector fields. Which has the greatest area of regrowth? Is this likely to be sufficient to balance out the rates of forest loss? Note: The gain layer shows locations where tree cover has increased for the period 2001–2012 (0 = no gain, 1 = tree cover increase), so for comparability use deforestation between the same time period of 2001–2012.

Question 3. In Sect. 23.2.2.2, we considered rates of deforestation in a buffer zone around La Paya. Estimate the deforestation rates inside of La Paya using buffer zones. Is forest loss more common close to the boundary of the reserve?

Question 4. Sometimes it’s advantageous to perform processing using raster operations, particularly at large scales. It is possible to perform many of the tasks in Sects. 23.2.1.3 and 23.2.1.4 by first converting the protected area vector to raster and then using only raster operations. As an example, can you display only deforestation events > 10 ha in La Paya using only raster data? (Hint: Consider using ee.Image.connectedPixelCount. You may also want to also look at Sect. 23.2.2.1).

4 Conclusion

In this chapter, you learned how to convert raster to vector and vice versa. More importantly, you now have a better understanding of why and when such conversions are useful. The examples should give you practical applications and ideas for using these techniques.