Diver surveys for training data
The field component of the KSLOF-GRE was conducted between 2006 and 2015, and Table 1 provides an overview of the quantity of data acquired by country visited. For each of the 1000 individual reefs visited, the benthic cover of major functional groups and substrate type were assessed along 10 m transects using both diver-recorded observations, point-intercept counts, and photographic assessments. A minimum of four transects were completed at each dive site, and surveys were completed at 25, 20, 15, 10, and 5 m water depths. Via these methods, the following parameters were quantified: corals identified to genus, other sessile invertebrates identified to phylum or class, and six functional groups of algae. Reef fish surveys were also conducted at each dive site at depths stratified between 5 and 20 m via visual census as described by English et al. (1997). For more detail, the reader is directed to the Foundation’s Field and Final Country Reports which are available online (www.livingoceansfoundation.org/publications/final-reports/—accessed 03/11/2019). These reports contain exhaustive lists of all sites and survey protocols.
Diver-collected data were used to aid in the definition of map classes, as described in “Definition of habitat map classes” section. In addition, the dominant habitat type was extracted from each dive site to serve as labeling (training) data for satellite mapping, as described in “Level 4 biological cover and Level 5 habitat maps” section. A total of 1240 dive sites across the KSLOF-GRE were treated in this way. This number of dives equates to approximately 15,000 hours of underwater data collection achieved by the > 200 scientists involved in the expedition.
Small-vessel surveys for training and validation data
A small vessel was used to collect several datasets at each of the ~ 1000 visited reef sites. A total of 30-million tide-corrected single-beam sonar soundings were acquired throughout the KSLOF-GRE. These measurements were used to create the bathymetry maps (“Satellite-derived bathymetry maps” section). Additional ground-truth data were collected in the form of 2000 surficial sediment samples and 150 linear km of low-fold subbottom geophysical profiles obtained with a 5 kHz SyQwest Stratabox subbottom profiler—protocols for each detailed by Purkis et al. (2014). These datasets were used in conjunction with the diver data just described to help define habitat classes and segment geomorphological structures (“Definition of habitat map classes” and “Development of habitat maps” sections).
A total of 11,000 seabed videos were captured across all sites via a tethered SeaViewer ‘drop’ camera integrated with a differential global positioning system (dGPS). This video system allowed seabed observations to be obtained from the intertidal to approximately 50 m water depth at a frequency far exceeding that achievable via SCUBA. The drop camera videos were analyzed in the laboratory and used for map validation (“Accuracy assessment” section).
WorldView-2 satellite imagery
The KSLOF-GRE employed the DigitalGlobe Inc. WorldView-2 (WV2) satellite to image each visited reef site. The instrument images in eight multispectral bands with pixel widths of 1.85 m for images acquired with look angles < 20° off-nadir coarsen to 2.07 m for look angles exceeding 20°. Pixel brightness values are digitally encoded with 11-bit radiometric resolution. WV2 is particularly adept at imaging the shallow seabed since five of the eight spectral bands are of sufficiently short wavelength to have meaningful penetration in water—these five are the coastal blue band (400–450 nm), blue (450–510 nm), green (510–580 nm), yellow (585–625 nm), and red (630 -690 nm). Experience across the KSLOF-GRE suggested that under ideal conditions, the seabed could routinely be imaged for habitat mapping down to water depths of 25 m.
The tropics are often cloudy and therefore challenging to image. To address this difficulty, at least eight months prior to each of the 15 field missions, the WV2 was tasked to acquire imagery at look angles < 15° off-nadir to minimize sun glint. At 1 month prior, all acquired imagery was purchased from DigitalGlobe Inc. and assembled to support mission planning and subsequent fieldwork. If insufficient cloud-free data had been obtained for mapping a given country, the sensor was tasked for an additional two-month post-cruise, to fill areas that remained stubbornly cloud contaminated. In this way, the majority of imagery was acquired within four months of fieldwork, but with a maximum differential of eight months. For large sites, such as the 6000 sq. km Cay Sal Bank (Bahamas—Fig. 1e), up to 50 individual WV2 acquisitions were assembled to deliver an image mosaic with < 3% cloud cover, which was the threshold deemed as the maximum tolerable for mapping. In many cases, cloud cover was further reduced by replacing individual cloud-contaminated areas with a portion of a cloud-free acquisition from an alternative date and equivalent tidal state, a process termed ‘cloud patching.’ Adjacent image scenes were selected to have a similar tidal state and equivalent water clarity.
Prior to mosaicking the individual scenes, each was processed to units of above-water remote sensing reflectance, which encompasses radiometric, solar geometry, and atmospheric correction, as described in detail by Kerr and Purkis (2018) and corrected for sun glint following Hedley et al. (2005). At this point, the processed satellite scenes were stitched into a mosaic using the image-processing software ENVI (v. 5.4, Harris Geospatial Inc.), emergent areas identified using a threshold in the 860–1040 nm spectral band, and areas of deepwater identified also, defined as having < 5% reflectance in the 450–510 nm band (Fig. 2a). The remainder of the imagery was considered as potentially containing shallow-water habitat, defined as < 25 m water depth, and was passed forward to the mapping workflow (Fig. 2b–d).
Satellite-derived bathymetry maps
Bathymetry maps were derived for all the KSLOF-GRE sites via spectral derivation of water depth from WV2 satellite imagery (workflow detailed in Fig. 2b). These products served as stand-alone data layers, but were also utilized in the habitat-mapping workflow to partition each reef site into zones, which in turn were populated with a zone-specific suite of habitat classes. Stumpf et al. (2003) offer the most widely adopted empirical algorithm for extracting bathymetry from multispectral imagery. This solution uses a ratio of reflectance from two spectral bands which is tuned against known water depths to yield a bathymetry map. Motivated by the fact that this method does not exploit all five water-penetrating bands of WV2 and its successors, Kerr and Purkis (2018) evolved the algorithm via multi-linear regression of five bands, a solution which provided enhanced estimates of water depth. Their algorithm allowed viable bathymetric models to be derived even in cases where ground truth via sonar was limited, and, under ideal conditions, even absent. Mapping of water depth for the KSLOF-GRE sites followed the Kerr and Purkis (2018) methodology and was calibrated by the sonar soundings described in “Small-vessel surveys for training and validation data” section. Bathymetry maps were masked below the 25-m-depth contour, as derived from sonar soundings.
Definition of habitat map classes
The KSLOF mapping endeavor built forward from two noteworthy regional-scale programs; the Millennium Coral Reef-Mapping Project (Andréfouët et al. 2006) and the NOAA Biogeography Reef-Mapping Program (Monaco et al. 2012). Although our habitat map classes differed from these predecessors, we adopted a hierarchical scheme which allows for cross-comparison (as also done by Roelfsema et al. 2018). The Landsat-derived maps of Andréfouët et al. (2006) delineated reef geomorphology, not habitat, though it was implied in many cases. For instance, class ‘fore reef’ in the Andréfouët et al. (2006) scheme describes location within the benthic system but, importantly, does not address substrate or cover type at that location. A fore reef environment can reasonably be anticipated to be coral-dominated, however. The NOAA effort (Monaco et al. 2012) also captured geomorphology, termed ‘structure’ in their nomenclature, but developed two additional map layers, ‘biological cover’ and ‘geographic zone.’ The former described dominant biota (e.g., live coral, seagrass, etc.), whereas the latter referred to the location of the benthic community within the system (e.g., reef crest, back reef, etc.). Unlike NOAA, the KSLOF-GRE products do not provide three map layers for each area, but the classification scheme was hierarchically arranged such that geomorphological structure, geographic zone, and biological cover can be separated if required. As described in “Development of habitat maps” section, this cross-compatibility is implicit to the way that the maps are created; a bathymetric map was initially interpreted into geographic zones (termed the ‘Level 1’ output), which was subsequently populated with increasing detail of geomorphological structure (Levels 2 and 3), before addition of biological cover recorded in situ (Level 4), to produce a final homologated Level 5 ‘habitat’ map in which zone, structure, and cover are aggregated.
The combination of reef zone, geomorphological structure, and biotic cover resulted in 36 habitat classes used across the Red Sea, Pacific, and Indian Oceans (Table 2). In the Atlantic, the same scheme was used, but not all combinations of zone, geomorphology, and cover were found in this ocean basin; only 25 of the classes were represented in the Atlantic maps. For example, there was no difference defined between ‘lagoon’ and ‘back reef’ in the Atlantic sites visited by the GRE. The description of these classes should make intuitive sense based on their zone, structure, and cover (Table 2), but there are also lengthy descriptions and example photographs for each class in the field reports previously published by KSLOF (see, for example, Bruckner et al. 2016).
Development of habitat maps
The KSLOF-GRE used eCognition software (v. 5.2, Trimble Inc.) to segment the WV2 imagery into polygons that were then labeled by zone, structure, and ultimately habitat class. In contrast to pixel-based classifiers, which assign image pixels to map classes based on their spectral content (Purkis and Klemas 2011), eCognition follows an object-based approach (Knudby et al. 2011; Phinn et al. 2012; Purkis et al. 2012a, b, 2014; Roelfsema et al. 2013, 2014, 2018; Zhang et al. 2013; Warren et al. 2016). In a workflow termed ‘hierarchical classification,’ edge-detection routines are used to segment imagery into eCognition ‘objects,’ which are precincts of the image set with similar spectral and/or textural attributes. These objects are subsequently assigned into one of several map classes based on rules which consider spectral/textural signatures, shape, and contextual relationships with surrounding classes.
Whereas recent progress has been made to automate the assignment of objects to map classes, such as by Saul and Purkis (2015) using multinomial logistic discrete choice models, we found the accuracy of the automated assignments to be consistently lower than that delivered manually by an expert user. For this reason, we elected to use manual assignment of eCognition objects to map classes in our workflow for the production of the KSLOF-GRE habitat maps (Fig. 2). The workflow required four steps to handle preprocessing of the satellite imagery, derivation of a bathymetry map, development of a habitat map, and accuracy assessment (Fig. 2a–d, respectively). This section deals solely with developing the habitat map (Fig. 2c); the other three steps are described in their corresponding sections.
Level 1 zone map
A Level 1 zone map for each site was created using eCognition by applying a multi-resolution segmentation algorithm to the bathymetry map. This algorithm, because it was described in detail by Baatz and Schäpe (2000), will only be treated briefly here. The general concept of multi-scale image segmentation is to subdivide an image set into objects with spectral and/or textural homogeneity. The solution proposed by Baatz and Schäpe (2000) considers this task an optimization problem. In the first step, every image pixel is considered a separate image object. Each object is then visited iteratively and merged with its neighbors to form larger (multi-pixel) objects. With each iteration, the merging decision is based on local homogeneity criteria describing the similarity of adjacent image objects. In a process similar to the annealing function described by Purkis et al. (2012b), a cost function is tracked as each merge is conducted and objects cease to be further amalgamated at the point that the function ceases to reduce.
To create the Level 1 map, the multi-resolution segmentation was deployed on the bathymetry map, which has pixel values enumerating water depth. Once segmented, an expert user manually grouped the resulting image objects that correspond to five reef zones (lagoon, back reef, fore reef, reef crest, and shelf), plus two zones encompassing terrestrial areas (land and intertidal), and deep ocean (Table 2). The upshot of this process was a Level 1 zone map.
Levels 2 and 3 geomorphological structure maps
The next step toward the final habitat map was the delineation of geomorphological zones which were first crudely defined (Level 2) and then refined in more detail (Level 3). For the Level 2 map, the inputs were (a) the Level 1 reef zone map produced from bathymetry and (b) the multispectral WV2 image mosaic. First, for each Level 1 zone, the multispectral imagery was segmented via the multi-resolution method of Baatz and Schäpe (2000). Second, in a process termed ‘labelling’ and with reference to the surficial sediment samples and geophysical profiles acquired in the field, the expert user manually selected image objects and attributed them as belonging to one of the three Level 2 geomorphology classes (unconsolidated sediment, coral reef and hardbottom, or other; see Table 2). Third, based on these user-defined training sets for each Level 2 class in each Level 1 zone, eCognition was used to classify all of the objects in the image set into geomorphological structures based on spectral, textural, and neighborhood parameters. The upshot of this process was a map of major geomorphological structure primarily split into unconsolidated sediment-dominated areas (spectrally bright and texturally homogeneous) and coral reef and hardbottom-dominated areas (spectrally dark and texturally heterogeneous). Note that by conducting this process independently within each Level 1 zone, the bias introduced by varying bathymetry across the satellite imagery was mitigated by the fact that each zone occupies a limited range of water depths. This was important because the rapid attenuation of light by water tends to override the subtle spectral differences between reef habitats (e.g., Purkis 2005).
The detailed geomorphological structure maps (Level 3) were produced in the same way as in the preceding step, but the imagery was re-segmented on the basis of the Level 2 classes and for each, the expert user applied labels for the 11 Level 3 classes defining seabed character (mud, sand, rock, etc.; see Table 2) and in the case of reefs, their morphological type (pinnacle versus aggregate, etc.; see Table 2). The advantage of conducting this segmentation based on the Level 2 classes was a radical reduction in computational overhead since subsets of the overall image mosaic were segmented separately. As before, the user manually developed these labels with reference to known points on the ground visited during fieldwork, and, again, eCognition was used to classify the unlabeled image objects based on their similarity to the training set.
Level 4 biological cover and Level 5 habitat maps
In the final step in the mapping workflow, field observations of biological cover (termed ‘Level 4’ data) were convolved with the geomorphology map to yield a map of habitat (e.g., Figs. 3d, 4d, 5d, 6d). This step was again achieved via application of the multi-resolution segmentation algorithm (Baatz and Schäpe 2000), but this time, the Level 3 classes were individually segmented and, again with reference to field data, the user manually selected labels for objects characterized by the 12 Level 4 classes of biological cover. Again, eCognition was used to classify the remaining unattributed image objects on the basis of similarity to the training set. As laid out in Table 2, each object, now classified according to benthic cover, was attributed with the addition of its zone and geomorphological structure, which varied by location within the image set, as defined by the previously created Level 1 and 2 maps, respectively. As an example, an image object describing a patch reef in the lagoon would be reattributed as ‘Lagoon–Patch Reef,’ and so on. This reattribution process delivered the 36 ‘aggregate classes’ of the final habitat map. To complete the map, boundaries existing between image objects of the same class were dissolved such that areas of a single habitat type were encompassed by a single polygon. At this stage, and again with reference to the diver observations, the evolving map was examined by an expert user and any obvious errors corrected in a process termed ‘contextual editing’ (as originally proposed by Mumby et al. 1998). To complete the process, the finished habitat map was exported as an ESRI shapefile for further analysis in a geographic information system (GIS).
Accuracy assessment of the habitat maps was conducted using error matrices (Story and Congalton 1986; Congalton 1991) with reference to the 5106 dGPS-positioned seabed videos captured using a tethered ‘drop’ camera that remained independent from the map-making workflow. These drop camera videos had three advantages for the purposes of accuracy assessment: a large sample size, wide, consistent coverage across the entirety of the GRE sites, and independence from the training/labeling process of map creation. The drop camera dataset suffered a few limitations as well. First, some habitat types were under-sampled due to physical constraints navigating the vessel. Second, the limited field-of-view of the camera created difficulties discriminating certain habitat classes. Third, there was some geographic uncertainty in camera location due to the tether length and the horizontal field-of-view. The first two of these limitations were addressed by eliminating or consolidating certain map classes for the purposes of accuracy assessment. The third was addressed by considering the neighborhood around each drop camera point using a technique we call ‘lagged accuracy.’ It is important to emphasize that field-operation logistical planning helped reduce these uncertainties by accounting for wind, as well as current magnitude and direction, when deploying the camera and capitalizing on precise boat handling techniques by the highly skilled skipper. This allowed us to accurately position and ‘fly’ the tethered camera over each habitat sampled.
Terrestrial habitat classes were impossible to sample with the drop camera, for obvious reasons. Thus, the accuracy of terrestrial habitat classes was not quantitatively assessed for these maps. Nevertheless, we assume that the maps are very accurate for a consolidated ‘terrestrial’ class (i.e., consolidated map Class #1; Table 2), since segmenting land versus marine habitats is straightforward with the infrared channels of satellite imagery. Intertidal and reef crest classes also proved difficult to sample, due to their extremely shallow depths at the islands surveyed. Only two reef crest videos and no intertidal videos were captured. Thus, the accuracy for intertidal classes was not assessed and fore reef crest was insufficiently sampled to draw strong conclusions. To put this limitation in perspective, however, intertidal and reef crest classes were each found to have < 1% of the total number of classified pixels (Fig. 7). Therefore, their omission from the accuracy assessment is unlikely to change overall conclusions about classifier performance.
The limited field-of-view of the drop camera prevented the discrimination of many of the fine details between Level 5 classes (Table 2). For instance, the videos were adequate to classify the seabed in general as a ‘Lagoonal Reef,’ but the field-of-view was inadequate to resolve whether a given lagoonal reef was only 10 m in diameter, or smaller, which would correspond to the Level 5 map class ‘Lagoon–Coral Bommies,’ versus a much larger patch, which would be a Level 5 ‘Lagoon–Pinnacle Reef.’ To compensate for this discrepancy in scale between the satellite data and the ground-truth data, we grouped the 36 Level 5 classes into a smaller number of ‘consolidated classes’ (Table 2). For most sites around the world, 16 consolidated classes were used, reflecting different combinations of geographic zone and substrate. In the Atlantic, however, geographic zone was not as easy to define, so additional classes were consolidated in the Atlantic, reducing the total to seven for those sites.
Overall, producer’s and user’s map accuracies were computed for each site using the consolidated classes (Table 2) via the error matrix approach (Story and Congalton 1986). In addition, the Kappa (Congalton 1991) and Tau (Ma and Redmond 1995) coefficients were computed to quantify the degree to which the accuracy of each map was better than random chance. Equal prior probability was used for calculating Tau because no a priori information on class probability was used in the hierarchical segmentation. It should be noted that the accuracies quoted in the error matrices (Table 3) are for the consolidated classes and cannot be extrapolated to speak to the accuracy of the individual classes prior to their consolidation.
Map accuracy as assessed via standard error matrices does not allow for geographic offsets between the habitat map and reference data. Such offsets are often unavoidable, however, and stem from the many vagaries of setting an exact position on the ocean during fieldwork. Sources of positional error include GPS inaccuracies, diver observations not made exactly beneath the position recorded when entering the water, and the drift of the tethered ‘drop’ camera away from the boat. These offsets might reasonably be expected to routinely exceed the 1.85 m pixel width of the WorldView satellite, with the result that the ground-truth data are not perfectly registered with the habitat map. Furthermore, with a horizontal field-of-view, as was the case with the drop camera used for this study, the video data are directional, which can have just as great an impact as positional uncertainty. Imagine the camera positioned on an edge between two classes; the class assigned to that ground-truth point would depend on the direction in which the camera was orientated, even if the camera position did not change. Whereas such offsets might legitimately be considered as inaccuracies for habitat maps produced on a local scale, we consider them to be acceptable when mapping across hundreds of thousands of sq. km of Earth’s remotest reef systems. Thus, we wanted a way to assess accuracy that would account for uncertainty in the relative position of ground truth to satellite data.
To sensibly address geographic uncertainty in camera location, we used a metric called ‘lagged accuracy’ which, for each ground-truth point, collates the cumulative probability of encountering pixels mapped as the same class attributed to the assessment point, for lag distances between 0 and 300 m offset from that point, in all Cartesian directions. Providing that map pixels of the class sought exist within the specified lag distance around the accuracy assessment point, the cumulative probability of encounter will rise as a function of increasing lag, with the rate of that rise dictated by the density of pixels of that class in the queried portion of the habitat map. Of course, it is unreasonable to take the existence of a map pixel within the search radius with the same class assignment of that of the accuracy assessment point to justify scoring the map as accurate. Indeed, for large lag distances, the correct assignment will be recognized even if the queried habitat map is random. Hence, it is necessary to set sensible thresholds in both lag distance and cumulative probability that might rationally indicate that a positioning error has precluded an exact match at the location of the accuracy assessment point. While there are no precise answers, we felt 25 m was a sensible threshold for the accuracy of a regional-scale map used to support marine spatial planning.
Patterns of habitat classification accuracy
To explore possible causes of habitat map error, the per-site Tau coefficients (a measure of map accuracy) were cross-plotted against mapped area and the complexity of the habitat maps (Fig. 9). Doubtless, the KSLOF-GRE dataset allows for all sorts of analysis of the spatial patterns among reef systems around the globe, and it is our hope that it will be used for such in the future. The present goal, however, was simply to check for broad and systematic patterns related to habitat classification accuracy.
There are many ways to quantify scene complexity. One of the simplest is to count the proportion of edge pixels, i.e., those which border a different class. Edge pixels are good metrics for assessing habitat classification because they are affected by a combination of class variety, spatial arrangement, and pixel mixing (Heydari and Mountrakis 2018). Furthermore, accuracy has sometimes been shown to decrease with increasing proportion of edge pixels (Heydari and Mountrakis 2018). Checking whether this pattern held for the KSLOF-GRE was valuable because datasets with a sufficient number of scenes with varying complexity to test this are rare.