Data collection
The Sand Dune Survey of Great Britain (Radley 1994) is used as a starting point to identify the location of these coastal sand dune habitats within protected sites. This is then combined with the relevant SSSI boundary to create a survey polygon. The position of mean high water (MHW) has been used to define the seaward limit for the survey and ensures data capture does not lose features such as embryo dunes. This survey polygon is then used by Environment Agency’s aerial surveyors to capture two datasets CASI (Compact Airborne Spectrographic Imager) and LIDAR. These datasets are both captured and processed in-house by Environment Agency Geomatics.
CASI
Multispectral data is captured using an ITRES Compact Airborne Spectrographic Imager (CASI) 1500. This is used to capture spectral data using a 22 channel set up (Table 1) designed for coastal and intertidal work and at a resolution of 1 m. Data capture is targeted in the summer season, with the optimum conditions for vegetation between June – August, although later capture in September is possible provided the vegetation on the target site is not in senescence.
Table 1 Wavelengths of CASI bands used. Each band shows the Wavelength centre and Spectral width of the band in Nanometres.
The data is radiometrically corrected using ITRES algorithms following annual spectral calibrations, then Quality Checked (QC) for gaps, lighting anomalies and distortions within each flight line. The data is then orthorectified using LIDAR data and mosaicked into a single dataset. Final QC are then carried out to ensure the flight lines edges have correctly aligned and for lighting differences between the flight lines.
LIDAR
LIDAR data is captured using an Optech Gemini ALTM, during the winter flying season (October–March). An exception to this is if suitable resolution (0.5 m or 1 m resolution) archive data is available within the past 0–2 years and no storm events have occurred causing major site changes. There is potential for some change in dune topography between the LIDAR and CASI data capture. However, the reason for preference of winter LIDAR capture over summer is that the frequency of laser pulses reaching the ground is greater in winter especially through deciduous vegetation. The implications of this are that a more accurate terrain model can be generated, and hence better modelling of vegetation heights.
LIDAR processing goes through a number of stages: including on trajectory quality, flight line overlap and coverage and ground control to verify accuracy. All of these stages must be passed including a Quality Control (QC) to ensure that the elevation Root Mean Square Error is better than +/−15 cm. The data is then processed to produce a Digital Terrain Model (DTM) using semi-automated raster classification and filtering techniques. Finally a repeat of the ground truthing is carried out alongside a number of other QC procedures to ensure full coverage and absence of striping. The final outputs from this, which are used in the habitat mapping, are Digital Surface Model (DSM) based on First Returns, a DTM and Intensity data. From these products two derived products are made: a slope model from the DTM and Canopy Height Model (CHM) from the DTM and DSM.
A known limitation of winter LIDAR capture is that returns are not from the top of the tree and shrub canopy in deciduous vegetation but rather large branches and the trunk lower in the tree structure. This means that the height of tall vegetation recorded in the CHM, are likely to be under calculating the vegetation height. Knowledge of this limitation has enabled us to develop a methodology which takes this into account.
Ground data collection
Ground data collection is carried out alongside the aerial data collection to assist with interpretation of imagery (Table 2). It is used to train the classification system and validate the final habitat map. Data is collected as soon as possible after the CASI data capture, normally within 1–2 months. The field collection of data can take between 1 and 4 days. The methodology for collection of ground data is to capture a good spatial spread of samples for each habitat type present on the site. Access on the site and time limitations may lead to multiple samples of different habitats being collected within a small area; however subsequent samples need to be collected over the rest of the site. This is to ensure that variations in habitat across the site are picked up and that spectral differences between flight lines can also be identified in the analysis.
Table 2 Data collection dates
When collecting ground data, a CASI true colour image is taken into the field onto which polygons are drawn to identify the location of habitats and must only contain a single habitat class, according to our habitat classes (Table 3). This action is to reduce errors from poor ground data, by mapping the features seen, so greater confidence can be assigned to the quality of data. For each ground data sample collected the following are recorded; class name, prominent species, other notes of interest (i.e. recent management, vegetation heights or topography) and surveyor. For mosaics or transitional habitats present, polygons are drawn and a description of key species is noted, but no class is assigned. Many of the samples are photographed, where possible with GPS co-ordinates to help interpret the final maps and address queries.
Table 3 Habitat Map Classes and equivalent Annex I class where appropriate
To help gain an understanding of the site, plan effective and time efficient site visits, and to get a good spatial spread of samples it is important to work closely with the site officer when planning the site visit. We also aim to have a site officer come out during the visit; this can help clarify unique site features, on-going management and natural changes that are occurring on the site.
The ground data is digitised as polygons into ArcGIS by Natural England, with all the sample information recorded as attributes. The data entry is validated to ensure that classes are correctly identified and to avoid the inconsistency that can arise when multiple surveyors work on a site (Hearn et al. 2011).
Data analysis
The data analysis process runs through seven key stages to produce the habitat map (Fig. 2). The first stage, data preparation, was undertaken using ArcGIS 9.3 or 10.2 and ERDAS Imagine 9.3 or 2014. This includes setting all datasets to the British National Grid projection system in a single software package. These datasets are: CASI, DTM, CHM and Slope. This is done to avoid the classification software thinking there are different project systems being used due to software formatting of the projection system. The null data value is also set for the DTM to the lowest actual elevation value; allowing the graphical functionality to work in the classification software.
Rather than traditional pixel based methods (Lucas et al. 2007), an object based image analysis (OBIA) method is used to undertake the classification. This classification used the OBIA software Trimble eCognition v8.9 or 9.0. OBIA has been shown to give improved results with a higher accuracy compared to pixel based methods (Hussain et al. 2013 & Strasser and Lang 2015). It also provides a powerful solution to using various data sources and through technological developments is able to deal with the complex processing needed (Toth and Jozkow 2016). OBIA has been successfully used by large detailed habitat mapping studies, such as the Phase 1 mapping of Wales (Lucas et al. 2011). It is also recommended and used in the Making Earth Observation Work projects, looking to create a Living Map of the UK (Medcalf et al. 2011 & Medcalf et al. 2013).
Stage 2 (Fig. 2) is the Project Setup in eCognition, this includes loading all the image and thematic data (as Shapefiles) and the process ruleset, this ruleset loads the Process Tree, Class list, and features needed for the classification. The use of eCognition allows a decision tree methodology to be applied (called a Process Tree in eCognition) and orders the various classification stages (Fig. 3). These stages contain a number of algorithms which implement the classification, including: segmentations, classifications and merging.
Initial steps in the classification are to carry out a chessboard segmentation, identification of intertidal areas, exclusion of parts of the image outside the area of interest (AOI) and classification into two classes; Vegetation and Non-Vegetation (Fig. 3– Part 1). The chessboard segmentation creates square objects with their size equalling the CASI resolution. This is done, to ensure that small subtle features; such as embryo dunes and bare sand within dunes are retained as single objects during the later multi-resolution segmentations, and helps improve the classification accuracy of these features. After the chessboard segmentation is carried out, it is possible to start classifying the objects. The first classification that takes place is to identify the intertidal area, with the Highest Astronomical Tide (HAT) height being the upper limit of the intertidal area. A second classification is then undertaken to classify all objects into either a Vegetation or Non Vegetation class (Fig. 4 – Centre Image). These results are then merged creating large objects for each class. This allows the subsequent processes to focus on the relevant site areas to identify a habitat. For example, Saltmarsh is only mapped within areas already found to be intertidal and vegetation.
The main stage of the classification is to classify the detailed habitat classes (Fig. 3 – Parts 3 & 4). This is started by creating the Detailed Habitat Level through a multi-resolution segmentation algorithm (Fig. 3 – Part 2) to create a new lower level in the classification onto which the main classification happens (Fig. 4 – Right Image). Segmentation into these objects has been found to help in classifications, as it allows plant communities to be mapped rather than individual plants when using a pixel based method (Blaschke 2010; Lucas et al. 2011). The software does this segmentation to a lower level so that it can create smaller more detailed objects than those above it and allow for the distinct habitat communities to become objects.
Non vegetation classes are initially classified (Fig. 3 – Part 3); these include Artificial Surfaces, Bare Sand and Water. Thematic data and spectral features are used in the decision rules for non-vegetation classes. This thematic data includes Ordnance Survey VectorMap™ District data for Buildings, Roads and Railways. The functionality within eCognition to interrogate information from the objects are called Features. These features include simple features like mean spectral values for an object, and advanced features like indices such as Normalized Difference Vegetation Index (NDVI), object extent /shape information and relational information such as border to /distance to. The ability to integrate relational information into the classification is another advantage of using OBIA (Blaschke 2010). These non-vegetation classes are classified before the detailed vegetative classes, as relational rules are used in some of the vegetative classes such as Embryo Dunes; which has a rule based on distance to Bare Sand.
Major vegetation classes are classified in sections, allowing ecological knowledge to help guide and limit the classification. The order of vegetation classes classified is shown in Fig. 3 – Part 4. The first to be classified are the scrub and tree classes, this is to remove these objects from potential misclassification as low vegetation types i.e. Dune Slacks - Creeping Willow. Scrub and Tree classes rely on the CHM to distinguish them clearly from other vegetation, as previous studies have found that improvements can be achieved from combining LIDAR and spectral information (Jeong et al. 2016 & Mucher et al. 2015). In this mapping Scrub is mostly defined as 0.5 m to 2 m in height, then Broadleaved Dune Woodlands and Coniferous Dune Woodlands defined as having a height above 2 m and an area over 2000m2. While all the classes rely on spectral information in their rules, these early distinctions allow for a greater focus in the remaining classes to be on the wealth of information in the CASI data. Spectral bands used in the rules and appropriate thresholds were decided upon using the various spectral information displays that eCognition has, including the graphical display of the Spectral Selection Information. This shows a histogram of the spectral responses for each band, and can compare them to another class. This also provides the range each class values are within and the amount of overlap between the two classes. To produce an accurate classification is an iterative process, identifying areas of misclassification and adapting rules and thresholds to limit these. This iterative process has been slower on some of the more recent large sites and eCognition Server 9.0 was acquired. This splits the project into tiles 1500 × 1500 pixels, and processes each tile individually before stitching the results together. This process is more efficient due to the ability to faster process multiple small tiles rather than a single large scene.
The habitat map is then tidied through a number of merge algorithms, used to combine neighbouring objects of the same class into single larger objects and finally export as a vector dataset (Fig. 3 – Part 5). Stage 4 (Fig. 2) of the habitat mapping workflow then carries out QC. This involves three checks, ensuring data matches on tile boundaries, identifying any objects which have not been mapped and checking the results against the training ground data. Once this QC has been undertaken manual edits are made to the classification (Fig. 2 – Stage 5). This work did not set out to create a classification purely in one software package or solely as an automated classification but rather to create a high quality and accurate habitat map. This need for high quality and accurate results has allowed us to be confident and open in the fact that we do carry out manual changes. These edits are normally limited to small distinct features falsely classified in larger classes or cleaning of feature edges where a transitional area has been better defined during the ground data collection.
Finalising the habitat map involves a final stage of QC (Fig. 2 – Stage 6), including verification of the results by habitat specialists/ site officers and where possible accuracy assessment. The verification by ecological specialists/ site officers is viewed as crucial in ensuring the results are correct and fit for purpose. This verification takes place through a systematic viewing process, heading through the complete site. The habitat map is compared against the CASI data and Aerial Photography throughout this viewing process. An accuracy assessment is also carried out where sufficient ground data has been collected to train the classification and provide an appropriate number of samples to undertake a robust accuracy assessment. Where sufficient ground data was available for training and accuracy, a subset of the ground data was selected before the classification started and kept separate to the data used for training the classification. An accuracy assessment is then run using eCognition. The ground data is loaded and used to create a Training and Test Area (TTA) mask, which allows the samples to be used on other scene levels. The TTA Mask is then used to test the results and produce an Error Matrix including individual class user and producer accuracy values and an overall accuracy value.
Evolving to operational use
The mapping work has evolved since 2012 into an operational product, the methodology and process for which has been used every year since development. Lessons learned from class classification (Table 3 shows the habitat map classes and equivalent Annex I habitats) and processing have been fed back into continual improvement over the period of the project (J. Brownett, Ground truthing & class codes report & guidance, Unpublished). Additional classes were added due to mapping new sites in 2014 which contained Coastal and Floodplain Grazing Marsh and Shingle – Vegetated.
Each CASI flight generates a unique set of data as the spectral information from the CASI. The spectral signatures of the vegetation vary for various reasons: lighting conditions, differing species making up the vegetation communities and phenological differences (even over the summer capture period). This means that the classification process needs to be adjusted for each site and session of data collection, with a number of changes needed.
The first change when starting the site mapping is to ensure each class needed for the site is enabled within the classification process. This is done by checking which classes had samples collected during ground data collection and ensuring these are the only ones enabled. Doing this stops classes being wrongly identified on the site when they do not exist there, and by checking it with the ground data means there is definitely ground data available to use to adjust spectral rules. The classification also makes use of tide heights, including the HAT. These levels need to be updated to the relevant tide height levels occurring at the site.
Finally the spectral rule thresholds within each class will need adjusting, however the spectral bands used for rules should not be adjusted between sites. This limits variability between sites classified due to the process, while allowing for the different spectral signatures to be accounted for. These spectral rule thresholds may need refining and so an iterative classification process is gone through to do this. Only when the original spectral bands used in the rules do not allow for accurate discrimination of that class should the spectral bands be changed. This has not occurred often and is mostly due to some sites having heavily mown, dry improved grassland compared to more productive growth on other sites or heavy senescence in bracken. By using this classification process and making these changes at each site, we are able to operationally use and repeat sand dune classifications, in an approach we describe as a Semi-Automated Classification.