Study Area
The study was carried out in two grassland fields at UNESCO biosphere reserve Rhön in Germany, which were invaded by lupine (Figs. 1a, 1b, 2). One field was classified as a former mountain hay meadow (hereafter referred to as G1), and the other was an old Nardus stricta grassland (hereafter referred to as G2). In both fields, rectangle plots of 1500 m2 (50 m by 30 m) were chosen as study areas, and 15 small plots of 64m2 (8 m by 8 m) were established within a grid (Fig. 1c, 1d). Three cutting dates (12th June, 26th June, 09th July, hereafter referred to as D1, D2, and D3, respectively) were randomly assigned to 5 replicated plots (Fig. 1c, 1d). At each date, plots were mowed at a stubble height of 5 cm, and biomass was removed from the field.
Data Collection
At each sampling date in each grassland field, UAV-borne images were acquired. A DJI-Phantom IV quadcopter (DJI, China) with an inbuilt off-the-shelf camera (FC330) was employed to obtain UAV-borne RGB images. The camera (FC330) captures a 12-megapixel image in red (R), green (G), and blue (B) bands. The UAV was flown at 20 m flying height, and it resulted in 0.009 m ground sampling distance. The UAV flight mission was designed using Pix4D capture app for Android (App version 4.4.0, Pix4D, Switzerland). The UAV was flown as double grid mission (two perpendicular missions), and the camera was triggered automatically to capture nadir-looking images based on the image overlap configuration (80% both forward and side overlap). All the flight sessions were conducted between 12:00 and 14:00. Before each flight session, nine black and white 1 m2 ground control points were distributed over the study area. Just after the UAV flights, the position of each ground control point was measured using a Leica RTK GNSS (Leica Geosystems GmbH, Germany) with 2 cm 3D coordinate precision. Additional UAV-borne RGB image was taken on 16 August 2019, when the whole fields were mowed.
A FLIR Vue Pro R (FLIR Systems Incorporation, USA) thermal camera was attached to the UAV parallel to the RGB camera. The camera has a 19 mm lens, and it has a spectral sensitivity between 7500 and 13,500 nm. With a single UAV flight, both thermal and RGB images were captured simultaneously. The thermal camera captures images as a radiometric JPEG which contains radiometrically calibrated temperature data. The thermal image has 640 by 512 pixels (FLIR 2016). The thermal camera was triggered every second throughout the whole UAV mission. Before each thermal data collection, metadata related to the thermal camera were collected using the FLIR UAS 2 app (App version 2.2.4, FLIR Systems Incorporation, USA), such as distance to the target (20 m), relative humidity, air temperature, and emissivity (0.98). All the metadata were saved in each captured image’s EXIF data.
A total of six UAV-borne RGB and six thermal datasets were collected. Hereafter, each dataset is labelled according to cutting date and grassland type (DiGj: where i = 1, 2, 3 and j = 1, 2). In each dataset, maturity stages of grasslands were different due to mowing activities in 64 m2 small plots. Maturity stage was lowest (V0) in the D1 dataset and was the same for all 30 small plots. At the 2nd cutting date (D2), 20 small plots out of 30 were covered by 2 weeks older vegetation (V2 weeks), while 10 small plots (which were cut at D1) had vegetation which was regrown for 2 weeks (VR2 weeks). The D3 dataset was composed of 10 plots with undisturbed vegetation (V4 weeks) which was 4 weeks older than V0, 10 plots of VR2weeks, and further 10 plots with vegetation regrown for 4 weeks (VR4 weeks) after D1.
Object-Based Image Analysis
Canopy Height Model and Point Density
Each collected dataset was processed separately, and the same procedure was applied, as explained below. The UAV-borne RGB images and coordinates of ground control points were processed with the Agisoft PhotoScan Professional version 1.4.4 software (Agisoft LLC, Russia). The software applied structure from motion (SFM) technique to align multi-view overlapping images and to build a dense 3D point cloud. The procedure of point cloud generation and canopy height computation was adopted from Wijesingha et al. (2019), and further details of the process can be found there.
The point density (PD) raster was created from the dense point cloud by binning into a raster with 2 cm. The PD raster contained point count under a cell area (4 cm2). The digital terrain model (reference plane) was generated using the August RGB images with a cell size raster of 5 cm. The z values of 3D point cloud and the digital terrain model based on x, y locations were subtracted to generate a point cloud with canopy height. The point cloud with canopy height was binned into the 2 cm cell size raster, where each cell contained mean canopy height value and hereafter it was considered as canopy height model (CHM) raster.
RGB Ortho-Mosaic
RGB ortho-mosaic was obtained after further processing of the dense point cloud in PhotoScan software. The output RGB ortho-mosaic was geo-referenced with a 1 cm spatial resolution. The RGB ortho-mosaic was converted into hue (H), intensity (I), and saturation (S) colour model using GRASS GIS and hereafter it was considered as HIS ortho-mosaic (Gonzalez and Woods 2008; GRASS Development Team 2017).
Thermal Digital Ortho-mosaic
The single JPEG thermal image contained 8-bit digital numbers. Following workflow and equations were adapted from Turner et al. (2017) to convert digital numbers to temperature values. The conversion workflow was conducted with EXIFtools and R programming language (Phil Harvey 2016; Dunnington and Harvey 2019; R Core Team 2019). A raw thermal TIFF image was exported from the JPEG image. Metadata of image were extracted from the JPEG EXIF header for each image. Based on the metadata and raw TIFF image values, the image with temperature was computed and exported as a TIFF file. The exported TIFF image contained a calibrated temperature value in degree Celsius (°C). Like RGB ortho-mosaic generation, thermal ortho-mosaic with 2 cm spatial resolution was generated using calibrated thermal images.
Spectral Shape Index and Texture Images
A spectral shape index (SSI) (Eq. 1) based on RGB image values was computed, and it showed excellent results for isolation of shadows within the vegetation (Chen et al. 2009). Moreover, two texture features (second-order statistics texture namely angular second moment (ASM)—uniformity, and inverse difference moment (IDM)—homogeneity) (Haralick 1979) from both intensity image and thermal image were computed.
$${{\rm SSI}} \, = \left|{{\rm R}} \, {+} \, {{\rm B}} \, - \, {2\times}{{G}}\right|$$
(1)
where R, G, and B are red, green, and blue values, respectively.
Segmentation
Segmentation and classification are the two main steps in OBIA (Silver et al. 2019). The segmentation is the first step and by definition “it divides an image or any raster or point data into spatially continuous, disjoint and homogeneous regions, referred to as segments or image objects” (Blaschke et al. 2014). The quality of segmentation depends on the balance between intersegment homogeneity and heterogeneity (Espindola et al. 2006). Variance and spatial autocorrelation (Moran’s I) between segments are utilised as measures to evaluate intersegment homogeneity and heterogeneity, respectively. The segmentation threshold (also referred to as scale) can control the balance between intersegment variance and spatial autocorrelation. Therefore, finding the optimum threshold is essential to produce segments which are matching to the real-world objects (Espindola et al. 2006). Johnson et al. (2015) established an F measure to identify the quality of the segmentation result for a given threshold value. The F measure is based on variance and spatial autocorrelation and calculated using Eq. 2 (Johnson et al. 2015). A weight value (a) must be defined in the F measure, with 0.5 is half weighting, and 2 is double weighting. The higher the F measure, the higher the quality of the segmentation.
$${{\rm F}} \, {=} \, \left({1} \, + {{\rm a}}^{2}\right)\left(\frac{{{\rm MI}}_{{\rm norm}} \, {\times} \, {{\rm V}}_{{\rm norm}}}{{{\rm a}}^{2} \, {\times} \, {{\rm MI}}_{{\rm norm}} \, {+} \, {{\rm V}}_{{\rm norm}}}\right)$$
(2)
where MInorm is the normalised Moran’s I value, Vnorm is the normalised variance value, a is the alpha weight, and F is the F measure.
Espindola et al. (2006) introduced the Unsupervised Parameter Optimisation (USPO) procedure to identify the optimum threshold value for the given image from a range of threshold values based on one of the quality measures mentioned above. The USPO procedure was implemented as an add-on tool called i.segment.uspo in GRASS GIS software (Lennert and GRASS Development Team 2019a). The CHM raster, PD raster and hue image from HIS ortho-mosaic were used in the segmentation process (Fig. 3). According to Georganos et al. (2018), finding the optimum threshold values for different local image regions gives superior results compared to the use of a single threshold for the whole image. Hence, the image was divided into sixteen small zones (15 zones overlapping with the 64 m2 plots for each study plot and one zone for the paths between the plots). Specific local threshold (ranging from 0.01 to 0.15) was determined for each region based on the F measure. The alpha value in the F measure calculation was kept at 0.5. Python Jupyter Notebook codes from Grippa (2018) were adopted and modified according to this study for automating the segmentation process using i.segment.uspo for each zone. The segmentation procedure was applied separately for each study plot and sampling date. A total of six different segmented rasters were created according to six datasets.
Attribute Calculation for Segmented Image Objects
The segmented raster was vectorised, and each segmented object was created as a polygon. Four geometric attributes [area (A), perimeter (P), fractional dimension (FD) (Eq. 3), and circle compactness (CC) (Eq. 4)] for the segmented objects were calculated. Based on all raster data (RGB image, HIS image, CHM raster, PD raster, thermal image, SSI image, and texture raster), the mean and standard deviation values for each polygon were computed as image-based attributes. Attribute calculation was done using i. segment.stats add-on in GRASS GIS (Lennert and GRASS Development Team 2019b). In total, 32 attributes (4 geometric and 28 image-based) were generated (Table 1).
Table 1 Description of the calculated object’s attributes (FD fractional dimension, CC circle compactness, SD standard deviation, SSI spectral shape index, ASM angular second moment, IDM inverse difference moment, CHM canopy height model, PD point density) $${{\rm FD}} \, {=} \, {2 \times} \frac{\log{{\rm P}}}{\log\left({{\rm A}} \, {+} \, {0.001}\right)}$$
(3)
$${{\rm CC}} {=} \frac{{\rm P}}{{2 \times} \sqrt{{\pi \times A}}}$$
(4)
Classification Model
Ten percent of the segmented objects (3698 out of a total of 81,704 objects) were manually labelled as either lupine (L) or non-lupine (NL) based on visual observation using the RGB ortho-mosaics. In each dataset, the number of L and NL labels was very similar (L = 1892 and NL = 1806), and labelled objects were spatially randomised. The labelled objects with attributes were utilised to develop a supervised classification model.
Classification model training and testing were conducted using R statistical software (R Core Team 2019). The random forest (RF) machine learning classification algorithm was employed to build a classification model using the mlr package in R software (Bischl et al. 2016). The RF has proven its efficiency for image–object classification using objects’ attribute data (Belgiu and Drăgu 2016). The RF algorithm utilises both decision trees and bagging (Breiman 2001). The decision trees are created from a subset of the training samples with replacement (known as bagging). Based on the average outcome from the decision trees, the sample is assigned to a majority class (Belgiu and Drăgu 2016).
A total of six RF classification models were built, and in each model, five datasets were employed to train the model, while the remaining dataset was used to test the model (Table 2). All the attributes (32) were employed as an input of the model alone with objects’ labels. Two hyperparameters, namely mtry (number of selected variables in each split) and node size (number of observations in a terminal node) (Probst et al. 2019) were tuned in the model training phase using random search. The model was trained with repeated spatial cross-validation resampling (five-folds and two repeats) to classify objects. The spatial cross-validation reduces the effect of spatial autocorrelation for model accuracy (Brenning 2012). Under the spatial cross-validation, the resampling is done based on the location of the observations. The location was based on the centroid of the objects. The trained model employed to predict objects’ labels of the holdout dataset. According to predicted labels and actual labels, the model performance was evaluated by calculating overall accuracy (OA), true-positive-rate (TPR), and false-positive-rate (FPR) (Eqs. 5, 6, and 7, respectively).
$${\text{OA}} \, = \frac{{\text{TP}} \, {+} \, {\text{TN}}}{{\text{TP}} \, {+} \, {\text{FN}} \, {+} \, {\text{FP}} \, {+} \, {\text{TN}}}$$
(5)
$${\text{TPR}} \, = \frac{\text{TP}}{{\text{TP}} \, {+} \, {\text{FN}}}$$
(6)
$${\text{FPR}} \, = \frac{\text{FP}}{{\text{FP}} \, {+} \, {\text{TN}}}$$
(7)
where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.
Table 2 Details of the training and testing datasets for different classification models Lupine Coverage Mapping
A single RF classification model (Mall) was trained using all labelled objects from the six datasets. Based on predicted labels from Mall, a lupine coverage map was generated (hereafter referred to as classification-based lupine coverage map). Additionally, the importance of the objects’ attributes to the Mall classification model was assessed based on the mean decrease Gini value. It is based on “the total decrease in node impurities from splitting on the variable, averaged over all trees” (Liaw and Wiener 2002). The higher mean decreased value indicated higher importance of the particular feature in the RF model.
A reference lupine coverage map for each dataset was created by digitising each RGB ortho-mosaic and was compared to the lupine area from the classification-based lupine coverage map. A relative number of no-difference pixels from two maps were computed as a measure for map accuracy (MA) (Eq. 8). Additionally, the pixel-wise correlation coefficient (PCC) was calculated. Each 64 m2 plot was divided into four equal areas of 16m2 each, and the relationship between relative digitised lupine area (LA) and MA of subdivided plots was analysed to understand the MA at different levels of LA.
$${\text{MA}} \, {=} \frac{\text{Number of equally categorised pixels in the two maps}}{\text{Total number of pixels}} { \times 100}$$
(8)