1 Introduction

Unmanned aerial vehicles (UAVs) were initially developed for military purposes but their potential was quickly recognised for civilian applications. Examples include [18] who proposes the use of UAVs for vineyard management based on vegetation canopy reflectance, mapping of invasive weeds [17] to plan herbicide applications, or using an infra-red camera to map the crop water stress index (CWSI) to guide irrigation. [2, 6] use UAVs to monitor soil erosion. [21] use aerial data for quantifying tree height by reconstructing 3D clouds of points.

Vegetation classification is a an important source of information for conservation science, agriculture, forestry and planning. UAV technology provides an alternative to ground-based measurements and satellite data as a means of obtaining suitable imagery for vegetation classification. [19] uses k-means clustering to classify airborne laser scans from UAVs into two different forest regions. [7, 9, 15] also use UAV remote sensing to classify vegetation. [22] compares classification methods for data recorded from an UAV.

Fig. 1.
figure 1

Native vegetation, exotic weed species and agricultural land on Waiheke Island, Auckland, New Zealand. Mosaic processed using Pix4Dmapper Pro 2.071 from Sony Action Cam images taken from a Blade QX350 quadcopter

The basic intention of image segmentation is the partitioning of an image into connected regions of pixels defined by similar colour or texture. In general it is concluded in the papers cited above that segmentation-based classification methods outperform pixel-based classification methods. In this paper, we compare four segmentation algorithms using aerial data recorded with UAVs over a warm temperate island (Waiheke Island, New Zealand) or in a polar desert (McMudro Dry Valleys, Antarctica) [3]. Figure 1 shows a mosaic generated from individual frames of a video recorded in New Zealand.

The algorithms compared are simple linear iterative algorithm (SLIC) [1], a Gaussian mixture model (GMM) [13], a hidden Markov model with expectation maximisation (HMM-EM) [20], and a mean-shift (MS) algorithm [5]. We assess the performance of these algorithms in classifying native forest, agriculture and invasive weeds, and also dry valley habitats in Antarctica. However, as none of these methods performed well, we propose a post-processing technique to merge and split output regions to improve the classification of the original segmentation algorithms.

The segmentation step (i.e. being only an initial step in a sequence of scene analysis procedures) aims at separating the scene into semantic regions, for instance, into lawn, forest, patches of gravel, or houses. We use a colour and texture similarity measure for post-processing.

The paper is structured as follows. Section 2 presents the segmentation methods and discusses the drawbacks for our image data. In Sect. 3, the proposed post-process is reported. A comparative performance analysis is given in Sect. 4. Section 5 concludes.

2 Segmentation Methods

The trials were conducted on images from Waiheke Island, New Zealand, recorded with a Sony Action Cam HDR100 camera mounted on a Blade QX350 version 3 quadcopter UAV, and from the Taylor Dry Valleys in Antarctica recorded from a fixed wing Skycam “Polarfox” with Sony Nex5 camera (Bollard-Breen et al. 2015). The images were initially segmented using the SLIC, GMM, HMM & EM, and MS algorithms. Each method has its merits and drawbacks for tackling different image segmentation tasks.

2.1 Tested Segmentation Methods

We briefly recall those segmentation methods later used in Sect. 4. Given an \(N \times N\) image I (for notational simplicity in this paper we assume square images), those four methods are applied to generate a labelling f (i.e. the segmentation result) which assigns uniquely a segment number (i.e. the label) to each of the \(N^2\) pixels.

SLIC. SLIC is a segmentation algorithm that segments images into “nearly convex” regions called superpixels. It uses a strategy similar to k-means but with some crucial modifications. Every cluster has a defined neighbourhood, and the algorithm is spatially constrained to only merge pixels if they are near the cluster. The size of the considered neighbourhood is proportional to the number k of clusters in the image. This number k is the only input parameter for the procedure.

The clustering distance depends on the L\(^\star \)a\(^\star \)b\(^\star \) colour of the pixels and their (xy) coordinates. See [1, 14] for details. Consider pixel \((x_i,y_i,L_i,a_i,b_i)\) and cluster centroid \((x_j,y_j,L_j,a_j,b_j)\). SLIC uses the following distance between both:

$$\begin{aligned} D_{SLIC}= & {} \sqrt{d_c^2+\frac{d_s^2}{S^2}\lambda ^2}, \quad \text{ where } \quad S=\sqrt{N/k} \end{aligned}$$
(1)
$$\begin{aligned} d_c= & {} \sqrt{(L_i-L_j)^2+(a_i-a_j)^2+(b_i-b_j)^2} \end{aligned}$$
(2)
$$\begin{aligned} d_s= & {} \sqrt{(x_i-x_j)^2+(y_i-y_j)^2} \end{aligned}$$
(3)

where \(d_c\) is the color distance, \(d_s\) the spatial distance, S is the length of one side of a cluster neighbourhood, and \(\lambda > 0\) is a weight constant. See [1].

GMM. GMM specifies a segmentation method based on a parametric model in which the probability density function of the levels in the image is a mixture of a number of different Gaussian density functions. The goal of GMM is to find the optimal thresholds that divide the probability density function of the image (i.e. the histogram of the given image) in \(\kappa \) Gaussian density functions, where each Gaussian density function represents a region of the image. See [13]. The Gaussian density function has the following form:

$$\begin{aligned} \mathrm {Pr} (u|C_j)=\frac{1}{\sigma _j\sqrt{2\pi }} \cdot e^{-\frac{(u-\mu _j)^2}{2\sigma ^2_j}} \end{aligned}$$
(4)

where \(\mathrm {Pr} (.\!|C_j)\) is the jth Gaussian density function for image values u; \(\sigma ^2_j\) is the standard deviation, and \(\mu _j\) is mean value of this Gaussian density function.

HMM & EM. HMM & EM is an edge-prior preserving segmentation algorithm which can be used for obtaining accurate segmentation labels by using the maximum a posteriori (MAP) criterion. This algorithm uses an initially segmented image, which is obtained using GMM or k-means clustering with estimated parameters. See [20].

HMM & EM is defined for images \(I=(\mathbf{u}_1,\ldots , \mathbf{u}_{N^2})\), drawn from an assumed distribution, where \(N^2\) is the total number of pixels, and each \(\mathbf{u}_i\) represents the values of the colours of a pixel, with \(\mathbf{u}_i=[u_{iR},u_{iG},u_{iB}]^\top \). Let \(L=\{l_1,\ldots , l_m\}\) be a set of possible labels, also drawn from an assumed distribution. See [20]. The goal of HMM & EM is to find a labeling \(f^*\) (i.e. which maps all \(N^2\) values uniquely into set L) which satisfies

$$\begin{aligned} f^*= \mathrm{argmax}_f\{\mathrm {Pr} (I |L,\varTheta )\mathrm {Pr} (L)\} \end{aligned}$$
(5)

where \(\varTheta =\{\theta _l : \, l \in L\}\) is a set of distribution parameters

$$\begin{aligned} \theta _l=\{(\mu _{l,1}\sigma _{l,1},w_{l,1}), \ldots ,(\mu _{l,g}\sigma _{l,g},w_{l,g})\} \end{aligned}$$
(6)

for g Gaussian components, each having \(\mu \) as mean value and \(\sigma \) as standard deviation; w is a weighting probability.

MS: MS is a variant of the steepest-ascent method to seek stationary peaks in a density function defined in a property space. Properties are values in the image (values directly available, such as colour values at pixels or coordinates, or values derived at pixels, such as local variance or gradient values). The algorithm is basically a density estimator. Discrete gradients are estimated in property space using a local weighting function (the kernel) for approximating derivatives. Let r be the radius of the kernel function K, and \(K'\) is the derivative of this kernel K.

The resulting steepest-ascent gradient defines the mean-shift vector, thus a new location in property space, where the procedure repeats, until the magnitude of the gradient is close to zero. The used density estimator has the following form:

$$\begin{aligned} E_K(\mathbf u )=\frac{c_k}{Mr^n}\sum _{i=1}^{M}K(\frac{1}{r^2}||\mathbf u -\mathbf u _i||^2_2) \end{aligned}$$
(7)

The function is defined by kernel K at property points or vectors \(\mathbf u \) in an n-dimensional property space, \(c_k\) is a normalization constant, M is the number of property vectors \(\mathbf u \). For time-efficiency reasons we use a simplified MS algorithm, basically without the second (delineation) step in [5].

2.2 Drawbacks of the Segmentation Methods

The above segmentation methods have some important drawbacks for the considered application, and we discuss those here. See Figs. 2 and 4.

SLIC. SLIC has two main problems, over-classifying a single relatively homogenous region into many different superpixels, and forming large superpixels that contain two or more regions inside. The latter mostly occurs with superpixels around the borders of the image, or where there are not enough superpixels to segment the image.

GMM. GMM tends to create many isolated pixels and many islands (i.e. connected regions of pixels that are surrounded by pixels of another region). Borders between regions are also not well defined, and the regions intercalate gradually.

HMM & EM. HMM & EM tends to classify images into nested regions where a segment is surrounded by a larger region, which in turn lies within another larger region. This pattern does not adequately describe the patterns present in studied vegetation or landscape images.

MS. In the mean-shift segmentation, the number of clusters depends on the image and creates a very large number of labels for segments belonging to the same semantic region. This can be avoided by a post-processing algorithm to join similar regions into single segments.

3 Proposed Post-processing

Given a label map \(I^m\) generated by a segmentation method (i.e. representing a labelling function f in form of an image), we propose post-processing to improve the segmentation performance of those methods by splitting segments, merging segments, and avoiding islands. In this section, we report the proposed post-process.

Splitting Segments. We introduce color similarity to decide whether to further split segments or not. In \(I^m\), one segment may contain multiple regions, such as both grass and forest regions, which require further segmentation. We introduce color similarity for deciding further splitting segments or not. The splitting condition is formulated as follows:

$$\begin{aligned} \sigma _{color_i}< & {} \tau _1\end{aligned}$$
(8)
$$\begin{aligned} \sigma _{color_i}= & {} \sqrt{\sigma _{L_i}^2+\sigma _{a_i}^2+\sigma _{b_i}^2} \end{aligned}$$
(9)

where i denotes the ith segment, and L, a, and b are the color components in the L\(^\star \)a\(^\star \)b\(^\star \) color space. \(\sigma \) denotes the standard deviation. If the value of the standard deviation \(\sigma _{color_i}\) is over the threshold \(\tau _1\), the segment i is further segmented using the SLIC method.

figure a

Merging Segments. As a result of the original segmentation, or after performing further splitting, a region (e.g. grass) might be segmented into too many smaller segments. We use color similarity and a texture-based measure for merging segments. The Laplacian kernel calculates an approximate of second-order derivatives along horizontal and vertical directions in an image; we compute the Laplacian in luminosity images [8, 10]. Due to changes in illumination, misclassifications may occur as the texture information is illumination independent.

The merging condition is defined as follows:

$$\begin{aligned} D_{ij}< & {} \tau _2\end{aligned}$$
(10)
$$\begin{aligned} \mathcal{L}_{ij}< & {} \tau _3\end{aligned}$$
(11)
$$\begin{aligned} D_{ij}= & {} \sqrt{(L_i-L_j)^2+(a_i-a_j)^2+(b_i-b_j)^2}\end{aligned}$$
(12)
$$\begin{aligned} \mathcal{L}_{ij}= & {} \left| \mathcal{L}_i-\mathcal{L}_j \right| \end{aligned}$$
(13)

where \(L_i\) is the average value of luminosity of the segment i, \(L_j\) is the average value of the luminosity of segment j, and \(a_i\) and \(a_j\) are the average values of the a component of the segment i and j, respectively, and \(b_i\) and \(b_j\) are the average values of the b component of the segments i and j, respectively. \(\mathcal{L}_i\) and \(\mathcal{L}_j\) are the average values of the Laplacian of segments i or j, respectively. If the color similarity \(D_{ij}\) or texture similarity \(\mathcal{L}_{ij}\) is smaller than a threshold, the two segments i and j are merged into one segment.

figure b

Avoiding of Islands. An island is a connected cluster of pixels that is surrounded by pixels of another segment. These islands complicate the interpretation of the resulting image, so we use a filter, sized \(3\times 3\), to remove small islands. If more than four adjacent pixels (using 8-adjacency) have the same segment label, and the label is different from the central pixel label, the central pixel label is changed to the majority label in the 8-neighborhood.

figure c

4 Comparative Performance Analysis

The experiments are conducted on images recorded with a Sony NEX5N14Mp camera mounted on a Swampfox UAV. The test parameters for individual methods were set to \(\tau _1=5.5\), \(\tau _2=62\), and \(\tau _3=\)2.3 for the illustrated Waiheke video, and to \(\tau _1=3.5\), \(\tau _2=50\), and \(\tau _3=\)0.2 for the used Antarctica images.

The original segmentation results are shown in Fig. 2. The SLIC method generates similar-sized nearly rectangular segments and follows edges between grass and forest. GMM labels segments pixel-by-pixel which generates many small islands. The GMM segments follow outlines of interest better than SLIC segments. The HMM-EM method generates smooth multiple-layer segments and is sensitive to regions with different colours or texture. MS generates an enormous number of islands; the results of the applied MS algorithm appear to be too sensitive to value changes.

The segmentation results with the post-processing are shown in Fig. 3. The borders between grass and forest are improved significantly. The various color and texture regions within the forest are separated to a certain level. The post-processing parameters \(\tau \) can be adjusted accordingly for different applications.

The results change with the thresholds used. Here we present results for thresholds that divide the used Waiheke images into forest and grass. However, the thresholds can also be adjusted to achieve a modified segmentation for different targets, for example for groups of trees.

Fig. 2.
figure 2

Segmentation results of original segmentation methods. Top to bottom: input images, results by SLIC method, results by GMM method, results by HMM-EM method and results by MS method

Fig. 3.
figure 3

Segmentation results after the proposed post-processing. Top to bottom: input images, results by SLIC method with post-processing, results by GMM method with post-processing, results by HMM-EM method with post-processing and results by MS method with post-processing

Figure 4 shows results using the original segmentation techniques for two images from data base. In the first column are the images, the second column are the resulting images when SLIC is applied, the third column are the resulting GMM images, and in the last column are the images when HMM-EM is used.

In Fig. 5 we illustrate the effects of our post-process. The post-process helps to clean the results of the GMM (avoiding the islands). In the resulting GMM images, borders follow better the outlines of interest than SLIC. Also, in this case, the HMM-EM method generates smooth multiple-layer segments.

Fig. 4.
figure 4

Segmentation results of the Antarctica images with the original segmentation methods. Left to right: input images, results by SLIC method, results by GMM method, results by HMM-EM method and results by MS method

Fig. 5.
figure 5

Segmentation results of the Antarctica images when it is applied the proposed post-processing. Left to right: input images, results by SLIC method with post-processing, results by GMM method with post-processing, results by HMM-EM method with post-processing and results by MS method with post-processing

4.1 Quantitative Measurement

Let \(\mathcal{S} = \{S_1,\ldots ,S_{N_{segments}}\}\) be a segmentation result. We introduce a quantitative measure for \(\mathcal{S}\). In this measure, segments with small standard colour deviation and larger areas are preferred. The measure is defined as follows:

$$\begin{aligned} \chi (\mathcal{S})=\frac{1}{N_{segments}}\sum _{i=1}^{N_{segments}}\frac{A_i}{\sigma _{color_i+1}} \end{aligned}$$
(14)

where i denotes segment \(S_i\), \(\sigma _{color_i}\) is the standard colour deviation in \(S_i\), \(N_{segments}\) is the total number of segments, and \(A_i\) is the number of pixels in \(S_i\), known to be a good estimator for the area of a region [11]. We identify a larger value of \(\chi (\mathcal{S})\) with a better segmentation performance for the defined target. Results of this measure for the shown Waiheke images are listed in Tables 1 and 2. Results for the shown Antarctica images are in Table 3.

Table 1. Values \(\chi (\mathcal{S})\) for the original Waiheke segmentation results
Table 2. Values \(\chi (\mathcal{S})\) for the post-processed Waiheke segmentation results
Table 3. Values \(\chi (\mathcal{S})\) for original and post-processed Antarctica segmentation results

5 Conclusions

This paper compared four segmentation methods on aerial data recorded with an UAV. Different methods have their own merits and drawbacks. SLIC generates similar sized segments. GMM produces isolated regions, and HMM & EM generates regions with smooth borders. MS produces an enormous number of isolated pixels, but the post process cleans the resulting images. We propose post-processing for splitting and merging segments. The parameter settings can be adjusted according to requirements of given applications. We also introduced a measure for evaluating segmentation results and illustrated the measure based on samples of video data recorded at a warm temperate island (Waiheke Island, New Zealand), and in a polar desert (McMudro Dry Valleys, Antarctica).

In conclusion of our extensive experiments, we consider a two-step procedure as being very attractive in our further studies: First, apply GMM and post-processing for having “clean” borders between regions of interest (possibly followed by spline approximation [4, 12] for smoother borders). Second, apply SLIC only within each region provided by the first step for identifying ecologically important smaller segments.