Background

Remote sensing is the science of acquiring information about objects or areas from a distance, typically from aircraft or satellites.

In tree canopies, the amount of radiation reflected in regions of different wavelengths is related to the chemical and physical properties of single trees as well as biotic and abiotic characteristics of an entire stand. Among the chemical properties of single trees are the levels of lignin, cellulose, nitrogen, chlorophyll, carotenoids, anthocyanins (Asner 1998; Clark et al. 2005; Grant 1987; Clark and Roberts 2012; Ustin et al. 2009), and water (Asner 1998, Gao and Hoetz 1990, Zarco-Tejada et al. 2003, Lee et al. 2010); these affect the health status of the trees (Waser et al. 2014). Among the physical properties of single trees are leaf and wood morphology, transmission characteristics (Asner 1998; Clark et al. 2005; Grant 1987; Clark and Roberts 2012; Ustin et al. 2009), vertical leaf area density (Treuhaft et al. 2002), and age (Ghiyamat et al. 2013; Roberts et al. 1997; Einzmann et al. 2014). The biotic characteristics of the whole stands include leaf and branch density, angular distribution, clumping, tree size compared to neighbours (Leckie et al. 2005, Korpela et al. 2011), and lichens, mosses, herbaceous vegetation, lianas, or other epiphytes (Clark and Roberts 2012). The abiotic characteristics of whole stands include topography, soil type (and its influence), moisture, and microclimate (Portigal et al. 1997).

It is hardly feasible to identify species-specific absorption features in the visible and near-infrared (VIS-NIR) spectral region; however, this is much easier in the shortwave infrared (SWIR) spectral region (Asner 1998). Salisbury (1986) presented leaf-level thermal infrared (TIR) spectra of four species and identified well-defined and notably different spectral features. Salisbury and Milton (1998) obtained close-range thermal reflectance measurements for several other species and reported differences in the spectra in most of them. Ribeiro da Luz and Crowley (2007) found that TIR spectra were associated with several chemical and structural compounds of plants such as cellulose, silica, xylan, and oleanolic acid levels, and reported that TIR signals were much more species-specific than the reflectance signals observed in the visible, shortwave, and infrared regions. Many plants develop chemical and aromatic compounds that might help define species-specific middle infrared and TIR signatures (Ribeiro da Luz and Crowley 2007; Ullah et al. 2012).

Identifying tree species using remote sensing data is useful in the context of detecting changes (Adams et al. 1995) and managing water stress (Cho et al. 2010; Fassnacht et al. 2016). It helps in the development of sustainable management policies (Dalponte et al. 2012, Jones et al. 2010, Plourde et al. 2007, Heinzel and Koch 2012, Kennedy and Southwood 1984) and performance of forest resource (Van Aardt and Wynne 2007) and single tree inventories (Korpela and Tokola 2006; Immitzer et al. 2015; Tompalski et al. 2014). It enables the assessment and monitoring of biodiversity, species compositions (Shang and Chisholm 2014; Wulder et al. 2006), wildlife habitats (Jansson and Angelstam 1999; Pausas et al. 1997), invasive species migrations (Chambers et al. 2013; Van Ewijk et al. 2014), and in the understanding of tree ecology (Chambers et al. 2013, Van Ewijk et al. 2014). It can also be applied to the estimation of insect abundance in forests (Kennedy and Southwood 1984) and the development of species-specific growth and yield models as well as allometric equations (Ørka et al. 2013; Vauhkonen et al. 2014).

Proper forest management and planning based on accurate distinction of tree species requires highly accurate classification maps that cannot yet be produced using the multispectral images typically acquired in four to eight wide spectral bands. Hyperspectral data are more useful for classifying tree species: the only condition is that the species must appear significantly different in the spectral reflectance measured in dozens of narrow spectral intervals (Clark et al. 2005, Heinzel and Koch 2012, Carlson et al. 2010, Dalponte et al. 2010, Dalponte et al. 2011, Stavrakoudis et al. 2014, Farreira et al. 2016). The reflectance of individual tree species is dependent on numerous factors, and the differences are sometimes too subtle to be observed using wide, multispectral bands (Dalponte et al. 2009; Mickelson et al. 1998). Since the technology was released, the cost of hyperspectral images has decreased gradually. It is expected that it will be soon possible to use hyperspectral imagery to study forest ecology and develop management and planning techniques (Innes and Koch 1998; Dalponte et al. 2008; Voss and Sugumaran 2008).

However, hyperspectral images contain a huge amount of auto-correlated data. Principal component analysis (PCA) is often used to solve this problem (Zagajewski 2010; Olesiuk and Zagajewski 2008; Bartold 2008). This widely known technique creates a set of artificial bands in which each band is less informative than the previous one. The minimum noise fraction (MNF) transformation works in a similar manner but reduces the noise first. More detailed information on these transformations is provided later in this paper. This method (MNF) was used by Zagajewski (2010) to classify vegetation in the Tatra Mountains, by Olesiuk and Zagajewski (2008) to classify the land cover of the Bystrzanka river drainage basin, and by Bartold (2008), Dian et al. (2014), and Han et al. (2004) to classify forest tree species. Han et al. (2004) compared the results with those obtained by canonical transformation, while Harsanyi (1994) used the ‘orthogonal subspace projection’ method. This method eliminates the response from non-targets while applying a filter to match desired targets in the data, and is most efficient and effective when the target signatures are distinct.

Algorithms such as the Pixel Purity Index (PPI) (Zagajewski 2010, Olesiuk and Zagajewski 2008, Bartold 2008) or linear spectral unmixing (LSU), which produces ‘maps of abundance’ in which each pixel is assigned to more than one class with a specified probability level (Luo and Chanussot 2009; Villa et al. 2013; Li et al. 2014), can be used to extract the pixels most useful for the classification (endmembers). Schull et al. (2010) also used pure spectral pixels to classify forests in the north-eastern USA and achieved an overall accuracy of 92%.

The ability to successfully classify forest tree species using hyperspectral data was proven for forests in the equatorial zone (Clark et al. 2005; Mickelson et al. 1998; Peerbhay et al. 2013; Goodwin et al. 2005), when seven tree species were classified using linear discriminant analysis (LDA), maximum likelihood (ML), and spectral angle mapping (SAM) methods, with accuracies of 80 to 100%. The hyperspectral data were also used in the tropical and sub-tropical zones (Dalponte et al. 2008, Dian et al. 2014, Dennison and Roberts 2003, Lucas et al. 2008, Yang et al. 2009, Gong et al. 1997, Van Aarst and Norris-Rogers 2008, Bellanti et al. 2016) with accuracies of over 90% and in the temperate zone (Zagajewski 2010; Olesiuk and Zagajewski 2008; Bartold 2008; Dian et al. 2014; Martin et al. 1998; Dalponte et al. 2013; Dmitriev 2014; Tarabalka 2010; Richter et al. 2016) with accuracies of 74 to 93%.

Classification results may be improved using hyperspectral data with light detection and ranging (LIDAR) data (Alonzo et al. 2014). For the temperate and sub-tropical (Hainzel and Koch 2012; Dalponte et al. 2008; Caiyun and Fang 2012) zones, the accuracies were over 80%. Passive optical systems, particularly hyperspectral ones, generally showed higher potential for tree species classification than active synthetic aperture radar (SAR) or LIDAR sensor systems. However, LIDAR data have proven suitable for regions with a low number of species (Fassnacht et al. 2016). Forest stands classified with the highest accuracy in the European temperate zone include mostly homogenous ones, dominated by Scots pine (Pinus sylvestris L.) and Norway spruce (Picea abies L.). Of the broadleaved species, European beech (Fagus sylvatica L.) and oak (Quercus spp. L.) are classified with the highest accuracy, but these classifications have lower accuracies than those of coniferous species (Wietecha et al. 2017).

The aim of this study was to evaluate the accuracy of tree species classification methods using a hyperspectral Airborne Imaging Spectrometer for Application (AISA) Eagle image for a forested area in northern Poland. The following algorithms were evaluated in the study: PCA and MNF transformation (to reduce noise and auto-correlated data), parallelepiped (P), minimum distance (MD1), Mahalanobis distance (MD2), ML, SAM, spectral information divergence (SID), neural net (NN), and support vector machine (SVM) to perform the supervised classification). The results were evaluated using a set of 300 test pixels, deployed randomly across the study/sample plots area, to achieve the most reliable assessment of accuracy.

Materials and methods

Study area

The survey was performed in the Miłomłyn Forest District in the north-eastern Poland (Fig. 1). Background information about this area was obtained from www.milomlyn.olsztyn.lasy.gov.pl/zasoby-lesne. The size of the area and relative tree composition is given in Table 1 and detailed information is listed in Appendix 1. The individual compositions of Scots pine (Pinus sylvestris L.) and European larch (Larix decidua L.) were not provided. The study area was a 15 km2 (10 km long and 1.5 km wide) rectangle including three lakes: Szeląg Wielki, Tabórz (southern part), and Długie (northern part) (Fig. 2.).

Fig. 1
figure 1

The study area

Table 1 Basic information about the Miłomłyn Forest District (source: www.milomlyn.olsztyn.lasy.gov.pl/zasoby-lesne)
Fig. 2
figure 2

AISA EAGLE hyperspectral image (natural colour composite) constructed by MGGP AERO; the sample plots

A local survey was performed on 9.85 ha of the Miłomłyn Forest District using a series of circular test plots (radius: 12.62 m; area: 500.34 m2) in March 2014. The sample plots were surveyed individually to achieve the highest level of diversity for various forest characteristics (e.g. age, species, forest type), where the influence of slope was minimal (Fig. 2). We corrected for the influence of slope on the tree-position measurements. Each tree with a diameter at breast height (dbh) over 5 cm was inventoried and had the following information recorded: distance from centre of the plot, azimuth (measured from the centre of the plot to each tree), defoliation (assessed using an expert method), and height. The centre of the test plot was determined using the Pathfinder ProXT (Trimble, Sunnyvale, California), Global Navigation Satellite System (GNSS) which functions in the DGPS (Differential Global Positioning System) mode. Its vertical and horizontal accuracy was estimated to be 1.4 m and 0.97 m respectively. Tree heights were measured using a Vertex IV device (Haglof Sweden AB, Langsele, Sweden) and dbh was measured using a Codimex calliper (Codimex, Warsaw, Poland). The data collected were used to calibrate and verify the hyperspectral image classification process. No grey alder trees were found in the plots so this species was not considered further. Although hornbeam occurs only occasionally in the forest, it was found in one plot so was included in the analysis.

Data and software

The hyperspectral image was provided by MGGP AERO and taken by the AISA Eagle camera (SPECIM) on 3 August 2013 at an altitude of 2303–2328 m (single flight). The spectral resolution of the image was 400–970 nm (129 spectral bands, 4–5 nm wide); the radiometric resolution was 12 bits, while the spatial resolution was 1.5 m. The lens size was 18.5 mm and the field of view (FOV) was 37.7°.

The hyperspectral image classification process (as detailed below) was performed using ENVI 5.0 (developed by Exelis Inc.), ArcGIS 10.3 (developed by ESRI), and Statistica 8.0 (developed by StatSoft). The atmospheric correction was carried out using a Quick Atmospheric Correction (Quack) method, radiometric calibration, data reduction (PCA and MNF transformations), band selection, and classifications using nine different algorithms; the accuracy analysis was performed using ENVI 5.0, and ArcGIS 10.3 was used to select training and test pixels. Figures were created using the ETRS 1989 Poland C92 Projected Coordinate System.

Pre-processing

The image was geometrically corrected by MGGP AERO (UTM, Zone 34N, WGS-84). It was also subject to radiometric calibration (using the ENVI ‘Radiometric calibration’ tool–calibration type: reflectance, output interleave: BSQ (band sequential), output data type: float, scale factor: 1.00) and ‘Quick Atmospheric Correction–Quack’ atmospheric correction (Sensor Type: AISA) (Dalponte et al. 2012; Bernstein et al. 2005) (Appendix 2).These procedures were performed using ENVI 5.0.

Data reduction

After the atmospheric correction, the amount of data was reduced. The image containing 129 bands was not an ideal data set with which to perform supervised classification, because it contained too much auto-correlated data. The reduction of the data may be performed using one of two types of methods: data transformation (PCA or MNF transformation) (Clark et al. 2005) or band selection. Data transformation is fully automatic but is based on artificial bands. Band selection is based on original bands but is also very subjective. Both methods were tested. The data reduction was performed using ENVI 5.0 software.

Classification

Finally, four sets of data were classified (using four algorithms):

  • The result of the PCA transformation—first three bands

  • The result of the MNF transformation—first seven bands

  • All 129 bands

  • 36 original bands with the largest differences in the spectral profiles generated from training pixels for each tree species

To perform the supervised classification, it was important to choose representative pixels with which to train the algorithm. This was performed using two MNF band compositions and the data from the test plots. A total of 260 training pixels were selected: 15 of which represented birch, 80—European beech, 30—European larch, 30—Scots pine, 30—oak, 10—hornbeam, 15—Norway spruce, and 50—no forest. The pixels of each class were randomly divided into training and validation sets within each plot. There was no spatial distinction between individual plots of training and test pixels; however, in some cases, only training or only test pixels might have been chosen for a single plot.

The spectral reflectance of more than one object (tree, bare ground, or any other) could have been contained in a single 1.5-m pixel. The GNSS device could also have introduced an error. Therefore, the normalised Digital Surface Model (nDSM) was used to overcome this problem. All areas below 1 m were removed. The spatial resolution of the nDSM was 0.5 m, so it was possible to choose training and test pixels containing a single tree or at least a group of trees of a single species. The species were identified using data points representing the location of tree tops. We assumed that they were directly above the section of trunks that had their location mapped during field measurements. By the end of the classification process, the entire area was classified since all pixels, not only the ‘clear’ ones, were used. The classifications were verified on separate data sets and evaluated at the sample plot level.

To perform the supervised classification, nine algorithms were used on three out of four datasets: P, BE, SID, MD1, MD2, ML, SVM, SAM, and NN. The settings for the algorithms are provided in Appendix 2. The classification was performed using ENVI 5.0 software.

Accuracy assessment

The accuracy analysis was performed using 300 test pixels and 2 MNF-band compositions (5-4-3 and 4-3-2) in which the differences in colour among the species were most observable. Some of the pixels representing trees in the sample plots were used as test pixels, but only those that were most recognisable were used (European beech: 50, birch: 20, oak: 50, hornbeam: 10, European larch: 50, no forest: 50, Scots pine: 50, and Norway spruce: 20).

The pixels of each class within each plot were randomly divided into training and validation data sets. There was no spatial distinction between the plots of individual training and test pixels; however, in some cases, only training or only test pixels might have been chosen for a single plot. Nevertheless, both data sets covered the entire study area randomly.

A normalised Digital Surface Model was used to choose the test pixels representing only one particular species and to overcome inaccuracies caused by the spatial resolution of the image and the GNSS device. Only pixels in which a tree top was located close to the centre were selected as test pixels. The accuracy assessment was performed using ENVI 5.0 software.

The classification results of 98 individual sample plots (values represented in %) were compared to the number of trees (values represented in %) belonging to individual species on each plot using the coefficient of determination (R2) calculated in Statistica 8.0. Only trees from the upper canopy were taken into consideration.

Results

The highest accuracy was obtained by the ML algorithm and the data set of the seven MNF bands. The final map was subjected to Majority Filter analysis. The overall accuracy was 91.3% (Kappa—0.9) (Fig. 3.).

Fig. 3
figure 3

Classification results (maximum likelihood on seven minimum noise fraction bands). Legend: red—birch, orange—European beech, yellow—oak, pink—hornbeam, pale blue—European larch, green—Scots pine, dark blue—Norway spruce

The classification performed on all 129 bands ranged from 31 (BE) to 76.7% (SVM), excluding P and NN (below 10%). Spectral Information Divergence also performed relatively well (64.7%). There were not enough training pixels to perform ML and MD2. The classification performed on the 36 original bands ranged from 33.6 (BE) to 66.3% (NN), excluding P (below 20%). Support vector machine also performed relatively well (58.7%). There were not enough training pixels to perform ML and MD2. The classification performed on the first three PCA bands ranged from 30.3 (P) to 88.3% (ML). NN, SAM, and MD2 ranged from 68.3 to 72.7%. The classification performed on the first seven MNF bands ranged from 10.3 (P) to 90.7% (ML). Spectral information divergence and SAM also performed relatively well (84.7%–85%) (Table 2).

Table 2 Classification results

The producer’s and user’s accuracy for the four best classification results (each based on a different data set) is provided in Table 3. The producer’s accuracy is the fraction of correctly classified pixels with regard to all pixels of that ground truth class. The user’s accuracy is the fraction of correctly classified pixels with regard to all pixels classified as this class in the classified image.

Table 3 The producer’s and user’s accuracy for each class using different datasets and algorithms

For the classification based on all 129 spectral bands performed with the SVM algorithm, the highest producer’s accuracy was observed for European larch, no-forest, and Scots pine (92–98%) and the lowest was for birch and hornbeam (10%). The highest user accuracy was observed for Scots pine and hornbeam (94–100%) and the lowest was for European beech (52.4%).

For the classification based on 36 spectral bands performed with the NN algorithm, the highest producer’s accuracy was observed for European larch, no-forest, and Scots pine (94%) and the lowest was observed for birch and hornbeam (0%). The highest user accuracy was observed for Scots pine and hornbeam (71.2–77.4%) and the lowest was for birch, hornbeam, and Norway spruce (0%).

For the classification based on three PCA spectral bands performed with the ML algorithm, the highest producer’s accuracy was observed for European larch and no-forest (100%) and the lowest was for birch (10%). The highest user’s accuracy was observed for Scots pine and Norway spruce (100%) and the lowest was for birch (40%).

For the classification based on 7 MNF spectral bands performed with the ML algorithm, the highest producer’s accuracy was observed for beech, European larch, and no-forest (100%) and the lowest was for birch (10%). The highest user’s accuracy was observed for hornbeam, European larch, Scots pine, and Norway spruce (100%) and the lowest was for birch (33.33%) (Table 3). Birch was spread across the study area with no observable concentration while hornbeam was very rare; only one sample plot contained enough of the latter (85%) to be observable from the aerial ceiling.

The visual comparison of these four classification approaches on a single-plot scale is shown for two chosen plots in Figs. 4 and 5. The best results were achieved using the ML algorithm.

Fig. 4
figure 4

Comparison of the results of four classification techniques (SVM-129, NN-36, ML-PCA, ML-MNF) on a single sample plot (Adams et al. 1995). Legend: red—birch, orange—European beech, yellow—oak, pink—hornbeam, pale blue—European larch, green—Scots pine, dark blue—Norway spruce

Fig. 5
figure 5

Comparison of the results of four classification techniques (SVM-129, NN-36, ML-PCA, ML-MNF) on a single sample plot (Alonzo et al. 2014). Legend: red—birch, orange—European beech, yellow—oak, pink—hornbeam, pale blue—European larch, green—Scots pine, dark blue—Norway spruce

The coefficient of determination between the number of trees and the classification results of individual sample plots ranged from 0.68 (birch) to 0.99 (European larch), while those of Norway spruce, hornbeam, and oak were approximately 0.9 (Table 4).

Table 4 Coefficient of determination between the number of trees (%) and the classification results (%) of individual species on individual test plots

Discussion

Hyperspectral images are difficult to use for classification purposes because they contain several narrow bands that are correlated with one another. It is important to reduce both the amount of data and the noise before performing classifications. Clark et al. (2005) observed a general increase in accuracy of up to 30 input bands when using a feature selection algorithm combined with a linear discriminant analysis classifier; including more bands produced a lower or equal accuracy when classifying tree species in a tropical environment. Dalponte et al. (2009) reported a slight decrease in accuracy when dropping several bands from the initial 126 in a tree-species classification that combined an SVM classifier with a feature-selection procedure. These findings were most likely also connected to the classifiers applied, given that SVM is known to handle high-dimensional data well without the need for a large training sample size. Thus, it is not strongly affected by the Hughes phenomenon (Dalponte et al. 2009, Hughes 1968), which states that as the number of hyperspectral narrow bands increases, the number of samples (i.e. training pixels) required to maintain a minimum statistical confidence and functionality in hyperspectral data for classification also increases exponentially, making it very difficult to address this issue adequately.

The compositions made from the PCA or MNF bands may be useful to distinguish tree species and create a layer of training pixels or polygons used to perform the supervised classification. However, PCA is not the most suitable method to reduce multidimensionality when the objective is to classify remotely sensed data (Cheriyadat and Bruce 2003). Principal component analysis identifies variabilities that may not perform well in multi-class discrimination and does not differentiate between within-group and between-group variations (Hobro et al. 2010).

The classification of the first seven MNF bands using the ML algorithm resulted in the best overall accuracy (91.3%) and kappa (0.9). The results are comparable to those obtained for a forest species in an equatorial zone (Clark et al. 2005; Mickelson et al. 1998; Peerbhay et al. 2013; Goodwin et al. 2005) at 80–100%, in a tropical and sub-tropical zone (Carlson et al. 2010; Dian et al. 2014; Goodwin et al. 2005; Dennison and Roberts 2003; Lucas et al. 2008; Yang et al. 2009; Gong et al. 1997; van Aardt and Norris-Rogers 2008) at over 90%, and in a temperate zone (Zagajewski 2010; Olesiuk and Zagajewski 2008; Bartold 2008; Dian et al. 2014; Martin et al. 1998; Dalponte et al. 2013; Dmitriev 2014; Tarabalka 2010; Richter et al. 2016) at 74–93%.

Our results have a high correspondence with tree species frequencies at the sample-plot level. Differences between the classification results and data from the local survey may be explained by the leaves and branches of the trees growing near, but outside the borders of, the testing areas. Stumps were observed in the field, so it is possible that some parts of unmapped trees were included in the sample. It is also possible that the reflectance of the input image was disturbed by the plants growing in the lower canopy layers of the stand. Additionally, even the same tree species may have different values of spectral reflectance depending on their age, weather and soil conditions, moisture, vegetation period, and many other factors (Ghiyamat and Shafri 2010), which is the premise for using hyperspectral imagery to detect disease and nutrient deficiencies in even-aged single-species stands.

The set of training polygons used in this study was suitable for performing the classification on a neighbouring area using the same type of data (AISA Eagle hyperspectral image), acquired at the same flight height, during the peak of the vegetation season (July and August), when the weather conditions were similar (although the atmospheric correction was performed). Otherwise, the set of training polygons used should be separate because the spectral signatures of the different tree species varied due to the study area, data type, acquisition date, weather conditions, altitude, and other factors (Ghiyamat and Shafri 2010). However, this is a common issue when dealing with remotely sensed data. Unfortunately, more issues can be expected with hyperspectral data; for example, a comparison to satellite images and reference data is needed. This is due the fact that flight strips are relatively narrow and a longer time is needed to cover large areas. As the result, there will be large differences between single strips or groups of strips. In these cases, a smaller part of the data set is required for training, and verification can be undertaken immediately.

It is also important to select the training and test pixels from the same (or at least neighbouring) areas, using the same methodology, and with a similar proportion of class samples to avoid differences between the accuracy assessment and the true classification results.

Conclusions

The classification based on 7 MNF spectral bands performed with the ML algorithm was found to be the most accurate method for classifying species (overall accuracy of 90.3%), with the highest kappa coefficient of 0.9. The results from the study reported here showed that this method is sufficiently reliable, accurate and user-friendly to be used in practice. However, the data and software required are still expensive, which may limit its practical use by forest managers at present.