1 Introduction

The spatial distribution of snow depth in alpine environments is highly heterogeneous (Elder et al. 1998). This is mainly owed to the complex interaction between alpine terrain and meteorological factors, such as precipitation and surface energy fluxes, as well as the redistribution of snow by wind, sloughing or avalanche activity (Cline et al. 1998; Elder et al. 1991). Area-wide approaches to determine snow depth [e.g., based on automatic weather station (AWS) data combined with medium-resolution satellite imagery (Foppa et al. 2007)] are not able to capture its high local variability (Ginzler et al. 2013). However, detailed information on slope-scale snow depth distribution plays an important role for many applications in snow science and practice, including numerical modelling of snow drift (Durand et al. 2005; Beyers et al. 2004), ecological studies on alpine flora and fauna (Bilodeau et al. 2013; Peng et al. 2010), planning avalanche hazard mitigation measures (Margreth and Romang 2010; Fuchs et al. 2007), avalanche forecasting and warning (Helbig et al. 2015; Vernay et al. 2015), avalanche event documentation, e.g., for hazard zone mapping (Holub and Fuchs 2009; Decaulne 2007), prediction and assessment of flood hazard resulting from snow melt (Painter et al. 2016; Schöber et al. 2014) or as an input for the optimisation of numerical simulation models in avalanche dynamics research (Fischer et al. 2015; Teich et al. 2014). Manually measuring this information in situ is labour-intensive, potentially hazardous or even impossible (Nolin 2010). Therefore, a wide range of terrestrial, airborne and spaceborne remote and close-range sensing techniques have been applied to retrieve digital surface models (DSMs)/snow depth maps (SDMs) at the slope-scale (Deems et al. 2013; Dietz et al. 2012; Rees 2006). One of the most recent techniques is unmanned aerial system (UAS) photogrammetry, which has quickly become a wide-spread method for geodata collection in different fields of earth science (Colomina and Molina 2014; Nex and Remondino 2013). This development has been fostered by the proliferation of easy-to-use UAS platforms and sensors, as well as recent progress in the field of computer vision [structure-from-motion (Koenderink and van Doorn 1991) and multi-view stereopsis (Furukawa and Ponce 2009)], considerably reducing requirements for photogrammetric processing of aerial imagery (Mancini et al. 2013). Despite some drawbacks (e.g., range limited to slope-scale, legal regulations, necessity for stable flight weather conditions), UAS photogrammetry offers many advantages over established techniques for snow depth mapping: compared to manned aircraft campaigns, UAS can acquire imagery at a much lower cost (e.g., for equipment, training, maintenance, operation) (Harder et al. 2016), higher operational flexibility (Vander Jagt et al. 2015), as well as higher flexibility and choice regarding the sensors’ spatial and radiometric resolution, including an option for UAS-based laser scanning (Whitehead and Hugenholtz 2014); compared to terrestrial laser scanning (TLS), UAS photogrammetry is more flexible regarding deployment in alpine terrain [high-accuracy UAS positioning or point cloud registration routines as presented by Miziński and Niedzielski (2017) make georeferencing targets obsolete] and it does not suffer the limitations of the line-of-sight due to acute viewing angles or occlusions (Marti et al. 2016; Harder et al. 2016). However, while the above-mentioned techniques are well-established, their quality and repeatability well-known (Hartzell et al. 2015; Müller et al. 2014), crucial questions regarding the accuracy and precision of UAS-based snow depth mapping are still under discussion (Avanzi et al. 2017). Several contributions have recently been published, reporting on the application of UAS photogrammetry to snow depth mapping, using both multicopter and fixed-wing UAS. In all of these studies, the UAS results were validated with reference data including:

  1. i.

    Global navigation satellite system (GNSS) measurements of the snow surface and/or manual snow depth probing (MP) (Miziński and Niedzielski 2017; De Michele et al. 2016; Harder et al. 2016; Lendzioch et al. 2016; Bühler et al. 2016; Vander Jagt et al. 2015).

  2. ii.

    Very high resolution optical satellite imagery (Marti et al. 2016).

  3. iii.

    A large-frame aerial camera mounted on a manned aircraft (Boesch et al. 2016).

  4. iv.

    A multi station in scanning mode (Avanzi et al. 2017).

However, all these assessments were made based on a comparatively small number of UAS measurements (1–3 flights), except for Harder et al. (2016); the majority used small amounts of discrete samples (GNSS and MP measurements); most studies evaluated the use of imagery collected in the visible part of the spectrum (VIS) (except for Miziński and Niedzielski 2017; Bühler et al. 2016; Boesch et al. 2016), however, several authors have pointed to the benefits of using near-infrared (NIR) imagery for snow mapping (Bühler et al. 2015; Nolin and Dozier 2000). Nolan et al. (2015) performed a large-scale accuracy and precision assessment of imagery collected with a consumer-grade digital camera mounted on a manned aircraft over large areas, with GNSS and airborne laser scanning data. However, since the employed methodology differs substantially from the presented study (size of target area, georeferencing routine, employed platform), results were not directly compared.

In this contribution, we present a multitemporal assessment (12 UAS flights) of the accuracy and precision of UAS photogrammetry for snow depth mapping. Adding to findings from the above-mentioned studies, we used TLS data to assess the accuracy of UASDSM, MP as reference data for UASSDM accuracy and calculated precision by intercomparison of UAS results. VIS and NIR imagery was used to map snow depth with a fixed-wing UAS.

2 Materials and Methods

2.1 Study Site

The study site is located in the Tuxer Alps of North Tyrol, Austria (47°10′N; 11°38′E), between the Northern Calcareous Alps and the Main Alpine Ridge. It lies at approximately 2020 m a.s.l., near the head of a north–south running valley. The area features a typical inner-alpine climate, with annual precipitation between 1200 and 1700 mm (period 1983–2003) and snow depths of 1–2 m (Schaffhauser and Fromm 2008). The land cover of the site is mainly characterised by (partially boggy) alpine grasslands, mixed with various types of small scrubs (height < 1 m). In the west and north, large clusters of dwarf pine (Pinus mugo, height 1–3 m) and singular or groups of stone pine (Pinus cembra) are present. Several small streams run parallel to the valley axis, some of which drain into a pond (approximate size 0.006 km2) in the north. The topography of the site (mean slope angle 6°) is dominated by the flat valley bottom; the steepest areas lie in the east and west, where the lower sections of the adjoining slopes reach into the study site. Large boulders (max. width < 30 m, max. height < 5 m) are scattered in the centre of the site. Multiple small buildings are situated in the north and east, connected by a network of gravel roads, which are partially cleared in winter. An overview of the site is provided in Fig. 1; it highlights where TLS, UAS and MP data were collected, as well as the location of the AWS and reference points (RPs). The site was chosen on account of its good accessibility, even during periods with high avalanche danger, and well-established infrastructure (power supply and network connection) (Adams et al. 2016).

Fig. 1
figure 1

Study site overview; a outline of areas of interest (AOI) for UAS and TLS data acquisition, as well as points where MP measurements were performed; positions of instruments (TLS and AWS) and RPs; 10 and 50 m contour lines were derived from airborne laser scanning data; ‘snow-off’ reference data included as hillshaded UASDSM [background of a] and orthophoto (b)

2.2 Data Acquisition and Processing

We collected data at the study site during four measurement campaigns in ‘snow-on’ conditions in February and March 2015 (Fig. 2, upper image). This allowed us to take different snow pack properties (snow depth, snow type at surface) and illumination conditions at the study site into account. For example, the snow depth measured at the AWS, ranged between 0.68 m (13 February, 1 p.m.) and 1.01 m (3 March, 2 a.m.).

Fig. 2
figure 2

Central part of study site on 11 February 2015 (upper image); launching Mentor UAS (lower image)

Each campaign consisted of:

  • Two to four UAS flights.

  • One to two TLS scans.

  • 149 MP measurements (February campaigns only).

The UAS and TLS data were acquired over an area of interest (AOI) of 0.12 km2 (Fig. 1). It was located in the centre of the valley floor, where MP measurements were performed, too. Due to the geometric properties of the measurement setup, TLS data was only collected on 70% of the area of the core AOI (not considering occlusions). Reference ‘snow-off’ UAS imagery was acquired on 21 August 2015.

2.2.1 Unmanned Aerial Systems

The aerial imagery was collected with a Multiplex Mentor Elapor fixed-wing UAS (Fig. 2—lower image, Table 1) at different times of the day. The original Mentor model was modified to add UAS capabilities, it was fitted with navigation sensors to determine its absolute position (GNSS) and orientation (inertial measurement unit); this data was managed by the on-board autopilot for autonomous flight (3DR ArduPilotMega); pre-flight mission planning to define the flight path, height and speed was performed in the open-source software Mission Planner (Table 2); an additional on-board GNSS unit (SM GPS-Logger 2) recorded 10 Hz positional data (x, y and z). After completing each flight, the on-board GNSS data was synchronised with the recorded imagery (geotagging), to facilitate the image processing (Adams et al. 2016). The UAS had a maximum flight time of 40 min, during which it could map up to 0.6 km2 at 2,000 m a.s.l. in wintry conditions.

Table 1 Technical specifications of the Mentor UAS (Adams et al. 2016)
Table 2 Typical UAS flight and camera settings

A Sony NEX5R digital camera was installed in the UAS fuselage to record the imagery on all the flights. It weighed 0.4 kg and was fitted with a 50 mm Sony prime lens (0.2 kg). The camera’s 16-megapixel APS-C sensor was modified by removing the built-in short-pass filter, increasing its sensitivity in the near-infrared from 700 to 1100 nm. This allowed us to mount the lens with different notch filters to record data in various parts of the electromagnetic spectrum: VIS (λ = 350–680 nm), NIR700 (λ > 700 nm) and NIR830 (λ > 830 nm). Each flight was carried out with a single camera on-board the UAS, set to record imagery at a defined wavelength, and the filters changed between retrievals. The camera was triggered via infrared signal, recording images at 1.25 Hz. Basic camera settings were fixed pre-flight, as no telemetry was available (Table 2); imagery was recorded with manual focus. The high image overlap (80% along- and 90% cross-track) was chosen based on the authors’ own experience, as well as recommendations from authors of similar studies dealing with UAS-based mapping of low contrast surfaces (Harder et al. 2016; Klemas 2015). We performed no internal camera calibration.

The study site was surrounded by high peaks, which cast a shadow on the valley floor from 1 p.m. onwards. This allowed a direct comparison of imagery collected in good (full sunlight) and in poor illumination conditions (shadow) on the same day. During all the campaigns, the sky was clear or only partially cloudy, with no precipitation; the nearby AWS (located at 2041 m a.s.l.) recorded the air temperatures between − 8° and 5 °C, at only very light winds (V max < 3 m s−1) 7 m above ground level. These can be considered good weather conditions for our UAS flights, especially considering the alpine environment. However, higher wind speeds and lower air temperatures can be expected at our typical flight height (400 m above ground level).

In lieu of survey-grade GNSS sensors on-board the UAS, indirect georeferencing had to be used (Harwin et al. 2015; Vander Jagt et al. 2015). Therefore, prior to each campaign, we distributed 10–20 RPs, consisting of 0.4 × 0.4 m black-and-white checkered wooden boards, within the AOI. We surveyed the location of each RP using a Trimble GEO-XT 2008, with an expected accuracy in the decimetre range (Adams et al. 2016). The data was corrected real time in the field and differentially during post-processing. Final RP coordinates were averaged from more than 200 point measurements made at each RP location. However, the resulting overall georeferencing errors (especially in z-direction) proved too high and resulted in implausible SDMs. Therefore, the z values used for georeferencing the ‘snow-off’ UASDSM were extracted from an airborne laser scanning DSM from 2009, while retaining the x- and y-coordinates surveyed with the GNSS. To georeference the ‘snow-on’ UASDSMs, seven natural or man-made RPs were chosen. These RPs had remained snow-free throughout the winter (e.g., centre of flat stones in the river bed, corner of wooden patio outside hut) (Fig. 1). Their coordinates were extracted from the ‘snow-off’ UAS data. Thus, the ‘snow-on’ data could be referenced using a stable set of RPs, ensuring minimal systematic error introduced by the georeferencing procedure (Adams et al. 2016). However, this resulted in a comparatively small amount of seven RPs.

All the UAS imagery was processed with Agisoft’s PhotoScan Pro (version 1.2.3), a commercially available photogrammetric software suite, that is widely used in the UAS community (Tonkin et al. 2014). It is credited to be among the most reliable (Sona et al. 2014) and accurate (Gini et al. 2013) software packages available. PhotoScan is based on a structure-from-motion algorithm (Verhoeven 2011) and provides a complete, photogrammetric workflow, with particular emphasis on multi-view stereopsis (Harwin et al. 2015). This workflow consists of the following principal steps (Vander Jagt et al. 2015):

  1. i.

    Tie-point matching.

  2. ii.

    Bundle adjustment (constrained by assigning high weights to the RP coordinates).

  3. iii.

    Linear seven-parameter conversion; removal of nonlinear deformations.

  4. iv.

    Dense point cloud generation with multi-view stereo reconstruction.

  5. v.

    Triangulation of dense point cloud into mesh, subsequently generating DSM and orthophotos.

In a related study, Boesch et al. (2015) analysed PhotoScan’s suitability for snow depth mapping and the best combination of processing parameters. Therefore, all the imagery was processed with the following alignment parameters: accuracy—highest, pair selection—reference, key point limit—40,000, tie-point limit—10,000. The dense point cloud was generated with the settings: quality—medium, depth filtering—moderate. One of the main reasons for corrupt UAS imagery is motion blur, which results from shutter speeds that are too slow in relation to the movement of the UAS (Turner et al. 2015; Immerzeel et al. 2014). This applies in particular to motion in direction of the UAS’ roll-axis, resulting from crosswinds, and increases with the length of the camera lens (Morgenthal and Hallermann 2016). As reported in Bühler et al. (2016), in our experience, fixed-wing UAS are generally more susceptible to crosswinds and thus less stable in windy conditions, than some multicopters. The sensor on-board our fixed-wing UAS was not stabilised by a gimbal. To systematically evaluate our imagery, we routinely calculated the ‘quality index’ (QI) during pre-processing in PhotoScan (Adams et al. 2016). As reported in the PhotoScan documentation (Agisoft 2016), it provides a normalised value for the sharpness of the imagery; images with QI < 0.5 are recommended to be excluded from photogrammetric processing. Orthophotos and DSM were exported from PhotoScan in 0.05 and 0.2 m resolution, respectively. We calculated snow depth for each pixel by subtracting the ‘snow-off’ DSM from the ‘snow-on’ DSM. This follows the definition by Fierz et al. (2009), where snow depth is the vertical distance from the base to the surface of the snow pack.

2.2.2 Terrestrial Laser Scanning

We used two Riegl long-range TLS instruments to collect the validation data: a LPM-321 (Fig. 3—left image) and a LPM 98-2k (Fig. 3—right image). Both the instruments operate at 905 nm wavelength, therefore, the penetration depths into the snow surface are only a few millimetres (Dozier and Painter 2004). They were positioned in a purpose-built shed, overlooking the valley (Fig. 1), and set to map the valley floor in a single scan window. The LPM-321 was used for the first campaign; for the subsequent campaigns, the LPM 98-2k was installed in a fixed, weatherproof transparent glass fibre enclosure. We set up the LPM 98-2k to continuously and automatically acquire scans from the study site approximately every 6 h, and a datalink allowed remote access [detailed setup description and technical specifications of the TLS instruments are provided in Adams et al. (2016)].

Fig. 3
figure 3

TLS instruments Riegl LPM-321 (left image) and LPM 98-2k (right image) in operation at the study site on 11 February 2015 and 13 February 2015, respectively (Adams et al. 2016)

To georeference the TLS data, five RPs, consisting of 0.3–0.5 m rectangular aluminium plates, coated with highly reflective material, were installed in the target area prior to the UAS campaigns. Their positions were surveyed with a Trimble M3 total station [expected accuracy ± 0.002 m (1σ), plus 2 ppm distance dependent error]. The RPs were scanned by the TLS instrument before and after mapping the valley floor. Point clouds from both the TLS instruments were processed in RiPROFILE (version 1.5.7). Here the locations of the RPs in the global coordinate system and the scanner-own coordinate system were linked by minimising the standard deviation of the residues. An unfavourable geometry of the measurement setup and the inherent scanning routine of the instruments caused inhomogeneous point distances. To counter this distant dependent point density, mean z values were calculated within a 0.2 m raster (corresponding to DSMUAS resolution) and the raster centre location plotted for validation. The accuracy assessment of the UASDSMs was performed with TLSDSMs, not the calculated snow depth values. No additional co-registration of these DSMs was performed.

2.2.3 Manual Snow Depth Probing

MP measurements were performed during both the February campaigns. The snowpack was sounded at each checkpoint with an avalanche probe. The checkpoints were distributed randomly within the AOI, roughly following a grid pattern to avoid spatial bias. At each checkpoint, five measurements were performed by probing all the four corners and the centre of a 2 × 2 m2. The snow depths were recorded to the nearest centimetre. Additionally, a GNSS (Garmin GPSMap 64s) was used to record the geographic coordinates of the square`s centre. The data was collected after completing campaign one, but is assumed to also be valid for campaign two, as the AWS recorded no intervening snowfall, and snow melt/settling was minimal (0.03 m). For validating the UAS-based snow depth maps, the centre location of the MP checkpoints was corrected by plotting them on the UAS orthophotos and manually adjusting their position. This was necessary as the Garmin GNSS has a nominal accuracy of only ± 3 m [1 standard deviation (σ)]. Additionally, it ensured the correct position of the checkpoints relative to the UAS results. To minimise the effect of the micro-topography below the snowpack on the results, the mean value of the five measurements was calculated. For accuracy assessment, the UASSDM values of all the pixels within a 2 m radius around a checkpoint were averaged (Adams et al. 2016).

2.3 Accuracy and Precision Assessment

To evaluate the performance of the UAS for slope-scale snow depth mapping in alpine terrain, we need to answer the following questions: (i) How well do the UAS-based DSMs and SDMs correspond to measurements taken with established, state-of-the-art techniques? (ii) How reliable are the UAS results in terms of their reproducibility? These questions correspond to determining the accuracy and precision of the UAS results, respectively (Nolan et al. 2015).

2.3.1 Accuracy

Two reference data sets were used for accuracy assessment:

  1. 1.

    The TLS measurements allowed an assessment of UASDSM accuracy at high spatial resolution (mean point distance: 0.2 m). The TLS instruments effectively surveyed a very large number of (pseudo-) checkpoints within the AOI, at high accuracy; the LPM-321 operates at a nominal accuracy of ± 0.025 m (1σ) plus a distance dependent error of ≤ 20 ppm (Grünewald et al. 2010; Riegl 2010); the LPM 98-2k at ± 0.05 m (1σ), plus a distance dependent error of ≤ 20 ppm (Schaffhauser et al. 2008; Riegl 2006). Considering all the areas surveyed by the TLS are within 300 m range of the instruments, the nominal TLS accuracy is between ± 0.031 and ± 0.056 m (1σ) for LPM-321 and LPM 98-2k, respectively. However, these values assume that the area illuminated by the laser beam (the footprint) is circular; this implies an incidence angle θ = 0° on a planar surface (Prokop 2008). θ is defined as the angle between the vector normal to the measured surface and the incoming laser beam (Jörg et al. 2006). The size of the footprint (δ) generally increases with an increase of distance from the instrument, beam divergence and θ (when only considering planar surfaces) (Prokop et al. 2008). According to Jörg et al. (2006), δ remains below 1 m in diameter for close range (< 500 m) TLS measurements, even at unfavourable scanning angles (i.e., θ < 75°). In the present case, the TLS instrument surveys the valley floor from a small mound at the base of the slope east of the AOI, resulting in high θ (> 75°) and thus large δ values (> 1 m diameter). Therefore, we calculated θ and the resulting δ for the 11 February 2015 TLS data set (change in snow depth and the position of the TLS to the following campaigns were considered negligible). Subsequently, the correlation between δ and UASDSM error was determined. Following the general practice in statistics, the Bravais-Pearson correlation coefficient r was calculated for normally distributed data and the Spearman’s rank correlation coefficient r SP for non-normally distributed data (Fahrmeir et al. 2011).

  2. 2.

    The MP measurements were the basis for the assessment of the UASSDM accuracy. Snow depth values were surveyed at comparatively low spatial resolution (mean distance between checkpoints: 18 m). However, this data has a high vertical accuracy, as a majority of the checkpoints were located above the frozen ground; the penetration depth of the probe is therefore considered to be within ± 0.02 m. Similar values are reported in comparative studies (e.g., Harder et al. 2016; Nolan et al. 2015). As this area is an unmanaged (high-) alpine grassland, it features a jagged micro-topography with local terrain height variations in the decimetre range. However, the high ground sampling distance of the UASDSMs (0.2 m) and the MP sampling routine (Sect. 2.2.3) are considered to be able to account for these variations.

Authors of the comparable studies (e.g., Fras et al. 2016; Hugenholtz et al. 2013; Harder et al. 2016) used checkpoints surveyed with high-accuracy GNSS as reference data. Such data are not included in the presented study, as it focusses on the area-wide, multitemporal evaluation of UAS-based photogrammetry of snow-covered surfaces with TLS. Such a comparison has only been marginally covered in the literature published to date (Sect. 1). Additionally, MP data was included for a direct assessment of the UAS’ snow depth mapping accuracy. This study focusses on vertical accuracy assessment, as no planimetric offset could be derived from MP or TLS data. Thus, the error of the UAS results was calculated as a difference in z value between UAS and the reference data sets (Müller et al. 2014).

We followed the accuracy assessment procedure outlined in Höhle and Höhle (2009), which was also adapted in similar studies [e.g., Fras et al. (2016) and Müller et al. (2014)]. Thus, the normality of UASDSM and UASSDM error distributions was checked by visually interpreting their histograms and quantile–quantile (Q–Q) plots. Q–Q plots juxtapose theoretical quantiles of a normal distribution with the quantiles of the empirical distribution function. If the latter is normally distributed, the Q–Q plot will result in a straight line; strong deviation indicates non-normal distribution (Höhle and Höhle 2009). Additionally, skewness and kurtosis were calculated. Based on recommendations from Höhle and Höhle (2009) and Willmott and Matsuura (2006), different accuracy measures were applied to normally and non-normally distributed errors (Table 3).

Table 3 Accuracy measures applied to normally (A) and non-normally (B) distributed errors; n is the number of tested points, and \(\Delta h_{i}\) denotes the difference from reference data for a point i (Höhle and Höhle 2009)

All the data sets were referenced to and compared within common global projected planimetric (MGI Austria GK West; EPSG Code 31254) and vertical coordinate systems (Gebrauchshöhen Adria; WKID 5778).

2.3.2 Precision

Performing several UAS flights per campaign day allowed assessing the precision, i.e., the reproducibility of the UAS results. As argued by Fabris and Pesci (2005) and Nolan et al. (2015), precision assessment of photogrammetric DSMs by intercomparison generally provides the basis for two different assumptions: (i) if the intervening changes of the reference surfaces between flights are negligible, it yields an empirical estimate of the internal precision of the UAS data acquisition and processing setup; (ii) in case real height changes of the snow surface (e.g., due to wind drift, snow fall, snow melt/settling) occur between two flights, it allows one to track the magnitude of these changes.

On all the campaign days, the AWS recorded air temperatures below 5 °C, snow temperatures below − 3 °C, calm winds (< 3 m s−1), no precipitation and a snow settling of less than 0.03 m. We, therefore, follow assumption (i) in this paper when interpreting the precision assessment results. Precision of the UASSDMs was determined following Fabris and Pesci (2005), by calculating Δh i residuals for each pixel of two UAS flights performed on the same day (Table 4). Standard deviation (SD) of the Δh i residuals distribution was reported as precision value. No separate precision calculations were performed for UASDSMs, as the reference ‘snow-off’ UASDSM was the same for all the campaigns. We performed the assessment for:

  1. i.

    A small area we considered to be the best-case scenario (snow heavily compacted, therefore, intermittent snow depth change was zero; high-contrast snow surface, therefore, high-point density expected in photogrammetric processing; planar surface with little elevation change, therefore, there is no influence of topography on the result).

  2. ii.

    The whole AOI.

Table 4 Overview of UAS campaigns, details of data acquisitions (columns one through three), camera settings and output (‘imagery’ columns), (pre-) processing results as reported in PhotoScan (‘photogrammetric processing’ columns) (Adams et al. 2016)

3 Results and Discussion

3.1 Unmanned Aerial System

Four ‘snow-on’ UAS campaigns were conducted between 11 February and 13 March 2015; details on data acquisition, camera settings and quality reports from photogrammetric (pre-) processing are presented in Table 4. 12 UAS flights were performed to record approximately 11,000 images in total, of which 9500 were used in photogrammetric image processing. Seven VIS, one NIR700 and four NIR830 data sets were acquired between 10.30 a.m. and 4 p.m. Each flight took approximately 35 min. Camera settings were chosen according to the illumination conditions prior to UAS launch; priority was given to exposure (1/320–1/500), ISO was set at 100 for most flights, while aperture was adapted dynamically by the camera for each image (typically between f/4 and f/18).

Results from the QI calculation showed that two-thirds of the UAS imagery have a satisfactory average QI above 0.62. Low average QI values were reported for imagery recorded on flights 1/3 (0.52), 2/1 (0.57) and 3/2 (0.40); QI calculation failed for imagery from flights 1/1 and 4/2 (QI = 0) (Table 4). A visual check of the data sets confirmed a large amount of blurry imagery on flight 1/1, possibly due to an error in data acquisition; no apparent deficiencies with regard to image sharpness were detected in the other imagery. As the calculation of the QI is poorly documented and therefore essentially black-box, no details on the impact of other deficiencies in UAS imagery are available. Therefore, the reason for the low QI of some UAS imagery is unknown. Overlap was at ‘nine’, indicating that, on average, each point within the AOI was visible in the nine UAS images. The lowest overlap was calculated for flight 2/2, which, in turn, also features the highest marker (0.22 m) and reprojection error (0.9 m/3.7 − RMS/maximum error) of the ‘snow-on’ flights; all the marker errors reported in this section are mean values of all the seven RPs. The highest overlap by far (36) was achieved for the ‘snow-off’ UAS campaign. This was owed to the flight path design, which consisted of overcrossing flight lines parallel and orthogonal to the valley axis, as opposed to the winter flight lines, which were always orthogonal. The marker error for the summer reference flight was remarkably higher (0.4 m) than the average winter marker error (0.13 m). This may be due to the fact that the GNSS instrument used to survey the GCPs has a low accuracy (Sect. 2.2.1). Reprojection errors for all the data sets were below 0.4 m (RMSE) and below 1.9 m (maximum error). To calculate a statistically significant correlation between overlap, quality index and marker/reprojection error, the sample size (n = 13) is too small in the presented case; a visual interpretation of the results points to high overlap (> 8.9, when excluding outlier 36) leading to low marker error (< 0.15 m) and vice versa (valid for all flights except 2/4); little or no connection was found between the other parameters. This confirms results from previous studies, which have shown that high image overlap generally leads to a high signal-to-noise ratio in photogrammetric outputs and therefore low error at the GCPs (Zongjian et al. 2012; Haala 2011). This holds true especially when mapping low contrast surfaces, such as snow (Vander Jagt et al. 2015; Harder et al. 2016) or sand (Klemas 2015; Mancini et al. 2013).

In total, 12 ‘snow-on’ and one ‘snow off’ orthophoto and DSM, as well as 12 SDMs for all the ‘snow-on’ flights were calculated. An example for an SDM (a), orthophoto (b) and hillshaded DSM (c) of flight 3/1, are shown in Fig. 4.

Fig. 4
figure 4

Results from photogrammetric processing of UAS imagery, generated on flight 3/1; SDM (a), orthophoto (b), hillshaded DSM (c)

3.2 Terrestrial Laser Scanning and Manual Snow Depth Probing

Four TLS scans were selected for accuracy assessment of UASDSMs, based on their temporal proximity to UAS flights, quality and completeness. One scan was performed with the LPM-321 (11 February 2015), three with the LPM 98-2K (14 February, 3 and 11 March 2015). The details of these scans are provided in Table 5. As described above, the number of measured TLS points (column ‘AOI’) was subsequently reduced to mitigate range bias (column ‘Filtered’). In the section ‘Point distances’, descriptive statistics of the unfiltered point clouds are reported: The mean distance between points and consequently 1σ is lowest for the LPM-321 measurement (0.13 and 0.1 m, respectively). Average values for mean and 1σ of point distances for all the campaigns are at 0.21 and 0.16 m, respectively. Both measurements performed in March were recorded at lower resolution than the February data sets (0.107° and 0.054° azimuth resolution, respectively) and thus show larger point distances. Average geolocation residues (column ‘Standard deviation residues’) were 0.09 m and show no connection with the type of instrument used.

Table 5 Details of collected TLS data (Adams et al. 2016)

Results from the analysis of θ for the TLS measurements conducted on 11 February 2015 show a normal distribution around a median of 82° (1σ = 3.9°). The corresponding δ values are left-skewed (skewness = 230) and comparatively large (median = 0.47 m2, 68.3% quantile = 1.24 m2), considering the close range (< 300 m) and small beam divergence of the LPM-321 (typically 0.8 mrad). This confirms the assumption of an unfavourable measurement setup for TLS validation. An analysis of the spatial distribution of δ values shows that it is dominated by range. An analysis of the spatial distribution of δ values shows they are dominated by range, due to increasingly acute viewing angles (r SP = 0.94 between δ and θ). By comparison, increasing divergence of the laser beam with range (diameter increases by 0.24 m between 0 and 300 m), or local variations of the terrain slope angle have less influence on the result (r SP = − 0.3 between δ and slope angle). We also checked correlation between δ and TLS error values for all the flights and found none (r SP between − 0.07 and 0.04). To sum up, although the calculated δ values are relatively large, compared to the size of the UASDSM pixels, δ is independent from the magnitude of error in the UASDSMs.

149 MP checkpoints were measured in the late afternoon of 11 February 2015. The grid pattern of the data collection routine had an average spacing of 18 m. Average snow depth was 0.83 m, with a maximum of 1.4 m. Snow depth differences within the 2 × 2 m plots at each checkpoint were as high as 0.8 m, with an average of 0.23 m.

3.3 Accuracy and Precision Assessment

3.3.1 Error Distribution

The histograms and Q–Q plots showed that distributions of UASDSM and UASSDM error followed a characteristic pattern. Examples of each type of distribution and plot are shown in Fig. 5.

Fig. 5
figure 5

Visualisation of the error distributions of UASDSMs (a, b) and UASSDMs (c, d) for flight 2/1; histograms (a, c) of Δh i in [m] with superimposed normal distribution, frequency corresponds to the number of measurements; Q–Q plots of Δh i (b, d) (Höhle and Höhle 2009)

The UASDSM error distributions (Fig. 5a, b), show a high amount of values around the median (± 0.5 m) and clear deviation from the superimposed normal distribution in the histogram. The Q–Q plot confirms the impression of a non-normal error distribution; the deviation of the plotted values from the straight line indicates a large amount of outliers and therefore heavy tails of the error distribution (Höhle and Höhle 2009). These observations agree with the general notion that non-normal error distribution is very common in photogrammetric DSMs, as stated in textbooks and related studies (e.g., Müller et al. 2014; Maune 2007). The UASSDM error distribution, on the other hand (Fig. 5c, d), shows good agreement with the normal distribution in the histogram. This observation is confirmed in the Q–Q plot; the plotted values are mostly located close to the line, indicating close resemblance between the empirical quantiles distribution and the theoretical quantiles of a normal distribution (Höhle and Höhle 2009). Skewness and kurtosis are shown in Table 6; skewness remains within ± 3 (except for flight 2/4) for UASDSM and UASSDM errors; however, kurtosis is high for UASDSM errors (mean = 100, if excluding the outlier flight 2/4) and very low (mean = 0.6) in the UASSDM errors (also apparent in histograms Fig. 5a, c). The general difference in error distributions and kurtosis values between UASDSM and UASSDM could be explained by the fact, that the majority of outliers in laser scanning and photogrammetric DSMs are caused by objects with high vertical offset from the terrain (Höhle and Höhle 2009). In the presented case, the TLS surveys the height of the snow surface and objects with high vertical offset (i.e., buildings, boulders, trees), thus potentially generating outliers (and therefore high kurtosis values); the MP routine, however, only samples the snow surface, therefore the accuracy assessment of UASSDM is less prone to outliers and shows very low kurtosis values. Based on these observations, a normal distribution is assumed for the UASSDM errors, and a non-normal distribution for the UASDSM errors.

Table 6 Overview of statistical analysis of the UASDSM and UASSDM errors for all flights

3.3.2 Accuracy Assessment of UASDSMs

Results of the UASDSM accuracy assessment are provided in Table 6 and visualised in boxplots (Fig. 6). The results show a high error variability between the UAS flights. Low values for NMAD (< 0.2 m), 68.3% quantile (< 0.25 m) and 95% quantile (< 0.55 m) were determined for approximately half the flights (e.g. 1/3, 2/2 or 3/2). This translates to 68.3 and 95% of the UASDSM errors of these flights being within a magnitude of ± 0.25 and ± 0.55 m, respectively (Höhle and Höhle 2009). The assessment of at least three flights (i.e., 1/1, 2/4 and 4/3), however, shows comparatively high values for the above-mentioned robust accuracy measures (> 0.3, > 0.4 and ≥ 1 m, respectively). The error medians are all within ± 0.2 m (except for flight 4/3). We analysed the correlation between indicators from the photogrammetric processing report (i.e., QI, overlap, marker error and reprojection error) and results from the UASDSM accuracy assessment (i.e., median, NMAD, 68.3 and 95% quantiles). The highest correlations were found between marker error and NMAD (r = − 0.39), and marker error and 68.3% quantile (r = − 0.35); all the other pairings were r < 0.3. It therefore seems that from a statistical view-point, the photogrammetric processing indicators have little explanatory power regarding the accuracy of the UASDSM when assessed with TLS. This could either be due to the fact that the chosen indicators are not statistically significant, or that the chosen sample size is too small to correctly show correlation between these indicators.

Fig. 6
figure 6

Boxplots of UASDSM error (left image) and UASSDM error (right image); y-axis shows Δh i [m], ID of UAS flight (campaign/flight number) are plotted on x-axis; whiskers in boxplot correspond to 1σ, outliers not shown; boxplots truncated at 1/− 1.5 m for better visualisation

Exemplary results of the spatial distribution of errors from flights 1/1 through 1/3 are provided in Fig. 7. They show that the high (mostly negative) DSMUAS errors of flight 1/1 reported in Table 6 occur mainly in the east (near the TLS instrument) and far west of the AOI (close to a large boulder and stone pine cluster—inset in Fig. 1). The UASDSM errors of flight 1/2 are mostly positive and are located in the AOI centre. While the overall error in UASDSM of flight 1/3 is low (i.e., 68.3% quantile = 0.23 m, most plotted checkpoints within ± 0.2 m), an area of approximately 0.002 km2 in the central part of the AOI shows high errors (0.5–1 m), surrounded by a 0.012 km2 large area with errors in the 0.2–0.5 m range. Additionally, outliers were identified (i.e., errors outside the above-mentioned 95% quantile) and their location mapped for all the UAS flights. They mainly occur in the central and northern area of the AOI (Fig. 8, left image). Additionally, the number of outlier occurrences was counted in each 0.2 m grid cell (equivalent to UASDSM cell size); high values are predominant on steep rock faces of large boulders (central AOI section, not shown) and the façades of buildings (Fig. 8, right image). A small amount of pixels (n = 46) within these areas were classified as outliers on all the flights. This further confirms the observations described in Sect. 3.3.1. Generally, the magnitude of error is complimentary in both statistical and spatial representations. However, the latter allows a more goal-oriented analysis of factors influencing UASDSM/UASSDM error. An example was presented in a related publication, the potential of using NIR sensors to collect UAS imagery under very poor illumination conditions and its impact on the accuracy of UASDSMs/UASSDMs (Bühler et al. 2017).

Fig. 7
figure 7

Results from accuracy assessment of UASDSMs recorded on 11 February 2015 [flights 1/1 (a), 1/2 (b) and 1/3 (c)]; reds indicate negative, blues positive Δh i values (Adams et al. 2016)

Fig. 8
figure 8

Location and occurrence of outliers; outlier heatmap of AOI—the darker the red, the more outliers in this area (left image); number of outlier occurrences at largest hotspot (red rectangle), coloured blue = 2 through red = 7 or more, per 0.2 m grid cell (right image); UAS orthophoto from 21 August 2015 in background of both figures

3.3.3 Accuracy Assessment of UASSDMs

Results of the UASSDM error analysis are also included in Table 6 and Fig. 6 (boxplots—right image). The error median and five out of seven upper quantiles are below zero, indicating a systematic offset between both the data sets: snow depth values mapped with the UAS were generally lower than snow depth values measured with MP. The average of this offset was − 0.19 m. As described in detail in related publications (Adams et al. 2016; Bühler et al. 2016), these irregularities are caused by the interaction of vegetation with the snow cover. To exclude systematic error from the subsequent calculation of accuracy measures, mean, MAE and RMSE were determined after subtracting the offset from the measured values (Table 6). Compared to the UASDSM errors described above, the boxplots show a lower spread of the UASSDM errors. The UASSDM RMSE of the middle five UAS flights (1/2 through 2/3) is ≤ 0.28 m, while flights 1/1 and 2/4 show a higher RMSE (> 0.37 m). The same holds true for the reported MAE (≤ 0.23 and > 0.29 m, respectively) and SD (≤ 0.26 and > 0.35 m, respectively) (Fig. 6—right image). As with the UASDSM errors, we analysed the correlation between UASSDM accuracy measures and indicators reported during photogrammetric processing; the highest r values were found for QI − SD (r = − 0.65), RMS reprojection error − SD (r = − 0.64), RMS reprojection error − MAE (r = − 0.68), QI − RMSE (r = − 0.67) and RMS reprojection error − RMSE (r = − 0.65); all the other pairings were r < 0.6. Although, in absolute terms, these correlations are again not very strong, they indicate that a higher QI and/or lower RMS reprojection error may result in an overall higher accuracy of the UASSDMs.

3.3.4 Precision

As highlighted in related publications concerning UAS-based snow depth mapping (Bühler et al. 2016, 2017), illumination of the snow surface, sensor choice and the presence of minor disturbances of the snow surface (e.g., due to ski tracks) have a large impact on structure-from-motion image matching and therefore on the accuracy and precision of UASDSMs and UASSDMs. Thus, only data collected under similar illumination conditions with the same VIS sensor setup from the same day (minimal disturbance of the snowpack by MP measurements) were considered in this precision assessment (Table 5). None of the available NIR data fitted these criteria; therefore, the precision assessment was limited to VIS data. A comparison of flights 1/2 versus 1/3 and 4/1 versus 4/2 for precision assessment is presented in Fig. 9. The profile plot shows snow depth values for four UASSDMs mapped along a 90 m stretch of cleared road (transect A–B). In three of the four UASSDMs in Fig. 9, negative snow depth values occur (grey patches), especially in the flight 1/3 profile plot. This can be explained by the very shallow snow depths along the cleared road (Sect. 2.3.2) and the slightly negative bias of the UASSDM error (Fig. 6, Sect. 3.3.3). As described above, the precision is reported as the SD of the residues between both the acquisitions. For the profile line in Fig. 9 this was 0.04 m for the comparison between both flights. For the entire AOI (approximately 6 million pixels), analysis of the residuals showed they were non-normally distributed; following the procedure applied to the accuracy assessment above, the 68.3 and 95% quantiles are reported: 0.25 and 0.55 m for 11 February, 0.33 and 0.99 m for 13 March, respectively.

Fig. 9
figure 9

Precision assessment of UASSDMs; overview AOI snow depth values for flight 1/2 (a); detail-view of area marked with red rectangle for flights 1/2, 1/3, 4/1 and 4/2 (top row); snow depth values along profile (A–B) for all flights (b)

3.4 Comparable Studies

To relate these results to the existing literature, Table 7 provides an overview of recent, comparable studies, dealing with accuracy assessment of UAS photogrammetry for snow depth mapping. A direct relation of findings from the presented study with comparable studies is limited, because:

Table 7 Overview of results from accuracy assessments in comparable studies for UAS-based snow depth mapping; geolocation errors reported in some listed studies are not included in this overview; multiple rows per study correspond to different sites and/or different reference measurements and related results
  1. i.

    Most studies are based on one to three UAS winter flights (all except Harder et al. 2016), thus temporal accuracy change cannot be investigated.

  2. ii.

    Different kinds of reference data, with varying nominal accuracies were used.

  3. iii.

    The employed methodologies for data processing and error analysis varied considerably and different accuracy measures were reported.

  4. iv.

    The size and topography of the AOI varied considerably (0.007–0.65 km2; steep mountain slopes vs. flat areas), incurring size- and terrain-effects on the results.

When putting aside these difficulties, it seems the accuracy reported here is within a similar range to the related studies. However, especially the implementation of recently available high-accuracy geolocation routines (e.g., real time kinematic GNSS) is able to substantially increase accuracy (Harder et al. 2016).

Of the studies presented in Table 7, only four reports on the precision of UASDSMs/UASSDMs: De Michele et al. (2016), Lendzioch et al. (2016) and Vander Jagt et al. (2015) report precision as the SD of UASSDM values of a single flight (0.1, 0.22–0.45, 0.21 m, respectively); Bühler et al. (2016) determined the precision by calculating the SD of residues for several UAS acquisitions along a limited, stable area (SD < 0.1 m), similar to the method used here. Precision results from the presented study are, therefore, in a similar range as in comparable studies, both with regard to small, stable areas and area-wide estimates.

4 Conclusions

In this work, we present a multitemporal assessment of the accuracy and precision of fixed-wing UAS photogrammetry for slope-scale snow depth mapping in alpine terrain. VIS and NIR imagery was collected with a modified off-the-shelf digital camera, mounted on a custom-built fixed-wing UAS. We performed 12 UAS flights during four campaigns in February and March 2015 in ‘snow-on’ and one in August 2015 in ‘snow-off’ conditions. The data were collected at a flat 0.12 km2 study site, located at approximately 2000 m a.s.l. While all the UAS imagery was processed with the same parameter settings in structure-from-motion photogrammetry software, it was collected under different site- and UAS-specific settings. This allowed testing the setup under different conditions and investigating factors influencing the quality of the UASDSMs and UASSDMs. Our assessment followed a threefold approach:

  1. 1.

    Accuracy assessment of UASDSMs with TLS reference data.

  2. 2.

    Accuracy assessment of UASSDMs with MP reference data.

  3. 3.

    Precision assessment of UASSDMs by intercomparison of multiple UAS results.

Ad (1) To determine how well UAS photogrammetry was able to map the absolute height of the snow surface, an accuracy assessment of the UASDSMs was performed with high-resolution, high-accuracy TLS reference data. While the choice of the study site location in a flat valley floor benefitted the UAS data acquisition (e.g., easy accessibility for MP and RP measurements), it resulted in an unfavourable TLS setup. The only slightly elevated location of the instrument over the valley floor resulted in a large amount of observations with high incidence angles, and caused large TLS footprints and occlusions (45% of AOI beyond TLS line-of-sight). However, the results showed no correlation between UASDSM error magnitude and footprint size. Therefore, we considered the TLS data valid for UASDSM accuracy assessment. UASDSM error distribution was non-normal, but without systematic bias. The skewed error distribution resulted from outliers, which typically occurred at steep rock faces, trees or building façades. Robust accuracy measures were calculated for UASDSM error and error maps interpreted visually. The results showed:

  1. i.

    Low errors were mostly observed for VIS or NIR UAS imagery acquired with the AOI in full sunlight (68.3% quantile ≤ 0.35 m; 95% quantile ≤ 0.57 m).

  2. ii.

    High errors were determined for VIS UAS imagery acquired with the AOI shadowed or with high amount of blurry imagery (> 0.41 m; > 0.96 m, respectively).

  3. iii.

    For NIR imagery collected with the AOI shadowed, one flight showed errors in the same range as for imagery collected in full sunlight; a similar scenario in a different flight, however, showed very high errors. Visual interpretation suggested an underlying systematic error (e.g., poor image alignment due to changing illumination caused by shadow moving across AOI during image acquisition).

Ad (2) If UAS imagery was available for two or more points in time, height differences were calculated between the UASDSMs. In this case, one ‘snow-off’ UASDSM was compared to several ‘snow-on’ UASDSMs to determine relative surface height change, i.e., UASSDMs. The accuracy of UASSDMs was assessed with MP data, collected for the February campaigns. The UASSDM error was distributed normally, thus standard accuracy measures were applied. The results show a negative bias in the data, caused by the interaction of vegetation and snow cover (Adams et al. 2016; Bühler et al. 2016). This problem has also been described in comparable studies by other authors (e.g., Marti et al. 2016; Vander Jagt et al. 2015; Nolan et al. 2015). After correction of this bias, UASSDM errors follow a pattern similar to the UASDSM errors (RMSE < 0.31 and > 0.39 m for flights recorded under good and poor illumination conditions, respectively). The results also showed some indication of high QI and low RMS reprojection error being correlated to low UASSDM error.

Ad (3) The precision of snow depth mapping with UAS photogrammetry was assessed by intercomparison of two sets of UASSDMs recorded under similar site-specific settings (VIS imagery acquired with AOI in full sunlight). The results showed that over a small area with negligible intermittent height change, the normally distributed residues were within 0.04 m (SD) for both the comparisons; over the whole AOI, the 68.3% quantile of the non-normally distributed residues where within ± 0.25 to ± 0.33 m.

UAS-based snow depth mapping provides a reliable source of snow depth information at an unprecedented level of detail (decimetres to centimetres). On-demand UAS surveys at the slope-scale can be performed cost-efficiently to provide details on snow depth distribution and allow visual interpretation of the snow surface with orthophotos. In the presented case, testing different setups and sensors, and evaluating the spatial and temporal variations of accuracy and precision, provides further insight into UAS-based snow depth mapping. However, the employed indirect georeferencing technique proved to be a considerable drawback, because it was very time-consuming, limited the achievable accuracy and reduced the benefits of close-range sensing (Adams et al. 2016). Additionally, the TLS setup showed some weaknesses due to high incidence angles. Further developments could include applying the presented technique to an AOI with a geometric setup more suited to TLS measurements or the use of additional sensors for snow quality, rather than only snow quantity mapping.