1 Introduction

For grassland management decisions, spatial information on sward growth and forage mass is important (Castle 1976; Catchpole and Wheeler 1992; Schellberg et al. 2008). In the last decade, the developments in unmanned aerial vehicle-based (UAV) sensing systems enable the acquisition of image data in ultrahigh spatial resolution for important phenological growth stages (Bareth et al. 2011; Zhang and Kovacs 2012; Colomina and Molina 2014). Furthermore, new photogrammetric software products support the analysis of such image data to produce 3D point clouds and digital surface models (DSMs) using Structure from Motion (SfM) and Multi-View Stereopsis (MVS) (Harwin and Lucieer 2012; Bendig et al. 2013). The analysis of multi-temporal DSMs supports the monitoring of sward or crop height development in high spatial resolution (Hoffmeister et al. 2010; Bareth et al. 2016).

In grasslands, the information of sward height development is closely related to forage mass determination (Fricke et al. 2011). Since the 1960s, the so-called rising plate meters (RPMs) are successfully used to obtain point measurements of compressed sward height as an estimator for forage mass (King et al. 1986; Sanderson et al. 2001). Such measurements are, however, labour-intensive and costly, and do not cover the area of interest (e.g., field or plots) completely (Harmoney et al. 1997; Goulding et al. 2008). Hence, the development of methods for spatially explicit monitoring of sward height as an estimator for forage mass is desired and several promising approaches exist (Reddersen et al. 2014; Bareth et al. 2015; Wachendorf et al. 2017). Hardin and Jackson (2005), Rango et al. (2009), and Laliberte et al. (2010) describe the potential of UAV-based monitoring of grasslands and rangelands. An early work on using UAV-derived RGB images for DSM generation for monitoring grassland traits is provided by Bareth et al. (2015). The authors report an R2 of 0.89 between RPM measurements and UAV-derived sward height. Lately, Wijesingha et al. (2019) investigated terrestrial laser scanning (TLS) and low-cost UAV-based image acquisition for SfM/MVS analysis for biomass prediction and report promising results for both methods. For grazing systems, Gillan et al. (2019) achieved an R2 of 0.78 and Michez et al. (2019) of 0.23–0.49. Bareth and Schellberg (2018) investigated the correlation of low-cost UAV-derived DSMs with RPM data and achieved a high R2 of 0.86 for a 3-year data analysis. Lussem et al. (2019) report varying performance (R2 between 0.57 and 0.73) of low-cost UAV-derived DSMs for estimating forage mass, depending on the different harvest cuts, but RPM measurements outperformed the UAV approach. Similar performance of UAV-based sward height metrics are described by Grüner et al. (2019) and Borra‐Serrano et al. (2019). The latter study used a combined approach from UAV-derived DSMs and vegetation indices, reporting a high R2 of 0.81. Näsi et al. (2018), Viljanen et al. (2018), and Zhang et al. (2018) report comparable effectiveness of such combined UAV-derived data analysis. However, these studies do not investigate the model performance for multiple years. Furthermore, several height metrics were investigated by the above mentioned studies, but a universally applicable height metric for biomass estimation could not be determined. In general, it can be summarized that UAV-based image acquisition with low-cost systems provides a promising potential for forage mass monitoring.

Thus far, there has no study been published, which utilizes low-cost UAVs to investigate forage mass prediction for 3 consecutive years with multiple annual cuts. The latter is important, because in managed grasslands, the varying weather conditions within a year and between years result in a significant variability of sward growth patterns and consequently in forage mass production. The overall objective of this study is to investigate the potential of UAV-derived sward height metrics for monitoring forage mass. Therefore, this study compares the performance of empirical models for forage mass production in a long-term grassland experiment over 3 years (2014–2016) with two harvest cuts per year. The developed models are evaluated at different levels: (i) harvest date-specific, (ii) year-specific, (iii) cut-specific, (iv) plant community-specific, and (v) a pooled model including all data. This study utilizes parts of the dataset published by Bareth and Schellberg (2018), which was investigated for replacing RPM measurements with UAV-derived sward height data.

2 Materials and Methods

2.1 Study Site: The Rengen Grassland Experiment (RGE)

The Rengen Grassland Experiment (RGE) is located in Rhineland-Palatinate, Germany, in the Eifel mountain region and is part of a former grassland research station of Bonn University, Germany (50° 13′ N, 6° 51′ E, 475 m above mean sea level). The local mean annual precipitation is 811 mm and mean annual temperature is 6.9 °C. The soil is classified as Pseudogley. The RGE was established in 1941 on a low productive Nardetum meadow and is one of the oldest continuously managed grassland experiments in Europe. The RGE is arranged in a randomized block design with five treatments (see Table 1) and ten replicates per treatment. The individual plot size is 3 m × 5 m. The fertilization levels are: lime only (Ca), lime/nitrogen (CaN), lime/nitrogen/phosphorus (CaNP), and lime/nitrogen/phosphorus/potassium as CaNP–KCl and CaNP–K2SO4 (Hejcman et al. 2010a). This study focuses on the northern block of the RGE (5 replications per treatment, 25 plots total, see Figs. 1, 2).

Table 1 Annual amount of nutrients (kg ha−1) supplied per treatment since 1941.
Fig. 1
figure 1

Design of the study area at the Rengen Grassland Experiment

Fig. 2
figure 2

Overview of the Rengen Grassland Experiment (left) and close-up of investigated plots (right), both pictures taken at first harvest cut in 2014 (Photos: G. Bareth)

From 1941 to 1961, the RGE was mown once per year in late summer and from 1962 onward twice per year in late June and beginning to mid-October. Due to the long-term application of different fertilizers, distinctive grassland communities have developed with significantly different species compositions (Hejcman et al. 2010a). Plant communities in Ca and CaN treatments are classified as Polygono–Trisetion and produce lower amounts of biomass relative to the NP(K) treatments (Schellberg et al. 1999; Hejcman et al. 2010b). The NP(K)-treated communities are classified as Arrhenatherion and senescence starts in June (cut 1) and September (cut 2), which leads to a certain amount of lodging, while the Ca- and CaN-treated communities maintain a longer time of green leaf area. More information on the RGE can also be found in Chytry et al. (2009), Hejcman et al. (2007), and Hollberg and Schellberg (2017).

2.2 Platform and Sensor

A DJI Phantom 2 was used as a platform to carry a Canon Powershot S110 camera with 12.1 megapixels. The camera was mounted with a fixed camera mount at the bottom of the UAV, pointing in nadir direction. A Canon Hack Development Kit (https://www.chdk.org) via SD card enabled continuous image acquisition. Speed and aperture were fixed before take-off and zoom lens was set to 35 mm. The complete UAV imaging system costs less than 800 € in 2013 (Bareth et al. 2015).

2.3 Data Acquisition and Processing

The UAV campaigns were scheduled 1 or 2 days before the reference measurements and harvest dates (biomass and sward height). Additionally, UAV campaigns were scheduled to obtain a DSM at the beginning of the growing season in early March for the first cut and directly after the first cut (July) for the second cut to obtain the base models (see Table 2).

Table 2 Acquisition dates of UAV-based data and ground control point errors (XYZ direction) in cm

The UAV was operated manually, flying in a grid pattern about 20 m above ground over the investigated plots. The image acquisition mode was set to continuous mode, to automatically acquire images every second from take-off to landing. The operating mode ensured an image overlap of about 90%. Ground control points were evenly distributed across the 25 plots (see Fig. 1) and measured with a RTK-DGPS (Real-time Kinematic Differential Global Positioning System; TopCon Hiper-Pro, Japan). The acquired images were processed in the SfM/MVS software Agisoft PhotoScan v.1.3 (Agisoft Ltd., St. Petersburg, Russia). After an initial image alignment of about 150 images per sampling date, the GCPs were placed in the images for accurate georeferencing of the data. Subsequently, the image alignment was run using ‘high’ quality setting and the dense point cloud was built using ‘high’ quality settings and ‘mild’ depth filtering, to preserve finer details of the sward (Viljanen et al. 2018). Pitch, roll, and yaw parameters were not further analyzed and the default settings in PhotoScan were used. However, the authors are aware of possible image distortions due to UAV movements not compensated by a gimbal. Furthermore, no filtering algorithm was applied to the point cloud. From the point cloud, a DSM was generated and exported as TIFF file. The DSMs had a spatial resolution of 2 cm/pixel horizontally and sub-millimetre resolution vertically. Table 2 provides details about the average accuracy of the GCPs per sampling date. To obtain sward height metrics, the base model DSM (T0) was subtracted from the respective sampling date DSM (TS). The sward height metrics were calculated using zonal statistics with a polygonal shape file that represented the plot outline with an inward buffer of 30 cm. The derived sward height metrics included the 80th (SHp80) and 90th percentiles (SHp90), the mean sward height (SHmean), and maximum (SHmax) and minimum sward height (SHmin) per plot.

Reference height measurements were taken with a rising plate meter to obtain compressed sward height in cm. Fresh biomass samples were collected from the center of each plot using a mower. A subsample per plot was dried to constant weight in a forced air drier oven (Memmert GmbH, Schwabach, Germany) at 65 °C to obtain dry matter. Subsequently, dry biomass yield (DBY) per plot in kg ha−1 was calculated.

Statistical analysis was performed in R v.3.6. To test the UAV-based sward height metrics as predictors for grassland DBY, the sensitivity of the variables was tested using Pearson’s correlation coefficient (PCC) on different levels of aggregation. Furthermore, the most sensitive and significant variables were chosen for regression analysis using ordinary least-squares regression (simple linear regression) with leave one out cross-validation (LOO-CV). Linear regression was chosen, since the relationship of the variables is best represented by a linear relationship. LOO-CV was chosen over k-fold cross-validation due to the relatively small sample size (Hair et al. 2014) and because it is widely applied in remote sensing research (Homolova et al. 2014; Ferner et al. 2015; Viljanen et al. 2018; Obermeier et al. 2019). To evaluate the prediction accuracy, the coefficient of determination (R2) and root-mean-square error (RMSE) were calculated from the LOO-CV results.

3 Results

The UAV-based canopy height models for all sampling dates are displayed in Fig. 3. The first cuts show higher sward heights than the second cuts. Two plots of the highest fertilizer treatment clearly show the highest sward height on all six dates. The lowest sward heights are found in the Polygono–Trisetion plots, and the higher ones are in the Arrhenatherion plots. In general, similar growth patterns can be observed for all years and all cuts.

Fig. 3
figure 3

UAV-based canopy height models of all six sampling dates

Table 3 summarizes descriptive statistics for the mean UAV-based sward height (SHmean), the RPM-based sward height, and DBY for each cut per year. Missing values or extreme outliers of DBY were excluded, which results in a different number of samples per cut. The first cut shows the highest values for all variables, while the UAV-based SHmean is about 10 cm higher than the RPM-based compressed SH. Furthermore, the maximum UAV-based SHmean is higher than the RPM-based compressed SH. The standard deviation of the UAV-based SH is higher compared to the RPM-based compressed SH. In summary, the spatially continuous acquisition of the sward using UAV data captures more details of the SH.

Table 3 Descriptive statistics of dry matter yield (DBY), rising plate meter (RPM), and UAV-based mean sward height (SHmean)

3.1 Sensitivity Analysis

Prior to DBY prediction, correlations between DBY and SH metrics (UAV and RPM) were evaluated. Because RPM measurements serve as an established predictor for forage mass, in a first analysis, the correlation between RPM data and UAV-derived sward height (UAV-SH) is investigated. Figure 4 shows the results of RPM and UAV-SH. The best fit is achieved for 2014 and 2015, while, in 2016, only the second cut shows a significant correlation. However, R2 for each single cut in 2014 and 2015 range between 0.75 and 0.96. Only in 2016, the performance is not convincing for the first cut and of moderate performance (R2 of 0.50) for the second cut.

Fig. 4
figure 4

Scatterplots of UAV-based sward height (mean) and rising plate meter (RPM) sward height by year and harvest cut [secondary y-axis: first (1) and second (2) harvest cut in upper and lower panels, respectively]

Figures 5 and 6 show the regression of RPM and UAV-SH to DBY, respectively, by year and harvest cut. Except for 2016, RPM performs slightly better than UAV-SH with R2 ranging between 0.67 and 0.89. The variability in canopy height and lodging swards in the first cut leads to a higher deviation from the regression line compared to the second cut. The UAV-SH for the first cut in 2016 shows a reasonable pattern compared to the RPM SH.

Fig. 5
figure 5

Scatterplots of rising plate meter (RPM) sward height and dry biomass yield (DBY) by year and harvest cut [secondary y-axis: first (1) and second (2) harvest cut in upper and lower panels, respectively]

Fig. 6
figure 6

Scatterplots of UAV-based sward height (mean) and dry biomass yield (DBY) by year and harvest cut [secondary y-axis: first (1) and second (2) harvest cut in upper and lower panels, respectively]

The results of the sensitivity analysis are shown in Table 4. Pearson’s Correlation Coefficient (PCC) is displayed for the UAV-based canopy height metrics for different levels of data aggregation. From the results of the sensitivity analysis, it can be summarized that all height metrics have a significant relationship to DBY for all aggregation levels, except for SHmin. Thus, SHmin was excluded as a predictor variable in the biomass prediction models. The first cut shows more variability from year to year than the second cut. The correlation of SH metrics and DBY for the second cut are consistently higher.

Table 4 Pearson’s correlation coefficient for dry biomass yield and UAV-based height metrics for different levels of data aggregation

3.2 Biomass Prediction Cross-Validation Results

The following tables and figures show the results of the biomass prediction models for different levels of aggregation: (i) harvest date-specific, (ii) year-specific, (iii) cut-specific, (iv) plant community-specific, and (v) a pooled model including all data. The SHmean performs best in most models, and thus, SHmean serves as predictor variable for the exemplary scatterplots of observed and predicted DBY.

In Table 5 and Fig. 7, the results for harvest date-specific biomass prediction models are displayed. This level provides a detailed insight in the consistently better performing models for the second cut per year (R2 values of > 0.70). Furthermore, while the 80th and 90th percentile and the mean perform within a similar range, SHmax performs consistently lower, except for the second cut in 2014. As can be seen from Fig. 7, the deviation from the regression line is higher for the first cut compared to the second cut.

Table 5 Leave-one-out cross-validation results for dry biomass yield and UAV-based height metrics by year and cut (harvest date-specific)
Fig. 7
figure 7

Observed vs. predicted dry biomass yield (DBY) by year and harvest cut (predictor SHmean). Secondary y-axis: first (1) and second (2) harvest cut in upper and lower panels, respectively

Table 6 and Fig. 8 provide the results for the year-specific biomass prediction models. It is obvious that the variability of the prediction quality of the investigated years is large. While for 2015 and 2016, the SHmean model performs moderately with R2 of 0.59 and 0.55, respectively, model performance for 2014 is high (R2 0.78). Again, SHmax shows the lowest R2 in all models.

Table 6 Leave-one-out cross-validation results for dry biomass yield (DBY) and UAV-based height metrics by year
Fig. 8
figure 8

Observed vs. predicted dry biomass yield (DBY) by year (predictor SHmean)

Table 7 and Fig. 9 show the results for biomass prediction models aggregated by cut over all 3 years. The model performance is only moderate and the models for the first cut perform less effective than for the second cut. The deviation from the regression line is as expected higher for the highly fertilized plots. The variability of sward growth between the years is again high and model performance is only moderate. While SHp90, SHp80, and SHmean show similar performance, SHmax is only in comparable range to the other metrics for the second cut.

Table 7 Leave-one-out cross-validation results for dry biomass yield (DBY) and UAV-based height metrics by cut
Fig. 9
figure 9

Observed vs. predicted dry biomass yield (DBY) by harvest cut (predictor SHmean)

The results of the biomass prediction models on the level of plant communities are shown in Table 8 and Fig. 10. Clearly, the predictive ability of the height metrics is higher for the low fertilized plots, e.g., the Polygono–Trisetion community. Model performance is poor for Arrhenatherion with R2 between 0.34 and 0.40 and moderate for Polygono–Trisetion with R2 between 0.54 and 0.59. Again, SHmax shows the lowest R2 values.

Table 8 Leave-one-out cross-validation results for dry biomass yield (DBY) and UAV-based height metrics by plant community
Fig. 10
figure 10

Observed vs. predicted dry biomass yield (DBY) by plant community (predictor SHmean)

Finally, Table 9 and Fig. 11 show the results of the pooled data biomass prediction model (all data combined). Except for SHmax, all metrics perform moderately well. SHp90, SHp80, and SHmean perform very similar for the pooled data model. SHmax performs significantly lower. While DBY in the high-fertilized plots is mostly underestimated, the DBY in the low fertilized plots is mostly overestimated (see Fig. 11).

Table 9 Leave-one-out cross-validation results for dry biomass yield (DBY) and UAV-based height metrics (pooled dataset)
Fig. 11
figure 11

Observed vs. predicted dry biomass yield (DBY) for all years and cuts combined (pooled dataset)

4 Discussion

The primary objective of this study was to evaluate the prediction accuracy of empirical models for grassland DBY at different levels of aggregation: (i) harvest date-specific, (ii) year-specific, (iii) cut-specific, (iv) plant community-specific, and (v) a pooled model including all data. The results showed that biomass prediction based on low-cost UAV image data is a reliable method compared to the reference data of the RPM. RPM did not outperform UAV-derived metrics as a predictor for forage mass in our study. According to the results, sward type and time of harvest cut impact biomass prediction. This was also observed in similar studies (Fricke et al. 2011; Viljanen et al. 2018; Grüner et al. 2019; Lussem et al. 2019; Wijesingha et al. 2019). On one hand, sward density decreases in upper layers of mixed swards, and on the other hand, lodging in very mature swards led to an underestimation of sward height (see Fig. 2). This is especially important for the RGE, because it is a system with two cuts a year, resulting in very high sward heights with a significant risk of lodging in the Arrhenatherion plots. Overall, data of 2014 performed better compared to 2015 and 2016. Furthermore, the second cut showed higher R2 values compared to the first cut, due to less sward height variability in general and especially in the Polygono–Trisetion plots. These findings also have implications for monitoring of natural grassland habitats, where a wide range of species composition can be found, producing a significant variability in sward heights Thus, the pooled model and the models for the first and second cut for forage mass prediction are not satisfactory in their predictive ability from our 3-year dataset. However, the results are in line with other studies that reached R2 values for linear pooled dataset models from 1-year field experiments of 0.72 (Grüner et al. 2019) or 0.45 (Wijesingha et al. 2019).

Overall, the results for the investigated year-specific and harvest cut-specific models for forage mass prediction from multi-temporal DSMs are in agreement with similar studies. Grüner et al. (2019) reported R2 values of 0.40–0.87 for mixed and pure legume-grass swards in Germany (Hesse). Lussem et al. (2019) reported R2 values between 0.50 and 0.70 for grass swards in Germany (North Rhine-Westphalia), and Wijesingha et al. (2019) reported a wider range of R2 values from 0.14 to 0.62 depending on sward type and harvest date for three different sites in Germany (Hesse). Viljanen et al. (2018) described correlation coefficient (R) values between 0.93 and 0.98 for a grassland experiment in Finland, and Zhang et al. (2018) reported R2 values within a range of 0.66–0.89 for linear and logarithmic biomass prediction models in China (natural grasslands). Borra‐Serrano et al. (2019) described a R2 of 0.67 for the best-performing height metric (median) in perennial ryegrass.

In Bareth and Schellberg (2018), the R2 of UAV-based SH to RPM SH is higher than in this study, although the study utilizes the same SH data as in this study. This is because in Bareth and Schellberg (2018), the values were arithmetically averaged by treatment in the analysis, leading to a more uniform distribution along the regression line. Furthermore, the number of observations in the present study is quite small, since only the harvest dates were included in the analysis and not subsequent sampling dates in the growing period, as in Bareth and Schellberg (2018). A pooled model with data averaged by treatment for this study (n = 30) would result in an R2 of 0.81.

One drawback of this study is the suboptimal design of the RGE for this approach. As a two-cut system, the sward in highly fertilized plots tends to lodge when harvest time approaches. Lodging leads to a higher uncertainty regarding accurate DBY estimation, since the linear relationship between sward height and biomass is randomly altered. Lodging quantification in mixed grasslands is difficult compared to graminoid monoculture crops such as wheat or barley, since grassland has a spatial and temporal diverse emergence of different species throughout the growing season. Due to this diversity, a common maximum height, from which lodging and erect plants can be differentiated (Wilke et al. 2019), is not yet applicable.

One aim of this study was to determine a universally applicable height metric for biomass estimation in grassland. The 80th and 90th percentile and the mean sward height performed equally well, while the minimum and maximum sward height performed less potent. Similar findings were observed by Viljanen et al. (2018), Lussem et al. (2019), and Borra‐Serrano et al. (2019). Today’s computational possibilities do not hinder the fast computation of large sets of variables. Thus, taking multiple variables (e.g., in a Random Forest model) might lead to better predictive ability in terms of grassland differentiation. Future studies should be directed towards the analysis of the full information derived from the photogrammetric point cloud. Wallace et al. (2017) compared LiDAR and photogrammetric point clouds for biomass estimation in pastures and found good agreement between the voxel-based analysis from photogrammetric reconstruction and pasture biomass (R2 0.70–0.78). Using terrestrial laser scanning (TLS) for grassland biomass estimation might lead to better canopy penetration and, thus, a better vertical representation of the sward’s density. However, Wallace et al. (2017) and Schulze-Brüninghoff et al. (2019) found different performances of TLS voxel-based analysis of grassland biomass due to voxel size and outliers in the point cloud. Furthermore, combining spectral and structural variables is also a promising approach as shown by Bendig et al. (2015), Tilly et al. (2015), Näsi et al. (2018), and Borra-Serrano et al. (2019). Finally, such combined approaches might have even a better model performance by including short-wave infrared wavelengths (Honkavaara et al. 2016; Camino et al. 2018; Jenal et al. 2019).

5 Conclusion and Outlook

Timely and accurate yield estimation is a key parameter in grassland management. The results of this study demonstrate that DBY can be predicted using sward height derived from multi-temporal DSMs derived from low-cost UAV-based imaging with consistent results over 3 years. Although the prediction accuracy may not be satisfactory for management applications, such as fertilizer application based on yield maps, our findings provide valuable insights on how to conduct further research on biomass modeling in grassland based on SfM/UAV-based image data. The ongoing miniaturization and cost efficiency of sensors and platforms, as well as powerful algorithms (e.g., deep learning) and computer hardware can open new paths to sustainable grassland management.