Monitoring Forage Mass with Low-Cost UAV Data: Case Study at the Rengen Grassland Experiment

Monitoring and predicting above ground biomass yield of grasslands are of key importance for grassland management. Established manual methods such as clipping or rising plate meter measurements provide accurate estimates of forage yield, but are time consuming and labor intensive, and do not provide spatially continuous data as required for precision agriculture applications. Therefore, the main objective of this study is to investigate the potential of sward height metrics derived from low-cost unmanned aerial vehicle-based image data to predict forage yield. The study was conducted over a period of 3 consecutive years (2014–2016) at the Rengen Grassland Experiment (RGE) in Germany. The RGE was established in 1941 and is since then under the same management regime of five treatments in a random block design and two harvest cuts per year. For UAV-based image acquisition, a DJI Phantom 2 with a mounted Canon Powershot S110 was used as a low-cost aerial imaging system. The data were investigated at different levels (e.g., harvest date-specific, year-specific, and plant community-specific). A pooled data model resulted in an R2 of 0.65 with a RMSE of 956.57 kg ha−1, although cut-specific or date-specific models yielded better results. In general, the UAV-based metrics outperformed the traditional rising plate meter measurements, but was affected by the timing of the harvest cut and plant community.


Introduction
For grassland management decisions, spatial information on sward growth and forage mass is important (Castle 1976;Catchpole and Wheeler 1992;Schellberg et al. 2008). In the last decade, the developments in unmanned aerial vehicle-based (UAV) sensing systems enable the acquisition of image data in ultrahigh spatial resolution for important phenological growth stages (Bareth et al. 2011;Zhang and Kovacs 2012;Colomina and Molina 2014). Furthermore, new photogrammetric software products support the analysis of such image data to produce 3D point clouds and digital surface models (DSMs) using Structure from Motion (SfM) and Multi-View Stereopsis (MVS) (Harwin and Lucieer 2012;Bendig et al. 2013). The analysis of multi-temporal DSMs supports the monitoring of sward or crop height development in high spatial resolution (Hoffmeister et al. 2010; Bareth et al. 2016).
In grasslands, the information of sward height development is closely related to forage mass determination (Fricke et al. 2011). Since the 1960s, the so-called rising plate meters (RPMs) are successfully used to obtain point measurements of compressed sward height as an estimator for forage mass (King et al. 1986;Sanderson et al. 2001). Such measurements are, however, labour-intensive and costly, and do not cover the area of interest (e.g., field or plots) completely (Harmoney et al. 1997;Goulding et al. 2008). Hence, the development of methods for spatially explicit monitoring of sward height as an estimator for forage mass is desired and several promising approaches exist (Reddersen et al. 2014; Bareth et al. 2015;Wachendorf et al. 2017). Hardin and Jackson (2005), Rango et al. (2009), and Laliberte et al. (2010) describe the potential of UAV-based monitoring of grasslands and rangelands. An early work on using UAV-derived RGB images for DSM generation for monitoring grassland traits is provided by Bareth et al. (2015). The authors report an R 2 of 0.89 between RPM measurements and UAV-derived sward height. Lately, Wijesingha et al. (2019) investigated terrestrial laser scanning (TLS) and lowcost UAV-based image acquisition for SfM/MVS analysis for biomass prediction and report promising results for both methods. For grazing systems, Gillan et al. (2019) achieved an R 2 of 0.78 and Michez et al. (2019) of 0.23-0.49. Bareth and Schellberg (2018) investigated the correlation of lowcost UAV-derived DSMs with RPM data and achieved a high R 2 of 0.86 for a 3-year data analysis. Lussem et al. (2019) report varying performance (R 2 between 0.57 and 0.73) of low-cost UAV-derived DSMs for estimating forage mass, depending on the different harvest cuts, but RPM measurements outperformed the UAV approach. Similar performance of UAV-based sward height metrics are described by Grüner et al. (2019) and Borra-Serrano et al. (2019). The latter study used a combined approach from UAV-derived DSMs and vegetation indices, reporting a high R 2 of 0.81. Näsi et al. (2018), Viljanen et al. (2018), and Zhang et al. (2018) report comparable effectiveness of such combined UAV-derived data analysis. However, these studies do not investigate the model performance for multiple years. Furthermore, several height metrics were investigated by the above mentioned studies, but a universally applicable height metric for biomass estimation could not be determined. In general, it can be summarized that UAV-based image acquisition with low-cost systems provides a promising potential for forage mass monitoring.
Thus far, there has no study been published, which utilizes low-cost UAVs to investigate forage mass prediction for 3 consecutive years with multiple annual cuts. The latter is important, because in managed grasslands, the varying weather conditions within a year and between years result in a significant variability of sward growth patterns and consequently in forage mass production. The overall objective of this study is to investigate the potential of UAV-derived sward height metrics for monitoring forage mass. Therefore, this study compares the performance of empirical models for forage mass production in a long-term grassland experiment over 3 years (2014)(2015)(2016) with two harvest cuts per year. The developed models are evaluated at different levels: (i) harvest date-specific, (ii) year-specific, (iii) cut-specific, (iv) plant community-specific, and (v) a pooled model including all data. This study utilizes parts of the dataset published by Bareth and Schellberg (2018), which was investigated for replacing RPM measurements with UAV-derived sward height data.

Study Site: The Rengen Grassland Experiment (RGE)
The Rengen Grassland Experiment (RGE) is located in Rhineland-Palatinate, Germany, in the Eifel mountain region and is part of a former grassland research station of Bonn University, Germany (50° 13′ N, 6° 51′ E, 475 m above mean sea level). The local mean annual precipitation is 811 mm and mean annual temperature is 6.9 °C. The soil is classified as Pseudogley. The RGE was established in 1941 on a low productive Nardetum meadow and is one of the oldest continuously managed grassland experiments in Europe. The RGE is arranged in a randomized block design with five treatments (see Table 1) and ten replicates per treatment.  (Hejcman et al. 2010a). Plant communities in Ca and CaN treatments are classified as Polygono-Trisetion and produce lower amounts of biomass relative to the NP(K) treatments (Schellberg et al. 1999;Hejcman et al. 2010b). The NP(K)-treated communities are classified as Arrhenatherion and senescence starts in June (cut 1) and September (cut 2), which leads to a certain amount of lodging, while the Ca-and CaN-treated communities maintain a longer time of green leaf area. More information on the RGE can also be found in Chytry et al. (2009), Hejcman et al. (2007, and Hollberg and Schellberg (2017).

Platform and Sensor
A DJI Phantom 2 was used as a platform to carry a Canon Powershot S110 camera with 12.1 megapixels. The camera was mounted with a fixed camera mount at the bottom

Data Acquisition and Processing
The UAV campaigns were scheduled 1 or 2 days before the reference measurements and harvest dates (biomass and sward height). Additionally, UAV campaigns were scheduled to obtain a DSM at the beginning of the growing season in early March for the first cut and directly after the first cut (July) for the second cut to obtain the base models (see Table 2).
The UAV was operated manually, flying in a grid pattern about 20 m above ground over the investigated plots. The image acquisition mode was set to continuous mode, to automatically acquire images every second from take-off to landing. The operating mode ensured an image overlap of about 90%. Ground control points were evenly distributed across the 25 plots (see Fig. 1) and measured with a RTK-DGPS (Real-time Kinematic Differential Global Positioning System; TopCon Hiper-Pro, Japan). The acquired images were processed in the SfM/MVS software Agisoft PhotoScan v.1.3 (Agisoft Ltd., St. Petersburg, Russia). After an initial image alignment of about 150 images per sampling date, the GCPs were placed in the images for accurate georeferencing of the data. Subsequently, the image alignment was run using 'high' quality setting and the dense point cloud was built  using 'high' quality settings and 'mild' depth filtering, to preserve finer details of the sward . Pitch, roll, and yaw parameters were not further analyzed and the default settings in PhotoScan were used. However, the authors are aware of possible image distortions due to UAV movements not compensated by a gimbal. Furthermore, no filtering algorithm was applied to the point cloud. From the point cloud, a DSM was generated and exported as TIFF file. The DSMs had a spatial resolution of 2 cm/pixel horizontally and sub-millimetre resolution vertically. Table 2 provides details about the average accuracy of the GCPs per sampling date. To obtain sward height metrics, the base model DSM (T0) was subtracted from the respective sampling date DSM (TS). The sward height metrics were calculated using zonal statistics with a polygonal shape file that represented the plot outline with an inward buffer of 30 cm. The derived sward height metrics included the 80th (SH p80 ) and 90th percentiles (SH p90 ), the mean sward height (SH mean ), and maximum (SH max ) and minimum sward height (SH min ) per plot. Reference height measurements were taken with a rising plate meter to obtain compressed sward height in cm. Fresh biomass samples were collected from the center of each plot using a mower. A subsample per plot was dried to constant weight in a forced air drier oven (Memmert GmbH, Schwabach, Germany) at 65 °C to obtain dry matter. Subsequently, dry biomass yield (DBY) per plot in kg ha −1 was calculated.
Statistical analysis was performed in R v.3.6. To test the UAV-based sward height metrics as predictors for grassland DBY, the sensitivity of the variables was tested using Pearson's correlation coefficient (PCC) on different levels of aggregation. Furthermore, the most sensitive and significant variables were chosen for regression analysis using ordinary least-squares regression (simple linear regression) with leave one out cross-validation (LOO-CV). Linear regression was chosen, since the relationship of the variables is best represented by a linear relationship. LOO-CV was chosen over k-fold cross-validation due to the relatively small sample size (Hair et al. 2014) and because it is widely applied in remote sensing research (Homolova et al. 2014;Ferner et al. 2015;Viljanen et al. 2018;Obermeier et al. 2019). To evaluate the prediction accuracy, the coefficient of determination (R 2 ) and root-mean-square error (RMSE) were calculated from the LOO-CV results.

Results
The UAV-based canopy height models for all sampling dates are displayed in Fig. 3. The first cuts show higher sward heights than the second cuts. Two plots of the highest fertilizer treatment clearly show the highest sward height on all six dates. The lowest sward heights are found in the Polygono-Trisetion plots, and the higher ones are in the Arrhenatherion plots. In general, similar growth patterns can be observed for all years and all cuts. Table 3 summarizes descriptive statistics for the mean UAV-based sward height (SH mean ), the RPM-based sward height, and DBY for each cut per year. Missing values or extreme outliers of DBY were excluded, which results in a different number of samples per cut. The first cut shows the highest values for all variables, while the UAVbased SH mean is about 10 cm higher than the RPM-based compressed SH. Furthermore, the maximum UAV-based SH mean is higher than the RPM-based compressed SH. The standard deviation of the UAV-based SH is higher compared to the RPM-based compressed SH. In summary, the spatially continuous acquisition of the sward using UAV data captures more details of the SH.

Sensitivity Analysis
Prior to DBY prediction, correlations between DBY and SH metrics (UAV and RPM) were evaluated. Because RPM measurements serve as an established predictor for forage mass, in a first analysis, the correlation between RPM data and UAV-derived sward height (UAV-SH) is investigated. Figure 4 shows the results of RPM and UAV-SH. The best fit is achieved for 2014 and 2015, while, in 2016, only the second cut shows a significant correlation. However, R 2 for each single cut in 2014 and 2015 range between 0.75 and 0.96. Only in 2016, the performance is not convincing for the first cut and of moderate performance (R 2 of 0.50) for the second cut. Figures 5 and 6 show the regression of RPM and UAV-SH to DBY, respectively, by year and harvest cut. Except for 2016, RPM performs slightly better than UAV-SH with R 2 ranging between 0.67 and 0.89. The variability in canopy height and lodging swards in the first cut leads to a higher deviation from the regression line compared to the second cut. The UAV-SH for the first cut in 2016 shows a reasonable pattern compared to the RPM SH.
The results of the sensitivity analysis are shown in Table 4. Pearson's Correlation Coefficient (PCC) is displayed for the UAV-based canopy height metrics for different levels of data aggregation. From the results of the sensitivity analysis, it can be summarized that all height metrics have a significant relationship to DBY for all aggregation levels, except for SH min . Thus, SH min was excluded as a predictor variable in the biomass prediction models. The first cut shows more variability from year to year than the second cut. The correlation of SH metrics and DBY for the second cut are consistently higher.

Biomass Prediction Cross-Validation Results
The following tables and figures show the results of the biomass prediction models for different levels of aggregation: (i) harvest date-specific, (ii) year-specific, (iii) cut-specific, (iv) plant community-specific, and (v) a pooled model including all data. The SH mean performs best in most models, and thus, SH mean serves as predictor variable for the exemplary scatterplots of observed and predicted DBY.
In Table 5 and Fig. 7, the results for harvest date-specific biomass prediction models are displayed. This level provides a detailed insight in the consistently better performing models for the second cut per year (R 2 values of > 0.70). Furthermore, while the 80th and 90th percentile and the mean perform within a similar range, SH max performs consistently lower, except for the second cut in 2014. As can be seen from Fig. 7, the deviation from the regression line is higher for the first cut compared to the second cut. Table 6 and Fig. 8 provide the results for the year-specific biomass prediction models. It is obvious that the variability of the prediction quality of the investigated years is large. While for 2015 and 2016, the SH mean model performs moderately with R 2 of 0.59 and 0.55, respectively, model performance for 2014 is high (R 2 0.78). Again, SH max shows the lowest R 2 in all models. Table 7 and Fig. 9 show the results for biomass prediction models aggregated by cut over all 3 years. The model performance is only moderate and the models for the first cut perform less effective than for the second cut. The deviation from the regression line is as expected higher for the highly fertilized plots. The variability of sward growth between the years is again high and model performance is only moderate. While SH p90 , SH p80 , and SH mean show similar performance, SH max is only in comparable range to the other metrics for the second cut.   The results of the biomass prediction models on the level of plant communities are shown in Table 8 and Fig. 10. Clearly, the predictive ability of the height metrics is higher for the low fertilized plots, e.g., the Polygono-Trisetion community. Model performance is poor for Arrhenatherion with R 2 between 0.34 and 0.40 and moderate for Polygono-Trisetion with R 2 between 0.54 and 0.59. Again, SH max shows the lowest R 2 values. Finally, Table 9 and Fig. 11 show the results of the pooled data biomass prediction model (all data combined). Except for SH max , all metrics perform moderately well. SH p90 , SH p80 , and SH mean perform very similar for the pooled data model. SH max performs significantly lower. While DBY in the highfertilized plots is mostly underestimated, the DBY in the low fertilized plots is mostly overestimated (see Fig. 11).

Discussion
The primary objective of this study was to evaluate the prediction accuracy of empirical models for grassland DBY at different levels of aggregation: (i) harvest date-specific, (ii) year-specific, (iii) cut-specific, (iv) plant community-specific, and (v) a pooled model including all data. The results showed that biomass prediction based on low-cost UAV image data is a reliable method compared to the reference data of the RPM. RPM did not outperform UAV-derived metrics as a predictor for forage mass in our study. According to the results, sward type and time of harvest cut impact biomass prediction. This was also observed in similar studies (Fricke et al. 2011;Viljanen et al. 2018;Grüner et al. 2019;Lussem et al. 2019;Wijesingha et al. 2019). On one hand, sward density decreases in upper layers of mixed swards, and on the other hand, lodging in very mature swards led to an underestimation of sward height (see Fig. 2). This is especially important for the RGE, because it is a system with two cuts a year, resulting in very high sward heights with a significant risk of lodging in the Arrhenatherion plots.
Overall, data of 2014 performed better compared to 2015 and 2016. Furthermore, the second cut showed higher R 2 values compared to the first cut, due to less sward height variability in general and especially in the Polygono-Trisetion plots. These findings also have implications for monitoring of natural grassland habitats, where a wide range of species composition can be found, producing a significant variability in sward heights Thus, the pooled model and the models for the first and second cut for forage mass prediction are not satisfactory in their predictive ability from our 3-year dataset. However, the results are in line with other studies that reached R 2 values for linear pooled dataset models from 1-year field experiments of 0.72 (Grüner et al. 2019) or 0.45 (Wijesingha et al. 2019).
Overall, the results for the investigated year-specific and harvest cut-specific models for forage mass prediction from multi-temporal DSMs are in agreement with similar studies. Grüner  In Bareth and Schellberg (2018), the R 2 of UAV-based SH to RPM SH is higher than in this study, although the study utilizes the same SH data as in this study. This is because in Bareth and Schellberg (2018), the values were arithmetically averaged by treatment in the analysis, leading to a more uniform distribution along the regression line. Furthermore, the number of observations in the present study is quite small, since only the harvest dates were included in the analysis and not subsequent sampling dates in the growing period, as in Bareth and Schellberg (2018). A pooled model with data averaged by treatment for this study (n = 30) would result in an R 2 of 0.81.
One drawback of this study is the suboptimal design of the RGE for this approach. As a two-cut system, the sward in highly fertilized plots tends to lodge when harvest time approaches. Lodging leads to a higher uncertainty regarding accurate DBY estimation, since the linear relationship between sward height and biomass is randomly altered.
Lodging quantification in mixed grasslands is difficult compared to graminoid monoculture crops such as wheat or barley, since grassland has a spatial and temporal diverse emergence of different species throughout the growing season. Due to this diversity, a common maximum height, from which lodging and erect plants can be differentiated (Wilke et al. 2019), is not yet applicable.
One aim of this study was to determine a universally applicable height metric for biomass estimation in grassland. The 80th and 90th percentile and the mean sward height performed equally well, while the minimum and maximum sward height performed less potent. Similar findings were observed by Viljanen et al. (2018), Lussem et al. (2019), andBorra-Serrano et al. (2019). Today's computational possibilities do not hinder the fast computation of large sets of variables. Thus, taking multiple variables (e.g., in a Random Forest model) might lead to better predictive ability in terms of grassland differentiation. Future studies should be directed Table 4 Pearson's correlation coefficient for dry biomass yield and UAV-based height metrics for different levels of data aggregation 'ns' not significant; '*' significant at p < 0.1; '**' significant at p < 0.05; '***' significant at p < 0.001   Fig. 7 Observed vs. predicted dry biomass yield (DBY) by year and harvest cut (predictor SH mean ). Secondary y-axis: first (1) and second (2) harvest cut in upper and lower panels, respectively  short-wave infrared wavelengths (Honkavaara et al. 2016;Camino et al. 2018;Jenal et al. 2019).

Conclusion and Outlook
Timely and accurate yield estimation is a key parameter in grassland management. The results of this study demonstrate that DBY can be predicted using sward height derived from multi-temporal DSMs derived from low-cost UAV-based imaging with consistent results over 3 years. Although the prediction accuracy may not be satisfactory for management applications, such as fertilizer application based on yield maps, our findings provide valuable insights on how to conduct further research on biomass modeling in grassland based on SfM/UAV-based image data. The ongoing miniaturization and cost efficiency of sensors and platforms, as well as powerful algorithms (e.g., deep learning) and computer hardware can open new paths to sustainable grassland management.