On-farm cereal rye biomass estimation using machine learning on images from an unmanned aerial system

KC, Kushal; Romanko, Matthew; Perrault, Andrew; Khanal, Sami

doi:10.1007/s11119-024-10162-9

On-farm cereal rye biomass estimation using machine learning on images from an unmanned aerial system

Open access
Published: 06 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Precision Agriculture Aims and scope Submit manuscript

On-farm cereal rye biomass estimation using machine learning on images from an unmanned aerial system

Download PDF

Kushal KC¹,
Matthew Romanko²,
Andrew Perrault³ &
…
Sami Khanal¹

300 Accesses
Explore all metrics

Abstract

This study assesses the potential of using multispectral images collected by an unmanned aerial system (UAS) on machine learning (ML) frameworks to estimate cereal rye (Secale cereal L.) biomass. Multispectral images and ground-truth cereal rye biomass data were collected from 15 farmers’ fields up to three times between March and May in northwest Ohio. Images were processed to derive 13 vegetation indices (VIs). Out of 13 VIs, six optimal sets of VIs, including excess green (ExG), normalized green red difference index (NGRDI), soil adjusted vegetation index (SAVI), blue green ratio (B_G_ratio), red-edge triangular vegetation index (RTVI), and normalized difference red-edge (NDRE) were selected using the variance inflation factor (VIF) based feature selection approach. Six regression models including a multiple linear regression (MLR), elastic net (ENET), multivariate adaptive regression splines (MARS), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGB) were investigated for estimation of cereal rye biomass based on the VIs. For most of the models, the six selected VIs performed better than or similar to the full set of 13 VIs with R² ranging from 0.24 to 0.59 and RMSE ranging from 83.13 to 91.89 g/m² during 10-fold cross-validation. During independent accuracy assessment with the selected set of VIs, XGB exhibited the highest R² (0.67) and lowest RMSE (83.13 g/m²) and MAE (48.13 g/m²) followed by RF and ENET. For all the models, the agreement between observed and predicted biomass was high for biomass less than or equal to 200 g/m² but decreased for biomass greater than 200 g/m². When field-collected structural features were integrated with the selected VIs, the models showed improved performance, with R² and RMSE of the models reaching up to 0.82 and 61.67 g/m² respectively. Among the six VIs, SAVI showed the strongest impact on the model prediction for the best-performing RF and XGB regression models. The findings of this study demonstrate the potential of precisely estimating and mapping cereal rye biomass based on UAS-captured multispectral images. Timely information on cover crop growth can facilitate numerous decision-making processes, including planning the planting operations, and management of nutrients, weeds, and soil moisture to improve agronomic and environmental outcomes.

Improved estimation of rice aboveground biomass combining textural and spectral analysis of UAV imagery

Article 18 September 2018

Improved estimation of aboveground biomass in wheat from RGB imagery and point cloud data acquired with a low-cost unmanned aerial vehicle system

Article Open access 20 February 2019

LAI estimation across California vineyards using sUAS multi-seasonal multi-spectral, thermal, and elevation information and machine learning

Article 14 March 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Cover cropping, a widely advocated conservation practice in agriculture for improving soil and water quality, offers numerous benefits, including increased nitrogen (N) retention, soil erosion control, weed suppression, enhancement of soil microbiology, and an increase in overall soil organic matter (Adetunji et al., 2020; Daryanto et al., 2018; Finney et al., 2016). Timely information on cover crop growth is crucial for managing nutrient and water balances within a field (Abdalla et al., 2019; Meyer et al., 2022), helping farmers to make timely agronomic decisions to enhance both the profitability and sustainability of crop production. Traditional methods for assessing the growth and biomass of cover crops, such as on-field surveys involving destructive sampling of living biomass followed by measuring dry weight, are costly, time-consuming, and labor-intensive processes (Kaspar & Bakker, 2015). This limits data collection to fewer locations, potentially failing to accurately represent within-field variabilities and differences across larger geographic regions. In contrast, remote sensing (RS) technology provides a non-destructive, cost-effective, rapid, and scalable approach to estimating cover crop biomass by assessing spectral signatures of crops within a field and/or region.

Unmanned aerial system (UAS) based RS for crop monitoring has gained significant interest in recent years due to its operational ease, affordability, and capacity to capture high-resolution data. Specifically, UASs have been used to study various aspects of crop growth, including canopy characteristics, crop growth rate, plant density, growth stage detection, disease detection, biomass estimation, and yield estimation (Awais et al., 2021; Holman et al., 2016; Koh et al., 2019; Rosle et al., 2019; Ye et al., 2020; Yue et al., 2021; Zhu et al., 2022). Vegetation indices (VIs) based on multispectral images have been commonly used to estimate biomass among various cover crops including cereal rye (Secale cereal L.), triticale (Yuan et al., 2019), barley (Prabhakara et al., 2015), wheat (Jennewein et al., 2022), legumes (Roth & Streit, 2018), oats (Sharma et al., 2022), and alfalfa (Tang et al., 2021). For instance, Yuan et al. (2019) compared the linear relationships between four VIs, including the normalized difference vegetation index (NDVI), green ratio egetation index (GRVI), green NDVI (GNDVI), and triangular vegetation index (TVI), derived from a series of images and ground truth biomass of cereal rye and triticale before termination. Relationships between VIs and cover crop biomass varied among VIs and across different fields, with R² ranging from 0.42 to 0.93 and RMSE from 9.4 to 27.2%. Similarly, the NDVI, red edge infection point (REIP), and green red vegetation index (GRVI) showed correlations with the biomass of legumes and dicots in Switzerland, with R² ranging from 0.03 to 0.37 (Roth & Streit, 2018). The significance of VIs in capturing variabilities in biomass and crop growth can vary among different crops. This variation is attributed to factors such as photosynthetic pigments, water content, nutrient status, and several other physiological aspects. Furthermore, VIs can be sensitive to external environmental factors and crop growth stages (Vélez et al., 2023), as well as the spectral region utilized in deriving them. The latter often depends on the quality of sensors (Alckmin et al., 2019). Hence, the selection of an optimal set of VIs specific to crops and growth stages is crucial for establishing meaningful relationships between the spectral and biophysical properties of crops.

Application of VIs in estimating biomass however can present limitations mainly due to saturation of VIs at a higher biomass range (Jennewein et al., 2022; Prabhakara et al., 2015). In contrast, structural features, such as crop height and canopy cover, are less sensitive to higher biomass and thus can effectively capture variability in crop growth compared to VIs (Tang et al., 2021). Galán et al. (2020) combined agronomic traits such as grain yield and plant height with VIs in a linear regression model and obtained an R² of 0.77 when predicting 274 different cereal rye hybrids.

A majority of the prior works focused on cereal rye biomass estimation using VIs have extensively relied upon linear models. While linear models are valuable in understanding the interaction and making inferences, they are limited when it comes to learning complex non-linear relationships between variables. Therefore, there remains untapped potential in harnessing non-linear machine learning (ML) models for estimating cereal rye biomass using VIs.

In recent years, ML models have received increased interest in effectively modeling crop biophysical properties using RS variables. Sharma et al. (2022) developed four ML models, including partial least squares (PLS), support vector machine (SVM), random forest (RF), and artificial neural network (ANN), that integrated UAS collected multispectral data with 315 ground-truth observations from three oat fields in South Dakota, USA. The best model for oat biomass estimation varied across different sites and growth stages, suggesting the importance of careful examination of multiple ML algorithms. Bai et al. (2023) evaluated four ML models- RF, SVM, partial least squares (PLS), and ANN, to estimate above-ground biomass of multiple cover crop species based on a single-site experiment in Nebraska. In the study, the linear model, specifically PLS, demonstrated superior performance compared to non-linear models, underscoring the importance of linear models in some instances. In a similar study utilizing Planetscope satellite imagery and hyperspectral data collected via spectroradiometer, biomass estimation of mixed cover crop species was performed using an RF model. The model’s performance varied across different periods, with R² ranging from 0.36 to 0.61 (Kharel et al., 2023).

To the best of our knowledge, there is a noticeable lack of studies, specifically addressing cereal rye biomass estimation in on-farm fields across multiple growth stages using UAS images, while employing a variety of both linear and non-linear ML models. Therefore, the objective of this study is to develop and evaluate cereal rye biomass estimation models by integrating ground truth data collected from 15 on-farm fields representing various field conditions and growth stages using multitemporal UAS data in linear and non-linear ML frameworks. The specific objectives of this study are to (i) determine the optimal set of VIs for cereal rye biomass estimation, (ii) compare the performance of different ML models for cereal rye biomass estimation, and (iii) evaluate the potential application of structural features for cereal rye biomass estimation. We hypothesize that (i) selection of optimal VIs minimizes multicollinearity issues while also resulting in better model performance, (ii) ML models will outperform linear regression models in cereal rye biomass estimation, and (iii) structural features will outperform spectral features and their integration will improve model stability.

Methods

Study sites and experimental design

This study was conducted in Ohio’s portion of the Western Lake Erie basin (WLEB). WLEB is one of the most impaired watersheds in the Great Lakes region, where water quality issues driven by nutrient and sediment loading from agricultural lands are a major concern (Berry et al., 2017; Michalak et al., 2013). Fifteen fields, planted with cereal rye between late September and mid-November in the fall of 2020, were selected across 10 counties in the study area (Fig. 1). Of 15 fields, two were planted in corn and 13 in soybean, representing varying management practices (Table 1). Cereal rye was planted using a drilling technique in 13 of the 15 fields, and aerial seeding and broadcasting were implemented in two other fields. The weather conditions varied across the cover crop growing season, with differences in average temperature ranging from 6.2 to 19 ℃ and monthly total precipitation ranging from 19 to 151 mm among sites.

Table 1 Management practices for 15 farmers’ fields planted in cereal rye

Full size table

Data collection

Field data collection

Ground-truth data on soil properties, biomass weight, canopy cover, and crop height were collected from six locations per field for all fields by considering variability in soil properties present within a field. The approximate sampling locations were mapped and stored before heading out for data collection. After locating their positions on the field, sampling locations were marked and bounded using a 0.25 m² quadrat (Fig. 2b). At first, canopy cover was measured using the Canopeo mobile application while crop height was manually measured by using a measuring tape (Fig. 2c, d). Following that, all the cereal rye plants within the quadrat were cut up to the root and collected in a paper bag. To capture variability in cereal rye biomass growth, samples were collected up to three times in the spring of 2020—T1 (March 2 to March 9), T2 (March 29 to April 6), and T3 (April 26 to May 12). It is important to note that three fields were sampled only twice due to logistical issues while the remaining 12 fields were sampled three times. This resulted in a total of 252 ground-truth observations containing biomass information. Since samples were collected through destructive sampling, the locations across three times were kept apart but close enough to represent the same portion of the field. The biomass samples were then oven-dried at 55 ℃ and weighed to determine dry weight. The term cereal rye biomass hereafter refers to the total dry weight biomass obtained after drying the sampled biomass.

Biomass samples collected over three times—T1, T2, and T3 from 15 fields, were grouped into two categories—early and late planting, for the ANOVA test to understand variabilities in biomass based on the timing of cereal rye planting. Fields planted in late September to early October were grouped in early planting, and the fields planted in late October to early November were grouped in late planting.

Image acquisition and processing

Before every UAS flight, markers were placed around six sampling locations, with one marker at each sampling location, at every field site. These markers were later used in the georeferencing of Orthomosaic images, which helped in precisely locating sampling locations on the image and facilitating the subsequent extraction of spectral values corresponding to ground truth data (Fig. 3a). UAS flights were conducted using a DJI Phantom 4 multispectral (DJI Innovations, Shenzhen, China) to capture images in the visible and near-infrared regions (Blue [450 ± 16 nm], Green [560 ± 16 nm], Red [650 ± 16 nm], Red Edge [730 ± 16 nm], and Near-infrared [840 ± 26 nm]). Notably, this UAS differs from the standard DJI Phantom 4, featuring an RGB camera and a multispectral camera array comprising five cameras. UAS surveys were conducted at a speed of approximately 5 m/s at about 90 m above-ground surface, maintaining mostly 80% front and 70% side overlap in a lawnmower pattern, which resulted in a Ground Sampling Distance (GSD) of 4.8 cm in images. These images were then processed, stitched together, and rectified using an image processing software, Pix4Dmapper (Pix4D SA, Switzerland), at a spatial resolution of 6 cm/pixel.

Data extraction

Data extraction at six sampling locations within each field was completed using geopandas and rasterio packages in the Python platform. At first, a 0.5 m × 0.5 m region of interest (ROI) around each sampling location was created, which was then used to compute average spectral values within the ROI based on zonal statistics (Fig. 3b). Of 252 locations containing biomass data, 13 observations were removed from the analyses due to gaps in data, resulting from missing cover crop height and canopy cover data and stitching issues of multispectral images (Table 1). The extracted spectral values of five multispectral bands from images were then used for computing 13 commonly used VIs (Table 2), representing crop growth and biomass (Kross et al., 2015; Meng et al., 2013; Santin-Janin et al., 2009). For instance, NDVI has been found to better estimate low to medium crop biomass, while EVI and TVI are good predictors of biomass at a higher range (Alexandridis et al., 2020; Prabhakara et al., 2015). Green band-based indices such as GNDVI and NGRDI are more sensitive to N content and growth than the red band-based indices (Gitelson & Merzlyak, 1998; Hunt et al., 2005). SAVI is recognized for minimizing spectral responses influenced by background soil reflectance, especially in scenarios where biomass is sparse, resulting in minimal crop reflectance. It has demonstrated effective performance in accurately estimating crop biomass (Xu et al., 2018). Unlike NDVI, red edge-based indices were found to be more strongly correlated with biomass at a higher range due to their higher sensitivity to leaf chlorophyll content (Jennewein et al., 2022).

Table 2 Independent RS variables selected for cereal rye biomass estimation models

Full size table

Data preprocessing and model development

Feature selection

To develop a model for cereal rye biomass estimation, the derived VIs (Table 2) were used as independent variables and observed dry cereal rye biomass weight as a response variable. The 239 observations were initially split into 70% training (n = 151) and 30% testing (n = 88) datasets, with careful stratification by field, to avoid potential correlations between observations within a specific field. This approach resulted in 10 fields randomly assigned for training and 5 for testing. Feature scaling was performed on both training and test data by applying the min–max normalization technique as a part of the initial steps in model training and testing.

Before training and tuning of models, multicollinearity among VIs in the training dataset was assessed. The VIs, derived from various combinations of spectral bands in multispectral images, contained somewhat redundant information about crop growth, as evidenced by cross-correlation (Fig. 4). To enhance model stability and interpretation, a subset of VIs was selected using variance inflation factor (VIF) statistics. VIF measures the inflation of variance due to correlations among predictor variables and can be calculated for each predictor by regressing it with every other variable (Salmerón et al., 2020) (Eq. 1).

$$VIF = \frac{1}{{1 - R^{2} }}$$

(1)

where R² represents the coefficient of determination of the model. VIF < 10 indicates no multicollinearity while VIF ≥ 10 indicates serious multicollinearity between predictor variables. The calculation of VIF began with all 13 VIs, and iteratively, the VI with the highest VIF was removed until the best combination of VIs yielded scores below 10.

Machine learning models

A suite of regression algorithms, including multiple linear regression (MLR), elastic net (ENET), multivariate adaptive regression splines (MARS), SVM, RF, and extreme gradient boosting (XGB) was used to fit interactions among predictor and response variables. The entire modeling pipeline was conducted in Rstudio (Posit Software, Boston).

Linear regression is the simplest model which can be used as a starting point or as a base model for other advanced models. It uses the least squares method to estimate the best-fitting line. Meanwhile, ENET is a regularized form of linear regression that takes advantage of both ridge regression and LASSO regression for accurate prediction (Zou & Hastie, 2005). The ENET algorithm uses a weighted combination of both regularization techniques. While ENET aims to improve the linear regression model using regularization techniques, models such as MARS, SVM, RF, and XGB can be used to learn non-linear relationships.

MARS is an algorithm for automatically creating a piecewise linear model that provides an intuitive way of capturing nonlinearity in a dataset using polynomial and step functions, unlike linear regression (Friedman, 1991). The MARS model is tuned using two tuning parameters: the maximum degree of interactions (degree) and the number of terms (nprune) retained in the final model. The degree was set from 1 to 3 while nprune was started with 10 values between 1 and 100.

SVM minimizes the error by using a hyperplane and maximizing the margin between predicted and actual values (Smola & Schölkopf, 2004). SVM requires tuning of hyperparameters including the loss function (ε), penalty (c), and the kernel function (linear, polynomial, and radial kernel functions). During the tuning process, a sigma between 0.01 and 1 and cost between 0.1 and 2 were considered.

RF is one of the widely used ensemble models, which is expressed as an average of simple individual decision trees d_n(x) which are built using different sets of subsamples from the data (Breiman, 2001) (Eq. 2). Since the model is created from random bootstrap samples, it is considered to be robust in handling noise and minimizing overfitting issues (Fawagreh et al., 2014). Three different hyperparameters were considered for tuning: ntree (number of decision trees: between 100 and 1000), mtry (number of features at each split: between 1 to number of features present), and nodesize (minimum node size: 1 to 10).

$${\text{RF}}\left( {\text{x}} \right) = {\text{d}}_{{1}} \left( {\text{x}} \right) + {\text{d}}_{{2}} \left( {\text{x}} \right) + {\text{d}}_{{3}} \left( {\text{x}} \right) + \cdots + {\text{d}}_{{\text{n}}} \left( {\text{x}} \right)$$

(2)

XGB is an optimized version of a gradient-boosting algorithm that aims to improve the speed and accuracy of the algorithm (Chen & Guestrin, 2016). In this algorithm, unlike in RF, boosting is done to combine weak tree models with low accuracy to build strong models with better performance and accuracy. Four hyperparameters were tuned: ntrees (number of trees: between 100 and 1000), max_depth (complexity of the model: 3, 6, 8, 10, 15, 20, 30), eta (learning rate: 0.001, 0.01, 0.1), and gamma (pseudo-regularization: 0, 0.1, 0.5).

These six ML models were trained using a 10-fold cross-validation (CV) approach across 10 random seeds to better understand the sensitivity and variance of the models due to the randomness in selecting different seed values. During the 10-fold CV, hyperparameters associated with each model were tuned using a random grid search method (Table S1). A model instance with minimum RMSE across 10 runs for each ML model was selected for further analyses. To understand the significance of feature selection in improving model performance, models were compared for both full and selected set of VIs.

Model evaluation

After training the models, they were evaluated using the test dataset not used in model tuning. This was conducted based on three metrics, including R² (Eq. 3), root mean square error (RMSE) (Eq. 4), and mean absolute error (MAE) (Eq. 5). RMSE is the standard deviation of the prediction errors while MAE is the average of absolute differences between observed values and predicted values. They are expressed as:

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} (y_{i} - \hat{y}_{i} )^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} (y_{i} - \overline{y}_{i} )^{2} }}$$

(3)

$$RMSE = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} (y_{i} - \hat{y}_{i} )^{2} }$$

(4)

$$MAE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {y_{i} - \hat{y}_{i} } \right|$$

(5)

where, n is the total number of samples, y_i and ŷ_i are the observed and predicted values of sample i, ȳ represents the mean of observed values. The higher value of R² and lower value of RMSE and MAE represent the higher accuracy and precision of a model.

Spectral and structural features

To develop models of various combinations of VIs and crop structural properties in cereal rye biomass estimation, three approaches (M): M1 (based only on VIs), M2 (based only on structural features), and M3 (based on a combination of VIs and structural features) were used. These models were then compared to identify the ones that offer the best overall performance in estimating cereal rye biomass, along with determining the most effective independent variables.

Shapely Additive Explanations (SHAP) analysis (Lundberg & Lee, 2017) was used to investigate model behavior and explore the impact that each feature has on the model prediction of the target output (response variable). The impact of features on model behavior was investigated using bar and summary plots. These SHAP values were computed in R utilizing the kernelshap function available in the kernelshap package. It is however important to emphasize that SHAP values do not imply causation but rather aid in explaining the behavior of the model concerning prediction values of the output variable.

Results/findings

Cover crop biomass over time

The cereal rye biomass varied across the sampling period from early to late spring of 2021. The average biomass of 15 fields changed from 17.92 g/m² in March to 189.57 g/m² in May. Along with an increase in biomass, the variability in biomass growth increased significantly, with a standard deviation of 22.12 g/m² in early March (T1) to 42.41 g/m² in later March and early April (T2) to 187 g/m² in late April and early May (T3) (Fig. 5). The increase in cereal rye biomass in early spring compared to winter can be attributed to the warming of the weather, with an average monthly temperature that ranged between 6.5 ℃ in March to 10.7 ℃ in April to 15 ℃ in May. Fields that were planted early had significantly higher biomass than fields that were planted late at p < 0.05 based on the ANOVA test.

Feature selection

During the screening of thirteen VIs to address multicollinearity issues using the VIF approach, NDVI was removed in step 1, followed by SR_RE in step 2, TVI in step 3, R_G_ratio in step 4, GNDVI in step 5, EVI in step 6, and SR in step 7, due to their higher VIF scores (Fig. 6). Throughout these steps, SAVI, B_G_ratio, and ExG consistently exhibited lower VIF scores, indicating their reduced correlation with the remaining VIs. Ultimately, in the final iteration, six VIs including ExG, NGRDI, SAVI, B_G_ratio, RTVI, and NDRE, with a VIF score below 10, were selected for further model comparisons.

Cereal rye biomass estimation models

Models based on VIs

For the RF model utilizing the full set of VIs, the average values of R², RMSE, and MAE were 0.52, 94.07 g/m², and 56.13 g/m², respectively. Conversely, when trained on the selected VIs, RF achieved slightly improved performance, with average values of 0.55 for R², 90.48 g/m² for RMSE, and 54.11 g/m² for MAE. Similarly, for XGB, the corresponding average values were 0.49, 97.09 g/m², and 59.58 g/m² for the full set, and 0.52, 94.77 g/m², and 57.36 g/m² for the selected VIs showing improvement with the selected subset (Fig. 7).

Meanwhile, MLR and ENET demonstrated comparable performance across both the full set and selected VIs. MLR achieved average R² of 0.46 and 0.47, average RMSE of 98.62 g/m² and 96.89 g/m², and average MAE of 65.43 g/m² and 65.04 g/m² for the full set and selected VIs, respectively. Similarly, ENET yielded average R² values of 0.47 and 0.50, average RMSE of 95.49 g/m² and 95.78 g/m², and average MAE of 61.78 g/m² and 63.41 g/m² for the full set and selected VIs, respectively. While R² seemed to be better for the selected subset, RMSE was in a similar range and MAE was even higher. Therefore, there was no clear evidence for a stronger performance for either the full set or selected subset in the case of ENET.

Conversely, MARS and SVM models exhibited stronger performance when trained on the full set of VIs compared to the selected VIs. MARS achieved average R² values of 0.50 and 0.38, average RMSE of 93.72 g/m² and 98.97 g/m², and average MAE of 57.65 g/m² and 63.74 g/m² for the full set and selected VIs, respectively. Similarly, SVM achieved R² values of 0.52 and 0.44, average RMSE of 95.05 g/m² and 97.24 g/m², and average MAE of 61.80 g/m² and 52.03 g/m² for the full set and selected VIs, respectively. While R² of the SVM model based on the full set of features was higher than that of the selected VIs, MAE did not decrease. For further analyses, the best-performing models based on six selected VIs were considered.

During the assessment of models using an independent test dataset on six selected VIs, ENET, RF, and XGB outperformed the MLR. XGB exhibited the highest R² (0.67) and lowest RMSE (83.13 g/m²) and MAE (48.13 g/m²), followed by RF (R² = 0.59, RMSE = 91.89 g/m², MAE = 53.93 g/m²) and ENET (R² = 0.55, RMSE = 96.77 g/m², MAE = 61.50 g/m²) (Fig. 8). MLR showed an R² of 0.54, RMSE of 97.63 g/m² and MAE of 64.40 g/m². Conversely, MARS and SVM exhibited poor performance compared to the MLR model, with the lowest R² (0.24), RMSE (126.33 g/m²), and MAE (72.83 g/m²) associated with MARS and R² (0.49), RMSE (102.97 g/m²), and MAE (59.99 g/m²) associated with SVM. Figure S.1 in the supporting document shows a comparison between the ground truth and predicted cereal rye biomass across 88 observations in five test fields organized three times representing trends in cereal rye biomass growth.

While all models demonstrated strong performance for biomass ranges below 200 g/m², the prediction error notably increased at higher biomass values greater than 200 g/m² as the growing season progressed (Figs. 8, 9). RF (RMSE = 185 g/m²) and XGB (RMSE = 156.91 g/m²) had lower prediction errors at higher biomass ranges, in contrast to MLR (RMSE = 196.60 g/m²), ENET (RMSE = 201.92 g/m²), MARS (RMSE = 282.75 g/m²), and SVM (RMSE = 223.44 g/m²) (Table 3). Nonetheless, the prediction error remained relatively high, particularly in comparison to lower cereal rye biomass values (59.72 g/m² for RF and 59.67 g/m² for XGB). This discrepancy highlights the limitations of relying solely on spectral VIs for predicting cereal rye biomass, especially at higher biomass ranges. Notably, RF and XGB showcased superior performance when utilizing the selected group of VIs compared to all the other models. Consequently, we exclusively employed these two best-performing models with the selected subset of six VIs for further analyses.

Table 3 Errors associated with models across three groups of observations, including (a) all observations, (b) observations with observed biomass less than or equal to 200 g/m², and (c) observations with biomass greater than 200 g/m²

Full size table

Integration of structural features

Models based on canopy cover and crop height

The best-performing RF and XGB models with field-collected canopy cover and crop height were observed to perform significantly better than their VI-based versions. In an independent accuracy assessment, RMSE decreased from 91.89 to 57.49 g/m² and R² values increased from 0.59 to 0.84 for RF whereas RMSE decreased from 83.13 to 59.94 g/m² and R² values increased from 0.67 to 0.83 for XGB (Figs. 8, 10) when compared with the performance of models based exclusively on VIs. Notably, with structural features, the models show a better fit for a higher biomass range. RF and XGB showed RMSE of 114.15 and 110.10 g/m² respectively for data points with observed biomass greater than 200 g/m². This is a 38.3% and 27.3% decrease compared to the RMSE observed with RF and XGB models with only VIs at biomass greater than 200 g/m².

Models based on VI, canopy cover, and crop height

When field-collected canopy cover and crop height variables were combined with the selected VIs, the models performed better than VIs-based models but not compared to the models that relied only on canopy cover and crop height (M2). During the testing phase, the RF model had an R² of 0.82, with corresponding RMSE and MAE values of 61.67 and 33.18 g/m² respectively, while for the XGB model, the R² was 0.78, and the RMSE and MAE were 67.19 and 39.37 g/m², respectively (Fig. 11). These results signify a notable improvement, with a 39% and 16% increase in R², a 32% and 19% decrease in RMSE, and a 38% and 18% decrease in MAE for RF and XGB, respectively, compared to their VIs-based counterparts (i.e. M1). Notably, the improvement was more pronounced for the RF model than the XGB. Similarly, the error went down for both models at a higher biomass range compared to the VIs-based models. For biomass higher than 200 g/m², RMSE decreased by 26% and 22% for RF and XGB, respectively. This shows significant improvement in model fit in resolving issues encountered with the VIs with the inclusion of canopy cover and crop height. The performance of these models however was slightly lower than the same models when trained solely on structural features, such as canopy cover and crop height. This emphasizes the significance of considering the potential concerns associated with the use of VIs and underscores the importance of incorporating structural features in biomass estimation. One thing to note here is that structural features were derived from field measurements while spectral VIs were based on multispectral RS images. Further elaboration on this topic has been provided in the discussion section.

Feature importance

Based on the SHAP analyses of the best-performing RF and XGB regression models for cereal rye biomass prediction using only VIs, the variable that had the overall strongest impact on model prediction was SAVI. While the ranking of the three next important variables—NDRE, NGRDI, and B_G_ratio- varied between the two models, they consistently appeared in the list of top four variables (Fig. 12). Across both models, higher values of SAVI, NDRE, and NGRDI are in light orange (dots in the figure) and indicate a positive impact on cereal rye biomass, while higher values of B_G_ratio (purple dots) have a negative impact on cereal rye biomass prediction. The importance of features according to Shapley values as global feature contribution by the mean SHAP value across samples in the training set is provided in the supporting document (Figure S.2).

Based on the SHAP analyses of both RF and XGB models consisting of VIs and structural features, crop height had the overall strongest impact on model prediction behavior followed by canopy cover (Fig. 13; Figure S.3). This outcome is consistent with the enhanced performance of models considering only these two structural variables. An increase in both crop height and canopy cover increased cereal rye biomass. After the inclusion of these two variables, the impact of the four VIs that were ranked high in the VIs-based models diminished, with SAVI consistently emerging as the strongest VI. The ranking of other VIs, such as NDRE, B_G_ratio, and NGRDI, were inconsistent and varied between models compared to the rankings observed in VI-based models.

Discussion

Importance of feature selection

In this study, we took multiple modeling approaches to predict cereal rye biomass. Given our objective to find the optimal set of VIs for cereal rye biomass prediction, our focus was on comparing models that incorporated all available VIs with those utilizing a limited set of VIs. The models built using only the six selected VIs performed better than or similar to the models with a full set of 13 VIs most of the time (Fig. 7). This supports our hypothesis that the selection of optimal VIs minimizes multicollinearity issues while also resulting in better model performance. While non-linear models like RF and XGB can handle multi-collinearity, prior studies focused on above-ground biomass prediction (Adame-Campos et al., 2019; Bai et al., 2023Kuhn & Johnson, 2013) have reported findings similar to ours, emphasizing that the feature selection is crucial in improving model interpretability, reducing overfitting, reducing computation efficiency, improving generalization, and reducing the noise in the dataset.

In the study, we used the VIF approach to identify and remove highly correlated variables to address multicollinearity. However, it is important to recognize that there exist various other common approaches for feature selection, including stepwise regression, ranking- and wrapper-based selection, which can yield different outcomes compared to the VIF approach. This discrepancy arises due to the distinct criteria and algorithms employed by each method, along with their varying assumptions about the data. For instance, stepwise regression iteratively adds or removes variables based on statistical significance and thus can be sensitive to the order of variable inclusion or exclusion. Wrapper-based methods evaluate subsets of features using predictive performance and thus can perform well on the training data but may not generalize well to unseen data. Ranking-based methods, on the other hand, prioritize features based on individual importance measures but can include variables with redundant information when considered together in the model. In comparison, the VIF approach explicitly addresses multicollinearity by quantifying the correlation between predictors. Being aware of these differences is crucial during modeling workflow, ultimately leading to more robust and reliable model outcomes.

Spectral vs structural features in biomass prediction

In our study, we observed a notable improvement in model performance when structural features such as crop height and canopy cover were included. Consistent with our hypothesis, models with structural features outperformed the models based solely on VIs. Due to the greater sensitivity of canopy cover and crop height to higher biomass compared to VIs, VI-based features did not provide much additional information. This could explain the inflated standard errors and lower R² values observed in models that considered both VIs and structural properties (Figs. 10 and 11) compared to models based solely on structural properties.

Based on the SHAP analysis of the best-performing RF and XGB models for biomass estimation in our study, SAVI consistently emerged as the most influential feature, followed by B_G_ratio, NGRDI, and NDRE. This finding aligns well with the findings of prior studies where SAVI performed superior in estimating various crop biophysical characteristics compared to other VIs (Venancio et al., 2020; Xu et al., 2018). For instance, Venancio et al. (2020) evaluated spectral VIs for estimating corn green biomass (i.e. total amount of above-ground vegetative part excluding corn grain) and identified SAVI as one of the best-performing among nine VIs. Similarly, Xu et al. (2018) demonstrated a robust linear relationship (i.e. R² = 0.77) between SAVI and cover crop biomass including cereal rye.

This distinctive robustness of SAVI could be attributed to its ability to minimize data noise arising from soil reflectance, achieved through its incorporation of a soil correction factor. This enhances its suitability for providing reliable estimates, particularly under conditions of sparse vegetation. Notably, our original data consists of all pixels within a sampling region, further supporting the robustness of the SAVI-based model. In addition to SAVI, NDRE emerged as a noteworthy contributor in both RF and XGB models, a finding consistent with previous studies (Amaral et al., 2015; Kanke et al., 2016; Wang et al., 2016). NDRE has been found to exhibit greater sensitivity than NDVI as a biomass predictor (Amaral et al., 2015; Wang et al., 2016). Such phenomena can be attributed to the stronger relationships of red-edge-based VIs compared to red-based VIs with agronomic attributes (Kanke et al., 2016; Peng & Gitelson, 2012) due to the higher sensitivity of the red-edge spectrum to change in chlorophyll than the red portion (Filella & Penuelas, 1994; Lichtenthaler et al., 1996).

In the study, we observed reduced sensitivity of models in the prediction of cereal rye biomass, particularly those above 200 g/m². Models that used VIs consistently underestimated biomass exceeding 200 g/m². These findings align with prior studies that estimated crop biomass using RS-derived VIs (Chen et al., 2009; Jennewein et al., 2022; Mutanga & Skidmore, 2004; Nguy-Robertson et al., 2012; Prabhakara et al., 2015; Thenkabail et al., 2000; Thieme et al., 2020) and associated the poor performance of models with the saturation of VIs. The reduced sensitivity of VIs to higher biomass is predominantly linked to saturation in light reflectance within specific bands such as blue, green, red, and NIR (Chen et al., 2009; Thenkabail et al., 2000). Specifically, in a healthy crop canopy, there is high reflectance of light in the NIR and green regions, coupled with strong absorption in the red and blue regions. As the canopy cover increases, light absorption in the red and blue regions peaks, while reflectance in the NIR and green regions continues to rise. Consequently, this disparity in reflectance changes results in a minimal or no increase in overall VIs value, even with a further increase in biomass.

Considering the saturation effect of VIs in biomass assessment biomass during peak growing seasons, it becomes imperative to consider other proxies such as canopy cover and height that provide greater sensitivity in high biomass situations, ensuring more reliable biomass estimation. This aspect was demonstrated in our study, where canopy cover and height consistently outperformed VIs. This finding was evident in the SHAP analyses of both RF and XGB models, which included VIs and structural features (Fig. 13). Between canopy height and cover, crop height contributed more to the model performance, possibly due to saturation issues associated with canopy cover. Some of the challenges of considering canopy cover in biomass estimation were discussed by Prabhakara et al. (2015) that was focused on estimating the biomass of various cover crop species including cereal rye. The study observed that canopy cover saturated at a biomass of around 150 g/m². Their findings suggested that estimating biomass with canopy cover as a predictor becomes challenging once it reaches 75% or more.

Recognizing the importance of these structural properties, studies in recent years have included these factors in biomass estimation. Roth and Streit (2018) employed spectral and structural features derived from visual and NIR images collected using UAS to assess legumes and dicots’ biomass in Switzerland. They observed that crop height exhibited the highest correlation (R² = 0.58 to 0.74) with biomass compared to VIs (R² = 0.03 to 0.37). However, even with the inclusion of structural features, most studies demonstrated limited predictive accuracy. Tang et al. (2021) used crop height, canopy area, NDRE, and NGRDI in a linear regression model for alfalfa biomass estimation and achieved R² values ranging from 0.51 to 0.71. Similarly, Galán et al. (2020) utilized crop height, grain yield, and six selected VIs (GNDVI, Lichtenthaler index 3 (LI3), green leaf index (GLI), SR, Photochemical reflectance index (PhoRI), and pigment-specific simple ratio chlorophyll a (PSSRa)) in a linear regression model for cereal rye prediction and observed an R² of 0.77.

While our study showcased improved model performance by integrating spectral VIs and structural features compared to previous research, it's worth noting that the features such as canopy cover and crop height included in the model were field-based, meaning they were manually collected, in contrast to these studies (Dhakal et al., 2023; Liu et al., 2023; Xu et al., 2022) that considered crop height information derived using visual or multispectral images collected via UAS.

It is also worth noting that previous studies have primarily focused on extracting crop height information using the structure from motion (SFM) in two ways: (i) process images collected at two-time stamps: one during bare soil conditions (digital terrain model (DTM)) and another during leaf-on conditions (digital surface model (DSM)) (Liu et al., 2023) (ii) identify the bare soil surface present within a field to interpolate ground surface (Dhakal et al., 2023; Xu et al., 2022). However, this process may not always be feasible, especially in areas with dense vegetation cover or when there is a lack of bare soil images for reference DTM. Additionally, the quality of DTM and DSMs generated through SFM reconstruction is often influenced by environmental conditions, such as lighting (sunny vs. cloudy days) during image acquisition, and sensor characteristics (Feng et al., 2021; Tmušić et al., 2020). Although we lacked images of fields in bare soil conditions to utilize image-derived height information in our study, we believe that the SFM reconstruction process can serve as an effective alternative in situations where sensors capable of capturing accurate crop height information, such as LIDAR, are unavailable.

Performance of biomass estimation model

Among the six ML regression models, RF and XGB were found to estimate cereal rye biomass with low error and better explain variance in low and high ranges than MLR, ENET, MARS, and SVM. This is consistent with prior studies that have consistently shown that RF and XGB models fitted with RS-derived features outperform other regression models, including MLR, SVM, artificial neural network (ANN), and partial least squares regression (PLSR) in estimating crop yield and biomass (Geng et al., 2021; Han et al., 2019; Meng et al., 2013; Tian et al., 2021; Wang et al., 2022a, 2022b; Zhang et al., 2021). For instance, Han et al. (2019) found RF to be the most stable model in estimating maize biomass in a plot scale study, while Tian et al. (2021) reported RF and XGB as superior in estimating above-ground mangrove biomass. Additionally, Zhang et al. (2021) demonstrated that XGB surpassed stepwise and RF regression in predicting maize biomass. Despite traditional parametric models offering better interpretability and ease of implementation, non-parametric models like RF and XGB tend to outperform them due to their ability to capture non-linear interactions among variables (Meng et al., 2020).

The accuracy of the models based on spectral VIs in our study aligns well with previous research that focused on utilizing solely RS variables to estimate cover crop biomass, particularly, cereal rye. Our best-performing models, RF (R² = 0.59, RMSE = 91.89 g/m², MAE = 53.93 g/m²) and XGB (R² = 0.67, RMSE = 83.13 g/m², MAE = 48.13 g/m²), demonstrated robust outcomes across both lower (0 to 200 g/m²) and higher biomass ranges (> 200 g/m²) across the cereal rye growing season. In a similar study aimed at estimating cover crop biomass, Prabhakara et al. (2015) reported R² values ranging from 0.26 to 0.40 and 0.40 to 0.63 using linear regression and log-linear regression model, respectively, with 10 VIs for five cover crop species including cereal rye, based on field samples collected throughout the cover crop growing season. Similar to results in our study, when the data were limited to early sampling dates, the regression between the VIs and the biomass improved significantly with R² values ranging from 0.59 to 0.84 and 0.63 to 0.86 for linear and log-linear regression models, respectively. This could be due to reduction in VI saturation occuring in high biomass at later growing stages. Another study employed a simple linear regression approach using four VIs and estimated cereal rye and triticale biomass with validation R² ranging from 0.47 to 0.93 (Yuan et al., 2019). However, the overall performance of this method was constrained by a limited biomass range, reaching only up to 140 g/m², which likely did not encounter issues related to VI saturation unlike in our study with biomass ranging up to 634 g/m².

To improve the estimation of springtime cereal rye biomass, Jennewein et al. (2022) integrated VIs with synthetic aperture radar (SAR) imagery-derived features in a log-linear model but achieved limited success with R² of 0.38 and RMSE of 77.2 g/m². In a two-year field experiment involving the estimation of five cover crop species conducted in Nebraska, Bai et al. (2023) evaluated the performance of four ML models (RF, SVM, PLS, and ANN) using only spectral VIs. They reported R² values ranging from 0.62 to 0.77, with higher RMSE values ranging from 106.3 to 136.5 g/m². Similarly, Kharel et al. (2023) reported biomass predictions for mixed species using RF models on individual spectral bands present in Planetscope image and the derived VIs within a similar range (R² = 0.25 to 0.61). By integrating a large set of datasets coming from multiple on-farm fields and multiple growth stage data, as well as employing various ML models coupled with a variable selection process, the VI-based models in our study achieved relatively superior performance compared to prior studies. This underscores the robustness of our approach in the context of cereal rye biomass estimation and supports our hypotheses regarding the superior performance of ML models. These models, combined with RS observations, can be utilized to comprehend the spatial variability of cereal rye growth within a field (e.g. Fig. 9) for precision management of planting, weeds, nutrients, and moisture.

Limitations and future work

In this study, our focus was on identifying VIs and developing models leveraging VIs and/or structural properties suitable to predict cereal rye biomass. We utilized a total of 239 ground-truth data collected from 15 farmers' fields ranging in size from 9 to 78 acres, spread across the northwestern part of Ohio. While our analyses provided useful insights into the importance of VIs and structural properties in cereal rye biomass prediction under diverse field conditions compared to prior studies that mostly relied on plot-based or limited on-farm data, we think data from more fields and multiple years could further enhance the model's generalizability and robustness.

Saturation of VIs as we have observed in the study can pose a known limitation for the application of VIs in biomass estimation. Therefore, it is necessary to assess such saturation effects of VIs to avoid the underestimation of biomass. Since structural properties such as crop height are less sensitive to saturation, efforts need to be directed toward inclusion of them whenever feasible.

Our ability to create DTM which is essential for computing crop height based on the SFM technique was limited by the lack of images representing bare soil conditions and the absence of bare soil surface present within cereal rye fields at later stages of data collection (T2 to T3). Future efforts should be directed towards developing new and/or employing existing methods that leverage RGB and/or multispectral images to accurately derive crop height information, which can then be integrated into the model for robust estimation of crop biomass. Furthermore, special consideration should be given to ensure quality aerial multispectral images for remote sensing applications in agricultural fields. This will make certain that the orthomosaic stitching process goes smoothly and the data gap does not persist.

Conclusion

This study explores the potential of using UAS multispectral imagery for the estimation of cereal rye biomass using data collected over three growth stages of cereal rye at 15 on-farm fields in northwest Ohio, USA. It also investigates the significance of different combinations of VIs and crop structural properties for predicting cereal rye biomass using six regression algorithms- MLR, ENET, MARS, SVM, RF, and XGB. The performance of these models was evaluated to gauge their effectiveness in biomass prediction. The main findings of this study include:

Feature selection based on VIF scoring resulted in the selection of six VIs out of a total of 13 VIs. The selected VIs included ExG, NGRDI, SAVI, B_G_ratio, RTVI, and NDRE.
The models built using only the six selected VIs performed better than or similar to the models with a full set of 13 VIs most of the time with R² ranging from 0.24 to 0.59 and RMSE ranging from 83.13 to 91.89 g/m² during cross-fold validation.
During the independent assessment of VI-based models using the test dataset, XGB exhibited the highest R² (0.67) and lowest RMSE (83.13 g/m²) and MAE (48.13 g/m²), followed by RF and ENET.
All ML models provided reasonable estimates for biomass at a range less than or equal to 200 g/m² but were underestimated when predicting biomass greater than 200 g/m².
When field-collected structural features such as canopy cover and crop height were combined with the selected VIs, the models performed better than VIs-based models but not compared to the models that relied only on canopy cover and crop height.
Based on the SHAP analyses of the best-performing RF and XGB regression models for cereal rye biomass prediction using only VIs, the variable that had the overall strongest impact on model prediction was SAVI. While the ranking of the three next important variables—NDRE, NGRDI, and B_G_ratio- varied between the two models, they consistently appeared in the list of top four variables.

The methodology developed in this study demonstrates the potential for the development of a tool for farmers that may facilitate precise estimation and mapping of cereal rye biomass. Timely information on cover crop growth is crucial for the production of cash crops following cover crops for numerous decision-making purposes, including planning the planting operations, and management of nutrients, weeds, and soil moisture to improve agronomic and environmental outcomes.

Data availability

Data will be made available on request.

References

Abdalla, M., Hastings, A., Cheng, K., Yue, Q., Chadwick, D., Espenberg, M., Truu, J., Rees, R. M., & Smith, P. (2019). A critical review of the impacts of cover crops on nitrogen leaching, net greenhouse gas balance and crop productivity. Global Change Biology, 25(8), 2530–2543. https://doi.org/10.1111/gcb.14644
Article PubMed PubMed Central Google Scholar
Adame-Campos, R. L., Ghilardi, A., Gao, Y., Paneque-Gálvez, J., & Mas, J.-F. (2019). Variables selection for aboveground biomass estimations using satellite data: A comparison between relative importance approach and stepwise Akaike’s information criterion. ISPRS International Journal of Geo-Information. https://doi.org/10.3390/ijgi8060245
Article Google Scholar
Adetunji, A. T., Ncube, B., Mulidzi, R., & Lewu, F. B. (2020). Management impact and benefit of cover crops on soil quality: A review. Soil and Tillage Research, 204, 104717. https://doi.org/10.1016/j.still.2020.104717
Article Google Scholar
Alckmin, G., Kooistra, L., Lucieer, A., & Rawnsley, R. (2019). Feature filtering and selection for dry matter estimation on perennial ryegrass: A case study of vegetation indices. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42, 1827–1831.
Article Google Scholar
Alexandridis, T. K., Ovakoglou, G., & Clevers, J. G. P. W. (2020). Relationship between MODIS EVI and LAI across time and space. Geocarto International, 35(13), 1385–1399. https://doi.org/10.1080/10106049.2019.1573928
Article Google Scholar
Amaral, L. R., Molin, J. P., Portz, G., Finazzi, F. B., & Cortinove, L. (2015). Comparison of crop canopy reflectance sensors used to identify sugarcane biomass and nitrogen status. Precision Agriculture, 16(1), 15–28. https://doi.org/10.1007/s11119-014-9377-2
Article Google Scholar
Awais, M., Li, W., Cheema, M. J. M., Hussain, S., AlGarni, T. S., Liu, C., & Ali, A. (2021). Remotely sensed identification of canopy characteristics using UAV-based imagery under unstable environmental conditions. Environmental Technology & Innovation, 22, 101465. https://doi.org/10.1016/j.eti.2021.101465
Article Google Scholar
Bai, G., Koehler-Cole, K., Scoby, D., Thapa, V. R., Basche, A., & Ge, Y. (2023). Enhancing estimation of cover crop biomass using field-based high-throughput phenotyping and machine learning models. Frontiers in Plant Science, 14, 1277672. https://doi.org/10.3389/fpls.2023.1277672
Article PubMed Google Scholar
Berry, M. A., Davis, T. W., Cory, R. M., Duhaime, M. B., Johengen, T. H., Kling, G. W., Marino, J. A., Den Uyl, P. A., Gossiaux, D., Dick, G. J., & Denef, V. J. (2017). Cyanobacterial harmful algal blooms are a biological disturbance to Western Lake Erie bacterial communities. Environmental Microbiology, 19(3), 1149–1162. https://doi.org/10.1111/1462-2920.13640
Article CAS PubMed Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Broge, N. H., & Leblanc, E. (2001). Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density. Remote Sensing of Environment, 76(2), 156–172. https://doi.org/10.1016/S0034-4257(00)00197-8
Article Google Scholar
Cao, Q., Miao, Y., Shen, J., Yu, W., Yuan, F., Cheng, S., Huang, S., Wang, H., Yang, W., & Liu, F. (2016). Improving in-season estimation of rice yield potential and responsiveness to topdressing nitrogen application with Crop Circle active crop canopy sensor. Precision Agriculture, 17(2), 136–154. https://doi.org/10.1007/s11119-015-9412-y
Article Google Scholar
Chen, J., Gu, S., Shen, M., Tang, Y., & Matsushita, B. (2009). Estimating aboveground biomass of grassland having a high canopy cover: An exploratory analysis of in situ hyperspectral data. International Journal of Remote Sensing, 30(24), 6497–6517. https://doi.org/10.1080/01431160902882496
Article Google Scholar
Chen, P., Tremblay, N., Wang, J., Philippe, V., Huang, W., & Li, B. (2010). New index for crop canopy fresh biomass estimation. Spectroscopy and Spectral Analysis. https://doi.org/10.3964/j.issn.1000-0593(2010)02-0512-06
Article PubMed Google Scholar
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. https://doi.org/10.1145/2939672.2939785
Daryanto, S., Fu, B., Wang, L., Jacinthe, P. A., & Zhao, W. (2018). Quantitative synthesis on the ecosystem services of cover crops. Earth-Science Reviews, 185(June), 357–373. https://doi.org/10.1016/j.earscirev.2018.06.013
Article CAS Google Scholar
Dhakal, R., Maimaitijiang, M., Chang, J., & Caffe, M. (2023). Utilizing spectral, structural and textural features for estimating oat above-ground biomass using UAV-based multispectral data and machine learning. Sensors. https://doi.org/10.3390/s23249708
Article PubMed PubMed Central Google Scholar
Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: From early developments to recent advancements. Systems Science & Control Engineering, 2(1), 602–609.
Article Google Scholar
Feng, L., Chen, S., Zhang, C., Zhang, Y., & He, Y. (2021). A comprehensive review on recent applications of unmanned aerial vehicle remote sensing with various sensors for high-throughput plant phenotyping. Computers and Electronics in Agriculture, 182, 106033. https://doi.org/10.1016/j.compag.2021.106033
Article Google Scholar
Filella, I., & Penuelas, J. (1994). The red edge position and shape as indicators of plant chlorophyll content, biomass and hydric status. International Journal of Remote Sensing, 15(7), 1459–1470. https://doi.org/10.1080/01431169408954177
Article Google Scholar
Finney, D. M., White, C. M., & Kaye, J. P. (2016). Biomass production and carbon/nitrogen ratio influence ecosystem services from cover crop mixtures. Agronomy Journal, 108(1), 39–52. https://doi.org/10.2134/agronj15.0182
Article CAS Google Scholar
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67. https://doi.org/10.1214/aos/1176347963
Article Google Scholar
Galán, R. J., Bernal-Vasquez, A.-M., Jebsen, C., Piepho, H.-P., Thorwarth, P., Steffan, P., Gordillo, A., & Miedaner, T. (2020). Hyperspectral reflectance data and agronomic traits can predict biomass yield in winter rye hybrids. BioEnergy Research, 13(1), 168–182. https://doi.org/10.1007/s12155-019-10080-z
Article Google Scholar
Gamon, J., & Surfus, J. (1999). Assessing leaf pigment content and activity with a reflectometer. The New Phytologist, 143(1), 105–117.
Article CAS Google Scholar
Geng, L., Che, T., Ma, M., Tan, J., & Wang, H. (2021). Corn biomass estimation by integrating remote sensing and long-term observation data based on machine learning techniques. Remote Sensing. https://doi.org/10.3390/rs13122352
Article Google Scholar
Gitelson, A., & Merzlyak, M. (1998). Remote sensing of chlorophyll concentration in higher plant leaves. Advances in Space Research, 22(5), 689–692. https://doi.org/10.1016/S0273-1177(97)01133-2
Article CAS Google Scholar
Gitelson, A., & Merzlyak, M. N. (1994). Quantitative estimation of chlorophyll-a using reflectance spectra: Experiments with autumn chestnut and maple leaves. Journal of Photochemistry and Photobiology b: Biology, 22(3), 247–252. https://doi.org/10.1016/1011-1344(93)06963-4
Article CAS Google Scholar
Han, L., Yang, G., Dai, H., Xu, B., Yang, H., Feng, H., Li, Z., & Yang, X. (2019). Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods, 15(1), 1–19. https://doi.org/10.1186/s13007-019-0394-z
Article CAS Google Scholar
Holman, F. H., Riche, A. B., Michalski, A., Castle, M., Wooster, M. J., & Hawkesford, M. J. (2016). High throughput field phenotyping of wheat plant height and growth rate in field plot trials using UAV based remote sensing. Remote Sensing. https://doi.org/10.3390/rs8121031
Article Google Scholar
Huete, A. (1988). A soil-adjusted vegetation index (SAVI). Remote Sensing of Environment, 25(3), 295–309. https://doi.org/10.1016/0034-4257(88)90106-X
Article Google Scholar
Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., & Ferreira, L. G. (2002). Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment, 83(1), 195–213. https://doi.org/10.1016/S0034-4257(02)00096-2
Article Google Scholar
Hunt, E. R., Cavigelli, M., Daughtry, C. S. T., Mcmurtrey, J. E., & Walthall, C. L. (2005). Evaluation of digital photography from model aircraft for remote sensing of crop biomass and nitrogen status. Precision Agriculture, 6(4), 359–378. https://doi.org/10.1007/s11119-005-2324-5
Article Google Scholar
Jennewein, J. S., Lamb, B. T., Hively, W. D., Thieme, A., Thapa, R., Goldsmith, A., & Mirsky, S. B. (2022). Integration of satellite-based optical and synthetic aperture radar imagery to estimate winter cover crop performance in cereal grasses. Remote Sensing. https://doi.org/10.3390/rs14092077
Article Google Scholar
Jordan, C. F. (1969). Derivation of leaf-area index from quality of light on the forest floor. Ecology, 50(4), 663–666. https://doi.org/10.2307/1936256
Article Google Scholar
Kanke, Y., Tubaña, B., Dalen, M., & Harrell, D. (2016). Evaluation of red and red-edge reflectance-based vegetation indices for rice biomass and grain yield prediction models in paddy fields. Precision Agriculture, 17(5), 507–530. https://doi.org/10.1007/s11119-016-9433-1
Article Google Scholar
Kaspar, T. C., & Bakker, M. G. (2015). Biomass production of 12 winter cereal cover crop cultivars and their effect on subsequent no-till corn yield. Journal of Soil and Water Conservation, 70(6), 353–364. https://doi.org/10.2489/jswc.70.6.353
Article Google Scholar
Kharel, T. P., Bhandari, A. B., Mubvumba, P., Tyler, H. L., Fletcher, R. S., & Reddy, K. N. (2023). Mixed-species cover crop biomass estimation using planet imagery. Sensors. https://doi.org/10.3390/s23031541
Article PubMed PubMed Central Google Scholar
Koh, J. C. O., Hayden, M., Daetwyler, H., & Kant, S. (2019). Estimation of crop plant density at early mixed growth stages using UAV imagery. Plant Methods, 15(1), 64. https://doi.org/10.1186/s13007-019-0449-1
Article CAS PubMed PubMed Central Google Scholar
Kross, A., McNairn, H., Lapen, D., Sunohara, M., & Champagne, C. (2015). Assessment of RapidEye vegetation indices for estimation of leaf area index and biomass in corn and soybean crops. International Journal of Applied Earth Observation and Geoinformation, 34, 235–248. https://doi.org/10.1016/j.jag.2014.08.002
Article Google Scholar
Lichtenthaler, H. K., Gitelson, A., & Lang, M. (1996). Non-destructive determination of chlorophyll content of leaves of a green and an aurea mutant of tobacco by reflectance measurements. Journal of Plant Physiology, 148(3), 483–493. https://doi.org/10.1016/S0176-1617(96)80283-5
Article CAS Google Scholar
Liu, Y., Feng, H., Yue, J., Fan, Y., Bian, M., Ma, Y., Jin, X., Song, X., & Yang, G. (2023). Estimating potato above-ground biomass by using integrated unmanned aerial system-based optical, structural, and textural canopy measurements. Computers and Electronics in Agriculture, 213, 108229. https://doi.org/10.1016/j.compag.2023.108229
Article Google Scholar
Lundberg, S., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
Meng, B., Liang, T., Yi, S., Yin, J., Cui, X., Ge, J., Hou, M., Lv, Y., & Sun, Y. (2020). Modeling alpine grassland above ground biomass based on remote sensing data and machine learning algorithm: A case study in East of the Tibetan Plateau, China. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 2986–2995. https://doi.org/10.1109/JSTARS.2020.2999348
Article Google Scholar
Meng, J., Du, X., & Wu, B. (2013). Generation of high spatial and temporal resolution NDVI and its application in crop biomass estimation. International Journal of Digital Earth, 6(3), 203–218. https://doi.org/10.1080/17538947.2011.623189
Article Google Scholar
Meyer, N., Bergez, J.-E., Justes, E., & Constantin, J. (2022). Influence of cover crop on water and nitrogen balances and cash crop yield in a temperate climate: A modelling approach using the STICS soil-crop model. European Journal of Agronomy, 132, 126416. https://doi.org/10.1016/j.eja.2021.126416
Article CAS Google Scholar
Michalak, A. M., Anderson, E. J., Beletsky, D., Boland, S., Bosch, N. S., Bridgeman, T. B., Chaffin, J. D., Cho, K., Confesor, R., Daloğlu, I., DePinto, J. V., Evans, M. A., Fahnenstiel, G. L., He, L., Ho, J. C., Jenkins, L., Johengen, T. H., Kuo, K. C., LaPorte, E., & Zagorski, M. A. (2013). Record-setting algal bloom in Lake Erie caused by agricultural and meteorological trends consistent with expected future conditions. Proceedings of the National Academy of Sciences, 110(16), 6448–6452. https://doi.org/10.1073/pnas.1216006110
Article Google Scholar
Moges, S. M., Raun, W. R., Mullen, R. W., Freeman, K. W., Johnson, G. V., & Solie, J. B. (2005). Evaluation of green, red, and near infrared bands for predicting winter wheat biomass, nitrogen uptake, and final grain yield. Journal of Plant Nutrition, 27(8), 1431–1441. https://doi.org/10.1081/PLN-200025858
Article CAS Google Scholar
Mutanga, O., & Skidmore, A. K. (2004). Narrow band vegetation indices overcome the saturation problem in biomass estimation. International Journal of Remote Sensing, 25(19), 3999–4014. https://doi.org/10.1080/01431160310001654923
Article Google Scholar
Nguy-Robertson, A., Gitelson, A., Peng, Y., Viña, A., Arkebauer, T., & Rundquist, D. (2012). Green leaf area index estimation in maize and soybean: Combining vegetation indices to achieve maximal sensitivity. Agronomy Journal, 104(5), 1336–1347. https://doi.org/10.2134/agronj2012.0065
Article Google Scholar
Peng, Y., & Gitelson, A. A. (2012). Remote estimation of gross primary productivity in soybean and maize based on total crop chlorophyll content. Remote Sensing of Environment, 117, 440–448. https://doi.org/10.1016/j.rse.2011.10.021
Article Google Scholar
Peroni Venancio, L., Chartuni Mantovani, E., & do Amaral, C. H., Usher Neale, C. M., Zution Gonçalves, I., Filgueiras, R., & Coelho Eugenio, F. (2020). Potential of using spectral vegetation indices for corn green biomass estimation based on their relationship with the photosynthetic vegetation sub-pixel fraction. Agricultural Water Management, 236, 106155. https://doi.org/10.1016/j.agwat.2020.106155
Article Google Scholar
Prabhakara, K., Dean Hively, W., & McCarty, G. W. (2015). Evaluating the relationship between biomass, percent groundcover and remote sensing indices across six winter cover crop fields in Maryland, United States. International Journal of Applied Earth Observation and Geoinformation, 39, 88–102. https://doi.org/10.1016/j.jag.2015.03.002
Article Google Scholar
Rosle, R., & Che’Ya, N. N., Roslin, N. A., Halip, R. M., & Ismail, M. R. (2019). Monitoring early stage of rice crops growth using normalized difference vegetation index generated from UAV. IOP Conference Series: Earth and Environmental Science, 355(1), 12066. https://doi.org/10.1088/1755-1315/355/1/012066
Article Google Scholar
Roth, L., & Streit, B. (2018). Predicting cover crop biomass by lightweight UAS-based RGB and NIR photography: An applied photogrammetric approach. Precision Agriculture, 19(1), 93–114. https://doi.org/10.1007/s11119-017-9501-1
Article Google Scholar
Rouse, J. W., Haas, R. H., Deering, D. W., Schell, J. A., & Harlan, J. C. (1974). Monitoring the vernal advancement and retrogradation (greenwave effect) of natural vegetation. E75-10354
Salmerón, R., García, C., & García, J. (2020). Overcoming the inconsistences of the variance inflation factor: A redefined VIF and a test to detect statistical troubling multicollinearity. arXiv:2005.02245
Santin-Janin, H., Garel, M., Chapuis, J.-L., & Pontier, D. (2009). Assessing the performance of NDVI as a proxy for plant biomass using non-linear models: A case study on the Kerguelen archipelago. Polar Biology, 32(6), 861–871. https://doi.org/10.1007/s00300-009-0586-5
Article Google Scholar
Sellaro, R., Crepy, M., Trupkin, S. A., Karayekov, E., Buchovsky, A. S., Rossi, C., & Casal, J. J. (2010). Cryptochrome as a sensor of the blue/green ratio of natural radiation in Arabidopsis. Plant Physiology, 154(1), 401–409. https://doi.org/10.1104/pp.110.160820
Article CAS PubMed PubMed Central Google Scholar
Sharma, P., Leigh, L., Chang, J., & Maimaitijiang, M. (2022). Above-ground biomass estimation in oats using UAV remote sensing and machine learning. Sensors, 22(2), 601.
Article PubMed PubMed Central Google Scholar
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
Article Google Scholar
Tang, Z., Parajuli, A., Chen, C. J., Hu, Y., Revolinski, S., Medina, C. A., Lin, S., Zhang, Z., & Yu, L. X. (2021). Validation of UAV-based alfalfa biomass predictability using photogrammetry with fully automatic plot segmentation. Scientific Reports, 11(1), 1–13. https://doi.org/10.1038/s41598-021-82797-x
Article CAS Google Scholar
Thenkabail, P. S., Smith, R. B., & De Pauw, E. (2000). Hyperspectral vegetation indices and their relationships with agricultural crop characteristics. Remote Sensing of Environment, 71(99), 158–182.
Article Google Scholar
Thieme, A., Yadav, S., Oddo, P. C., Fitz, J. M., McCartney, S., King, L. A., Keppler, J., McCarty, G. W., & Hively, W. D. (2020). Using NASA Earth observations and Google Earth Engine to map winter cover crop conservation performance in the Chesapeake Bay watershed. Remote Sensing of Environment, 248, 111943. https://doi.org/10.1016/j.rse.2020.111943
Article Google Scholar
Tian, Y., Huang, H., Zhou, G., Zhang, Q., Tao, J., Zhang, Y., & Lin, J. (2021). Aboveground mangrove biomass estimation in Beibu Gulf using machine learning and UAV remote sensing. Science of the Total Environment, 781(12), 146816. https://doi.org/10.1016/j.scitotenv.2021.146816
Article CAS Google Scholar
Tmušić, G., Manfreda, S., Aasen, H., James, M. R., Gonçalves, G., Ben-Dor, E., Brook, A., Polinova, M., Arranz, J. J., Mészáros, J., Zhuang, R., Johansen, K., Malbeteau, Y., de Lima, I. P., Davids, C., Herban, S., & McCabe, M. F. (2020). Current practices in UAS-based environmental monitoring. Remote Sensing. https://doi.org/10.3390/rs12061001
Article Google Scholar
Tucker, C. J., & Sellers, P. J. (1986). Satellite remote sensing of primary production. International Journal of Remote Sensing, 7(11), 1395–1416. https://doi.org/10.1080/01431168608948944
Article Google Scholar
Vélez, S., Martínez-Peña, R., & Castrillo, D. (2023). Beyond vegetation: A review unveiling additional insights into agriculture and forestry through the application of vegetation indices. J, 6(3), 421–436. https://doi.org/10.3390/j6030028
Article Google Scholar
Wang, F., Yang, M., Ma, L., Zhang, T., Qin, W., Li, W., Zhang, Y., Sun, Z., Wang, Z., Li, F., & Yu, K. (2022a). Estimation of above-ground biomass of winter wheat based on consumer-grade multi-spectral UAV. Remote Sensing. https://doi.org/10.3390/rs14051251
Article Google Scholar
Wang, J., Liu, T., Zhang, J., Yuan, H., & Acquah, G. E. (2022b). Spectral variable selection for estimation of soil organic carbon content using mid-infrared spectroscopy. European Journal of Soil Science, 73(4), e13267. https://doi.org/10.1111/ejss.13267
Article CAS Google Scholar
Wang, X., Miao, Y., Guan, Y., Xia, T., Lu, J., & Mulla, D. J. (2016). An evaluation of two active canopy sensor systems for non-destructive estimation of spring maize biomass. In 2016 Fifth International Conference on Agro-Geoinformatics (Agro-Geoinformatics). https://doi.org/10.1109/Agro-Geoinformatics.2016.7577610
Woebbecke, D., Meyer, G., Von Bargen, K., & Mortensen, D. (1995). Color indices for weed identification under various soil, residue, and lighting conditions. Transactions of the ASAE, 38(1), 259–269.
Article Google Scholar
Xu, L., Zhou, L., Meng, R., Zhao, F., Lv, Z., Xu, B., Zeng, L., Yu, X., & Peng, S. (2022). An improved approach to estimate ratoon rice aboveground biomass by integrating UAV-based spectral, textural and structural features. Precision Agriculture, 23(4), 1276–1301. https://doi.org/10.1007/s11119-022-09884-5
Article Google Scholar
Xu, M., Lacey, C. G., & Armstrong, S. D. (2018). The feasibility of satellite remote sensing and spatial interpolation to estimate cover crop biomass and nitrogen uptake in a small watershed. Journal of Soil and Water Conservation, 73(6), 682–692. https://doi.org/10.2489/jswc.73.6.682
Article Google Scholar
Ye, H., Huang, W., Huang, S., Cui, B., Dong, Y., Guo, A., Ren, Y., & Jin, Y. (2020). Recognition of Banana Fusarium Wilt based on UAV remote sensing. Remote Sensing. https://doi.org/10.3390/rs12060938
Article Google Scholar
Yuan, M., Burjel, J. C., Isermann, J., Goeser, N. J., & Pittelkow, C. M. (2019). Unmanned aerial vehicle-based assessment of cover crop biomass and nitrogen uptake variability. Journal of Soil and Water Conservation, 74(4), 350–359. https://doi.org/10.2489/jswc.74.4.350
Article Google Scholar
Yue, J., Feng, H., Li, Z., Zhou, C., & Xu, K. (2021). Mapping winter-wheat biomass and grain yield based on a crop model and UAV remote sensing. International Journal of Remote Sensing, 42(5), 1577–1601. https://doi.org/10.1080/01431161.2020.1823033
Article Google Scholar
Zhang, Y., Xia, C., Zhang, X., Cheng, X., Feng, G., Wang, Y., & Gao, Q. (2021). Estimating the maize biomass by crop height and narrowband vegetation indices derived from UAV-based hyperspectral images. Ecological Indicators, 129, 107985. https://doi.org/10.1016/j.ecolind.2021.107985
Article CAS Google Scholar
Zhou, Y., Zhang, L., Xiao, J., Chen, S., Kato, T., & Zhou, G. (2014). A comparison of satellite-derived vegetation indices for approximating gross primary productivity of grasslands. Rangeland Ecology & Management, 67(1), 9–18. https://doi.org/10.2111/REM-D-13-00059.1
Article Google Scholar
Zhu, W., Rezaei, E. E., Nouri, H., Sun, Z., Li, J., Yu, D., & Siebert, S. (2022). UAV-based indicators of crop growth are robust for distinct water and nutrient management but vary between crop development phases. Field Crops Research, 284, 108582. https://doi.org/10.1016/j.fcr.2022.108582
Article Google Scholar
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Article Google Scholar

Download references

Acknowledgements

We want to thank Boden Fisher, Brigitte Moneymaker, Abha Bhattarai, and Gaoshoutong Si for their contribution and support during field data collection.

Funding

This work was supported by funds from OSU L&L Grant (PG107271), SI Grant (PG107338), Ohio Soybean Council (GR123740), OSU Soybean Research Center, USDA-AFRI Grant (GR130726), and Hatch Project (NC1195).

Author information

Authors and Affiliations

Department of Food, Agricultural, and Biological Engineering, The Ohio State University, Columbus, OH, 43210, USA
Kushal KC & Sami Khanal
Ohio State University Extension, The Ohio State University, Fremont, OH, 43420, USA
Matthew Romanko
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
Andrew Perrault

Authors

Kushal KC
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Romanko
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Perrault
View author publications
You can also search for this author in PubMed Google Scholar
Sami Khanal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Kushal KC: Conceptualization, formal analysis, methodology, software, validation, visualization, writing—original draft, writing—review and editing. Sami Khanal: Conceptualization, funding acquisition, investigation, methodology, project administration, supervision, writing—review and editing. Matthew Romanko: Data collection, writing—review and editing. Andrew Perrault: Writing—review and editing.

Corresponding author

Correspondence to Sami Khanal.

Ethics declarations

Conflict of interest

The authors declare no competing financial interests or personal relationships that could have influenced the work in the paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOC 291 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

KC, K., Romanko, M., Perrault, A. et al. On-farm cereal rye biomass estimation using machine learning on images from an unmanned aerial system. Precision Agric (2024). https://doi.org/10.1007/s11119-024-10162-9

Download citation

Accepted: 20 June 2024
Published: 06 July 2024
DOI: https://doi.org/10.1007/s11119-024-10162-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On-farm cereal rye biomass estimation using machine learning on images from an unmanned aerial system

Abstract

Similar content being viewed by others

Improved estimation of rice aboveground biomass combining textural and spectral analysis of UAV imagery

Improved estimation of aboveground biomass in wheat from RGB imagery and point cloud data acquired with a low-cost unmanned aerial vehicle system

LAI estimation across California vineyards using sUAS multi-seasonal multi-spectral, thermal, and elevation information and machine learning

Introduction

Methods

Study sites and experimental design

Data collection

Field data collection

Image acquisition and processing

Data extraction

Data preprocessing and model development

Feature selection

Machine learning models

Model evaluation

Spectral and structural features

Results/findings

Cover crop biomass over time

Feature selection

Cereal rye biomass estimation models

Models based on VIs

Integration of structural features

Models based on canopy cover and crop height

Models based on VI, canopy cover, and crop height

Feature importance

Discussion

Importance of feature selection

Spectral vs structural features in biomass prediction

Performance of biomass estimation model

Limitations and future work

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOC 291 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation