Introduction

Faba bean (Vicia faba L.) is an important cool-season grain legume crop grown worldwide for its high seed protein content and ability to effectively fix biological nitrogen (Bangar & Kajla, 2022). The protein content of faba bean seeds is approximately 30% on a dry matter basis, making them highly valuable for both human consumption and animal feed (Crépon et al., 2010; Dhull et al., 2022; Warsame et al., 2018). Faba bean holds great promise for meeting the increasing global demand for plant-based protein. Furthermore, it integrates well into sustainable agricultural systems by enhancing soil nitrogen fertility and breaking the cycle of biotic stress in cereal-based cropping systems (Lepse et al., 2017; Li et al., 2023). The objective of faba bean breeding programs is to develop widely adaptable, disease-resistant, high-yielding, and genetically stable germplasm (Khazaei et al., 2021). This process includes monitoring plants' phenotypic traits for large sets of genotypes and traditional methods are time-consuming, prone to human error, and even destructive (Burud et al., 2017).

Unmanned aerial vehicles (UAV) equipped with various sensors such as RGB, multispectral, and hyperspectral sensors offer a practical solution for rapid and efficient high throughput phenotyping (HTP). While traditional RGB sensors, which consist of three bands (red, green, and blue), are cost-effective, easy to process, and provide high spatial resolution, they have limitations in capturing the complex information present in the crop canopy spectrum (García-Martínez et al., 2020). RGB sensors are particularly suitable for assessing canopy height and lodging (Adak et al., 2021; Kim & Chung, 2021). On the other hand, multispectral sensors typically cover a range of wavelengths between 400 and 900 nm, including blue, green, red, red edge, and near-infrared (NIR) bands (Bian et al., 2022). Such sensors are commonly used in UAVs due to their ability to capture vegetation information across multiple wavelengths simultaneously, allowing for the calculation of spectral reflectance and indices correlated with vegetation characteristics (Ganeva et al., 2022). Hyperspectral sensors, known for their numerous wavelength bands, rich spectral information, high spectral resolution, and excellent recognition capabilities (Feng et al., 2020) offer superior performance in accurately characterizing spectral responses. However, they are expensive and require complex data processing (Yoosefzadeh-Najafabadi et al., 2021).

Hence, the utilization of UAVs presents opportunities for generating extensive georeferenced data. However, managing such large datasets can be challenging. Nevertheless, this challenge also opens up possibilities for the application of novel modeling techniques, such as machine learning, which can enhance selection intensity, improve selection accuracy, and enhance decision support systems (Ganeva et al., 2022). Machine learning, as a successor to traditional statistical regression, enables the analysis of hierarchical and non-linear relationships between predictor and response variables, often outperforming conventional linear regression methods (Bian et al., 2022). Two commonly used machine learning regressions are support vector regression (SVR) and random forest (RF). SVR is a subclass of the support vector machine (SVM) algorithm and constructs an optimal separating hyperplane to distinguish overlapping and non-linearly separable classes. On the other hand, RF is an ensemble learning method that builds multiple decision trees and calculates the average value of their predictions as the final output (Li et al., 2022). The RF algorithm also provides a variable importance plot which is particularly valuable for identifying the most influential input variables in the model. By measuring the importance of each variable, RF facilitates the selection of candidate predictors to enhance regression accuracy (Li et al., 2022).

In the context of machine learning or general data analysis, it is desirable to minimize dependence on a large number of features. This can be achieved through the application of feature selection methods, which involve selecting a relevant subset of features from a dataset during the development of a machine learning model. Feature selection serves multiple purposes, including reducing the impact of noisy data on the prediction model, decreasing computation time, improving model performance, and enhancing the understanding of the dataset (Badillo et al., 2020). In supervised learning, feature selection is commonly employed prior to model development. Two widely used feature selection methods are the Pearson correlation coefficient method and sequential feature selection. The Pearson correlation coefficient (PCC) measures the linear correlation between variables, providing insights into their relationship (Jebli et al., 2021). Sequential feature selection algorithms, on the other hand, are a family of greedy search algorithms that operate in either a forward (SFS) or backward (SBS) manner, progressively adding or removing features based on their contribution to the model's performance (Li et al., 2022; Shafiee et al., 2021).

UAVs and machine learning have been utilized to evaluate various phenotypic traits, including yield, plant height, and chlorophyll content. The ability to predict crop yields in small and medium-sized plots, particularly during the early stages of plant growth, can assist agricultural producers in identifying areas with low crop yields due to abnormal crop health and poor soil fertility. This enables them to implement early intervention measures (Ganeva et al., 2022). Yield prediction by using vegetation indices has been assessed in some studies on wheat (Ganeva et al., 2022; Goodwin et al., 2018; Naser et al., 2020a, 2020b), corn (García-Martínez et al., 2020) and durum wheat (Kyratzis et al., 2017). Determining optimized index combinations of single or multiple plant development stages can better reflect crop growth patterns and greatly improve the crop yield prediction models. Plant height is another important indicator of crop growth, contributing to the assessment of crop productivity and decision-making in the crop management (Xie et al., 2021). Plant height is also correlated with the susceptibility to lodging (Ji et al., 2022) and with yield at early to mid-developmental stages (Xie & Yang, 2020; Yin et al., 2011) making it valuable for plant breeding programs and management practices (Xie & Yang, 2020). However, the utility of early-season plant height in predicting end-of-season traits such as yield and final height exhibits variation across studies and requires further evaluation. Chlorophyll content can serve as an indicator of the physiological status of the crop vegetation (Zhang et al., 2021). The Soil Plant Analysis Development (SPAD) is a commonly used tool for rapid and non-destructive estimation of relative chlorophyll content (Guo et al., 2022). Spectral indices have been employed in predicting SPAD values in crops such as wheat, maize, and potato, offering an alternative to labor-intensive and error-prone laboratory analysis (Liu et al., 2020, 2021; Sudu et al., 2022; Xiong et al., 2015; Yang et al., 2021; Yu et al., 2018). Although SPAD is faster than laboratory analysis of chlorophyll content it is still very time-consuming. Combining machine learning techniques with spectral vegetation indices may address the limitations of conventional methods for determining chlorophyll content in plants, offering a more efficient and accurate approach.

To the best of our knowledge, the utilization of UAVs for faba bean phenotyping has been explored in only three published studies focusing on yield and plant height estimation using RGB and multispectral sensors (Cui et al., 2023; Ji et al., 2022, 2023). However, these papers primarily discussed different models and methods of UAV imagery on a limited number of field plots and few flight missions. They did not consider the physiological and phenological aspects of faba bean in the interpretation of the UAV data. Therefore, conducting a large-scale field experiment with various cultivars, an increased number of replicates, and flight missions using both RGB and multispectral cameras from sowing to harvest can result in more efficient phenological and physiological data, as well as improved prediction accuracy. This study aimed to:

  1. 1-

    Investigate the correlation between estimated plant height using UAV technology and manually measured plant height. Additionally, assess the capability of UAV-based plant height to predict yield.

  2. 2-

    Examine the effectiveness of spectral indices in identifying phenological stages in faba bean. Additionally, evaluate the potential of Support Vector Regression (SVR) and Random Forest (RF) algorithms in predicting SPAD values—an indicator of chlorophyll content.

  3. 3-

    Optimize yield prediction models by utilizing Support Vector Regression (SVR) and Random Forest (RF) algorithms, integrating two feature selection methods: Pearson Correlation Coefficient (PCC) and Sequential Forward Selection (SFS).

Material and methods

The study was conducted at Vollebekk Research Farm at the Norwegian University of Life Sciences (NMBU), South-Eastern Norway (59° 39′ N 10° 45′ E) in 2022. A spring faba bean trial with 152 plots, 38 cultivars with 4 replicates, was managed in randomized complete block design (list of cultivars, their providers, country and maturation behavior presented in Online Appendix-Table 1). The field trial plots were 1.5 m wide and 7 m long with 0.25 m alleys between plots. Border plots were planted at each end of the field to decrease border effects. The field was planted on April 22, and harvested on August 15 (early cultivars) and August 31 (late cultivars). Field orthomosaics from sowing to harvest are presented in Fig. 1 while the procedure followed for data collection, processing, and data modeling is shown by a flowchart in Fig. 2.

Fig. 1
figure 1

Faba bean field orthomosaics from sowing (April 22) to harvest (August 31) created from images taken with Phantom 4 multispectral camera

Fig. 2
figure 2

Flowchart of data acquisition, processing and modeling in this study

Manual measurements

Plant development stages were recorded during the season using the BBCH scale (Weber & Bleiholder, 1990), and four stages were especially assessed in this study:

  • BBCH 50: Flower bud present

  • BBCH 60: First flower open

  • BBCH 70: First pod has reached the final length

  • BBCH 80: Beginning of ripening

SPAD values were collected at two stages, BBCH 50 and BBCH 70, using a chlorophyll content meter (Model CL-01, Hansatech Instruments Ltd, United Kingdom). SPAD values were recorded for 10 plants randomly spread in each plot (plants in outer rows ignored) and the average values were calculated to be representative of each plot. Manual plant height measurements were done from the ground surface to the end of the stem six times during the season before lodging of plants and before the harvest for 10 randomly selected plants from the middle of each plot. Lodging was measured by considering the leaning of plants at the plot level and reported as percentages. Each plot was harvested separately using a harvester, and faba bean yield was reported with 15% moisture content.

UAV image capturing and processing

RGB and multispectral aerial images were captured weekly from planting to harvest. RGB images were captured using the integrated camera in DJI Phantom 4 and the multispectral images were acquired by a Phantom 4 multispectral camera. Flights were conducted under clear sky between 11:00 AM to 2:00 PM at 20 m above ground level resulting in a ground sampling distance of approximately 1.3 cm/pixel. Mission planning was performed with the DJI GO 4 application (Everest Innovation Technology). The multispectral images were taken from a nadir-angle view with 80% frontal and 75% side overlap in five different channels including red, green, blue, red edge, and NIR. The RGB images were taken with 15-degree angle from the nadir view and with 85% frontal and 75% side overlap. The RGB images were taken from both perpendicular and horizontal to the sowing rows to acquire a densified point cloud. A total of sixteen missions were conducted throughout the season with both cameras.

Processing of multispectral images including geometric correction, image mosaicking, and radiometric calibration was conducted in Pix4D software (Pix4D SA, Lausanne, Switzerland), and for multispectral images, the orthomosaics were generated for each band separately. Five ground control points, geo-located with real-time kinematic (RTK) survey precision, were used to geo-reference the orthomosaics. The use of ground control points allows the alignment of orthomosaics obtained from different dates by the generation of a georeferenced orthomosaic image. A calibrated reflectance panel (CRP) was applied to help calibrate the images for the daily light levels. In order to conduct a rigorous analysis utilizing multispectral bands, soil values were excluded from the radiometrically calibrated images through the application of a vegetation index, the Excess Green index (ExG) as defined by Eq. 1. This approach was adopted based on prior findings indicating that ExG values demonstrate efficacy in assessing variations in canopy structure pertaining to green crop biomass (Kim et al., 2018; Torres-Sánchez et al., 2014). Following the computation of ExG for each flight, a threshold value was applied to facilitate the segregation of plant and soil values, thereby generating a mask. Subsequently, this mask was employed to selectively extract values from all multispectral bands.

$$ {\text{ExG}} = {2} \times \left( {{\text{green}}/\left( {{\text{red}} + {\text{green}} + {\text{blue}}} \right)} \right){-}\left( {{\text{red}}/\left( {{\text{red}} + {\text{green}} + {\text{blue}}} \right)} \right){-}\left( {{\text{blue}}/\left( {{\text{red}} + {\text{green}} + {\text{blue}}} \right)} \right) $$
(1)

In Eq. 1, red, green, and blue represent the reflectance values of these bands in the calibrated images.

To extract spectral values for each plot in the field, a shape file including all plots was created in QGIS software (QGIS 3.4, Open Source Geospatial Foundation Project) and the mean spectral values were extracted for each plot using zonal statistics in QGIS. To avoid any potential plot border effects, the outer edges of each plot were removed from the calculations. It is worth mentioning that the focus of this study was on whole-field trial plots with well-structured canopies, which mitigated the impact of mixed pixels and made it less significant in the analysis.

RGB images were used to create the georeferenced 3D models of the digital terrain model (DTM) and the digital surface model (DSM) in Pix4D. The DTM represented the topography of the field when no plants were present on the ground. On the other hand, the DSM encompassed both the topography and field features, including the plants, representing the state when plants were grown (Kim et al., 2018). The DTM was obtained during the initial flight of the DJI Phantom 4, conducted within 7 days after sowing when the plants had not yet emerged from the ground. Subsequently, DSMs were acquired for each flight of the DJI Phantom 4 on different dates. To determine the plant height, which exclusively accounted for field crop canopy features, the DSM was subtracted from the DTM using Eq. 2.

$$ {\text{Plant}}\;{\text{height}} = {\text{DSM}}{-}{\text{DTM}} $$
(2)

This study aimed to use the mean value of UAV plant height estimation for the analysis so the soil values might cause some miscalculations of mean plant height at the plot level at the beginning and end of the season. Therefore, soil values were excluded from all DSMs. To achieve this objective, a mask was generated to isolate plant height values exceeding 0.00 m (value of the soil) for each specific flight. This mask was subsequently employed to extract plant height information during each flight. A total of 2400 mean plot plant height values were extracted across sixteen-time points.

Data preprocessing and modeling

Data preprocessing techniques were applied to the extracted data to make the data clean, noise-free, and consistent. Before modeling, all data were standardized to avoid the influence of an inconsistent order of magnitude of the data on models and exploratory statistical analyses of yield, SPAD value, and plant height. The strength of the linear relationship between UAV and manual plant height measurements was expressed as Pearson's correlation coefficients. The common approach in HTP is using vegetation indices instead of directly using the spectral bands in crop monitoring, and estimation of yield and phenotypic traits (Qiao et al., 2022). Therefore, in this study the linear and non-linear combination of reflectance of blue, green, red, red-edge, and NIR wavebands were used to calculate eight indices that were widely used for yield and SPAD value prediction (Guo et al., 2022; Narmilan et al., 2022; Yang et al., 2022). Each index value was obtained by substituting the mean reflectance of each band of the plots into the corresponding formulas in Table 1.

Table 1 Selected spectral indices evaluated in this study

SVR and RF models with optimized hyperparameters were applied to predict yield and SPAD value. (Ganeva et al., 2022). SVR encompasses various hyperparameters, with two of the most significant being C and the choice of kernel. The parameter C determines the model's sensitivity to new training data, where larger C-values reduce the tolerance for errors, while smaller C-values allow for a less strict model (Evgeniou & Pontil, 2001). On the other hand, the kernel is responsible for determining how samples are separated. By utilizing a kernel, it becomes possible to transform data that is not linearly separable into linearly separable data (Shafiee et al., 2021). The most crucial hyperparameters in RF are n_estimators, max_depth, and max_features. N_estimators control the number of trees within the forest, max_depth governs the maximum height that trees can grow within the forest, and max_features aids in determining the number of features to consider when making optimal splits. The available options for max_features are "auto", "sqrt", and "log2" (Bian et al., 2022; Narmilan et al., 2022). The hyperparameters of these models in this study were tuned to improve model accuracy through the GridSearchCV package and Random Forest features importance were visualized by using scikit-learn library in Python 3.7

Two feature selection methods were employed; The Pearson correlation coefficient (PCC) method assesses a subset of features using a proxy measure and serves as a filtering method to indirectly evaluate regression problems (Jebli et al., 2021). On the other hand, the forward sequential feature selection (SFS) method initiates with the utilization of a single feature and attempts to model the data using the given model, selecting the feature that yields the highest accuracy. This process is repeated until reaching a predetermined number of features determined by the user (Li et al., 2022; Shafiee et al., 2021).

Yield prediction was done by three groups of models:

  1. 1.

    Mono temporal models (indices models): Using spectral indices at separate development stages

    • BBCH 50

    • BBCH 60

    • BBCH 70

    • BBCH 80

  2. 2.

    Multi-temporal models (stages models): Using various combinations of spectral indices of different development stages

    • BBCH 50 + 60

    • BBCH 60 + 70

    • BBCH 50 + 60 + 70

    • BBCH 50 + 60 + 70 + 80

  3. 3.

    Mono temporal models (bands models): Using five multispectral bands (red, green, blue, red-Edge, and NIR) and RGB plant height at separate development stages

    • BBCH 50

    • BBCH 60

    • BBCH 70

    • BBCH 80

In this part, 70% of the samples were randomly selected as the training data set and the other 30% of the samples were used as the test (validation) data set. The models were trained with tenfold cross validation to avoid overfitting. The coefficient of determination (R2) and root mean square error (RMSE) were utilized to assess the performance of the models. R2 serves as a statistical measure that indicates the degree to which the model's predictions closely resemble the actual data points. Its value ranges between 0 and 1, with higher values indicating a better fit of the data to the model. On the other hand, RMSE represents the sample standard deviation of the differences between the predicted values and the measured values. A lower RMSE signifies a more accurate prediction (Agüera-Vega et al., 2017).

Results

Plant height

Considering the ANOVA analyses and subsequent Tukey test results (Online Appendix-Table 3), UAV-based plant height differed significantly (p < 0.05) between different time points (DAS; days after sowing). The UAV-based plant height trend for each cultivar is shown in Fig. 3. Generally, faba bean plant height continued to increase until DAS 96 when a sharp reduction appeared in plant height because of lodging, approximately coinciding with BBCH stage 70–80. The plant height increased again until DAS 101 and after this stage UAV-based plant height estimates started to decrease. There were 38 cultivars with different degrees of lodging and UAV could catch this difference very well. Cultivar Allexia had the maximum lodging percentage (80%) and Louhi, Sampo, Vire, Merlin, Mistral, and GLA2001 showed the lowest lodging score (1–2%). The UAV-based plant height estimates were lower than the manually measured values. The correlation between manual and UAV-based plant height measures during the season (before lodging) showed an incremental trend from 0.35 at DAS 47 to 0.93 at DAS 79 and DAS 87 and they exhibited a strong overall correlation of 0.97, and RMSE 6 cm (Fig. 4a). The Pearson correlation coefficients were calculated to examine the relationship between UAV-based plant height measurements at different development stages (BBCH 50, 60, 70, and 80) and manual plant height at harvest time, as well as yield. Among the different development stages, manual plant height exhibited the maximum correlation (0.86) with UAV-based plant height at BBCH 70. Notably, the correlation coefficient at earlier stages, BBCH 50 and BBCH 60 were 0.57 and 0.73, respectively (Fig. 4b).

Fig. 3
figure 3

UAV-based plant height trends for 38 cultivars throughout the growing season (DAS: Days After Sowing)

Fig. 4
figure 4

Pearson correlation between UAV-based and manual plant height measurement at 6-time points (DAS 47–87) (a), and correlation coefficient map between UAV plant height at four BBCH stages with final manual height (harvest time) and yield (b)

The correlation between the UAV-based and manually measured plant height decreased at BBCH 80 due to lodging, and plants bending during the maturation stage resulting in a reduction in UAV-derived plant height measurements. However, manual measurements of stem length were not affected by these factors. The photos in Fig. 5 illustrate the impact of physiological processes on faba bean plants, showing significant bending at the time of harvest. The highest correlation between UAV-based plant height and yield was observed at BBCH 60, with a value of 0.54, as depicted in Fig. 4b. Two machine learning models, SVR and RF, were employed to predict the yield of faba bean. The predictions utilized UAV plant height data at different BBCH stages, as well as combinations of all flights throughout the season, including mono-temporal and multi-temporal data. Even if each BBCH stage had data from only a single flight, it provided a good comparison between mono-temporal and multi-temporal models. The results showed that SVR at stage 70 had the highest R2 (0.46) and lowest RMSE (0.63 tons/ha) for the test data set. On the other hand, RF performed best at stage 50, with R2 of 0.21 and RMSE of 0.77 tons/ha (Fig. 6). Pooling UAV height measurement of all time points resulted in improved model accuracy, with SVR achieving R2 of 0.63 and RMSE value of 0.52 tons/ha, and RF yielding R2 of 0.59 and RMSE of 0.55 tons/ha. Detailed information on the coefficient of determination and RMSE values of the train and test datasets for the RF and SVR models are presented in Online Appendix-Table 4.

Fig. 5
figure 5

Faba bean plants before lodging (a) and at harvest time (b)

Fig. 6
figure 6

Coefficient of determination and RMSE values of the RF and SVR models for predicting faba bean yield using UAV-based plant height measurements, at four BBCH stages and all flights throughout the growing season. The mean value of each group is added to each bar

Chlorophyll content (indicated by SPAD value) and indices trends

SPAD values were recorded at two stages, BBCH 50 and BBCH 70, for all 152 plots. The SPAD value range at BBCH 50 was 5.87 to 15.21, while at BBCH 70, it was 8.53 to 29.8. It was observed that all cultivars had significantly higher SPAD values at the second stage (Fig. 7a). The Pearson correlation coefficients between spectral indices and SPAD values at the two stages are illustrated in Fig. 7b. At BBCH 50, SPAD values exhibited the highest correlation with the MTCI and NDRE indices, with R2 values of 0.49 and 0.47, respectively. On the other hand, at BBCH 70, the highest correlation was observed with EVI (R2: 0.53) and NDVI (R2: 0.52).

Fig. 7
figure 7

SPAD values of 38 cultivars at development stages BBCH 50 and 70 (a), and correlation coefficients between SPAD values and each of the eight spectral indices at development stages BBCH 50 and 70 (b)

For predicting SPAD values using RF and SVR models, grid search was utilized to fine-tune the hyperparameters for both models. The selected kernel for SVR in both stages was Radial Basis Function (RBF), and C value of 1 at BBCH 50 and 10 at BBCH 70. In RF, max_features in both stages was 'sqrt', the number of estimators (n_estimators) at both stages was 100, and max_depth was 50 at BBCH 50 and 70 at BBCH 70. SFS and PCC were used for selecting features in both models. Feature importance plots at both BBCH stages for the RF model are presented in Fig. 8. In both stages, EVI showed the maximum importance compared to the other indices (0.16–0.2). The results of predicting SPAD values at both BBCH stages using SVR and RF models, along with feature selection using SFS and PCC, are presented in Fig. 9. PCC demonstrated better prediction for the test set at BBCH 50 and BBCH 70 (using SVR model), as well as BBCH 70 (using RF model) (R2: 0.16–0.32, RMSE: 1.19–3.07). However, for the test set at BBCH 50 (using the RF model), SFS yielded superior predictions (R2: 0.38, RMSE: 1.14). Comparing the development stages, both RF and SVR exhibited higher R2 values and lower RMSE values for predicting SPAD values at BBCH 50 (Fig. 9). Despite the attempts, both RF and SVR models showed limited capability in accurately predicting SPAD values for faba bean at different growth stages. In terms of feature selection, EXG and EVI were common features selected by the models at BBCH 50, while EVI, TGI, NDRE, and NGRDI were common features at BBCH 70. The coefficient of determination (R2) and RMSE values for both the training and test datasets, as well as selected features of RF and SVR models are presented in Online Appendix-Table 5.

Fig. 8
figure 8

Feature importance plots provided for eight spectral indices by sequential forward selection (SFS) in Random Forest for predicting faba bean SPAD value at development stages BBCH 50 and 70

Fig. 9
figure 9

Coefficient of determination and RMSE values for RF and SVR models in predicting faba bean SPAD values using spectral indices and feature selection methods (PCC and SFS) at development stages BBCH 50 and 70. The mean value of each group is added to each bar

The ANOVA test results indicated that all indices differed significantly between the four development stages except at BBCH 60 and 70 that NDVI, MTCI, NDRE, and NGRDI didn’t show significant differences, with their highest values observed at these stages, reaching 0.85, 0.73, 0.24, and 0.28 respectively (Online Appendix- Table 6). On the other hand, EVI and GNDVI exhibited their highest significant values of respectively 0.88 and 0.75 at BBCH 70. TGI had its highest value at BBCH 50 (0.041) while EXG reached its peak at BBCH 60 (0.45) (Fig. 10).

Fig. 10
figure 10

Boxplots of spectral indices at four development stages BBCH 50, 60, 70, and 80

Yield

The average yield of faba bean over all plots in this study was 6.6 tons/ha with a standard deviation of 1.2 tons/ha. The yield distribution was found to be approximately normal, with a slight skew towards the left (Fig. 11). This skewness can be related to the presence of some early-maturing cultivars with significantly lower yields, as well as some late-maturing cultivars that were highly affected by chocolate spot fungal disease, resulting in lower yield. In our study, we explored faba bean yield prediction using different combinations of multispectral indices, bands, and UAV-based plant height measurements at four development stages (BBCH 50, 60, 70, and 80). The Pearson correlation coefficients between yield and various indices at these stages revealed interesting patterns. For NDVI, NDRE, MTCI, GNDVI, and EVI the correlation increased from BBCH 50 to 60 and then decreased towards the end of the season (Fig. 12). At BBCH 60 these indices exhibited the highest correlation values of 0.81, 0.81, 0.79, 0.83, and 0.77, respectively. On the other hand, TGI, NGRDI, and ExG demonstrated the highest correlation coefficients at BBCH 80, with values of 0.76, 0.77, and 0.78, respectively. At BBCH 60, the GNDVI index demonstrated the highest correlation of 0.83. To predict yield at each development stage, using SVR and RF models, grid search was employed to tune the models' hyperparameters. For SVR, the selected kernel was linear in stages 50 and 80, while it was RBF in stages 60 and 70. The selected max- -features of the RF model was sqrt in all stages, indicating that the square root of the total number of features was considered as the maximum number of features during each split.

Fig. 11
figure 11

Distribution of faba bean yield harvested from 152 plots used in the present study

Fig. 12
figure 12

Correlation coefficients between yield and each of the eight spectral indices at four development stages BBCH 50, 60, 70, and 80

The yield prediction using indices at different development stages (indices models) showed that the R2 increased and RMSE decreased from stages 50 to 60, followed by a decrease in R2 towards the end of the season. At BBCH 50, SVR showed higher accuracy when using SFS as the feature selection method (R2: 0.62, RMSE: 0.64 tons/ha) while in RF and also at other development stages, PCC yielded better predictions (Fig. 13a). In all stages, RF showed higher R2 and lower RMSE in the train data. However, the performance varied in the test data across different development stages, showing diverse trends. For example, at BBCH 60, SVR with PCC had the best prediction (R2: 0.83, RMSE: 0.53 tons/ha), while at BBCH 70, RF with PCC was the best (R2: 0.76, RMSE: 0.62 tons/ha). The best prediction using spectral indices was achieved at BBCH 60 with SVR and PCC as feature selector (R2: 0.83, RMSE: 0.53 tons/ha). The common indices selected by PCC in both SVR and RF models were NDRE, GNDVI, and EVI, while GNDVI was the common index selected by SFS. The coefficient of determination and RMSE of the train and test datasets, as well as selected features of RF and SVR models, are presented in Online Appendix-Table 7.

Fig. 13
figure 13

Coefficient of determination and RMSE values of RF and SVR models combined with two feature selection methods (PCC and SFS) for faba bean yield prediction using: spectral indices at each development stages (a), the combination of all multispectral bands and UAV-based plant height at each development stages (b), and spectral indices in different combinations of development stages (c). The mean value of each group is added to each bar

In the second part of yield prediction, the combination of all multispectral bands and plant height estimation by the RGB camera at each development stage was used (bands models). These models exhibited a similar pattern to the indices model, with the best prediction occurring at BBCH 60 for both RF and SVR. The R2 increased from 50 to 60 but decreased thereafter, while the RMSE decreased from 50 to 60 and then increased. The best prediction was achieved at BBCH 60 using SVR (R2: 0.70, RMSE: 0.47 tons/ha). These models, using bands and plant height, did not show an improvement in the R2 for yield prediction compared to the indices model and the R2 was 15% lower than the best indices model. However, it resulted in a reduction of RMSE by 11% (Fig. 13b). There were no significant differences between the prediction accuracy of SFS and PCC, except at BBCH 50, where SFS had better performance with RF (R2: 0.41, RMSE: 0.66 tons/ha). Green, red, and red edge were the commonly selected bands by all models (Online Appendix-Table 8). In the third part, different combinations of development stages and indices, such as 50 + 60, 60 + 70, 50 + 60 + 70, and 50 + 60 + 70 + 80 were tested (stages models). The best prediction was achieved with RF and PCC at 50 + 60 (R2: 0.61, RMSE: 0.53 tons/ha), followed by SVR and PCC (R2: 0.57, RMSE: 0.55 tons/ha) (Fig. 13c). A combination of 60 + 70 ranked as the second group and predicted yield with an R2 ranging 0.49—0.55 and RMSE ranging 0.57—0.6 tons/ha. GNDVI was the common index selected by all models (Online Appendix-Table 9).

Discussion

Plant height

The primary objective of this study was to develop a reliable and cost-effective platform to accurately estimate the plant height of faba bean during the entire growth period that can enhance our understanding of growth patterns. The correlation between UAV-based and manual plant height measurements varied with plant development. At the earliest time point, the lowest correlation (0.35) was observed and the correlation improved as the developmental stages progressed. This improvement may be attributed to uneven seed germination and subsequent uneven plant height within plots during the earlier stages. Moreover, manual measurements are typically obtained for a subset of plants within a plot, whereas UAV measurements represent the average plant height at the plot level, encompassing a larger number of plants. Additionally, the highest correlation for a single date was 0.93, while pooling data from all dates resulted in a correlation of 0.97. This discrepancy is likely due to the relatively low level of height variation observed on a single date. This trend is also reported in other studies for maize (Tirado et al., 2020) and faba bean (Ji et al., 2022), where the development of plants resulted in a higher correlation between manual and UAV-based plant height measurements. This study showed that the average plant height values of each plot extracted from the UAV imagery were lower than the manually measured values, which was consistent with the previous studies (Ji et al., 2022; Yu et al., 2020). However, these manual measurements can still be used to assess the quality of UAV height measurements, if it is accepted that the quality of UAV-based and manual plant height measurements and their correlation likely cannot be improved beyond the inherent errors in manual height measurements and UAV errors which accumulated at the plot level.

This study could develop a platform for assessing the growth of different cultivars throughout the season that was significantly different regarding UAV-based plant height. In the cultivars' height profiles, two distinct reductions were observed. The first reduction occurred at DAS 96, which coincided with the occurrence of plant lodging. Plant lodging refers to the phenomenon that plants lose their ability to maintain an upright position and is influenced by factors such as cultivar type, pests, diseases, and field management practices (Alharbi & Adhikari, 2020; De Ron, 2015; Mínguez & Rubiales, 2021). Cultivars with elongated plants are susceptible to lodging, resulting in challenges in field management, and intensified field maintenance efforts (Duc et al., 2015). Therefore, it is crucial to obtain accurate and comprehensive information about lodging, including its location and extent, to effectively manage faba bean fields. The significant differences observed between cultivars in manual lodging scoring were also evident in the UAV-based cultivars' height profile, highlighting the potential utility of UAVs in tracking lodging occurrences in faba bean fields. The second reduction in UAV-based plant height was observed at DAS 107 (BBCH 80), which corresponded to the start of the pods ripening and the weakening of stems during maturation (De Ron, 2015). Therefore, these time points (DAS 96 and 107) were identified as crucial stages in the development of faba bean plants, effectively captured by RGB camera.

Being able to estimate the final plant performance using earlier stage information can be valuable for assessing end-season performance and implementing management practices to enhance it. In this study, there was a strong correlation (R2 > 0.5) between faba bean UAV-based plant height and final plant height (measured manually) from DAS 55 (BBCH 50) onwards. The highest correlation was observed at DAS 87 (BBCH 70), which occurred before lodging. Thus, the end of the flowering stage could be an optimal time for assessing cultivars' performance, contributing to improved selection efficiency. Plant height is also a reliable indicator of crop growth status and is closely linked to yield (Tilly et al., 2014). Previous studies have addressed the importance of plant height in yield estimation (Stanton et al., 2017; Yin et al., 2011), but most of them relied on single-time point plant height data. In this study, faba bean yield was estimated using both single and multiple time points plant height data. The results indicated that the single time point UAV-based plant height, particularly at BBCH 70 could be used to estimate faba bean yield (R2: 0.46, RMSE: 0.63 tons/ha). However, the estimation of yield based on multiple time points of plant height significantly improved the accuracy, resulting in a 17% increase in R2 and an 11% decrease in RMSE. Prediction of yield based on UAV plant height by using SVR and RF showed better results of SVR. These findings are consistent with other studies on faba bean (Ji et al., 2022) and maize (Guo et al., 2020), which have demonstrated the potential of SVR in enhancing crop yield estimation accuracy, while the RF is effective in handling large datasets and maintaining high accuracy (Hastie et al., 2009). However, it is worth mentioning that the RF method often encounters the overfitting phenomenon, which can lead to slightly inferior prediction outcomes (Bai et al., 2018).

Chlorophyll content (Indicated by SPAD value) and indices trends

SPAD is widely used to measure the leaf chlorophyll content (Xiong et al., 2015; Yang et al., 2012), and using spectral indices combined with machine learning models proved to be an efficient method to evaluate and predict SPAD values in several studies (Guo et al., 2022; Narmilan et al., 2022; Silva et al., 2022). In this study, chlorophyll content was assessed at BBCH 50 and 70 and significant differences in SPAD values and various spectral indices were observed between these two stages, with BBCH 70 exhibiting higher SPAD values and stronger correlations with the spectral indices. This study indicated that during the earlier stage, MTCI and NDRE, and in the later stage, NDVI, EVI, and GNDVI indices could be used for indirect assessment of the SPAD values, as the correlation among these traits and SPAD values approached 0.5. The SPAD emits light through the leaf in sequence with the two LED light sources, and the transmitted light in the red and infrared regions is measured by the two silicon photodiode detectors (Zhang et al., 2022). The correlation between these selected indices at different development stages and SPAD values can be attributed to the measurement of these two spectral bands by the SPAD device. However, the spectral indices varied significantly across different stages, indicating that the underlying mechanisms may differ throughout the various development stages (Guo et al., 2022). The prediction of SPAD value using RF and SVR indicated that the best prediction was achieved at BBCH 50, with RF and SFS as feature selection method (R2: 0.38, RMSE: 1.14), followed by SVR with PCC (R2: 0.32, RMSE: 1.19), suggesting that early development stage provide more accurate data for SPAD value prediction. Moreover, EVI was consistently selected by all SPAD prediction models, as it can enhance the early saturation of some other vegetation indices (Ma et al., 2018). However, measuring the SPAD value in faba bean can be challenging due to its thick leaf vein and SPAD sensitivity to the light condition in the field which might affect the correlation value and modeling results. The accurate measurement of chlorophyll content using SPAD relies on the light transmittance through the leaf, which can be influenced by various environmental factors, including light intensity (Narmilan et al., 2022). Other studies have similarly reported that the assessment of chlorophyll by SPAD may produce inaccurate readings due to leaf structure, leaf pigment distribution, and water content (Liu et al., 2019). Moreover, the representativeness of sampled SPAD values for each plot may be limited, and future analyses should consider incorporating more readings to minimize uncertainties.

The result of this study revealed a significant difference in the values of eight selected indices throughout faba bean development stages. This finding supports previous studies that have highlighted the usefulness of aerial images for monitoring crops over their development stages (Bian et al., 2022; Ganeva et al., 2022; Silva et al., 2022; Tirado et al., 2020; Yang et al., 2022). However, it should be considered that faba bean development stages differ somewhat from other crops. This can be attributed to the progressive vegetative growth of faba bean, which leads to an overlap in vegetative and reproductive growth, resulting in three distinct growth types: indeterminate, semi-determinate, and determinate (Etemadi et al., 2018). In indeterminate and semi-determinate faba bean types, vegetative growth continues if conditions are suitable after the initiation of the flower, resulting in ongoing competition for assimilation between vegetative and reproductive growth (Mínguez & Rubiales, 2021). However, as the upper sections of the plants continue to produce leaves, the lower canopy experiences leaf aging and senescence (Alharbi & Adhikari, 2020). In this study, flowering started at BBCH 50, but vegetative growth continued till BBCH 80 which marked the beginning of pod ripening. Among different growth stages, BBCH 70 had commonly the maximum value of NDVI, EVI, ExG, and GNDVI indices. Plants in good physiological condition have high values of these indices, and they display also high reflectance values in the NIR band (Ganeva et al., 2022) which was observed also in our study (Online Appendix-Fig. 14). Therefore, it could be concluded that faba bean plants were in the best physiological condition at BBCH 70.

Yield

Timely and non-destructive assessment of crop yield is an essential part of plant phenotyping and precision agriculture. In this study, RGB, and multispectral images, two machine learning models, two feature selection methods, and three groups of features in four development stages were used to predict faba bean yield. Results indicated that the models utilizing PCC as a feature selection method achieved higher accuracy compared to the models employing SFS. By employing different feature selection techniques, redundant variables will be eliminated, reducing the model variance and resulting in a more concise and parsimonious model (Shafiee et al., 2021). PCC was chosen as the preferred method due to the assumption of feature independence made by SFS. In real-world datasets, it is common for features to exhibit various degrees of correlation with each other. PCC is capable of measuring the strength and direction of these correlations, providing valuable insight into the interrelationships among features and it takes into account the multivariate relationships between features and the target variable (Géron, 2022; Guyon & Elisseeff, 2003). Across the three groups of models focusing on spectral indices, spectral bands, and plant height, the RF model showed the maximum accuracy and performed the best on the training data set. However, its performance on the test data set wasn’t satisfactory in certain models. This discrepancy could potentially be attributed to the phenomenon of overfitting, where the RF model becomes too specialized to the training set, leading to reduced performance on unseen data (Bai et al., 2022; Xiang-yu et al., 2020). The SVR model demonstrates high accuracy on the test data set due to its robustness, and compatibility with small sample data regression (Elbeltagi et al., 2021; Shafaei & Kisi, 2017; Suykens et al., 2002). SVR is a sample learning method and can avoid the traditional process from induction to deduction, realizing the efficient “transduction reasoning” from training samples to prediction samples, simplifying the usual problems such as classification and regression (Guo et al., 2022; Schuldt et al., 2004; Wang & Hu, 2005; Yoosefzadeh-Najafabadi et al., 2021).

Faba bean yield prediction using both RF and SVR showed the best prediction in BBCH 60 utilizing both indices and the combination of bands and plant height. Many studies have found that the physiological characteristics of crops vary with crop growth stage (Bendig et al., 2014; Bian et al., 2022; Lai et al., 2018). Indices trend in this study indicated that the eight spectral indices varied significantly over the four stages and it might lead to significantly different accuracies of yield prediction (Bian et al., 2022) and spectral indices showed the highest correlation with yield at BBCH 60. According to the faba bean BBCH scale (Weber & Bleiholder, 1990), the first flowers open at BBCH 60, and it is also reported in other crops such as maize (García-Martínez et al., 2020; L. Li et al., 2021a, 2021b; Yang et al., 2022), soybean (Bai et al., 2022) and rice (L. Li et al., 2021a, 2021b; Zhou et al., 2017) that flowering stage can provide more reliable yield prediction by spectral indices. Moreover, the intensive growth period of faba bean occurs during the flowering stage when the crop reaches its highest leaf area index and CO2 fixation rate (Etemadi et al., 2018). During the later stages of faba bean growth, particularly at BBCH 80, there is a gradual transfer of assimilates to the pods. This results in a decrease in chlorophyll content, leading to a weakening of the relationships between the canopy spectral reflectance and grain yield. Furthermore, the pods' development in faba bean primarily occurs along the main stem, and the upper leaves may obstruct the traits derived from UAV-based remote sensing, which captures the top view. Consequently, the accuracy of yield prediction in faba bean at later stages may differ from crops such as wheat, where the reproductive organs are located at the top of the canopy (Li et al., 2021a, 2021b; Veverka et al., 2021). This study also constructed yield prediction models for the vegetation indices at multiple stages. When spectral indices of all four development stages were used as input variables, the accuracy of the models was lower compared to using a single stage. However, in some studies, the use of multi-temporal data demonstrated better prediction results in comparison to mono-temporal data (Shafiee et al., 2021; Wang et al., 2014; Zhou et al., 2017). In the current study, the inclusion of multi-temporal data led to issues such as multicollinearity among the input spectral indices (Online Appendix-Fig. 15) and a decrease in the correlation between certain spectral indices and yield. Consequently, these factors contributed to a reduction in prediction accuracy. This result suggests that the impact of multi-temporal data on prediction performance can vary depending on the specific dataset and the characteristics of the crop being studied.

The results of this study revealed that the best spectral index for yield prediction was GNDVI. This finding was supported by the Random Forest feature importance plots, which consistently showed an increasing trend in the importance of GNDVI from BBCH 50 to BBCH 60, followed by a decline in importance during the later stages of the season because of gradual transfer of nutrients from canopy leaves and stems to the developing pods, coupled with a reduction in the chlorophyll content of the leaves (Online Appendix-Fig. 16). Some studies showed that the correlation between yield and spectral indices based on NIR bands diminishes in later development stages, as the reduced amount of green biomass leads to a decline in NIR reflectance (Bian et al., 2022; M. Naser et al., 2020a, 2020b; Veverka et al., 2021; Xue & Su, 2017). The NIR band is commonly present in indices such as NDRE and EVI, which were also selected by the majority of the models in this study. In this study, the variability in grain yield was observed among different cultivars and it became evident that GNDVI was more effective in capturing yield differences under these specific conditions, compared to NDVI as the selected index in some other studies (Shafiee et al., 2021, 2023; Silva et al., 2022). Other studies also reported the strength of NDRE and GNDVI in comparison with NDVI (Li et al., 2019; Ramos et al., 2020). However, the selection of a particular index to predict yield is dependent on various factors including the dataset, the growth stage of the plants, and the environmental conditions (Shafiee et al., 2021).

Conclusion

This study demonstrated the potential of using UAVs equipped with RGB and multispectral cameras, along with machine learning algorithms, as an accurate and efficient approach for high throughput phenotyping in faba bean. The models developed in this study achieved good accuracy for estimating and predicting important agronomic traits like plant height, chlorophyll content, and yield. UAV-based plant height showed a very strong correlation with manual measurements and effectively captured growth patterns and lodging events. It was also found to be predictive of final plant height and yield when measured at key developmental stages like flowering. Spectral indices proved useful for yield and chlorophyll content prediction. GNDVI, EVI, and NDRE emerged as particularly relevant spectral indices. Overall, BBCH 60 (the first flower open) was found to be the optimal stage for yield prediction using spectral data. This presents an opportunity for implementing targeted management practices to maximize yield. With some customization, this methodology could also be applied to other grain legume crops that have similar growth habits as faba bean.

While promising results were achieved, there is scope for improving accuracy further by testing these methods across diverse environments. Moving forward, it is recommended to combine spectral features with other features (e.g., morphological, textural, and physiological) and use advanced machine learning models to enhance the accuracy of lodging detection using UAVs. Laboratory measurement of leaf chlorophyll content via stoichiometry and spectrophotometer is also advised for more precise monitoring and validation of multispectral indices-based predictions across faba bean developmental stages. Furthermore, this study specifically examined five spectral bands, but future research could explore the development of SPAD prediction models using hyperspectral imaging which captures a significantly larger number of narrow and contiguous spectral bands compared to multispectral cameras. Hyperspectral cameras can be applied to different growth stages of faba bean under varying environmental conditions.