Background

Iron deficiency chlorosis (IDC) is a major soil borne stress in soybean [Glycine max (L.) Merr.] and causes significant yield reductions. In the United States, soybean IDC has been reported to result in yield losses totaling $260 million annually [1, 2]. Soybean IDC is caused by a lack of available iron [Fe(II)] to the plant [3,4,5,6]. While iron is abundant in almost all soils, deficiencies are caused by several soil chemical factors and their interactions that change the solubility of iron in the soil [2, 7]. Soybean IDC is impacted by soil pH, soil calcium carbonate content, soil moisture content, soil electrical conductivity, iron oxide concentration, and soluble salts [2, 8,9,10]. Deficiency symptoms include interveinal chlorosis and overall stunting of the plant [11, 12]. Soybean growers can overcome the drastic yield penalties of IDC by growing tolerant soybean varieties, planting companion crops, reducing other forms of plant stress, and supplementing the soil with iron chelates [13, 14].

The preferred method to minimize yield losses caused by IDC is growing a tolerant variety, which is why there is continued interest in the development of IDC tolerant varieties by soybean breeders [13, 15, 16]. To accomplish this breeding objective, thousands of soybean breeding lines need to be screened every year for IDC severity. The screening has traditionally been accomplished using a 1–5 visual severity scoring system where a score of 1 is given to tolerant lines and a score of 5 is given to susceptible lines [6]. The visual rating system is labor intensive and typically only done at one time point in the growing season. In addition, intra-rater variability due to the subjectivity of the human eye can result in less accurate phenotypic measurements, and thus, researchers are investigating image-based methods for quantifying IDC severity [6, 17]. New automated rating systems hold potential for more objectivity and reliability for phenotyping IDC stress [17, 18]. To date, these phenotyping methods have been implemented from tripods [17] and push carts [19]; however, implementation using an unmanned aircraft system (UAS) has not been reported.

High throughput phenotyping (HTP) refers to the ability for researchers to collect detailed information during a plant life cycle in a non-invasive way [20, 21]. This can be done in both controlled and field environments and with a wide array of platforms and sensors including ground- and aerial-based systems [18, 20]. In recent years, there has been a growing interest in the utilization of aerial HTP platforms, especially for use in germplasm assessment within breeding programs [22]. While many traits are currently being measured in soybean using these platforms, including plant maturity, canopy coverage, and yield estimation [23, 24], for example, the main goal of this project was to use an unmanned aircraft system (UAS) to improve plant assessments of soybean iron deficiency chlorosis (IDC). The objective of this research was to use images collected from a UAS to measure IDC severity and determine the accuracy and precision of these predictions.

Methods

Plant material, location, and field design

A series of 36 trials consisting of breeding lines at different stages in the UMN breeding program, ranging from the advanced yield trial stage to the regional trial and commercial testing stage, was used in this study (Fig. 1). Breeding lines belonged to maturity groups ranging from 00 to II. Individual trials consisted of breeding lines of similar relative maturity. The number of entries in each trial ranged from 16 to 80. Each plot (experimental unit) was planted as a single row 91 cm in length and 76.2 cm apart. Plots were arranged in a randomized complete block design with two replications. All plots (a total of 3824) were planted on June 1, 2017.

Fig. 1
figure 1

Soybean iron deficiency chlorosis testing field site location near Danvers, MN. The field used in this study is located in Six Grove Township, MN (45.274285, − 95.718046) in Swift County. The yellow circles highlight the nine ground control points in the field for geometric calibration and the red squares highlight the radiometric calibration panels. These panels were painted with four levels of grey for empirical line method calibration. Overlaid to the field orthomosaic is a vector file delineating the plot boundaries of 3386 soybean plots. Each plot boundary is colored based on the trial each plot belongs to. A total of 36 trials was grown

The location for this study was a field site near Danvers, MN (45.274285, − 95.718046) in Swift County. This field has a history of soybean IDC and has been rotated between corn and soybean for several years. Before planting, soil cores were taken and the soil was confirmed to have a pH in the range of 7.5 to 8.2, a range known to induce iron deficiency in soybeans [2, 8].

A total of nine ground control points (40 cm × 20 cm cement pavement blocks painted red) were placed in the field (Fig. 1). These were placed randomly throughout the field site location in such a way to cover the entire area of interest and remained in the field for the duration of the season. The ground control point coordinates were collected at one time-point during the summer using a Trimble Handheld GPS unit.

Reflectance calibration panels were also created for use in this project. A total of three calibration targets (2 feet by 2 feet matte boards) were made with each target consisting of four levels of grey—5%, 20%, 40%, and 55% reflectance painted with ‘black’, ‘iron mountain’ ‘flannel gray,’ and ‘silver bullet,’ and for each % reflectance respectively (BEHR paint brand, Santa Ana, California) Paint was mixed with a 50/50 mix by weight with barium sulfate to ensure a near Lambertian surface. A total of three layers of paint were used on top of one coat of primer. An ASD (Analytical Spectral Devices) Handheld 2: hand held VNIR spectroradiometer was used to measure the reflectance of the panels with the built-in halogen bulb for source lighting.

IDC ground-based phenotyping

Soybean IDC was rated on July 12 and August 1, 2017, herein referred to as date 1 and date 2 respectively. On date 1, soybeans ranged in vegetative growth stages from V3–V6 and on date 2, soybeans ranged in growth stages from V6–R2. Soybeans were rated on both dates with a 1–5 visual rating scale. With this scale, a rating of “1” indicates a plot that is 100% green (no yellowing), a score of “2” indicates slight yellowing with some plants in the plot turning yellow, a score of “3” indicates moderate yellowing with most plants turning yellow in the plot, a score of “4” indicates intense yellowing where all plants are yellow and some are becoming stunted and necrotic, and a score of “5” indicates most severe IDC symptoms where the entire plot is damaged and dying or completely dead [6, 25]. Each plot was measured by an expert rater who understands IDC stress symptoms. This ground-based phenotyping served as the reference data in this study for training and validating models. In addition, 252 plots were scored independently by two trained raters for the assessment of inter-rater variability. These plots were chosen because they displayed variation in IDC severity because of their placement in a part of the field with optimal IDC stress for detecting differences between varieties.

UAS platform, sensor, flight plan

In this study, a pipeline was created from image capture to image analysis. Table 1 highlights the major steps in this pipeline including (1) UAS image collection, (2) orthomosaic generation, and (3) image processing. In step 1, aerial data were collected on the same dates as ground-based data (July 12 and August 1) with a DJI Inspire 1 equipped with a modified dual camera system, “Sentera Double 4 K agricultural” (Sentera Inc, Minneapolis, MN). The images were captured in 12.3 mega-pixel (MP) red (650 nm × 70 nm width), green (548 nm × 45 nm width), blue (446 nm × 60 nm width) and 12.3 MP red (650 nm × 70 nm width), green (548 nm × 45 nm width), near infrared (NIR, 840 nm × 20 nm width). Each camera has a 60° field of view. The UAS was flown using the autonomous flight mission planning of AgVault software. All missions were conducted at an altitude of 60.96 m with a UAS speed of 5 m/s and images captured with 70% end lap and side lap. The resulting images had a ground sampling distance of 1.6 cm. All flights were conducted within 2 h of solar noon to limit shadow effects.

Table 1 Pipeline for image capture and analysis for iron deficiency chlorosis assessment

Image data processing

After image collection, image orthomosaics were generated using Pix4D Desktop (Pix4D, SA). The WGS 84 datum was used with a projected coordinate system of UTM zone 15 N. Images captured by the Sentera dual camera system were uploaded into the same project and given group names of “RGB” and “NIR” for the two sets of images respectively. For this study, only RGB data were used in image processing. The default processing options template, “Ag RGB,” was used for generating geo-referenced orthomosaics. This option generates mosaics from overlapping nadir images and outputs a full resolution GeoTIFF file and merges tiles. In addition, the “Ag RGB” processing template has faster processing speed and is compatible with RGB cameras.

Pix4D Processing occurs in three major steps including initial processing; point cloud and mesh; and digital surface model, orthomosaic, and index generation. Nine ground control points were input into the Pix4D project directly following the initial processing step using the ray cloud editor.

In step 3, orthomosaics were processed and data extracted from each plot. Orthomosaics were loaded into Erdas Imagine and unsupervised classification using k-means clustering into five classes was done on an indexed map of the ratio of red and green (R − G)/(R + G). After unsupervised classification, the five classes were manually grouped into the “plant” class and the “soil” class based on human observation of the five classes. The classes were recoded, and a mask was set based on the new classification.

Accuracy assessment of the plant and soil classification was conducted using Erdas Imagine software (Hexagon Geospatial, United States). The accuracy assessment toolkit was used to assign 1000 points across the field of interest. An equalized random sampling scheme was used to set 500 sampling points of soil and 500 of plant classified pixels. Reference data were created by visually assigning each point as plant or soil based on human interpretation of the non-classified original orthomosaic. Class values were hidden during reference data collection in order to ensure unbiased values. An error matrix was used to compute overall accuracy, producer accuracy, and user accuracy. Overall classification accuracy was computed by summing the major diagonal numbers (correctly classified) divided by the sum of all sample units in the entire matrix. Producer accuracy was computed by dividing the total number of correct sample units in each specific category by the total number of the specific sample units of reference data. User accuracy was computed by dividing the total number of correct sample units in the specific category by the total number of sample units classified as the specific category.

The masked plant canopy was further classified to green, yellow, and brown plant pixels using an additional k-means clustering step. This was done to mimic how researchers typically rate IDC plots and was based on previous studies [17, 19]. These features were then extracted using QGIS software (QGIS Geographic Information System. Open Source Geospatial Foundation Project. http://qgis.osgeo.org). In QGIS, a polygon shapefile was created where field plot polygons were used to identify each of the 3824 plots. The zonal statistics plugin was used to extract the proportion of green, yellow, and brown pixels in each plot.

Two modeling algorithms, neural network and random forest, were used to relate these three features to the ground-based visual scores [26]. Both algorithms are available in the predictive analytics software package within JMP Pro (JMP®, Pro 10, SAS Institute Inc., Cary, NC, 1989–2007). In both cases, the IDC visual score data was treated as a character and 77% of the data, at random, was used to train the models and 33% of the data, at random, was used for validation. The random forest model used 100 trees in the forest with 10 minimum splits per tree. The neural network was run using default settings in JMP software. The hidden layer structure used 3 TanH functions and a learning rate of 0.1. Results are reported in a confusion matrix.

Each of the 36 trials in the field were analyzed using a linear model including the IDC score as the response variable, a fixed effect for entry, a fixed effect for block, and a random residual. Least-significant differences (P < 0.20) were calculated for all 36 trials on both ground-based and UAS-image based scores for both dates of data collection.

Results and discussion

Ground-based phenotyping

A total of 3386 plots were assessed for severity to IDC stress on a 1–5 scale on two separate dates (July 12 and August 1, 2017). The distribution of scores from each of the two dates, as well as their relationships to one another, can be seen in Fig. 2. In total, 15%, 45%, 30%, 9%, and 1% were rated 1–5, respectively, on date 1. On date 2, 30%, 41%, 18%, 8%, and 3% were rated 1–5, respectively. In total, there was a much higher abundance of entries given a score of “1” on date 2 as compared to date 1. Nevertheless, the overall average score in the field was 2.1 on both dates of visual scoring. The comparison between raters was also investigated on a subset of 252 plots. For these plots, two people independently scored the plots to provide an estimate of inter-rater variability in visual scores. The correlation between rater 1 and rater 2 was 0.93.

Fig. 2
figure 2

Relationships between two dates of iron deficiency chlorosis (IDC) severity (a) and two separate raters scoring plots (b). IDC severity was scored on a total of 3386 plots on both July 12 and August 01. The correlation of ratings between date 1 and date 2 was found to be 0.80. A subset of 252 plots was scored by two independent raters on date 1. The correlation of ratings between raters was found to be 0.93

UAS imagery for IDC phenotyping

In the first step of the image analysis pipeline (Table 1), plant canopy was masked from the soil using k-means clustering (Fig. 3), an unsupervised machine learning approach that has been successfully used in many agriculture applications [27, 28]. This approach resulted in an overall classification accuracy of greater than 95%. The user accuracy was 94% and 97% for soil and plant classification, respectively. Table 2 is a confusion matrix highlighting the results of the accuracy assessment of this classification.

Fig. 3
figure 3

Iron deficiency chlorosis classification. The Orthomosaic (top) is first classified into plant and soil pixels (bottom left). The plant pixels are then classified in a second step to green pixels (%G), yellow pixels (%Y), and brown pixels (%B) (bottom right). These features are then related back to ground-based visual scores through random forest and neural network models to classify tolerant and susceptible plots

Table 2 Accuracy assessment of pixel-based classification method for plant and soil classification

Previous studies have utilized a variety of techniques to mask plant canopies. In a publication by Yu et al. [23], for example, plant and soil classification was achieved using a random forest model and resulted in an accuracy of 99.9% [23]. We achieved an accuracy of 95.6%. Plants under chlorosis or necrosis stress, however, had spectral properties very similar to those of the soil background. In addition, many misclassification errors occurred at the edges of the canopy where soil and plant canopy pixels were overlapping. One advantage of this study, however, is the ability to classify plant and soil in the absence of training data. The unsupervised classification approach using k-means clustering resulted in pixels of similar spectral values being clustered together. This clustering occurs with no former knowledge or input, and can often provide new insights into the data [18]. With supervised approaches, such as the random forest, a training data set must be generated by manually interpreting the classes pixels belong to. Other studies have used thresholding approaches to differentiate plants from soil. In these scenarios, the image is often thresholded based on the hue, saturation, value format of the image, or based on indices or ratios of different color bands [17, 24, 29].

Percent green, yellow, and brown pixels were also classified using a similar approach as the plant canopy classification (Fig. 3). These features were used in two different machine learning models (random forest and neural network) to predict IDC severity scores. The confusion matrices for the random forest models for date 1 and date 2 are shown in Tables 3 and 4 for date 1 and date 2, respectively. Overall, classification accuracy was 68% for date 1 and 77% for date 2. The increase in classification accuracy between date 1 and date 2 was likely the result of the size of the plants, with more pixels representing plant material at date 2 than at date 1.

Table 3 Random forest confusion matrix for date 1 of data collection (July 12)
Table 4 Random forest confusion matrix for date 2 of data collection (August 1)

The average canopy coverage of each plot is related to overall biomass and light interception of the plots. Canopy coverage was thus calculated as the ratio of “plant classified pixels” to “soil plus plant classified pixels” and compared across the five different severity classes. The average canopy coverage for each IDC class was as follows: class 1 = 0.264, class 2 = 0.246, class 3 = 0.189, class 4 = 0.096, and class 5 = 0.039. Each of these levels was considered significantly different from each other class at the 0.05 probability level using Tukey’s honest significant difference, indicating the IDC score explained variation in plot biomass.

UAS image-based scoring can detect differences among breeding lines

Another method to test the usefulness of UAS-image based scores for variety screening and development is to test if the scores can differentiate breeding lines in the field. A common method for assessing differences is to calculate the LSD for comparing performances of breeding lines (or released varieties) in typical randomized field trials. For reporting of variety trials, a relaxed significance threshold (e.g., P < 0.20) is commonly used for declaring differences among varieties to increase power and reduce type II errors. Based on this, if the LSD is low, the precision of the data is higher, and the researcher is better able determine differences among breeding lines. The LSD values from the ground-based data and the UAS data were compared among the 36 experimental trials, or sets of lines, for both dates of data collection. In 32 of the 36 trials on date 1 and 33 of 36 trials on date 2, the LSD was smaller when using the UAS image-based scores in the linear model compared to the visual scores (Fig. 4). There was an average 0.15- and 0.20-point reduction in LSD (ranging from 0.35 increase to a 0.50 decrease) with UAS scores compared to the visual scores for date 1 and date 2, respectively. This overall reduction in LSD indicates that the UAS image-based scores were more precise than visual scores and offers valuable information for use in a breeding program.

Fig. 4
figure 4

A total of 36 trials consisting of soybean breeding lines, each arranged in a randomized complete block design in the field, were analyzed using a linear model for both visual score and unmanned aircraft system (UAS) predicted values on both dates (July 12 and August 01). Bars indicate the least significant difference (LSD) values to separate mean scores of breeding lines for each trial in the field. In 31 of the 36 trials on date 1 (top) and 33 of 36 trials on date 2 (bottom), the LSD was decreased when using the UAS predicted IDC scores (black inside bars in the linear model compared to the visual scores (dashed outside bars)

Many previous studies have shown great success and accuracy in collecting phenotypic traits from unmanned aircraft systems both on soybeans and other crops [22, 23, 30, 31]. In addition, previous studies have successfully used image-based methods for classifying and quantifying soybean IDC stress [17, 32]. However, to date, no study has demonstrated combining the powers of high-throughput image collection from unmanned aircraft systems with image-based classification for IDC.

UAS imagery compared to other methods for IDC assessment and limitations

Previous studies have tested the use of image analysis procedures to quantify IDC stress [17,18,19]. Their approaches achieved a mean per class accuracy of ~ 96% and 81%. While the previous studies showed relatively high accuracy, the throughput of taking photos from a tripod or push cart is much lower in comparison to a UAS. Spatial resolution achieved, however, was much higher (6 pixels/mm from tripod compared to 1.6 cm/pixel from the UAS). This decrease in spatial resolution of UAS images versus ground images caused blending of pixels and thus less resolution to depict subtle changes of individual leaves becoming chlorotic.

For this study, a flight path was chosen to cover a large area within one UAS battery life, and to limit variation in sun positioning and cloud shadows during flight, as these variables are known to present problems with image analysis applications. Future research should be done to test if an increase in spatial resolution through improved camera sensors or lower flight altitudes would improve predictive abilities. Care should be taken to ensure that flights conducted limit shadowing caused by soybean plots by flying near solar noon, and that the cloud cover during flights be consistent—full cloud cover or no cloud cover is optimum. The system presented in this paper reduced flight time by flying at a higher altitude and will allow researchers to achieve a much higher temporal assessment of IDC severity as well as the ability to rate more plots in any given growing season.

Additionally, the confusion matrices show that many of the misclassified scores are mostly between nearby classes. If a breeder simply wishes to select against scores of 4 or 5, for example, the UAS image-based scores would be very adequate for breeding purposes. To test this, scores of 4 and 5 were binned together as “high stress” and scores of 1, 2, and 3 were binned together as “low stress.” The overall accuracy of the random forest model in correctly placing the entries into these two categories was 89%.

A final caution that researchers should consider in employing UAS imagery for IDC detection is that other soybean stresses or field variables may mimic IDC symptoms detected by image analyses. One major biotic factor to soybean production, for example, is soybean cyst nematode. Research plots may experience stunting and chlorosis from nematode presence, which could be rated as IDC susceptible using this system. In addition, healthy weeds in the field would be detected by this system as healthy vegetation, and thus, healthy plots. These cautions were addressed by site selection with limited known off-target stressors and by proper weed control prior to all UAS flights, and should be considered by researchers wishing to employ this technology.

Conclusions

In this study, we achieved high efficiency in collecting data with autonomous UAS flights, greater than 77% accuracy in classifying plots on a 1–5 severity scale, and confidence in this system for IDC assessment through an average reduction in LSD values across a series of experimental trials. This method is high-throughput, objective, and more precise than traditional ground-based visual assessments. In addition, it allows researchers to collect information from more plots in a given year and at a much higher temporal frequency than before.