One of the great challenges for the future is the production of sufficient food for a growing population. From 1961 to 2012, the human population more than doubled from approximately 3 billion to 7 billion people and a further increase to 9.3 billion is projected for the year 2050 (FAOSTAT 2016). This means that crop production must double by 2050 to meet the predicted production demands of the global population. However, achieving this goal will be a significant challenge for agriculture since crop yields would have to increase at a rate of 2.4% per year, yet the average rate of increase is only 1.3%, with yields stagnating in up to 40% of land under cereal production (Araus and Cairns 2014). Further, climate change will exacerbate this challenge by intensifying field crop exposure to abiotic stress conditions, including rising temperature, drought, and increased CO2 concentration [CO2] (Christensen et al. 2007). This is a major issue because climatic factors since the end of the 1980s have counterbalanced the wheat genetic progress of recent decades in Europe (Oury et al. 2012). Indeed, as observed by Oury et al. (2012) and Gray and Brady (2016), the beneficial effects expected from the increase in atmospheric [CO2] in the World’s crop production during recent decades have been constrained by the effects of temperature increases and extended drought.

Grain legumes are the main source of proteins, minerals, and fibers for animals and humans (Meena et al. 2018). To achieve significant improvements in crop yield, breeding strategies aiming to increase biomass gains and crop productivity need to focus on radiation uptake, photosynthetic efficiency, and harvest index (HI) (Reynolds et al. 2012; Koester et al. 2014). However, to date, breeding for higher photosynthetic efficiency or for tolerance to different environmental stresses has only played a minor role in increasing crop productivity over past decades (Zhu et al. 2010). In a rational sense, plant physiology research should focus on improving photosynthesis due to its central part in plant productivity (Long et al. 2004). Recently, different studies have advanced how to optimize photosynthetic processes in different crops (Ort et al. 2015; Simkin et al. 2019).

One way to improve crop photosynthesis is to increase our knowledge of genomic control of photosynthesis under different environmental conditions. To achieve this, diverse crop populations representing hundreds of cultivars need to be screened (phenotyped) under different environments to associate traits of interest (i.e., photosynthetic parameters) with specific genomic regions. With the rise of genomic and bioinformatics technologies, phenotyping entire populations for traits of interest is the bottleneck that delays scientific advancement in genomics (Adachi et al. 2011; Yan et al. 2015; de Oliveira Silva et al. 2018; Oakley et al. 2018). Therefore, genomic approaches and breeding solutions need to implement new high-throughput phenotyping techniques that allow rapid measurement of photosynthetic traits for screening cultivars in the shortest amount of time (Araus and Cairns 2014; Araus et al. 2018). By improving techniques for measuring photosynthetic traits, more efficient cultivar selection will likely improve both yield potential and resilience to abiotic stresses.

Photosynthetic performance is frequently measured with an infrared gas analyzer that assesses plant CO2 assimilation rate. Photosynthetic parameters, such as leaf mid-day photosynthesis and leaf diurnal photosynthesis, can be used to assess in situ plant performance under different abiotic stresses (Sanz-Sáez et al. 2012, 2017). More detailed photosynthetic parameters, such as maximum rate of rubisco-catalyzed carboxylation (Vc,max) and maximum electron transport rate supporting RuBP regeneration (Jmax), have been identified as selection parameters for tolerance to abiotic stress, such as drought (Aranjuelo et al. 2009, 2013), elevated tropospheric ozone (Yendrek et al. 2017), or for improved performance under elevated atmospheric CO2 (Ainsworth et al. 2004; Soba et al. 2020). Depending on the parameter to be measured, sampling can take a few minutes each (e.g., mid-day photosynthesis) or 20–60 min per sample for photosynthetic parameters, such as Vc,max and Jmax, which are calculated using photosynthesis to intercellular CO2 curves or A–Ci curves (Farquhar et al. 1980; Long and Bernacchi 2003). In addition, Vc,max and Jmax are essential input parameters for the FvCB model (Farquhar et al. 1980) that relates photosynthetic biochemistry responses to known environmental conditions (Von Caemmerer 2013). This model has also been used in earth systems models for predicting ecosystem responses to environmental changes (Rogers 2014).

Reflectance spectra at leaf and canopy levels can facilitate assessment of plant’s structure, nutritional status, and certain stress parameters. This includes estimating contents of chlorophyll, xanthophylls, nitrogen, phosphorus, fiber, sucrose (Gamon et al. 1997; Peñuelas and Filella 1998; Petisco et al. 2006; Asner and Martin 2008; Colombo et al. 2008; Ainsworth et al. 2014; Serbin et al. 2014; Dechant et al. 2017; Yendrek et al. 2017), and plant secondary metabolites (Couture et al. 2016; Vergara-Diaz et al. 2020). In addition, leaf level spectral reflectance has been used to predict photosynthetic parameters, such as Vc,max and Jmax in soybean (Ainsworth et al. 2014), wheat (Silva-Perez et al. 2018), maize (Heckmann et al. 2017; Yendrek et al. 2017), and trees (Serbin et al. 2012) as well as dark respiration in wheat (Coast et al. 2019).

Although translating data acquired with a field spectrometer using a leaf clip to scalable imaging approaches using multispectral or hyperspectral cameras in drones or other aerial platforms (frequently limited to the 350–1000 nm spectral range) may be further complicated by the heterogeneous nature of canopies, such techniques could greatly expand the scope of applicability of these measurements. In the above-mentioned research, relationships between photosynthetic parameters and complex data arrays captured by leaf level spectrometers need to be analyzed using complex multivariate statistical models. Partial least squares regression (PLSR) is the most commonly used model (Serbin et al. 2012; Ainsworth et al. 2014; Heckmann et al. 2017; Silva-Pérez et al. 2017; Yendrek et al. 2017). However, Fu et al. (2020) recently reported that other machine learning algorithms such as Least Absolute Shrinkage and Selection Operator (LASSO) can estimate photosynthetic parameters as accurately or better than PLSR, since LASSO is more robust when comparing different environments or plant species (Tibshirani 1996). Therefore, to bypass PLSR performance problems, we propose to explore other powerful machine learning algorithms with appropriate feature extraction capacities, which include LASSO (Vergara-Diaz et al. 2020), Bayesian Ridge (BR; Neal 1996), and Automatic Relevance Determination Regression (ARDR; Tipping 2001).

For these multivariate models, utilized data must represent enough phenotypic variability to support proper model functioning. To achieve sufficient phenotypic variability, several researchers have applied a range of growth conditions, including different levels of abiotic stresses, such as drought (Silva-Perez et al. 2017), elevated tropospheric ozone (Ainsworth et al. 2014; Yendrek et al. 2017), or high temperature (Serbin et al. 2012). Another means for increasing phenotypic variation is by including several related species in the same model. For example, Doughty et al. (2011) used 149 tropical tree species to create a PLSR model to estimate mid-day photosynthesis using canopy hyperspectral imaging; and Serbin et al. (2012) combined hyperspectral data of two tree species to estimate Vc,max. However, to the best of our knowledge, no published study has combined multiple leguminous row crops species. In our research, we focused on soybean (Glycine max) and peanut (Arachys hypogea), which are leguminous crops often grown under high abiotic stress levels (drought and elevated temperature) in the southeastern United States. These legume crops are also important in rotation with corn and cotton.

The aims of this study were (i) to estimate photosynthetic capacity parameters, such as mid-day photosynthesis, leaf chlorophyll content (LCC), Vc,max, and Jmax of two legume crops (soybean and peanut) using full-range leaf level reflectance spectra (VIS–NIR–SWIR, 400–2500 nm) with PLSR, BR, ARDR and LASSO models and (ii) to simulate photosynthetic parameter model performance using four common types of sensors with more limited wavelength ranges: VIS–NIR (350–1000 nm), NIR–SWIR (1000–2500 nm), SWIR (1400–2500 nm), and an advanced multispectral sensor imitating the ESA Copernicus Sentinel 2 satellite with 12 spectral bands.

Materials and methods

Trial setup and design

Experiments were conducted in field trials and controlled conditions located at Auburn University (Alabama, USA). The study was carried out with two leguminous crops (soybean and peanut) that were exposed to different growth conditions. The first experiment involved two soybean (Glycine max. L) cultivars grown under ambient and elevated [CO2] at an Open Top Chamber Facility. The second experiment involved four soybean cultivars grown under high night temperature in growth chambers. The third experiment was performed with 6 peanut (Arachis hypogea L.) cultivars grown under well-watered and water-stress conditions in a greenhouse.

Experiment 1: soybean cultivar response to elevated [CO2]

Two soybean cultivars representing high (PI398223) and low (PI567201A) water use efficiencies (WUE) were chosen for the study based on previous screening by Dhanapal et al. (2015). The two cultivars were planted on 16 May 2019 in 20 L pots filled with commercial growth media (Pro-Mix, Premier Tech, Quebec, Canada) at the Open Top Chamber Facility located at the USDA-ARS National Soil Dynamics Laboratory, Auburn, AL, USA. Open top chambers (OTC) (Rogers et al. 1983), encompassing 7.3 m2 of ground surface area, were used to deliver target [CO2] of ~ 410 ppm (ambient) or ambient plus 200 ppm (elevated) [CO2] during light hours using a delivery and monitoring system described elsewhere (Mitchell et al. 1995). There were four replicate chambers of each CO2 level for a total of eight experimental plots. Each OTC held two pots of each cultivar to have two sub-replicates for each plot. The experiment was conducted as a split plot design with CO2 level being the main plot factor and cultivar being the split plot factor. Mid-day photosynthesis and A–Ci curves were performed when plants were at the beginning of pod development (R3, Fehr et al. 1971, 15 July) and at the beginning of seed filling (R5, 26 July) according to growth stages defined by Fehr et al. (1971). Relative chlorophyll content and leaf hyperspectral reflectance measurements were performed concurrently with photosynthetic parameter measurements. More detailed information on experimental design was previously reported by Soba et al. (2020).

Experiment 2: soybean cultivar response to high night temperatures

Four soybean cultivars (PI360846, DS25-1, PI458098, and Agx9) were planted in 3.8 L pots containing a peat-moss: perlite potting mixture (2:1) on 1 May 2019. Plants were grown at the Auburn University Plant Science Research Center greenhouse complex. Temperatures were maintained at 28/20 °C (day/night) until plants reached the first flowering stage (R1). To impose night temperature treatments, plants were then moved to two Conviron CMP 6010 growth chambers (Conviron, Manitoba, Winnipeg, Canada) maintained on a 12 h photoperiod (1200 µmol m−2 s−1 PAR) with 50/70% RH (day/night). Control plants were grown at 30/20 °C (day/night) and high night temperature plants were grown at 30/30 °C (day/night). Three replicates per cultivar and chamber were used and the whole experiment was repeated twice. Fourteen days after temperature treatments were imposed, mid-day photosynthesis, A–Ci curves, LCC, and leaf hyperspectral reflectance were performed as explained below.

Experiment 3: peanut cultivar response to drought

Six peanut cultivars (AUG16-28, AU17, 18H19-3738, G06-G, AU8-19, and AU18-21) were planted at the Auburn University Plant Science Research Center greenhouse complex on 21 April 2019. Plants were grown in 20 L pots containing a mixture of sand and sandy-loam field soil (1:1, w/w) collected from EV-Smith Research Center, Shorter, AL, USA. Plants were maintained under well-watered conditions (80% relative soil water content, RSWC) until 60 days old; at this time, the drought experiment was initiated. Weighing pots every 2–3 days initially and every day towards the end of the experiment allowed RWSC to be gravimetrically maintained. Well-watered plants were maintained at 80% RSWC while drought plants were maintained at a 30% RSWC. Four replicates per cultivar and stress treatment were used in this experiment. At 20 and 40 days after drought initiation (i.e., 80- and 100-day-old plants), mid-day photosynthesis, A–Ci curves, LCC, and leaf hyperspectral reflectance measurements were performed as explained below.

Physiological parameter assessments

In this study, mid-day photosynthesis, A–Ci curves, and SPAD measurements were taken from 3 different experiments and coupled with full-range (350–2500 nm), high-resolution (3–8 nm) spectral reflectance measurements taken with a Field Spec Hi-Res four field spectrometer (Analytical Spectral Devices, Boulder, CO, USA) to predict physiological parameters that characterize photosynthetic traits.

Mid-day photosynthesis measurements

Depending on experiment size, mid-day photosynthesis measurements were taken one day before A–Ci curves using two or three LI-6400 (Li-Cor Biosciences, Lincoln, NE, USA) systems. Measurements were performed on fully expanded young leaves corresponding with the third/forth leaf from the top in soybean, and second/third leaf from the top of the main stem in peanut. Prior to measurements, systems were set to match environmental growth conditions (light intensity and temperature) and maintained at a relative humidity of 60–70%. While photosynthesis measurements were in progress, relative chlorophyll content and spectral reflectance measurements were also performed on the same leaves using a SPAD meter (Minolta SPAD-502, Spectrum Technologies Inc., Plainfield, IL, USA) and the Field Spec Hi-Res 4 field spectrometer, respectively.

A–Ci curves

To calculate maximum rate of rubisco-catalyzed carboxylation (Vc,max) and maximum electron transport rate supporting RuBP regeneration (Jmax), A–Ci curves were performed at different developmental stages in each experiment. In general, the A–Ci curves were the same for peanut and soybean except for different light saturation points: 1750 μmol m−2 s−1 PAR for soybean (Ainsworth et al. 2004) and 2000 μmol m−2 s−1 PAR for peanuts (Ferreyra et al. 2000). Photosynthesis was initially induced at the growth [CO2] (410 ppm for ambient and 610 ppm for elevated CO2 treatments), and then [CO2] was reduced stepwise to the lowest concentration of 50 ppm. Afterwards, [CO2] was increased stepwise to the highest CO2 concentration of 1500 ppm. A total of 11 measurements per curve were recorded (Sanz-Sáez et al. 2017). During measurements, block temperature was set at 28 °C (i.e., mean mid-day temperature at Auburn, AL). The equations and spreadsheet developed by Sharkey et al. (2007) were used to calculate Vc,max and Jmax normalized at 25 °C as it has been demonstrated by (Khan et al. 2021) that different temperatures and the effect on reflectance does not affect prediction of these normalized parameters. While A–Ci curves were taken, concurrent spectral reflectance measurements were performed on the same leaves.

Relative chlorophyll content

Relative chlorophyll content was taken on the same mid-day photosynthesis leaves using a SPAD-502 chlorophyll meter (Konica Minolta, Tokyo, Japan). Five subsample measurements per leaf were collected and averaged.

Leaf spectral reflectance measurements

Leaf spectral reflectance was measured with a FieldSpec Hi-Res 4 concurrently on the same leaves used for photosynthetic measurements. This device has three sensors with a full spectro-radiometer range of 350–2500 nm, with a resolution of 3 nm in visible (VIS; 350–700) and near-infrared (NIR; 700–1000 nm) and 8 nm in shortwave-infrared (SWIR; 1000–2500 nm). Measurements were taken via a leaf clip coupled to a fiber-optic cable. The FieldSpec has a radiometrically calibrated internal light source, which was standardized for relative reflectance using white reference measurements every 15 min. For each leaf, 6 reflectance measurements were recorded on different regions of a single leaf per pot. We used the FieldSpectra package in R to average the six samples and align the VIS, NIR, SWIR sensors with a spectral splice correction (Serbin et al. 2014; Yendrek et al. 2017).

To accomplish the second research aim, we simulated if a more limited spectral range (corresponding to other remote sensing devices) would be able to estimate photosynthetic parameters with the same accuracy as the full-range spectra achieved with the Field Spec HiRes4. Simple spectral resampling of four different sensors was performed to simulate commercial spectrophotometer sensors, such as the UniSpec-DC VIS/NIR (310–1100 nm; PP Systems, Amesbury, MA, USA), the USB 2000 VIS/NIR (340–1014 nm; Ocean Optics, Dunedin, FL, USA), and the Liga SWIR spectrophotometer (850–1888 nm; STEAG Micro Parts, Dortmund, Germany). We also included a resampling simulation for the bands and bandwidths of the ESA Copernicus Sentinel-2 satellite, with 12 spectral bands (443, 494, 560, 665, 704, 740, 781, 834, 944, 1375, 1612, and 2194 nm) representing VIS, NIR, and SWIR (see more in Drusch et al. 2012; Segarra et al. 2020).

Statistical analysis of measured and estimate values

Statistical analyses were conducted using R Studio (RStudio Team 2020) and Python 3.7 (Python Software Foundation, via a Jupiter notebook (Wofford et al. 2019). Effects of abiotic stress treatments and differences between cultivars on studied variables were assessed using analysis of variance (ANOVA) in R Studio. We also analyzed correlations between photosynthetic parameters against each spectrum band by Pearson’s correlation using R Studio.

With respect to the different advance regression models, we used the SciPy module (Jones et al. 2001; Varoquaux et al. 2015) in Python 3.7 and the Scikit-Learn library for the estimation of different parameters to estimate determination (R2) and the root means squared error (RMSE). For cross-validation, we used the “train test split method” where, we split our data into training (60% of the data used to build the model) and testing (40% of the data used to test the model). This method quantifies the prediction error, the RMSE, which measures the average prediction error made by the model in predicting the outcome for an observation. That is, the average difference between the observed known outcome values and the values predicted by the model. Associations between photosynthetic parameters (response variables) and the leaf reflectance spectrum (explanatory) variables were analyzed using four advances models: (i) Partial Least Squares Regression (PLSR) is based on the dimension reduction method (Wold et al. 2001). For this model, we used between 5 and 11 components, choosing the number of components that gave the highest R2 and the lower RSME; (ii) Least Absolute Shrinkage and Selection Operator (LASSO) is a shrinkage method (Tibshirani 1996); (iii) Bayesian ridge (BR) and (iv) Automatic relevance determination regression (ARDR) are both high-dimensional methods (Neal 1996; Tipping 2001). Figures were prepared using the matplotlib (Hunt 2019) and Seaborn Python (Waskom et al. 2017) modules in Python 3.7.


Effect of abiotic stress and cultivar on photosynthetic parameters

Analyzing the effect of abiotic stress and cultivars can yield valuable insights into phenotypic range of variation within each experiment. In Experiment 1, the two soybean cultivars showed significant effects of [CO2] on mid-day photosynthesis and LCC, but not on Vc,max and Jmax (Table 1 and Fig. S2). We observed treatment effects for mid-day photosynthesis and LCC (Table 1a). In summary, phenotypic variation was noticeable with a range of 17.01–36.22 µmol m−2 s−1 for mid-day photosynthesis, 34.55–51.35 for LCC, 182.9–348.4 µmol m−2 s−1 for Vc,max, 174.7–263.7 µmol m−2 s−1 for Jmax, and 29.4–30.37 °C for leaf temperature. In Experiment 2, four soybean cultivars were grown under high night temperature (30/30 °C day/night) for comparison to controls (30/20 °C day/night). Cultivar effects with treatment showed a significant effect on mid-day photosynthesis, LCC, and Jmax (Table 1b). Overall, phenotypic variation was noticeable with a range of 11.52–32.68 µmol m−2 s−1 for mid-day photosynthesis, 34.01–53.95 for LCC, 48.01–135.2 µmol m−2 s−1 for Vc,max, 61.01–165.1 µmol m−2 s−1for Jmax, and 29.9–30.33 °C for leaf temperature. In Experiment 3, the effect of drought was significant for all measured peanut parameters except for Vc,max and Jmax (Table 2). Cultivars only showed significant effects for LCC and Jmax. The interaction effects of drought and cultivars was only slightly significant for Vc,max (P = 0.094). Phenotypic variation was perceptible since mid-day photosynthesis ranged from 5.051 to 26.41 µmol m−2 s−1, LCC varied from 42.30 to 52.45, Vc,max varied from 64.38 to 171.3 µmol m−2 s−1, Jmax ranged from 79.3 to 206.1 µmol m−2 s−1, and 28.6 to 30.5 °C for leaf temperature. When phenotypic variation of all three experiments was considered together, the range for mid-day photosynthesis was 5.051–36.22 µmol m−2 s−1, 34.55–53.95 for LCC, 48.01–348.4 µmol m−2 s−1 for Vc,max, 61.01–263.7 µmol m−2 s−1 for Jmax, and 26.33–31.55 °C for leaf temperature (Fig. S2, shows the Box plot for each experiment).

Table 1 Mean values of mid-day photosynthesis (µmol m−2 s−1), leaf chlorophyll content (LCC, arbitrary units), maximum rate of rubisco-catalyzed carboxylation (Vc,max, µmol m−2 s−1), maximum electron transport rate supporting RuBP regeneration (Jmax, µmol m−2 s−1), and leaf temperature (°C) per each treatment. (a) Experiment 1: two varieties of soybean grown at 410 ppm and 610 ppm of [CO2]; n = 32. (b) Experiment 2: four soybean varieties grown at low (20 °C) and high (30 °C) night temperature; n = 48
Table 2 Mean values of midday photosynthesis (µmol m−2 s−1), leaf chlorophyll content (LCC, arbitrary units), maximum rate of rubisco-catalyzed carboxylation (Vc,max, µmol m−2 s−1), maximum electron transport rate supporting RuBP regeneration (Jmax, µmol m−2 s−1), and leaf temperature (°C) in six varieties of peanut grown under well-watered (WW, 80% SWC) and water-stress (WS, 30% SWC) conditions

Relationships between spectral signatures and photosynthetic parameters

Figure 1 presents the sensitivity of leaf reflectance spectrum for different species and abiotic stresses. Under high night temperature, soybean reflectance spectrum shows higher variability than the control with a larger peak at ~ 550 nm and wider reflectance band between ~ 750–1400, 1550–1800 and 2000–2300 nm (Fig. 1a, b). Elevated CO2 in soybean tended to reduce variability of the reflectance spectrum between ~ 500–600 and 750–1400 while maintaining the variability in the reflectance spectrum between 1550–1800 and 2000–2300 nm (Fig. 1c, d). In peanut, drought increased variability at all wavelengths with the exception of the 500–600 nm range (Fig. 1e, f). When comparing reflectance of the two legume species, we noted that peanut added a lot of spectral variation in the range from 750 to 2300 nm, probably due to the drought treatment; meanwhile soybean added more variability in the 500–600 nm range (Fig. S1).

Fig. 1
figure 1

a Mean, ± standard deviation (n = 24), and minimum and maximum leaf reflectance for soybean at high night temperature grown in growth chambers. b Mean, ± standard deviation (n = 24), and minimum and maximum leaf reflectance for soybean at control temperature grown in growth chambers. c Mean, ± standard deviation (n = 18), and minimum and maximum leaf reflectance for soybean at 610 ppm grown at an Open Top Chamber Facility. d Mean, ± standard deviation (n = 18), and minimum and maximum leaf reflectance for soybean at 410 ppm grown at an Open Top Chamber Facility. e Mean, ± standard deviation (n = 24), and minimum and maximum leaf reflectance for peanut drought grown under greenhouse conditions. f Mean, ± standard deviation (n = 24), and minimum and maximum leaf reflectance for peanut irrigated grown under greenhouse conditions

Pearson’s correlations were performed to highlight which zones of spectral signatures presented negative or positive correlations with each measured parameter. Pearson’s correlations between the parameter and each wavelength were presented separately for soybean (Fig. 2a), peanut (Fig. 2b), and both species combined (Fig. 2c). Regarding soybean Vc,max and Jmax values, correlation against each band showed significant (P < 0.05) negative values (Pearson coefficient around − 0.6) in the VIS (400 nm) and in almost all SWIR (1400–2500 nm) bands (Fig. 2a). On the other hand, mid-day photosynthesis and LCC presented lower and no significant correlation coefficients against each band from the reflectance spectrum. In the case of peanut (Fig. 2b), photosynthesis values against each wavelength band showed significant correlation (r = − 0.6, P < 0.05) in VIS–NIR (400–1000 nm) bands. LCC and each wavelength showed strong correlation (r = − 0.7, P < 0.05) in the NIR (700 nm). For Vc,max and Jmax, the correlation against each wavelength was very low or non-significant (Fig. 2b). With increased variability from combining all experiments, we could observe that mid-day photosynthesis against each wavelength showed a significant correlation (r = − 0.5, P < 0.05) in the VIS (400 nm). Regarding the coefficient of correlation between Vc,max and Jmax, significance (r = 0.6, P < 0.05) in the VIS (400 nm) and most of the SWIR (1400–2500 nm) bands indicated an improvement relative to species analyzed separately. For this reason, we ran all advance models using combined phenotypic and spectral data from each species and environmental condition.

Fig. 2
figure 2

Pearson’s correlation coefficients (r) between photosynthetic parameters and each wavelength from the leaf reflectance spectrum for each species and both species combined. a Soybean varieties under two treatments, one at high [CO2] and the other at high temperature. b Peanut varieties at water stress. c Soybean and peanut data pooled together. Each graphic presents in the x-axis the wavelength spectrum between 350 and 2500 nm and in the y-axis the Pearson’s correlation coefficient from − 1 to 1. The discontinuous line in each graphic means the significance level P < 0.05 below the x-axis

Estimating photosynthetic parameters using field spectroscopy and advance regression models

To test how accurately a given model estimated different photosynthetic parameters, we presented the coefficient of determination (R2) and RMSE for each model and mean parameter, i.e., interpreted as the proportion of information in data that is explained by each model (Fig. 3). Since estimation of the Vc,max and Jmax parameters did not work well in the peanut experiment but worked well for the soybean (Table S1), and since the LCC estimation does not work with soybean, we decided to combine these three experiments and focus on the combination of the two crop species in this manuscript (Fig. 3). Mid-day photosynthesis showed a higher R2 (0.62) and low RMSE (4.79) using the PLSR model using 10 components, followed by BR (R2 = 0.41 and RMSE = 5.92) with the worst model being the ARDR (R2 = 0.28 and RMSE = 6.55) (Fig. 3a). LCC was better assessed by PLSR (R2 = 0.56 and RMSE 3.83) using 10 components, followed by ARDR (R2 = 0.34 and RMSE = 4.71) with the BR model showing the worst performance (R2 = 0.08 and RMSE = 5.55; Fig. 2b). The best Vc,max model was obtained by PLSR (R2 = 0.70 and RMSE = 42.80) using nine components followed by the other three models with similar values (R2 = 0.56–0.59; RMSE = 50.11–52.03). Regarding Jmax, the best model was PLSR (R2 = 0.50 and RMSE = 35.83) using nine components closely followed by Lasso (R2 = 0.46 and RMSE = 37.1) and BR (R2 = 0.45 and RMSE = 37.41), with ARDR (R2 = 0.40 and RMSE = 39.29) being the worst model.

Fig. 3
figure 3

Measured against estimated values correlation for different physiological parameters estimated with PLSR (blue), BR (green), ARDR (red), and LASSO (yellow) predictive models. The estimated physiological parameters are: mid-day photosynthesis (a), leaf chlorophyll content (b), maximum rate of Rubisco catalyzed carboxylation (Vc,max, c) and maximum electron transport rate supporting RuBP regeneration (Jmax, d) for soybean and peanut cultivars all pooled together. All the models were built using train and test data splitting them into 60 and 40%, respectively. In each graph, the R2, the RMSE of the train and test of the model are shown along with the size of the train and test population and number of model components (comp) used in each PLSR model. The gray dashed line shows the 1:1 line

For each of the four models, we calculated the coefficient of weight for each band and model (Fig. 4). These coefficients showed waveband contributions along the VIS–NIR–SWIR spectrum for photosynthetic parameter estimations using leaf reflectance spectrum of pooled species, cultivars, and growing conditions. The coefficient of weight for estimating mid-day photosynthesis using PLSR showed maximum values around 400, 750, and 1750 nm, while ARDR and LASSO showed high coefficient weights at 400 nm. On the other hand, BR did not show any remarkable coefficient weights for mid-day photosynthesis (Fig. 4a). With respect to LCC, PLSR showed maximum coefficients at 400, 750, and 1750 nm, while ARDR showed a peak around 400 nm (Fig. 4b). LASSO and BR showed very low coefficients at all wavelengths (Fig. 4b). In Fig. 4c, we can observe the different coefficients of each band for Vc,max, where the maximum peaks were at 400, 700 and around 2000 nm for PLSR, BR and LASSO, while for ARDR it was only at 400 and 750 nm. For estimates of Jmax, the highest coefficient weights for PLSR were located in SWIR (2200–2300), followed by NIR (900–1100). For the LASSO model, the strongest areas were at 400, 750, and 1750 nm (Fig. 4d), while the highest coefficients were found in the SWIR (1400–2500 nm) for BR and ARDR.

Fig. 4
figure 4

Spectral-specific coefficients for each prediction model (PLSR, BR, ARDR and LASSO) used to predict the following photosynthetic parameters of the two species pooled together. a Mid-day photosynthesis. b Leaf chlorophyll content (LCC). c Maximum rate of Rubisco carboxylation (Vc,max). d Maximum electron transport rate supporting RuBP regeneration (Jmax,). Continuous vertical lines delineate different regions of the spectrum: VIS = 450–700 nm, NIR = 700–1400, and SWIR = 1400–2500 nm

Scaling up estimations of photosynthetic parameters for potential hyperspectral aerial or satellite applications

To assess their ability to estimate photosynthetic parameters compared to full spectra captured by the Field Spec Hi-Res4 (VIS–NIR–SWIR, 350–2500 nm), we simulated other sensors with limited wavelength ranges, specifically VIS–NIR (350–1000 nm), NIR–SWIR (1000–2500 nm), SWIR (1400–2500 nm), and the 12 wavelength bands of Sentinel-2 satellites (Table S2). To test this, we used reflectance data acquired by the Field Spec Hi-Res4 and separated the reflectance data according to the wavelength range of each before mentioned sensor. We then performed photosynthetic estimations using the same 4 models (PLSR, BR, ARDR, and LASSO).

Table 3, Figs. 5, and 6 show estimations of photosynthetic parameters using pooled data from both species. For mid-day photosynthesis and LCC, simulations with different sensors with just the VIS–NIR (350–1000 nm), NIR–SWIR (1000–2500 nm), and SWIR (1400–2500 nm) spectrum regions were best performed using PLSR compared to BR, ARDR, and LASSO models (Table 3; Figs. 5, 6). However, LCC was estimated best by BR, ARDR, and LASSO using the simulated ESA Copernicus Sentinel-2 satellite multispectral bands (Table 3; Figs. 5, 6). Concerning estimation of Vc,max within the VIS–NIR range (350–1000 nm) and the ESA Copernicus Sentinel-2 satellite sensors, the best performing model was PLSR using 10 components (R2 = 0.63 and 0.53, respectively). For simulations of the NIR–SWIR (1000–2500 nm), SWIR (1400–2500 nm), BR was the best model for assessing Vc,max (R2 = 0.62 and 0.60, respectively). For estimating Jmax with the VIS–NIR (350–1000 nm) sensor, the best model was LASSO (R2 = 0.42). For the range NIR–SWIR (1000–2500 nm), SWIR (1400–2500 nm) ARDR estimated Jmax similarly (R2 = 0.51). PLSR, BR, and LASSO presented the same coefficient of determination (R2 = 0.41) when using ESA Copernicus Sentinel-2 satellite simulated wavebands to assess Jmax.

Table 3 Coefficient of determination (R2) and root mean squared error (RMSE) of mid-day photosynthesis (µmol m−2 s−1), leaf chlorophyll content (arbitrary units) of all species pooled together based on leaf reflectance spectra at different ranges [VIS–NIR (350–1000 nm), NIR-SWIR (1000-–2500 nm), SWIR (1400–2500 nm), and Sentinel-2 bands] through advance regression models: Partial Least Squares Regression (PLSR), Bayesian Ridge (BR), the Automatic Relevance Determination Regression (ARDR), and Least Absolute Shrinkage and Selection Operator (LASSO)
Fig. 5
figure 5

Measured (X-axis) against estimated (Y-axis) correlation of maximum rate of rubisco-catalyzed carboxylation (Vc,max) estimated with PLSR (blue), BR (green), ARDR (red), and LASSO (yellow) predictive models. These models were based on leaf reflectance spectra at different ranges [VIS–NIR (350–1000 nm), NIR–SWIR (1000–2500 nm), SWIR (1400–2500 nm), and Sentinel-2 bands] for soybean and peanut cultivars all pooled together. All the models were built using the training and test split method (60 and 40%, respectively). Each graph shows the train and test R2 and the RMSE values of for each model. For PLSR models, we used 10 components. Size of population is n = 158. The gray dashed line shows the 1:1 line

Fig. 6
figure 6

Measured (axis X) against estimated (axis Y) correlation of maximum electron transport rate supporting RuBP regeneration (Jmax) estimated with PLSR (blue), BR (green), ARDR (red), and LASSO (yellow) predictive models. These models were based on leaf reflectance spectra at different ranges [VIS–NIR (350–1000 nm), NIR–SWIR (1000–2500 nm), SWIR (1400–2500 nm), and Sentinel-2 bands] for soybean and peanut cultivars all pooled together. All the models were built using the training and test split method (60 and 40%, respectively). Each graph shows the train and test R2 and the RMSE values of for each model. For PLSR models, we used 10 components. Size of population is n = 158. The gray dashed line shows the 1:1 line

Regarding comparison of different sensors (VIS–NIR, NIR–SWIR, and SWIR) against original FieldSpec data (VIS–NIR–SWIR), we observed that estimation of mid-day photosynthesis by the different models was similar to that of simulated sensors (Figs. 3, 5, 6 and Table 3). With ESA Copernicus Sentinel-2 satellite, estimation of mid-day photosynthesis did not work. Estimation of LCC using ESA Copernicus Sentinel-2 satellite was lower than when the whole spectrum was used. Regarding the estimation of the Vc,max simulating ESA Copernicus Sentinel-2 satellite, the PLSR and LASSO presented an R2 (0.50) that was a little lower than the FieldSpec (R2 = 0.70). With respect to Jmax estimation, we observed that coefficients for the simulated NIR–SWIR and SWIR sensor ranges were very similar (but slightly lower) to the full-range FieldSpec (Figs. 3, 5, 6 and Table 3). The VIS–NIR and ESA Copernicus Sentinel-2 satellite simulations presented values that were lower than using the whole spectrum (Figs. 3, 5, 6 and Table 3).


Estimating photosynthetic parameters using field spectroscopy and advance regression models

The main objective of this research was to assess which advanced statistical model (PLSR, BR, ARDR, and LASSO) was the most successful in estimating different photosynthetic parameters using leaf reflectance spectra (VIS–NIR–SWIR, 350–2500 nm) from two legume species. The use of advance regression models to predict different physiological parameters needs ample phenotypic variation to be accurate (Kuhn and Johnson 2013). Since statistical effects of different treatments over some variables were not significant (Table 1), we combined findings from three experiments (two different species) to increase phenotypic range for better parameter estimation with all models rather than examining each species separately (Table S1). Similar approaches have been used recently to increase phenotypic variation and obtain a better prediction model by including different species and/or cultivars (Doughty et al. 2011; Serbin et al. 2012; Choquette et al. 2019), different abiotic stresses such as drought (Silva-Perez et al. 2018), or elevated atmospheric ozone concentrations (Ainsworth et al. 2014; Yendrek et al. 2017).

In our study, when data from both legumes were combined, almost all of the advanced models were able to estimate Vc,max and Jmax at greater than R2 > 0.50 (Fig. 3). Of the four models used to predict these two parameters, PLSR was the overall best model for Vc,max (R2 = 0.70 and RMSE 42.80) and Jmax (R2 = 0.50 and 35.83), followed by LASSO and BR for Vc,max (R2 = 0.59 with RMSE 50.11; 0.59 with a RMSE 50.15, respectively), and BR and LASSO for Jmax (R2 = 0.45 with a RMSE 37.11; 0.46 with a RMSE 37.41, respectively) (Fig. 3). This may be because the PLSR model does not estimate shrinkage when performing variable selection (spectral wavebands) as do BR, ARDR, and LASSO (Neal 1996; Tipping 2001; Wold et al. 2001). Others have also found that PLSR and LASSO had similar estimation capacities, showing that LASSO band block contribution was similar to the PLSR model (Fu et al. 2020). Specific reasons why PLSR was more efficient at estimating photosynthetic parameters assessed in this study are discussed in detail below.

Successful predictions of Vc,max (R2 = 0.89 with a RMSE 15.4) and Jmax (R2 = 0.93 with a RMSE 18.67) using PLSR have been previously obtained by combining two tree species (Serbin et al. 2012); this study showed statistically significant phenotypic variation due to temperature treatments as well as species. In our study, the lower R2 associated with Vc,max and Jmax estimates could be attributed to the lack of effect of some environmental treatments (temperature, elevated CO2, and drought) and cultivars over these parameters (Table 1). However, Ainsworth et al. (2014) showed a significant correlation between measured and estimated Vc,max (R2 = 0.88 with a RMSE 13.4) with the effect of treatments (elevated ozone) and cultivars not being significant. This demonstrated that good parameter estimation and significant treatment or cultivar effects are not mutually exclusive and that it is only necessary to have sufficient range in variation of phenotypic data. For example, Ainsworth et al. (2014) and Serbin et al. (2012) noted Vc,max variation (60–280 μmol m−2 s−1 and 40–170 μmol m−2 s−1, respectively) similar to the values obtained in this study when all three experiments were combined (48–348 μmol m−2 s−1 for the current experiment). Since the ranges in variation of Vc,max and Jmax data are similar but higher to those obtained in the above-mentioned research, why are R2 values in the current study for Vc,max (R2 = 0.70) and Jmax (R2 = 0.50) lower and RSME (42.80 and 35.83, respectively) higher than in those studies? Tibshirani (1996) has noted that PLSR models lose accuracy when estimating parameters across different environments. Research by Serbin et al. (2012) and Ainsworth et al. (2014) were each performed in one environment (greenhouse and field, respectively) for one growing season, while our study combined information from three experiments representing distinct environments (greenhouse, growth chambers, and open top chambers) with plants grown at very different environmental conditions. In an experiment with several corn breeding lines grown under ambient and elevated ozone repeated over three growing seasons, Yendrek et al. (2017) obtained Vc,max estimations (R2 = 0.55 with RMSE 6.61, and 0.65 with a RMSE 6.60) similar to those reported in our study but with a RMSE lower than ours. This was probably due to the effects of changing environments on PLSR performance (Serbin et al. 2012; Ainsworth et al. 2014). Regarding the lower RMSE obtained in the above-mentioned publications (Serbin et al. 2012; Ainsworth et al. 2014; Yendrek et al. 2017) in comparison with those obtained in our research, this could be due to the different cross-validation used in our approach. In our cross-validation, the test error rate can be highly variable, depending on which observations are included in the training set and which observations are included in the validation set. This may be the reason for the higher RMSE values observed in Vc,max and Jmax. Also the high RMSE values can be due to a higher phenotypic range as a result of including two crop species grown in three very different environments. This highlights the importance of performing calibration experiments under multiple environments. Other issue that can arise is the use of these models with completely new set of cultivars and experimental conditions as was tested in Yendrek et al. (2017). In such a case, it would be recommendable to test model precision by measuring spectral reflectance under new conditions and corroborating model estimates of extreme values for Vc,max with ground truth measurements of the photosynthetic parameter. Although this extra step will take more time, this procedure could serve to test model accuracy and help improve the model with new training data.

To solve this multiple environment/location problem, new approaches need to be developed and implemented. For example, Fu et al. (2020) increased prediction model accuracy by stacking different machine learning algorithms (i.e., R2 increases of 0.1–0.2 over single prediction models). Another alternative would be creation of a consortium of scientists interested in using hyperspectral reflectance technology to predict physiological traits. Their combined expertise would create strong standardized calibrations that could be used across multiple environments as has been done for assessing forage quality traits using NIRS technology (i.e., NIRS Consortium;

Estimation of mid-day photosynthesis using PLSR, BR, ARDR, and LASSO presented lower R2 values (≈ 0.29–0.62) than for Vc,max and Jmax (Fig. 3) since in situ photosynthetic measurements are likely more influenced by environment (Sanz-Sáez et al. 2017; Soba et al. 2020) than by leaf structure and biochemistry (Serbin et al. 2012; Ainsworth et al. 2014). Thus, a looser estimation was expected. Due to environmental variability, few reports have estimated mid-day photosynthesis. However, our PLSR estimation was better than the observations of Vitrack-Tamam et al. (2020) for cotton stomatal conductance (R2 = 0.23); this was likely due to the lower range spectral reflectance device used in their experiment (633–1659 nm). Similar estimations of net photosynthesis were accomplished using the scaled photochemical reflectance index and a FieldSpec Hi-Res Device (Kumari et al. 2012).

Regarding spectral wavelength specific coefficients for each estimation model for Vc,max and Jmax, the most frequent selection for the four models was the VIS waveband (Fig. 4) where chlorophyll and other pigments have strong absorption features (Peñuelas and Filella 1998). However, these models also used wavebands in the NIR and SWIR, similar to other studies (Hansen and Schjoerring 2003; Doughty et al. 2011; Serbin et al. 2012; Ainsworth et al. 2014; Yendrek et al. 2017). In addition, Rubisco has several relatively broad spectral absorption features in the NIR and SWIR (Elvidge 1990). These selections of spectral region combinations indicate that Vc,max and Jmax spectral signatures are not simply a function of chlorophyll content, which suggests that more information is needed beyond the VIS–NIR wavebands to estimate such complex processes. The inclusion of a broader range of wavebands, due in part to less penalizations, is likely why the PLSR model outperformed BR, ARDR, and LASSO by more effectively capturing the broader spectral absorption features of Rubisco. For example, the Vc,max LASSO model only selected specific coefficients at 540, 680, 720, 2000, and 2250 nm (Fig. 4c), while the PLSR model had significant coefficient ranges between 400–450, 700–800, and 1750–1900 (Fig. 4c). Photosynthesis and LCC also presented the highest selection of spectral peaks in the VIS, followed by NIR; this has been extensively documented through both vegetation indices that estimate chlorophyll pigment content and also by the Photochemical Reflectance Index (PRI) that predicts photosynthetic efficiency through a zeaxanthin absorption feature (Gamon et al. 1997; Gitelson et al. 2005; Schlemmera et al. 2013).

We also present a more in-depth comparison of the four models. As shown in Fig. 3, the R2 of models do not present significant differences between each other, although we can see that the models used different numbers of coefficients to estimate each parameter (Fig. 4). This was reflected in the algorithm differences in each model approach to parsimony, the simple explanation of an occurrence involving the fewest entities, assumptions, or changes. This means that a fewer number of weight coefficients were used to estimate the different parameters (Vandekerckhove and Matzke 2015). In our study, all PLSR models (blue line in Fig. 4) used VIS, NIR, and SWIR wavelengths, but potentially over-fitted by an over-inclusion of predictor variables (Geladi et al. 1986; Wold et al. 2001). This contrasts to the BR (in green), ARDR (in red), and LASSO (in yellow) models (Fig. 4), which used more specific and limited spectra than restricted models that penalize the lesser coefficients (Neal 1996; Tibshirani 1996; Tipping 2001).

Scaling up estimations of photosynthetic parameters for potential hyperspectral aerial or satellite applications

The second aim of this study was to simulate different sensors with more limited spectral coverage (VIS–NIR, NIR–SWIR, and SWIR), including the ESA Copernicus Sentinel-2 satellite13 bands. We found that estimation of Vc,max using three different sensor ranges (VIS–NIR–SWIR) with the four models performed (R2 = 0.50) surprisingly similar to the whole spectrum (Figs. 3 and 5). For Jmax, the highest estimation (R2 = 0.51) used NIR–SWIR and SWIR data in ARDR. This was quite similar to Meacham-Hensold et al. (2020) who used PLSR models and canopy-level spectra with three different spectral ranges (500–900, 500–1700, and 500–2400 nm) to achieve Vc,max estimations near R2 = 0.60 and Jmax estimations around R2 = 0.40.

We also resampled FieldSpec data to cover the 12 spectral bands of the ESA Copernicus Sentinel-2 satellite; these were quite similar to spectral ranges selected by the coefficients used by the different models to estimate photosynthetic parameters. Concerning the different photosynthetic parameters, only Vc,max was estimated at more than R2 = 0.50. This could be related to the carboxylation process (Vc,max) having several relatively broad spectral absorption features in NIR and SWIR centered at 1.5, 1.68, 1.74, 1.94, 2.05, 2.29 µm, etc. (Elvidge 1990), which are in close proximity to several Sentinel-2 wavelength bands (Table S2). Supplementary data (Table S3) and Serbin et al. (2012) showed that wavelengths (490, 610, 690, 710, 1680, 1940, 2200, 2400 nm) used to estimate Vc,max have some bands similar to Sentinel-2. Figure 5d also shows that the spectral regions used in PLSR models were similar to Sentinel bands (Yendrek et al. 2017). The limited success of single-leaf-level estimations of photosynthetic capacities using point-based spectral analysis (Serbin et al. 2015) found considerable promise in airborne and potential promise in space-borne imaging spectroscopy such as the NASA HyspIRI mission (Mariotto et al. 2013). In this regard, hyperspectral imagery through inversion of the Soil-Canopy Observation of Photosynthesis and Energy (SCOPE) model to estimate Vc,max also uses sensor resolutions available in airborne or even precision agriculture technologies (Camino et al. 2019). Recently, one plot-level study using sunlit vegetative reflectance pixels from a single visible near infra-red (VNIR; 400–900 nm) hyperspectral camera reported determination coefficients of R2 = 0.79 for Vc,max and R2 = 0.59 for Jmax (Meacham-Hensold et al. 2020). Thus, our simulation analyses and other recent literature suggest that the wide range of variability in VIS, NIR, and SWIR sensors and the Sentinel-2 multispectral sensor (to a more limited extent) could be employed to estimate photosynthetic parameters (including Vc,max and Jmax) with advanced regression models. However, more research needs to be done in this area as one of the limitations of this work was that we measured leaf reflectance with a leaf clip, while UAV and satellites measure canopy reflectance that can be different from single leaf reflectance. For the future, we suggest to test if canopy reflectance measurements at different precision levels can predict leaf level photosynthetic measurements or even canopy-level photosynthesis as has been done with models such as PROSAIL (Berger et al. 2018).

Conclusion and future directions

In this study, we estimated Vc,max and Jmax using leaf spectral reflectance data and different advanced regression models with determination coefficients higher than R2 = 0.50–0.70. The combination of different species and environmental conditions (elevated [CO2], high temperature, and drought) increased phenotypic variation and improved model estimations where treatment effects were not significant. To achieve higher coefficients of determination and model performance, this research demonstrated that it is more important to have a wider range of phenotypic variation than a significant effect of a treatment or cultivar. We suggest that estimating photosynthetic capacity from reflectance spectra may be considered sufficiently robust to be useful for several different plant physiological applications, such as abiotic stress detection, improved characterization of photosynthesis process-based crop models, and a prescreening tool in breeding programs. We demonstrated that PLSR was the best model for predicting photosynthetic parameters in comparison to other advanced regression models (BR, ARDR and LASSO). However, new advance regression approaches that combine different regression models may be employed to increase phenotype estimation using this technology. Based on simulation of four limited spectral range sensors (VIS–NIR, NIR–SWIR and SWIR) using a leaf level spectrophotometer, we demonstrated that it is possible to estimate Vc,max with similar precision compared to using the whole VIS–NIR–SWIR spectrum. This research should encourage future studies using different imaging sensors (hyperspectral and multispectral) at different scales for estimating Vc,max and Jmax.

Author contribution statement

MLB Experimentation, curation of the data, formal analysis, writing original draft. DS Experimentation, review and editing. TS Experimentation, review and editing. JL Experimentation, review and editing. IA Resource managing, review and editing. JLA Resource managing, review and editing. GBR Experimentation, review and editing. SAP Experimentation, review and editing, resource managing. SCK Conceptualization, data curation, resource managing, formal analysis, supervision, writing original draft. ASS Conceptualization, experimentation, data curation, resource managing, formal analysis, supervision, project administration, writing original draft.