1 Introduction

Chlorophyll is the core module of light absorption in plants. It can not only affect the photosynthetic potential and primary productivity of plants but also play a crucial role in reflecting physiological state, health, and nutritional deficiency of plants (Simkin et al. 2022). Therefore, acquiring and analyzing chlorophyll content in plants is helpful to understand the nutrition status and growth changes of plants, and then to accurately guide fertilization operations. It is of great significance to improve fruit quality and yield, to monitor growth status of plant groups and individuals, and to reflect the stress status in subsequent production (Yang et al. 2021; Liu et al. 2022).

The traditional method to measure chlorophyll content is to conduct destructive sampling in the field and then send the samples back to the laboratory for chemical analysis, which is difficult to achieve rapid monitoring in large areas (Gao et al. 2019). In recent years, the non-destructive and efficient estimation method of plant physiological and biochemical indicators based on hyperspectral technology has developed rapidly. This method has the advantages of simplicity, sensitivity, and credibility, and is suitable for large-scale monitoring applications. It provides a reliable data acquisition method for nutrient and health assessment as well as sustainable management during plant growth (Shu et al. 2022). There are two common observation scales of plant hyperspectral reflectance at ground level (i.e., leaf scale and canopy scale) (Mirzaei et al. 2019). Leaf-scale spectral characteristics are mainly controlled by the scattering and absorption characteristics of leaf internal structure and biochemical components, which can be used to estimate the content of leaf nutrition and health status indicators (Zhao et al. 2022a). Canopy scale contains the spectral contribution of an entire plant, focusing on the ability of the plant to reflect the total photosynthetically active radiation and primary productivity, and plays an important role in the quantification of yield (Robles-Zazueta et al. 2022). Jiang et al. (2022) comprehensively compared estimation accuracy, sensitivity, anti-noise performance, spatial visualization quality, etc., and constructed a hyperspectral index sensitive to SPAD of leaves. It is concluded that the band combinations of the red edge region of leaf hyperspectral reflectance can effectively capture the changes of chlorophyll content in leaves, which provides the early warning of mangrove pests and diseases. Peng et al. (2021) used Grünwald-Letnikov fractional-order derivative algorithm to extract red-edge parameters from the canopy hyperspectral data of apple trees, and established a canopy nitrogen content estimation model based on FOD, which provided an effective approach for the real-time monitoring of apple tree nutrition status.

However, raw spectral reflectance often has extremely high spatial complexity. The information carried by spectral reflectance curves can not only reflect the composition and content of various components but also objectively record the non-target components such as temperature, humidity, surface texture, tissue structure parameters, etc., during observation. Coupled with the introduction of a great deal of background noise, their spectral peaks overlapped and the absorption intensity decreased, which in turn affected the estimation accuracy and robustness of models (Yang et al. 2022). Thus, it is necessary to transform spectral data and enhance spectral characteristics of different band ranges. Currently, it is a common method to improve the response relationship between spectral data and target variables by differential transformation. The integer-order derivatives represented by first-order and second-order derivatives are used to eliminate background interference and baseline shift in spectral curves (Peng et al. 2018). Furthermore, the integer-order derivatives with higher orders have also been used for spectral preprocessing. However, as the order increases, it is found that the high-frequency noise in the spectra will be further amplified, and the signal-to-noise ratio will decrease, resulting in the loss of useful information in raw spectra or the difficulty to extract information (Fu et al. 2019). With the introduction of FOD into signal processing, it is found that the optimal derivative transformation results of hyperspectral data are not all in integer order, and it may be in a fractional order between the integer ones (Hasan et al. 2023).

FOD transformation not only refines spectral spacing and amplifies weak spectral characteristics in a small range but also reflects the changes of spectral information to a certain extent, and can find a finer interpolation reflection spectrum between integer-order derivatives (Tian et al. 2018). Fu et al. (2019) discussed the spectral preprocessing effect based on the Grünwald-Letnikov FOD between 0th order and 2nd order. By studying the variation trends of correlation coefficients under different fractional orders, it was found that the fractional order can significantly increase the correlation coefficient and dig deeper into the potential information of the spectra. Hu et al. (2021) studied the non-destructive nitrogen content estimation methods of rubber tree leaves based on near-infrared spectrum fractional derivatives, and selected derivatives of 0.6th order, 1st order, 1.6th order, and 2nd order to establish the estimation model of nitrogen content in rubber tree leaves. It was found that 0.6th- and 1.6th-order derivatives had better model estimation performances than integer-order ones. Cheng et al. (2021) used FOD and band combinations for spectral preprocessing, and finally constructed aboveground vegetation organic carbon content estimation models using partial least square regression and support vector machine algorithms. Compared with first-order and second-order derivatives, FOD can capture more subtle spectral features and explore the application potential of coastal wetland vegetation canopy spectra in estimating aboveground vegetation organic carbon content.

Carya illinoensis is native to the eastern part of North America and is a famous deciduous dry fruit and woody oleiferous tree species (Zhang et al. 2022b). Nuts are delicious and rich in various nutrients such as fatty acids and amino acids required by human body. The content of linolenic acid in nut oil is higher than that in olive oil and tea oil. It is an excellent fruit tree species with good economic and ecological benefits (Araújo et al. 2021). It is important to grasp the nutrient demand and health status of Carya illinoensis in real time through hyperspectral technology to promote its high quality and yield. However, the research results of non-destructive hyperspectral testing in China and abroad mainly focus on rice, corn, wheat, and other field crops with obvious homogenization. There are very limited studies on the research of non-timber product trees such as Carya illinoensis and Camellia oleifera.

Therefore, in this study, the SPAD of Carya illinoensis leaves was taken as the research object, and the hyperspectral data of Carya illinoensis canopies and leaves at fruit ripening stage was obtained. The specific objectives of this study are as follows: The first objective was to perform FOD preprocessing based on leaf and canopy hyperspectral reflectance data of Carya illinoensis, and to analyze the effects of different fractional orders on spectral features. The second objective was to construct two-band spectral index (NDSI) combined with FOD, then to explore whether FOD can deeply mine spectral information and effectively enhance the response relationship between spectral features and SPAD, and to explore the optimal preprocessing derivative order. The third objective was to screen out the bands and band combinations sensitive to SPAD as modeling variables by Pearson correlation analysis. On the basis, we used the XGBoost machine learning algorithm to construct SPAD estimation model of Carya illinoensis, so as to obtain the most accurate spectral estimation model, and to provide scientific basis for nutrient monitoring of Carya illinoensis.

2 Material and Methods

2.1 Overview of the Study Area

The samples of this study were collected from the experimental demonstration base of Carya illinoensis (117°22′20′′–117°23′10′′E, 32°11′10′′–32°11′30′′N) in Bailong Town, Feidong, Anhui, China (Fig. 1). The climate type in this region is the northern subtropical monsoon climate, with sufficient light, mild climate, and moderate precipitation. The average annual precipitation is about 879.9 mm, and the average temperature in growing season is 15.5 °C. This area is suitable for the introduction and cultivation of Carya illinoensis. The actual cultivation of Carya illinoensis has showed strong stress resistance. In this study, four typical varieties of Jiande (i.e., J5, J35) and Changlin (i.e., C10 and C21) series with many excellent qualities (e.g., early fruiting, high yield, disease resistance) and significant economic benefits were selected in the base, with a total of 53 samples investigated. Since the trees are fertilized twice a year in March and October, the data were collected on September 6, 2022, when Carya illinoensis trees were in their mature period. Nutrient monitoring in September can provide a scientific basis for fertilizer application in October. There were no other operational measures (e.g., watering, fertilizing, and dosing) during the first 5 months of the survey period, except for necessary weeding.

Fig. 1
figure 1

Research location map

2.2 Data Acquisition

2.2.1 Collection of Spectral Data

A full-band spectrometer (Fieldspec4 Wide-Res, Analytical Spectrum Devices Inc.) was used to obtain spectrum reflectance. Wavelength range is 350–2500 nm. The outdoor experiment of canopy spectrum was carried out under sunny, windless, and cloudless conditions between 11 a.m. and 2 p.m in Beijing time. In order to ensure the accuracy of the experiment, the spectrometer was used for tilt measurement in four directions (i.e., east, west, south, and north) of each plant due to the generally high heights of them. The spectrum of each direction was repeatedly measured for 10 times, and each plant had a total of 40 spectra. After removing the abnormal spectrum curves, their average values were used as the final raw canopy spectral characteristics of Carya illinoensis.

After canopy spectrum was collected, the leaves of the samples were collected and sent to laboratory for leaf spectrum measurement. Thirty-six leaves of each single tree were obtained, which were evenly dispersed in eight directions (i.e., NE, E, SE, S, SW, W, NW, and N) in the middle of their tree canopy. The picked leaves had good growth and had no mechanical damage, no pests, and diseases. The armored optical fiber of spectrometer and blade clip were assembled to measure the component spectrum which can reflect the internal composition and cell structure of the leaves. The measurement should avoid the vein contacting the central part of the adaxial surface of the blade directly. Five measurements were made on the front side and back side of each leaf, and a total of 10 spectral data were collected from each leaf. After removing abnormal spectrum, their average values are used as the final raw leaf spectral characteristics of Carya illinoensis.

2.2.2 Determination of Chlorophyll Content in Leaves

The working principle of SPAD-502 Plus chlorophyll meter (SPAD-502 plus, Konica Minolta, Inc.) is to use two light sources with different wavelengths to irradiate the surface of plant leaves, respectively, and to obtain the relative value of the current chlorophyll of leaves by comparing the optical density difference pass through the leaves. Many studies have shown that the chlorophyll content in plants leaves can be determined in a representative way through SPAD value, which is a reliable non-destructive chlorophyll detection method (Terassi et al. 2023; Li et al. 2023; Wang et al. 2023). The studies of some cash crops also indicated that SPAD value has been proved to have a good correlation with chlorophyll level (Ban et al. 2019; El-Jendoubi et al. 2012; Lu et al. 2021). Therefore, this study used SPAD value to indirectly replace the measured chlorophyll content. After leaf spectrum was measured, the SPAD-502 Plus meter was used to measure SPAD value of each leaf for three times continuously, and the average of the measured values was taken as the relative chlorophyll content of the leaves.

2.3 Data Processing and Analysis

2.3.1 Hyperspectral Data Preprocessing

FOD is an extension of integer-order derivative calculus. The calculation of FOD is similar to integer-order derivative, but its order is arbitrarily extended to fractions. Three different definitions of fractional derivatives are widely used: Grünwald-Letnikov derivative, Riemann–Liouville derivative, and Caputo derivative (Ortigueira et al. 2011; Li and Deng 2007; Lupulescu 2015). The definition of Grünwald-Letnikov derivative has been widely used in spectral data processing and information extraction (Li et al. 2021). In this study, Grünwald-Letnikov derivative was used for spectral data preprocessing. Before providing the definition of Grünwald-Letnikov FOD, we first observe the formula of n-th order derivative of \(f(t)\).

$$\frac{{d^{n} }}{{dt^{n} }}f\left( t \right) = \mathop {\lim }\limits_{h \to 0} \frac{1}{{h^{n} }}\sum\limits_{j = 1}^{n} {\left( { - 1} \right)}^{j} \left( \begin{gathered} n \hfill \\ j \hfill \\ \end{gathered} \right)f\left( {t - jh} \right)$$
(1)

where n is the order of the derivative, the binomial expansion can be written as

$$\left( {1 - z} \right)^{n} = \sum\limits_{j = 0}^{n} {\left( { - 1} \right)}^{j} \left( \begin{gathered} n \hfill \\ j \hfill \\ \end{gathered} \right)z^{j} = \sum\limits_{j = 0}^{n} {\frac{{\left( { - 1} \right)^{j} n!}}{{j!\left( {n - j} \right)!}}z^{j} }$$
(2)

z is the set of complex numbers. Thus, it is easy to directly extend the n-th order derivative formula above to the case of non-integer \(\alpha\). Unlike that of the integer order, the binomial expression is no longer the sum of finite terms, but in the form of infinite series, that is,

$$\left( {1 - z} \right)^{\alpha } = \sum\limits_{j = 0}^{\infty } {\left( { - 1} \right)^{j} } \left( \begin{gathered} \alpha \hfill \\ j \hfill \\ \end{gathered} \right)z^{j} = \sum\limits_{j = 0}^{\infty } {w_{{^{j} }} z^{j} }$$
(3)

Then, the extended binomial expression is

$$w_{j} = \left( { - 1} \right)^{j} \left( \begin{gathered} \alpha \hfill \\ j \hfill \\ \end{gathered} \right) = \frac{{\left( { - 1} \right)^{j} \tau \left( {\alpha + 1} \right)}}{{\tau \left( {j + 1} \right)\tau \left( {\alpha - j + 1} \right)}}$$
(4)

Assuming that when \(t \le t_{0}\), \(f(t)\) is zero, then the sum of infinite terms can be converted into a finite term, so the Grünwald-Letnikov fractional derivative formula is introduced:

$${}_{{t_{0} }}^{GL} D_{t}^{\alpha } f\left( t \right) = \mathop {\lim }\limits_{h \to 0} \frac{1}{{h^{\alpha } }}\sum\limits_{j = 0}^{{\left[ {{{\left( {t - t_{0} } \right)} \mathord{\left/ {\vphantom {{\left( {t - t_{0} } \right)} h}} \right. \kern-0pt} h}} \right]}} {\left( { - 1} \right)^{j} } \left( \begin{gathered} \alpha \hfill \\ j \hfill \\ \end{gathered} \right)f\left( {t - jh} \right)$$
(5)

where \(\alpha\) is the order of the derivative, \(t_{0}\) is the lower bound of the derivative, \(t\) is the upper bound of the derivative, \(h\) is the step length of the derivative (\(h = 1\)in this study), \(\tau\) is the gamma function, and j is the data length, \(j = {{(t - t_{0} )} \mathord{\left/ {\vphantom {{(t - t_{0} )} h}} \right. \kern-0pt} h}\); if \(\alpha = 0\), then \({}_{{t_{0} }}^{GL} D_{t}^{\alpha } f(t) = f(t)\). If \(\alpha = 1\) or 2, it represents the 1st-order and 2nd-order derivative transformations of the original function, respectively.

The above formulations/algorithms are run using by the FOTF toolbox written by Xue (2018) in MATLAB R2022a (MathWorks Corporation, USA) software. (https://ww2.mathworks.cn/matlabcentral/fileexchange/60874-fotf-toolbox?s_tid=srchtitle_FOTF_1).

2.3.2 Spectral Index

Due to the superposition of different nutrients and the influence of leaf and canopy structure parameters, the effect of single band reflectance is limited. The spectral index compresses the important information of the spectrum into a spectral index channel by constructing a ratio, linear or nonlinear combination of the spectral reflectance of two or more bands, thereby effectively reducing the background effect and enhancing the spectral characteristics to improve the sensitivity to target variables (Zhang et al. 2022a; Montero et al. 2023). The normalized spectral index (NDSI) of the two-band combination can comprehensively analyze the response relationship between spectral data and target variables, and has achieved good results in the estimation of plant chlorophyll content (Yao et al. 2014). In this study, a fractional order derivative \({NDSI}_{\alpha }\) was constructed based on the calculation method of NDSI.

$$NDSI_{\alpha } \left( {R_{i} ,R_{j} } \right) = \frac{{R_{\alpha } \left( i \right) - R_{\alpha } \left( j \right)}}{{R_{\alpha } \left( i \right) + R_{\alpha } \left( j \right)}}$$
(6)

where \(NDSI_{\alpha } (R_{i} ,R_{j} )\) represents the spectral index formed by the combination of two bands under the order of α, and \(R_{\alpha } (i)\) and \(R_{\alpha } (j)\) represent the spectral reflectance corresponding to the i-th and j-th bands after α-order pretreatment, respectively. When α = 0, \(R_{\alpha } (i)\) and \(R_{\alpha } (j)\) are the spectral reflectance corresponding to the i-th and j-th bands in raw spectrum in the range of 350–2500 nm. When α = 0.25, \(R_{0.25} (i)\) and \(R_{0.25} (j)\) are the spectral reflectance corresponding to the i-th and j-th bands in the range of 350–2500 nm after 0.25th-order derivative treatment.

2.3.3 Construction and Evaluation of Machine Learning Models

XGBoost (eXtreme Gradient Boosting) was originally proposed by Chen as a distributed machine learning community group. It is an improvement to the GBDT (gradient boosting decision tree) algorithm (Chen and Guestrin 2016). XGBoost uses decision tree as its base classifier, and then controls the complexity of model by adding regularization terms to improve the generalization ability of the model and prevent overfitting. XGBoost works differently from random forest. Its newly generated tree continuously learns the residual between the predicted value and the real value of the current tree, and finally accumulates the learning results of multiple trees as prediction results. Newton method is used to solve the extreme value of the loss function, the loss function uses Taylor expansion to the second order, and the second-order derivative is used to accelerate the convergence speed of the model. In addition, since the regularization term is added to the loss function, the function in the training process consists of two parts: the first part is the loss term of the gradient boosting algorithm, and the second is the regularization term. In XGBoost, the complexity of each tree needs to be limited. In machine learning, the general regularization term is used to constrain the complexity of the model, so XGBoost also uses this method to simplify the weak learner. This study was mainly based on the maximum depth of the decision tree (maxdepth) for parameter tuning. The objective functions are

$$L\left( \emptyset \right) = \sum\limits_{i = 1}^{n} {l\left( {y_{i} ,\hat{y}_{l} } \right)} + \sum\limits_{i = 1}^{t} {\Omega \left( {f_{k} } \right)}$$
(7)
$$\sum\limits_{i = 1}^{t} {\Omega \left( {f_{k} } \right)} = \gamma T + \frac{1}{2}\lambda \left\| \omega \right\|^{2}$$
(8)

where n is the number of samples; \(l(y_{i} ,\hat{y}_{l} )\) is used to characterize the loss function between the target value \(y_{i}\) and the predicted value \(\hat{y}_{l}\); \(\gamma\) is a manually set parameter; \(\omega\) is the vector of the values of all leaf nodes in the decision tree; T is the number of leaf nodes, that is, \(\sum\nolimits_{i = 1}^{t} {\Omega (f_{k} )}\) is the complexity of all \(t\) trees. In this paper, the SPAD sample data set of Carya illinoensis leaves with a capacity of 53 was randomly divided according to a ratio of 7:3, and the XGBoost machine learning algorithm was used to construct the optimal estimation model of SPAD of Carya illinoensis leaves.

The coefficient of determination (R2) and root mean square error (RMSE) were used to verify the performance of the model. When R2 is larger and RMSE is smaller, the models’ predictions will be more accurate.

$$R^{2} = 1 - {{\sum\limits_{i = 1}^{n} {\left( {x_{i} - y_{i} } \right)^{2} } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{n} {\left( {x_{i} - y_{i} } \right)^{2} } } {\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{{x_{l} }} } \right)^{2} } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{{x_{l} }} } \right)^{2} } }}$$
(9)
$$RMSE = \sqrt {{{\sum\limits_{i = 1}^{n} {\left( {x_{i} - y_{i} } \right)^{2} } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{n} {\left( {x_{i} - y_{i} } \right)^{2} } } n}} \right. \kern-0pt} n}}$$
(10)

In the equations above, \(x,y\) are the observed and predicted values, respectively; \(\overline{x}\) is the average value of the observed values; and \(n\) is the number of samples.

3 Results

3.1 Descriptive Analysis of Chlorophyll Content

The statistical results of SPAD value in the Carya illinoensis are presented in Table 1. The SPAD range of the Carya illinoensis leaves is 43.97–54.42, with an average of 48.77. The standard deviation is 4.42, and the coefficient of variation is 9.06%.

Table 1 Statistical results of SPAD values in Carya illinoensis leaves

3.2 Spectral Characteristics of Carya illinoensis Canopies and Leaves

3.2.1 Raw (0th order) Spectral Characteristics of Carya illinoensis Canopies and Leaves

The raw spectral reflectance curves of Carya illinoensis canopy and leaf spectrum are shown in Fig. 2. The canopy spectral reflectance of Carya illinoensis samples is observed to be lower than that of leaves. The canopy and leaf spectrum exhibit two absorption valleys (around 450 nm blue light and 660 nm red light) and a reflection peak (around 550 nm green light) within 500–700 nm due to the strong absorption of chlorophyll a and b. The intensity of the reflection peak is strongly correlated with the overall chlorophyll content of the plants (Kivimaenpa et al. 2022; Zhao et al. 2022b). The intensity of spectral reflectance in the range of 780–1350 nm mainly depends on the relative thickness of the intercellular space between mesophyll and cell. As a result, the shapes of canopy spectral reflectance curve shapes are less stable compared with those of the leaves (Xiao et al. 2016). Between the 1350 and 2500 nm, various factors such as water absorption and the upward evaporation of water vapor caused by solar radiation have a significant impact (Yao et al. 2018). The canopy spectra around 1400 nm, 1800 nm, and 2500 nm are extremely susceptible to experiences significant fluctuations because of random noise. However, the shapes of leaf spectrum are relatively stable.

Fig. 2
figure 2

a The raw canopy spectral reflectance curve and b the raw leaf spectral reflectance curve. Note: The pink areas represent the whole scope of the spectrum and the gray lines represent the mean spectrum

3.2.2 FOD Spectral Characteristics of Carya illinoensis Canopies and Leaves

The FOD spectral analysis results of the canopies (Fig. 3) showed that the overall spectral intensity was weakened as the fractional order increased from 0 to 0.75. The reflection peaks in raw (0th order) spectrum were transformed into multiple reflection peaks and absorption valleys. Additionally, the spectral reflectance experienced a rapid increase within the wavelength range of 1450–1800 nm and 1950–2350 nm. At the order of 0.75, the slope of reflectivity increased at 1450 nm and 1950 nm, reaching its maximum. As the order increased from 1 to 1.5, the spectral reflectance experienced a downward trend in the range of 1450–1800 nm and 1950–2350 nm. As the order increased from 1.5 to 2, the shape of the spectral curve underwent minimal changes, and the reflectance values range between − 0.01 and 0.01.

Fig. 3
figure 3

Canopy fractional order derivative (FOD) spectrum (0 to 2, with an increment of 0.25 per step)

The FOD spectral analysis results of leaves (Fig. 4) showed that, as the fractional order increased from 0 to 0.75, the spectral reflectance curve exhibited a prominent peak around 780 nm, and the absorption valley caused by water absorption became more distinct. As the order increased from 0.75 to 1.5, the spectral reflectance curve showed numerous fluctuations within the range of 200–400 nm. Additionally, the reflection peak near 780 nm and the absorption valley near 1450 nm and 1900 nm diminished in size. As the order increased from 1.5 to 2, the difference between the spectral reflectance curves was reduced (almost close to zero).

Fig. 4
figure 4

Leaf fractional order derivative (FOD) spectrum (0 to 2, with an increment of 0.25 per step)

In general, the low-order spectrum can retain similar characteristics to raw (0th order) spectrum. However, as the order increases, the spectral reflectance decreases, and the reflection intensity gradually stabilizes. The morphological characteristics of the spectral reflectance curve become less pronounced, and some spectral reflectance curves exhibit noticeable fluctuations.

3.3 Response Relationship of Carya illinoensis Leaf and Canopy FOD Spectrum with SPAD

3.3.1 Response Relationship of Leaf and Canopy FOD Single-Band Spectrum with SPAD

The correlation between leaf FOD single-band spectrum and SPAD (Fig. 5) showed that they were negatively correlated in the range of 450–1350 nm and were particularly significant in the range of 500–750 nm, which was associated with the high absorption of blue-violet light by chlorophyll. Further statistics showed that (Table 2) the maximum absolute correlation coefficient (r = 0.673) between the raw (0th order) leaf spectrum and SPAD was found at 704 nm. As the fractional order increased from 0.25 to 1.25, the negative correlation between the FOD spectrum of leaves and SPAD in the range of 500–750 nm changed to a positive correlation. Additionally, the number of bands with positive correlation initially increased and then decreased. These bands were predominantly located in the red edge area, which was an important indicator area for describing the state of plant pigments (Inoue et al. 2016). As the order increased from 1.25 to 2, the negatively correlated bands decreased continuously, and the absolute correlation coefficient between the two variables increased with the order. The maximum absolute correlation coefficient (r = 0.761) was observed at 731 nm of 1.75th order, after which it gradually decreased (Table 2).

Fig. 5
figure 5

Correlation between relative chlorophyll content (SPAD) and leaf fractional order derivative (FOD) single-band spectrum

Table 2 The maximum absolute correlation coefficients (MACC) between relative chlorophyll content (SPAD) and the canopy and leaf fractional order derivative (FOD) single-band spectrum

The correlation between canopy FOD single-band spectrum and SPAD (Fig. 6) showed that they were negatively correlated in the range of 350–1400 nm. The correlation between the raw (0th order) canopy spectrum and SPAD is very low due to canopy structure, background radiation, etc. The maximum absolute correlation coefficient (r = 0.314) was found at 727 nm (Table 2). As the fractional order increased from 0.25 to 1.25, the change in the correlation between the canopy FOD spectrum and SPAD in the range of 500–750 nm was consistent with that of the leaf spectrum. It shifted from a negative correlation to a positive correlation, and the bands were also observed in the red edge area. As the order increased from 1.25 to 2, the absolute correlation coefficient between the two variables increased with the order. The maximum absolute correlation coefficient (r = 0.580) was observed at 572 nm of 1.75th order, after which it gradually decreased (Table 2).

Fig. 6
figure 6

Correlation between relative chlorophyll content (SPAD) and canopy fractional order derivative (FOD) single-band spectrum

At leaf scale, the number of spectral bands passing the 0.01 highly significant level test was 203 and 98 bands when the order was 1 and 2, respectively. However, the number of spectral bands passing the 0.01 highly significant level test was 213 and 214 bands when the order was 0.5 and 0.75, respectively. At canopy scale, the number of spectral bands passing the 0.01 highly significant level test was 91 and 28 bands when the order was 1 and 2, respectively. The number of spectral bands passing the 0.01 highly significant level test was 75 bands when the order was 0.75 and 1.25, respectively. This suggests that, in comparison to the first-order derivative and second-order derivative, the FOD can more effectively enhance the correlation between the single-band spectrum and the target variable.

3.3.2 Response Relationship of Canopy and Leaf FOD Two-Band Spectral Index (NDSI) with SPAD

The correlation between leaf FOD two-band spectral index (NDSI) and SPAD (Fig. 7) showed that the correlation coefficient of FOD combined with leaf-scale NDSI and SPAD exhibited an overall trend of “rise-fall-rise.” When the order is 2, the correlation between leaf spectral index and SPAD was the greatest (r = 0.839), which showed improvement compared with the correlation of the 0th-order derivative (r = 0.737) combined with NDSI.

Fig. 7
figure 7

Correlation between relative chlorophyll content (SPAD) and leaf fractional order derivative (FOD) two-band spectral index

Further correlation analysis between canopy FOD two-band spectral index and SPAD (Fig. 8) showed that the correlation between canopy FOD spectral index and SPAD is generally different from that of leaf scale as the fractional order increased from 0 to 2, It exhibited a pattern of “decline-increase-decline-increase.” The correlation between canopy spectral index and SPAD reached its maximum (r = 0.652) when the order is 1.5, which is significantly higher than the correlation from the 0th-order derivative (r = 0.551) combined with NDSI.

Fig. 8
figure 8

Correlation between relative chlorophyll content (SPAD) and canopy fractional order derivative (FOD) two-band spectral index

It is worth noting that the correlation between the canopy and leaf FOD normalized spectral index (NDSI) and SPAD is higher than that of raw (0th order) canopy and leaf NDSI. The correlation between canopy NDSI and SPAD is better under fractional derivative treatment compared with integer derivative spectrum. However, the correlation between leaf NDSI and SPAD is lower under fractional derivative treatment compared with integer derivative spectrum. This study demonstrates that by utilizing the appropriate derivative transformation in conjunction with the two-band spectral index, it is possible to greatly enhance the correlation coefficient and effectively extract subtle information from spectrum.

3.4 Construction and Evaluation of SPAD Estimation Model of Carya illinoensis Leaves Based on Canopy and Leaf FOD Spectrum

Based on the results of the correlation analysis, 10 bands and band combinations (NDSI) with the highest correlation coefficients were chosen as input variables (Table 3), SPAD was output variables. Seventy percent of the data is used for training and 30% for test.

Table 3 The optimal variable combination subset results of fractional order derivative (FOD) spectral reflectance of relative chlorophyll content (SPAD)

The XGBoost machine learning algorithm was deployed to construct SPAD estimation models for Carya illinoensis (Table 4). Due to the very limited number of samples, this study used a five-fold cross-validation method to fix the number of random seeds to divide the validation set, which was used to select the optimal parameters of the model and to prevent overfitting.

Table 4 Evaluation results of relative chlorophyll content (SPAD) estimation models based on XGBoost

The results of canopy spectrum and two-band spectral indices after FOD processing showed that the R2 C was 0.443–0.732 and the R2 P was 0.382–0.670 in nos. 1–9. The R2 C was 0.606–0.729, and R2 P was 0.562–0.721 in nos. 10–18. The results of leaf spectrum and two-band spectral indices after FOD processing show that the R2 C was 0.625–0.744, and the R2 P was 0.594–0.736 in nos. 19–27. The R2 C was 0.663–0.788, and R2 P was 0.685–0.766 in nos. 28–36. The combined analysis showed that leaf spectrum was estimated better than canopy spectrum. As the spectral band dimension increases, the model estimation accuracy increases. Canopy spectrum (1.5th order) and two-band spectral index (1.5th-order NDSI), and leaf spectrum (0.5th order) and two-band spectral index (0.5th-order NDSI) models had higher estimation accuracy, and presented more better overall estimation performance compared with the first order and second order.

In order to further evaluate the changes between the estimated and measured values of the SPAD optimal model, the optimal FOD canopy and leaf models are presented in the form of scatter plots (Fig. 9). The results showed that the 0.5th-order SPAD estimation model incorporates NDSI R2 C = 0.788 and RMSEC = 1.007 in the calibration set, and R2 P = 0.766 and RMSEP = 0.842 in the prediction set (Fig. 9b), explaining 78.8% of the variability in the training samples and 76.6% of the unknown SPAD samples, respectively. This means that the model can accurately reflect the changes in SPAD of Carya illinoensis through the feature information of FOD spectral data. The estimated and measured values of the model are scattered evenly around the standard 1:1 line.

Fig. 9
figure 9

Scatter plots of XGBoost model in measured and predicted datasets

4 Discussion

After FOD transformation, the spectral reflectance provides higher resolution and clearer spectral profile than the raw spectrum, and enhances the correlation between reflectance and plant attributes (Liu et al. 2021). Comparing the correlation analysis of raw and FOD spectrum with SPAD, the correlation between spectrum and target variables was significantly improved under FOD treatments, confirming by the results of Hong et al (2019). The results of this article show that the correlation and model accuracy of both single-band spectrum and two-band spectral indices (NDSI) with the SPAD of Carya illinoensis show an overall tendency of increasing, and then decreasing with the improvement of the differential order. The reason for this is that FOD is the expansion and extension of integer-order differential transformation, which can extract the asymptotic information that cannot be characterized by integer-order differentiation (Furati et al. 2021). However, as the differential-order increases, the background noise is gradually weakened and the high-frequency noise is gradually amplified, which also reduces the potential sensitive information in the reflectance data, resulting in a lower signal-to-noise ratio of the spectral information, which in turn affects the correlation and the model accuracy. Therefore, derivative spectra are generally used only to first or second order, occasionally to the third order, and then higher-order derivative spectra are basically not used (Feng et al. 2022b).

It is worth noting that leaf and canopy spectra, whether in single band or the normalized spectral index (NDSI) of FOD combined with two-band combination, were significantly different from the response relationship with SPAD. The overall linear response of spectrum and SPAD at leaf scale is relatively high, while the overall linear response at the canopy scale is not high. This difference can be attributed to various factors. Canopy spectral characteristics are determined not only by the internal structure and biochemical components of the plants but also by canopy structure parameters, which include leaf area index, canopy extinction coefficient, and leaf inclination angle distribution (Mirzaei et al. 2019). Additionally, canopy spectral reflectance is influenced by external factors such as atmosphere, vegetation underlying surface, solar altitude angle, observation angle, and orientation (Luo et al. 2022). As a result, spectral features associated with chlorophyll fractions are somewhat attenuated in the expression of information.

Due to the superposition of different nutrients in plants, the role of single-band reflectance is limited. Spectral indices, constructed using the ratio of the spectral reflectance of two or more bands through linear or nonlinear combination, compress the important information of the spectrum into one spectral index channel, which can effectively reduce the background effect and enhance the spectral features to improve the sensitivity to the target variables (Zhang et al. 2021; Liu et al. 2021). Compared with FOD and single-band combination, the combination of two-band spectral indices (NDSI) is more effective in improving the correlation with SPAD. The combination of FOD and spectral indices with different forms of algebraic operations effectively enhances the linear measure of spectral transform features on Carya illinoensis SPAD, and the important reason may be that the different methods form a complementary advantage to each other and reduce the interference factors on the spectral reflectance (Bhadra et al. 2020; Chen et al. 2022) .

According to the correlation analysis results between the spectral characteristics and SPAD, the sensitive regions related to chlorophyll at the canopy and leaf scales are mainly located in the visible spectral regions of green (490–570 nm), red (620–780 nm) including red edge (670–780 nm), and near infrared (780–1000 nm) (Pu 2017). These three response areas correspond precisely to the strong absorption and reflection bands of chlorophyll (Yao et al. 2022). Normalized difference spectral indices of 0th to 2nd orders were constructed by selecting bands within the full band range (350–2500 nm). These bands were also located in the red (620–780 nm) including red edge (670–780 nm), and near infrared (780–1000 nm) regions. The strong absorption of red light by chlorophyll and the strong reflection of near-infrared wavelengths inside the leaf make the red-edge band (670–780 nm) the most important indicator band for the fastest increase of reflectance in green plants and the most important indicator band for the physiological characteristics of plant growth (Liu 2021). The red band contains spectral information that can map more than 80% of the physical and chemical parameters of plants (Sun et al. 2019). In this study, the FOD was deployed to eliminate background noise while preserving the ability of the red-edge bands to characterize the physicochemical parameters of the plant. Therefore, the bands, with highly significant correlation with SPAD screened out under the FOD process in this study were mostly distributed in the range of 490–570 nm, 670–780 nm, and 780–1000 nm, which is in line with the results of the previous studies.

Machine learning is capable of effectively explaining nonlinear relationships. However, the accuracy of the model can be influenced by factors such as the selection of a large number of samples and the debugging of hyper-parameters, and the ensemble learning algorithm can effectively avoid the problem of small sample modeling estimation (Feng et al. 2022a). In the scatter plot of measured and predicted values, it is observed that some high values are underestimated. But the model still demonstrates good accuracy and performance. At the same time, this study utilized the 10 wavelengths with the highest correlation coefficients to identify the sensitive bands. Despite the limited number of sensitive parameters used in modeling, the model yielded excellent results. As the research advances, increasing the sample size will result in more accurate estimation.

Considering the variations in growth regions, varieties, and phenological periods, the response relationship between the canopy and leaf spectrum and chlorophyll content will also vary. This study has only established the SPAD estimation model of Carya illinoensis leaves during fruit-maturing period. The applicability of the model for different varieties and growth stages still needs to be verified through further research. In addition, this study only used ground canopy and leaf-scale hyperspectral features to explore the feasibility SPAD estimation of Carya illinoensis. With the increase of sample data and the comparison of the study area, the remote sensing monitoring of nutrients and health status in a large area of Carya illinoensis forests will also be possible.

5 Conclusions

Fractional-order derivative improved the correlation between SPAD and spectrum and spectral index (NDSI), compared with raw spectrum. In this study, the correlation coefficients between FOD spectrum and relative chlorophyll content (SPAD) are significantly elevated. The average value of the correlation coefficient between leaf spectrum under 0.5th-order treatment and canopy spectrum under 1.5th-order treatment and SPAD increased by 0.055 and 0.151, respectively. Among them, 1.75th-order leaf spectrum at 731 nm and 1.75th-order canopy spectrum at 572 nm showed the highest correlation; the correlation coefficients were 0.761 and 0.580. The average value of the correlation coefficient between leaf spectral index under 0.5th-order treatment and canopy spectral index under 1.5th-order treatment and SPAD increased by 0.095 and 0.086, respectively. Among them, the 2nd-order leaf spectral index (581 nm and 716 nm) and the 1.5th-order canopy spectral index (670 nm and 746 nm) showed the highest correlation; the correlation coefficients were 0.839 and 0.652, respectively. The leaf spectrum could estimate the SPAD of Carya illinoensis leaves more accurately than that of canopy spectrum. Compared with single-band, FOD two-band spectral index (NDSI) was more effective in estimating SPAD of Carya illinoensis. The optimal SPAD model is the 0.5th-order derivative transformation combined with two-band combination leaf normalized difference spectral index model. The R2 P is 0.788, and the RMSEP is 0.842 in prediction set.