Introduction

Gross primary production (GPP), i.e. total carbon (C) fixed by vegetation photosynthesis, is the largest component of the global C cycle (Beer et al. 2010). It is also tightly related to many ecosystem functions, including ecosystem and soil respiration (Baldocchi et al. 2018; Janssens et al. 2001), vegetation growth (Körner 2015), water loss through transpiration (Baldocchi 2020; Zhou et al. 2014), etc. Due to the complex response of vegetation to climatic perturbations, there is a large inter-annual variation (IAV) in global C budget (Ahlström et al. 2015); and the IAV in GPP is still difficult to be accurately estimated using current land surface models (Piao et al. 2019; Xia et al. 2020), which in turn hindered the prediction of future climate-C cycle feedback (Ryu et al. 2019).

Remote sensing vegetation index (VI, often called greenness), such as the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI), are widely used as direct proxies of GPP (Rahman et al. 2005; Wang et al. 2004; Zhou et al. 2001), or as key inputs in the light-use efficiency model for simulating GPP (Running et al. 2004). A newly proposed index, the near-infrared reflectance of vegetation (NIRv, the product of NDVI and near-infrared reflectance), is demonstrated to be a good proxy of GPP at monthly to annual scales across FLUXNET sites (Badgley et al. 2019; Wang et al. 2021). Compared with NDVI, the NIRv untangles the confounding effects of background brightness and the saturation in the dense canopy (Badgley et al. 2019; Badgley et al. 2017). However, capturing the IAV and long-term trends in GPP remains a challenge to satellite data. The ability in surrogating annual GPP diverges among VIs, and even varies with integrating or averaging period of VI. For example, the annual mean VIs only explained the variations in GPP for deciduous broadleaved forests by 9–50%, with a slightly better performance of NIRv than NDVI and EVI (Huang et al. 2019). When using the VI during growing season, the integral NDVI was tightly correlated with GPP across vegetation types (R2 = 0.80) (Park et al. 2016). In contrast, the growing-season mean NDVI and EVI only explained <10% of the GPP variations for the deciduous broadleaved forests in the La Thuile dataset (Verma et al. 2014). Moreover, considering the seasonal asynchrony between canopy greenness and C uptake, different growing-season definitions for VI (for example, the various thresholds of VI magnitude) may change the VI-GPP relationship, which has not yet been tested. Collectively, it is essential to comprehensively evaluate the relationships between annual GPP and different types of integrated or averaged VIs with various definitions of growing-season.

Accurately modeling the canopy GPP phenology by VI phenology is another significant question, as the GPP phenology is important for understanding changes in the C sequestration, surface energy and water balances (D’Odorico et al. 2015). The relationship between VI (NDVI or EVI) and GPP phenology diverges across deciduous forest ecosystems (D’Odorico et al. 2015), with the root mean square error of the linear model between VI and GPP phenology across site-years varying from 1–3 weeks in the literature (Balzarolo et al. 2019; Gonsamo et al. 2012; Peng et al. 2017a; Peng et al. 2017b). Generally, the relationship between VI and GPP phenology in autumn is weaker than that in spring for deciduous forests (D’Odorico et al. 2015; Gonsamo et al. 2012; Yin et al. 2020), and EVI may outperform NDVI (Yin et al. 2020). Nevertheless, NIRv has been rarely used to capture the GPP phenology (Yin et al. 2020). Therefore, it is imperative to compare the abilities of various VIs for capturing GPP phenology. Furthermore, most studies use the same inflection point (i.e., the change of curvature) to define both VI and GPP phenology. However, the seasonal variations in VI and GPP are asynchronous, as VI responds differently to greenness, wetness and brightness dynamics (D’Odorico et al. 2015). For instance, the NDVI phenology based on the midpoint method agrees with the GPP phenology based on the start of slope point method better than does the NDVI phenology based on the start of slope point method (D’Odorico et al. 2015). To date, less work has been devoted to changing the definition of VI phenology to match with GPP phenology.

Whether the summer peak of VI can surrogate that of GPP is also a pending question, although we have known that the annual peak growth of vegetation is critical in characterizing the capacity of ecosystem production (Huang et al. 2018). As a pivotal physiological metric, the GPP peak based on eddy covariance (EC) dominates the IAV in GPP across the northern hemispheric ecosystems (Xia et al. 2015; Xu et al. 2019; Zhou et al. 2016; Zhou et al. 2017). In contrast, the NDVI peak plays a less important role than NDVI phenology in regulating the IAV in the integral NDVI (a proxy of GPP) for the broadleaf forests in northeastern China (Zhou 2020). Huang et al. (2018) reported that the global NDVI peak and modeled GPP peak did not change consistently after 1998 when charactering the long-term trend of vegetation growth. These discrepancies imply that the NDVI peak may not be an efficient surrogate of GPP peak. Although the EVI and NIRv are more sensitive to canopy variation in the dense vegetation than NDVI (Badgley et al. 2017; Huete et al. 2002), the relationship between the EVI or NIRv peak and GPP peak has rarely been explored.

The broadband VIs based on near-surface remote sensing have the advantages of high temporal resolution and little influences of the atmospheric perturbations (Liu et al. 2019a; Richardson et al. 2013). The seasonal broadband NDVI has been showed to be more related to GPP than Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI in a Scots pine of Finland (Wang et al. 2004). However, the broadband VIs-GPP relationships at the interannual scale are still poorly evaluated.

In this study, the NDVI, EVI, and NIRv obtained from tower-radiometers and MODIS were used to track the IAV in GPP measured with the EC method in a temperate deciduous forest, Northeast China. We aimed to address the following questions: (1) How the VI type and definition of growing-season compromise the VI-GPP (flux) relationship? (2) Does the match between VI and GPP phenology vary with VI type and definition of phenology (definition threshold of VI magnitude) ? (3) Can the VI peak track the IAV in GPP peak?

Materials and Methods

Site description

The study was conducted at the Maoershan Forest Ecosystem Research Station of Northeast Forestry University, Northeast China with a continual monsoon climate (45 °24 ' N, 127 °40 ' E, 400 m a.s.l.). The mean (± standard deviation) air temperature and precipitation were 2.0 ± 0.8 °C and 676 ± 206 mm, respectively, across 2008−2018 (Liu et al. 2021b). A 48-m-high tower was set up at the low-part of the sidewall of a valley (Fig. 1). The vegetation around the flux tower was a 70-year-old temperate deciduous broadleaved forest with ~20 m high canopy in average (Liu et al. 2021b). The maximum canopy leaf area index estimated by the litterfall collection varied from 5.8 to 6.5 m2 m-2 during 2012−2018 (Liu et al. 2021a). The major tree species are Betula platyphylla, Ulmus japonica, and Fraxinus mandshurica.

Fig. 1
figure 1

A map of study area with the location and photo image of eddy-flux tower. The four squares and the circle represent the four pixels of MODIS and the reflective footprint of the radiometers installed on the tower (90% of signal), respectively. The vegetation map of China adopted from Su et al. (2020)

Instrument configuration and flux calculation

An open-path EC system (LI-7500, Li-Cor Inc., Lincoln, NE, USA; CSAT3, Campbell Scientific Inc., USA) was installed at the 36 m height to measure the vertical turbulent flux of CO2. The original data were recorded at a frequency of 10 Hz with a datalogger (CR3000, Campbell Scientific Inc., USA). An 8-level profile of CO2/H2O concentrations (0.5, 2.0, 4.0, 8.0, 16.0, 20.0, 28.0, and 36.0 m above the ground surface) was measured by the AP100 (Campbell Scientific Inc., USA) to calculate the storage flux of CO2 (Wang et al. 2016).

Half-hourly net ecosystem exchange of CO2 (NEE) was calculated as the sum of eddy flux (Fc) and storage flux (Fs). The Fc data were processed with the flux measurement standard procedures, including despiking, time-lag removing, planar-fit tilt correction, frequency response correction, density effect and surface heating correction, and quality control (Aubinet et al. 2012). The Fs was calculated by the 2-min mean 8-level profile within each 30 min to minimize the underestimation of the magnitude of the Fs (Wang et al. 2016). The early-evening maximum respiration method was used to filter the nighttime NEE (van Gorsel et al. 2009). There were eight large gaps (>15 days) due to instrument malfunction etc.: March 3 to April 21 and July 13 to July 30 in 2010, January 17 to March 20, April 22 to May 11, June11 to June 25, August 11 to August 27, and November 9 to December 11 in 2013; August 27 to October 25 in 2017. The gaps of daytime NEE during growing season months (May−September) was filled by the monthly Michaelis-Menten type light response curve (Falge et al. 2001). The light response curve for September in 2017 cannot be fitted, and the parameters for another year (2018) with similar air temperature to September in 2017 was used to fill the gap. The ecosystem respiration was fitted with moisture-modified empirical temperature-respiration model (Noormets et al. 2008) for the growing season, and with Lloyd-Taylor model (Lloyd and Taylor 1994) for the non-growing season. We extrapolated nighttime respiration into the daytime to estimate the GPP (Liu et al. 2021b; Reichstein et al. 2005). The footprint of CO2 flux was 800−1200 m during the daytime along the valley (90% signal).

Broadband vegetation index calculation

A net radiometer (CNR1 or CNR4, Kipp & Zonen, the Netherlands) was installed at the 48 m height of the tower to measure the incoming and outgoing radiation (W m-2), including solar (short-wave, 300–2800 nm) and long-wave radiation (4.5–42 μm). The CNR1 was operated from 2008 to 2015, and the CNR4 was operated since 2015; The bias of CNR1 relative to CNR4 was removed by a linear model. The incident and reflected photosynthetically active radiation (PAR, 400–700 nm, μmol m-2 s-1) were measured by a pair of light quantum sensors (PQS1 or PARLITE, Kipp & Zonen, the Netherlands). The footprint of hemispherical radiometers was 176 m radially (90% signal). The drift of radiometers was calibrated by the manufacturer in 2015. All the radiation data were sampled every 5 s, and averaged every 30 min and stored in a CR1000 datalogger (Campbell, Scientific, Inc., Logan, UT, USA). We calculated the broadband VIs, i.e., NDVI (NDVIB), EVI (EVIB) and NIRv (NIRvB) as:

$$ {\mathrm{NDVI}}_{\mathrm{B}}=\frac{{\mathrm{r}}_{\mathrm{NIR}}-{\mathrm{r}}_{\mathrm{PAR}}}{{\mathrm{r}}_{\mathrm{NIR}}+{\mathrm{r}}_{\mathrm{PAR}}} $$
(1)
$$ {\mathrm{EVI}}_{\mathrm{B}}=\frac{2.5\times \left({\mathrm{r}}_{\mathrm{NIR}}-{\mathrm{r}}_{\mathrm{PAR}}\right)}{{\mathrm{r}}_{\mathrm{NIR}}+2.4\times {\mathrm{r}}_{\mathrm{PAR}}+1} $$
(2)
$$ {\mathrm{NIR}\mathrm{v}}_{\mathrm{B}}={\mathrm{NDVI}}_{\mathrm{B}}\times {\mathrm{r}}_{\mathrm{NIR}} $$
(3)
$$ {\mathrm{r}}_{\mathrm{PAR}}=\frac{{\mathrm{PAR}}_{\mathrm{out}}}{{\mathrm{PAR}}_{\mathrm{in}}} $$
(4)
$$ {\mathrm{r}}_{\mathrm{NIR}}=\frac{{\mathrm{SOLR}}_{\mathrm{out}}-{\mathrm{PAR}}_{\mathrm{out}}}{{\mathrm{SOLR}}_{\mathrm{in}}-{\mathrm{PAR}}_{\mathrm{in}}} $$
(5)

where rNIR and rPAR are the albedo of near-infrared and photosynthetically active radiation, respectively. PARout, PARin, SOLRout, and SOLRin are the reflected and incident PAR and solar radiation, respectively. The reflectance of blue band cannot be obtained by the broadband radiometers, we used the two band EVIB to substitute for the traditional EVI with blue band (Rocha and Shaver 2009). A moving window approach that assigned the 50th percentile of the values around noon (10:00–14:00 local time) within a 3-d window to the center day was used to smooth the broadband VI time series (Liu et al. 2019a; Sonnentag et al. 2012).

MODIS vegetation index calculation

The MODIS product of 500-m surface reflectance data (MOD09A1) was obtained from ORNL DAAC (ORNL DAAC 2018; Vermote 2015), which was an 8-d composite by selecting observations with favorable viewing geometry and minimal cloud cover. The MODIS VIs, i.e., NDVI (NDVIM), EVI (EVIM) and NIRv (NIRvM), were calculated using band 1 (red, 620–670 nm), band 2 (NIR, 841–876 nm), and band 3 (blue, 459–479 nm):

$$ {\mathrm{NDVI}}_{\mathrm{M}}=\frac{{\mathrm{r}}_{\mathrm{NIR}}-{\mathrm{r}}_{\mathrm{RED}}}{{\mathrm{r}}_{\mathrm{NIR}}+{\mathrm{r}}_{\mathrm{RED}}} $$
(6)
$$ {\mathrm{EVI}}_{\mathrm{M}}=\frac{2.5\times \left({\mathrm{r}}_{\mathrm{NIR}}-{\mathrm{r}}_{\mathrm{RED}}\right)}{{\mathrm{r}}_{\mathrm{NIR}}+6\times {\mathrm{r}}_{\mathrm{RED}}-7.5\times {\mathrm{r}}_{\mathrm{BLUE}}+1} $$
(7)
$$ {\mathrm{NIR}\mathrm{v}}_{\mathrm{M}}={\mathrm{NDVI}}_{\mathrm{M}}\times {\mathrm{r}}_{\mathrm{NIR}} $$
(8)

where rNIR, rRED and rBLUE are the reflectance of near-infrared, red and blue bands, respectively. The quality control of MOD09A1 removed all data that were flagged as cloud, cloud shadow or cirrus cloud, and the view zenith angle was constrained to < 60°.

Phenology, summer peak and vegetation index “production” estimation

A double-logistic model (Eq. 9) was used to fit the time series of GPP and VI (Fig. 2) and define the start and end of growing season (SOS and EOS). The SOS and EOS of GPP (SOSGPP and EOSGPP) were defined as the 25% of the maximum daily GPP in spring and autumn, respectively, because of the highest correlation with the IAV in GPP (Liu et al. 2021b). Considering the seasonal asynchrony between VI and GPP, the SOS and EOS of VI (SOSVI and EOSVI) were defined from 10% to 50% of amplitude at 5% intervals to compare with SOSGPP and EOSGPP. The growing season lengths of GPP and VI were calculated as the number of days from SOS to EOS. Additionally, the inflection point of curvature change was also used to test the relationship between phenology parameters defined by VI and GPP. Because the relationships between the VI and GPP phenology extracted from the change of curvature method, and between integral VI and annual GPP were much weaker than that for the threshold-method (Figs. A12), we then only focused on the threshold method.

$$ \mathrm{f}\left(\mathrm{t}\right)=\mathrm{a}+\frac{\mathrm{b}}{1+{\mathrm{e}}^{\left(\mathrm{c}-\mathrm{t}\right)/\mathrm{d}}}-\frac{\mathrm{g}}{1+{\mathrm{e}}^{\left(\mathrm{e}-\mathrm{t}\right)/\mathrm{f}}} $$
(9)
Fig. 2
figure 2

Seasonalities of gross primary production and vegetation indices during 2008–2018. NDVIB: broadband normalized difference vegetation index, EVIB: broadband enhanced vegetation index, NIRvB: broadband near-infrared albedo of vegetation, NDVIM: MODIS normalized difference vegetation index, EVIM: MODIS enhanced vegetation index, NIRvM: MODIS near-infrared reflectance of vegetation

where a is the background GPP or VI, b and g are the amplitudes of GPP or VI in spring and autumn, respectively; c and e are the midpoints for spring and autumn (day of year), respectively; d and f are the transitions curvature parameters.

It has been reported that the summer peak of GPP dominated the IAV in GPP across the northern hemispheric ecosystems (Xia et al. 2015; Zhou et al. 2016; Zhou et al. 2017), thus the peaks of VI were also compared with that of GPP. The integral and mean VIs, the proxies of “production” (Zhou 2020), were calculated as accumulating and averaging the fitted VI values across the growing season defined by different thresholds.

Testing the consistency between vegetation index and GPP

The R2 of linear regression was performed to assess the consistency of long-term trends between VI and GPP, and the mean bias and absolute deviation (MAD) were used to assess the differences in absolute date between VI and GPP phenology. A positive (negative) bias of VI phenology means that it is later (earlier) than GPP phenology.

$$ \mathrm{Bias}=\frac{\sum \limits_{i=1}^{\mathrm{N}}\left( VI\ phenology- GPP\ phenology\right)}{N} $$
(10)
$$ \mathrm{MAD}=\frac{\sum \limits_{i=1}^N\mid VI\ phenology- GPP\ phenology\mid }{N} $$
(11)

where N is the number of years (11 in this study). The relative importance of summer peak and growing season length to the IAV of “production” (GPP or integral VI) was quantified with a multiple linear regression analysis (Grömping 2006) based on variance decomposition, and then was compared between integral VI and GPP.

Results

Relationships between vegetation indices and annual GPP

The relationship between growing-season integral VI and annual GPP was markedly affected by VI type but not the definition threshold of VI magnitude (Fig. 3). Among the six tested VIs, the integral EVIM performed best for tracking the IAV in GPP, with a narrow range of R2 (0.60–0.67), followed by the integral NIRvM (R2 = 0.52–0.63). The integral NDVIB defined by the nine thresholds explained 44%–60% of the IAV in GPP, and the relationships between integral NDVIB defined from 30% to 50% threshold and annual GPP were highly conservative (R2 = 0.58–0.60). Nevertheless, the integral NDVIM only explained 23%–39% of the change of annual GPP, while the integral EVIB and NIRvB could not track the IAV in GPP (P > 0.05). The growing-season mean VIs performed generally poorly compared with their integrations (Fig. 3b). Moreover, defining the growing season by combining different thresholds of spring and autumn phenology did not improve the R2 (Fig. A3).

Fig. 3
figure 3

Determination coefficient (R2) of the regression of annual gross primary production (GPP) against the integral vegetation indices (VIs) or mean VIs for the growing-season defined by different thresholds. NDVIB: broadband normalized difference vegetation index, EVIB: broadband enhanced vegetation index, NIRvB: broadband near-infrared albedo of vegetation, NDVIM: MODIS normalized difference vegetation index, EVIM: MODIS enhanced vegetation index, NIRvM: MODIS near-infrared reflectance of vegetation

Partitioning the annul GPP into growing season length and GPP peak, the IAV in GPP was predominated by GPP peak (69%), followed by growing season length (19%). Using the VIs integrated across the growing season defined by the optimal thresholds as a proxy of GPP, only the NIRvM reflected the contributions of summer peak (69%) and growing season length (14%) to the “production” accurately. However, the other VIs overestimated or underestimated the contributions of summer peak and growing season length to “production” (Fig. 4).

Fig. 4
figure 4

Relative importance of the growing season length (GSL) and summer peak to the interannual variation in gross primary production (GPP) or integral vegetation index (VI). NDVIB: broadband normalized difference vegetation index, EVIB: broadband enhanced vegetation index, NIRvB: broadband near-infrared albedo of vegetation, NDVIM: MODIS normalized difference vegetation index, EVIM: MODIS enhanced vegetation index, NIRvM: MODIS near-infrared reflectance of vegetation

Comparisons of phenological metrics estimated by vegetation indices and GPP

The relationships between VI and GPP phenology changed with VI type, definition threshold, and season (Fig. 5). All the SOS and EOS defined by GPP and VI had no advanced or delayed trends during 2008–2018 (Figs. A4 and A5). In spring, the broadband VIs performed better than MODIS VIs in capturing the IAV of SOSGPP, with the corresponding R2 of 0.20–0.60 and 0–0.35. Among the tested thresholds for each broadband VI, the 45% of NDVIB amplitude, 35% of EVIB amplitude, and 50% of NIRvB amplitude showed the largest R2 (0.46, 0.60, and 0.54), with the positive biases of 5, 4, and 9 d, respectively. However, the SOS derived by the three MODIS VIs was poorly consistent with SOSGPP, regardless of the threshold.

Fig. 5
figure 5

Determination coefficient (R2) of the regression, mean bias and mean absolute deviation (MAD) of the phenology estimated from vegetation indices (VI) against that from gross primary production (GPP). SOS: start of growing season, EOS: end of growing season. NDVIB: broadband normalized difference vegetation index, EVIB: broadband enhanced vegetation index, NIRvB: broadband near-infrared albedo of vegetation, NDVIM: MODIS normalized difference vegetation index, EVIM: MODIS enhanced vegetation index, NIRvM: MODIS near-infrared reflectance of vegetation

In autumn, the relationships between EOSVI and EOSGPP were relatively insensitive to the definition threshold of EOSVI. The EOS of MODIS VIs generally had a higher R2 than that of broadband VIs (0.1–0.73 versus 0–0.76). The 35% of NDVIM amplitude (R2 = 0.73, bias = 6 d) and the 30% of NDVIB amplitude (R2 = 0.70, bias = 7 d) had the tightest relationship with EOSGPP for the MODIS and broadband VIs, respectively. Although the 15% of EVIB amplitude and the 10% of NIRvB amplitude also had high R2 (0.64 and 0.76), the biases were very large (15 and 28 d).

Comparisons of summer peaks of vegetation indices and GPP

The peaks of GPP and NDVIM significantly increased by a rate of 0.30 g C m-2 yr-2 (2%) and 0.004 yr-1 (0.4%), respectively, whereas the peaks of the other tested VIs had no significant trends (Fig. 6). Surprisingly, the peaks of VIs had very weak relationships with GPP peak (Fig. 7). The peaks of NIRvM and NDVIM were weakly consistent with GPP peak across the 11 years (R2 = 0.39 and 0.31, respectively), while the peaks of the other VIs could not track the IAV in GPP peak (R2 < 0.01).

Fig. 6
figure 6

Interannual variation in the summer peaks of gross primary production (GPP) and vegetation indices. NDVIB: broadband normalized difference vegetation index, EVIB: broadband enhanced vegetation index, NIRvB: broadband near-infrared albedo of vegetation, NDVIM: MODIS normalized difference vegetation index, EVIM: MODIS enhanced vegetation index, NIRvM: MODIS near-infrared reflectance of vegetation

Fig. 7
figure 7

Relationships between the summer peaks of gross primary production (GPP) and vegetation indices. NDVIB: broadband normalized difference vegetation index, EVIB: broadband enhance vegetation index, NIRvB: broadband near-infrared albedo of vegetation, NDVIM: MODIS broadband normalized difference vegetation index, EVIM: MODIS enhanced vegetation index. NIRvM: MODIS near-infrared reflectance of vegetation

Discussion

Tracking the variation in annual GPP by vegetation index

Our results indicated that the integral NIRvM was a robust proxy of the IAV of GPP among the six tested VIs, which was inaccordance with previous studies (Baldocchi et al. 2020; Wang et al. 2021). However, the moderate tight relationship between NDVIM and GPP did not support using the integral NDVIM as the proxy of GPP (e.g. Verma et al. 2014; Wylie et al. 2003; Zhou 2020). Changing the definition threshold of VI growing-season had little influence on the relationship between the integral VI and GPP at our site, because the IAV of GPP was predominated by GPP peak (which was poorly reflected by the tested VIs) rather than GPP phenology (Fig. 5; Xia et al. 2015; Zhou et al. 2016). However, for those sites with a larger contribution of GPP phenology than GPP peak, the definition threshold of growing season may be of significance.

The discrepancies in the IAV of the integral VI and GPP may be attributed to the differences in their seasonal trajectories. First, the weak ability of VI for charactering the summer peak (Fig. 7) reduced the relationship between the integral VI and GPP (Shi et al. 2017). Second, the rapid change of species composition may alter the VI-GPP relationship because of the species-specific optical properties (Mbow et al. 2013; Musavi et al. 2017). Over the 11 years, the litterfall mass of the pioneer species B. platyphylla decreased by 17%, while that of the two mid-successional species, U. japonica and F. mandshurica, increased by 28% and 24%, respectively (Calculated from Sun et al. 2021). Third, the photosynthetic rate was sensitive to environmental factors, so the GPP was not always high when the VI was high (Nagai et al. 2010). Additionally, the methods of gap filling and flux partitioning (Falge et al. 2001; Lasslop et al. 2010; Reichstein et al. 2005; Tramontana et al. 2020) may also introduce uncertainties in the VI-GPP relationship. Finally, improving the cloud screening and considering the fraction of diffuse radiation is imperative for accurately modeling GPP (Badgley et al. 2019; Huang et al. 2019). The solar-induced chlorophyll fluorescence (SIF) can improve the accuracy of GPP retrievals (Li and Xiao 2020; Lu et al. 2018; Yang et al. 2017). However, the SIF products are noisy and long-term SIF products are lacking, VIs are still the most widely used indicator of GPP (Lu et al. 2018).

The difference in spatial scale can also explain the mismatch between integral VI and GPP. The field of view of radiometers was constant, which was determined by the installation height. The limited height of tower resulted in a small viewing area (the radius of viewing field was 176 m). The EC footprint varied 10–100 times of the measurements height with atmospheric stability, wind speed, etc. (Schmid 2002). Marcolla and Cescatti (2018) suggested that radiometers should be taken about 6–15 times higher than turbulent flux ones, which was difficult to achieve. The four MODIS pixels matched better with EC footprint than the radiometers, but it was impossible to match completely. Additionally, the uncertainty caused by mixed-signals of MODIS is also a potential reason for the difference in MODIS VI and GPP by EC.

Characterizing the GPP phenology and summer peak by vegetation index

The six tested VIs had different abilities in capturing the IAV of SOSGPP and EOSGPP, which was consistent with previous studies (e.g. D’Odorico et al. 2015; Gonsamo et al. 2012; Zhao et al. 2020). In spring, the broadband VIs outperformed the MODIS VIs. This may be a result of the high uncertainty of SOS estimations by the MODIS VIs time series with a coarse temporal resolution and cloud contamination, since the phenology diverged between overstory and understory at our site (Liu et al. 2019b). Conversely, in autumn with more sunny days, the narrow band MODIS VIs were more sensitive to the changes of vegetation features than broadband VIs during leaf senescence period (Elvidge and Chen 1995). Among the three broadband VIs, the NDVIB was sensitive to background influences, thus was less satisfactory than EVIB and NIRvB (Chang et al. 2019; Yin et al. 2020). Contrary to the spring, the NDVI performed better than EVI and NIRv for both broadband and MODIS VIs in autumn. The low solar zenith angle in autumn at our mid-latitudinal site may introduce uncertainties for EVI (Sesnie et al. 2012).

The choice of threshold had little effect on the general trend of phenology (Figs. A45; Keenan et al. 2014), but changed both the relationships and absolute differences between VI and GPP phenology over the 11 years. In physiology, the mismatch between VI and GPP phenology is mainly attributed to the time lag between canopy greenness and C uptake. The VI (canopy greenness) generally occurred ahead of GPP in spring and persisted after cessation of GPP in autumn (Lu et al. 2018; Shen et al. 2014). Significant mismatch existed between VI and GPP phenology even though they had strong relationship spatiotemporally (across site-years) in most previous studies, partly because they used the same inflection point to define the VI and GPP phenology. For example, the SOSEVIM defined by the change of curvature was 10-d earlier than SOSGPP across deciduous broadleaved forests of AmeriFlux sites (Gonsamo et al. 2012; Shen et al. 2014). The NDVIM-, EVIM-, and NIRvM-derived EOSs defined by 20% threshold were later than EOSGPP by 23, 16, and 15 d, respectively, across the 19 tested deciduous broadleaf forests (Yin et al. 2020). Changing the definition threshold of VI reduced the mismatch between VI and GPP phenology (Fig. 5). Therefore, we recommend using the threshold-method to define the VI and GPP phenology. First, the relationship between VI and GPP phenology by the threshold-method was higher than that by the change of curvature method (Figs. 5 and A2), which was partly consistent with the situation of NDVI (D’Odorico et al. 2015). Second, the curvature-method may fail to extract SOS and EOS when the curvature of the fitted curve do not have inflection point (Shen et al. 2014). Despite of the hysteresis in seasonal VI and GPP patterns, using appropriate definition threshold of VI magnitude can effectively estimate the GPP phenology. Jointly considering the R2, bias and MAD, we recommend that using the 25%–35% of EVIB amplitude to track the SOSGPP, and using 50% amplitude of NDVIB and NDVIM to capture the EOSGPP.

However, the peaks of the six tested VIs could not characterize the IAV of the GPP peak. Although the peaks of NDVIM and NIRvM were significantly correlated with GPP peak, they could not capture the long-term increasing trend. There are a few reasons for the failure of VI peak in indicating the GPP peak. First, the VI (particularly the NDVI) saturates in summer, thus it only detects the sunlit leaves of overstory but not the shaded leavers of overstory and understory (D’Odorico et al. 2015). Although the NIRvM is sensitive at high leaf area index, the cloud contamination is still a problem for satellite data (Badgley et al. 2019), particularly for our site usually with a wet summer (Liu et al. 2021b). Second, the photosynthetic capability of vegetation may vary with environmental drivers even when vegetation spectral properties were similar (Zhao et al. 2020). These results indicate the peaks of VIs should be precautionarily used as proxies of the IAV of the peak of ecosystem activity or GPP.

As far as we know, this is the first comprehensive study to investigate the abilities of VIs in capturing the IAV of the phenology, summer peak and annual flux of GPP. These results may be not universal, but has great implications for future remote sensing studies on GPP. These thresholds can be tested using a larger dataset, for example, with the dataset of FLUXNET (Baldocchi et al. 2001).

Conclusion

The abilities of tower-based broadband and MODIS VIs on tracking the IAV of GPP were comprehensively evaluated in the temperate deciduous forest. The growing-season integral NIRvM best represented the IAV in annual flux of GPP, and its performance was insensitive to the threshold of magnitude. In contrast, both VI type and threshold of magnitude were important for accurately modeling the GPP phenology, of which the 35% of EVIB amplitude in spring and 50% of NDVIM amplitude in autumn performed best. The peaks of the six tested VIs could not reflect the IAV in GPP peak. We concluded that using appropriate VI (and thresholds of magnitude) could improve the ability of monitoring annual GPP (and GPP phenology) in deciduous forests.