Background

Observations and model simulations are two complementary tools used to describe ecosystem processes. The process-based biogeochemical model (i.e., Biome-BGC) is a diagnostic and predictive tool for quantifying carbon and water fluxes in forest ecosystems and for assessing the effects of changing atmospheric/climatic environments on terrestrial ecosystems (Running and Coughlan 1988; Thornton and Rosenbloom 2005; Barcza et al. 2009). The global terrestrial carbon stock in vegetation and soil amounts to 2500 Gt (Hairiah et al. 2011). As a major carbon reserve in terrestrial ecosystems, forest carbon includes living trees above and below the ground, standing dead trees, woody debris and litter, and soil (IPCC 2006; Pukkala 2018). Among them, soil accounts for 60% to 80% of the forest ecosystem carbon (Asseffa et al. 2013); therefore, a better understanding of soil carbon fluxes is essential for forest management and global climatic variation mitigation (Davidson and Janssens 2006).

As one of the most important modules in biogeochemical models, the soil module simulates the composition of dead plant material and soil organic matter (SOM) as well as nitrogen (N) mineralization and balance (Running and Gower 1991). As reported by Meentemeyer (1984) and Ise and Moorcroft (2006), soil-related processes are controlled mainly by soil temperature and moisture, which are key process variables in biogeochemical models and are linked through coupled carbon and water balances. Soil temperature is the most important factor regulating belowground respiration, which integrates both autotrophic and heterotrophic respiration (HR) processes (Hursh et al. 2017). The effects of soil moisture on soil respiration are more complicated, and are chiefly determined by root and litter mass as well as soil organic carbon, density, and porosity. Soil hydrological and carbon cycles of ecosystems are coupled owing to the interactions among soil temperature, soil moisture and soil respiration (Hidy et al. 2016a, 2016b). Therefore, accurate estimation of soil temperature and moisture is a crucial requirement in biogeochemical models.

However, most soil modules in biogeochemical models (e.g., Biome-BGC) are based on a one-layer bucket model that considers only plant uptake, canopy interception, snowmelt, outflow, and soil transpiration among the various soil hydrological processes. Several studies have indicated the need for improvement to the soil module. For example, Pietsch et al. (2003) extended the applicability of Biome-BGC to consider the effect of water infiltration from groundwater and seasonal flooding in forest areas. Wang et al. (2014) coupled Biome-BGC with a multi-layer hydrological model (SHAW) through the exchange of key variables (e.g., leaf area index (LAI), soil temperature, and moisture). Validation with eddy covariance (EC) flux measurements proved that the integration of the two models could enhance the performance of carbon and water fluxes.

The Biome-BGC process-based biogeochemical model is applied in this study to estimate forest ecosystem carbon and water fluxes. Since research by Hidy et al. (2016a, 2016b) was published on the newly developed modules for Biome-BGC and their ability to manage vegetation, various model versions of Biome-BGC MuSo, v1.0through v4.1, have been published in quick succession. These versions extended the modules by adding multi-layer soil processes (e.g., percolation, diffusion, and groundwater), simulating management activities such as harvesting, plowing, and fertilizing; and adjustment of other plant-related processes such as phenology (Hidy et al. 2016a, 2016b). In the multilayer soil module, the soil carbon and hydrological processes of each layer depend on soil texture, although parameterization of these processes remains challenging, particularly over large areas with high resolution (Lu et al. 2017). This leads to inaccurate simulations of soil temperature and moisture content, thereby biasing the estimates of carbon and water fluxes.

Errors and uncertainties inevitably exist in biogeochemical modeling, depending on the inputs, model structure, and model parameters. Data assimilation is an effective method for integrating available multi-source observations with models because errors in both the observations and models are considered, which improves the estimates. The observations and model estimates provide various types of information over different time and spatial scales. The ensemble Kalman filter (EnKF), an extended Kalman filter, is a popular data fusion algorithm formulated by Evensen (1994, 2003). This method is based on the Monte Carlo estimation and uses recursive data processing to track the model error statistics using an ensemble of model state variables. By updating the state variables periodically by using observations, the EnKF can be used to improve the performance of the biogeochemical model without changing its original structure.

Recently, observed soil parameters have been successfully assimilated into terrestrial and hydrological models, either by updating the relevant variables in the model or by adjusting the initialization and parameterization of the model. Yu et al. (2014) improved the model’s performance by assimilating observed soil temperature and Moderate Resolution Imaging Spectroradiometer (MODIS) land surface temperature into the Common Land Model using the Ensemble Particle Filter. Zhu et al. (2017) investigated the influences of assimilation of multi-scale soil moisture into a hydrological model and proved that coarse-scale soil moisture observations could also help to identify the parameters and states of the water flow model. Ines et al. (2013) assimilated soil moisture and LAI independently and simultaneously into a crop model using the EnKF to control model runs and to update model variables. The results demonstrated that crop yield prediction errors were reduced significantly after assimilation. However, common data assimilation schemes focus mainly on using above-ground state variables such as LAI, land surface moisture, and temperature and apply a one-layer soil bucket model. This poses a structural problem and introduces considerable uncertainty, as previously mentioned. Thus far, assimilation of multi-layer soil variables into Biome-BGC MuSo to improve the simulation of carbon and water fluxes had not been attempted.

In this study, we provide a strategy for simulating carbon and water fluxes using a process-based biogeochemical model and multi-source data to alleviate model uncertainties. Specifically, considering the drawbacks of the simple soil sub-model in Biome-BGC, the Biome-BGC MuSo model with a multi-layer soil module was used to estimate carbon and water fluxes at the Changbai Mountains forest flux site between 2003 and 2007. In order to improve the simulations, we assimilated daily multi-layer soil temperature and moisture data into Biome-BGC MuSo using EnKF. Finally, the simulated carbon fluxes based on ecosystem respiration (ER) and net ecosystem exchange (NEE) and the water fluxes based on evapotranspiration (ET) were evaluated using eddy covariance (EC) flux measurements. The credibility of this assimilation strategy was tested using three-dimensional analysis to evaluate the differences in root mean square error (RMSE) and related climatic and biophysical factors.

Study area

The studied forest flux site is located in the National Natural Conservation Park of Changbai Mountains of Jilin Province, China (42°24′9″N, 128°05′45″E), as shown in Fig. 1. The climate in the Changbai Mountains is temperate and continental, and is influenced by the monsoon. Its annual average precipitation is approximately 713 mm, and precipitation occurs mainly over June to August. The annual average temperature is 3 .6°C. The terrain surrounding the forest flux site is flat, with an average elevation of 738 m. According to the average seasonal patterns shown in Fig. 2, the strong seasonal variability experience at this site provides an opportunity to evaluate the performance of the proposed data assimilation under varying climatic and biophysical conditions. Thus application of this method to other forest ecosystems can be considered. The forest land is covered predominantly with temperate broadleaf Korean pine forest consisting mainly of Pinus koraiensis, Tilia amurensis and Fraxinus mandshurica (Wang et al. 2005).

Fig. 1
figure 1

Location of the forest flux site

Fig. 2
figure 2

Average seasonal patterns of climatic and biophysical factors at Changbai Mountains forest flux site. a Seasonal variations of Temp; b Seasonal variations of Precip and PAR; c Seasonal variations of LAI; d Seasonal variations of soil temperature and moisture

Data

An EC system was set up on a 62-m high tower, and sensors were fixed on a boom located at a height of 40 m that extended 3 m upwind of the tower to minimize flow distortion (Wang et al. 2005). One sensor was used to gauge the carbon dioxide (CO2) concentration profile and the other served to record routine meteorological observations (Wu et al. 2005). Meteorological data were measured continuously from 2003 to 2007 using an open path EC system at the forest flux tower. The dataset included air temperature (Temp), relative humidity (RH) (Model HMP45C, Campbell Scientific Inc., Utah, USA), precipitation (Precip) (Model 52,203, Rm Young, Traverse City, Michigan, USA), wind speed, and direction. Photosynthesis active radiation (PAR) was measured with a quantum sensor (Model LI190SB, LI-COR Inc., Lincoln, Nebraska, USA). Other meteorological factors, including vapor pressure deficit (VPD), incoming shortwave radiation (Srad), and day-length (from sunrise to sunset), were calculated using the MTCLIM 4.3 based on measured daily maximum and minimum temperatures and precipitation (Running et al. 1987; Thornton and Running 1999).

The observed soil variables included three-layer temperature and moisture data at depths of 5, 20 and 40 cm, respectively. At the Changbai Mountains forest flux site, the top soil layer (5 cm) is dominated by litter fall, humic substances distribute between 5 and 20 cm depths, and an albic soil layer distributes between 20 and 40 cm depths. Soil water upward and downward movements occur mainly among the three active layers because of the poor penetration of the albic soil layer. The soil layer below 40 cm is dominated by loess, the distribution of vegetation roots over the Changbai Mountains is negligible (Wang and Pei 2002; Zhao et al. 2013). Therefore, three layers of soil moisture were collected using a Micrologger for data acquisition (CR23X-TD, Campbell Scientific Inc., Utah, USA) at a frequency of 30 min at the forest flux site. Then, these 30-min data were averaged on a daily basis.

Water vapor densities and CO2 were measured using an open path system from 2003 onwards at the forest flux site. The open path EC system contained a three-dimensional sonic anemometer (CAST3, Campbell Scientific Inc., Logan, Utah, USA) and a fast-responding open path infrared gas analyzer (LI-7500, LI-COR Inc., Lincoln, Nebraska, USA). The collection frequencies for raw flux data and climate data were 10 and 0.5 Hz, respectively. The 30-min averaged values of each variable were calculated. A series of preprocessing steps was conducted, including outlier removal, coordinate rotation, time lag analysis, frequency response calibration, and Webb–Pearman–Leuning (WPL) correction (Wang et al. 2005). Half-hourly net CO2 exchange and energy fluxes including latent and sensible heat fluxes, were calculated using EdiRe software. Specifically, to estimate the night-time net CO2 exchange, the net CO2 exchange was regressed with the air or soil temperature using an exponential function. The built model was then used to calculate the ER. Then, the NEE and ER were summed to estimate the ecosystem gross primary productivity (GPP). The daily flux of ET can be expressed in equivalent units of both energy (W∙m− 2) and water (kg∙m− 2 or mm∙s− 1). The conversion from latent energy flux (LE, W∙m− 2) to ET (mm∙s− 1) is calculated as ET = LE/λ, where λ is the latent heat of evaporation (Mu et al. 2007, 2009).

EC measurements of carbon and heat fluxes and observations of meteorological and soil data during 2003 to 2007 were thus collected at the Changbai Mountains forest flux site. Additionally, soil texture data were collected from the soil texture map of China with a spatial resolution of 1 km, which was downloaded from the Resource and Environment Data Cloud Platform website. The time series LAI products from 2003 to 2007, with a spatial resolution of 1 km, were provided by the Center for Global Change Data Processing and Analysis of Beijing Normal University. Digital elevation model (DEM) data were obtained from Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM). Latitude and topography were calculated using the DEM at the forest site.

Methods

First, we simulated the carbon and water fluxes using the calibrated Biome-BGC model at the Changbai Mountains forest flux site. Then Biome-BGC MuSo with multi-layer soil was applied in the simulation. Third, the daily soil temperature and moisture were assimilated into Biome-BGC MuSo. The performances of simulated carbon and water fluxes were evaluated by EC measurements. Finally, three-dimensional relationships among ΔRMSE and climatic and biophysical factors were analyzed. Figure 3 represents the overall methodology in this study, the details of which are presented in subsequent sections.

Fig. 3
figure 3

Overall technique flowchart

Biome-BGC MuSo model

The Biome-BGC Multi-layer Soil Module version 4.1 (Biome-BGC MuSo v4.1) was developed to improve its ability to simulate carbon and water cycles within terrestrial ecosystems. Biome-BGC MuSo v4.1 improved the multi-layer soil module, and introduced the management and phenological modules. These three modules are independent of each other in the model. In this study, the management module was deactivated during the spinup and normal simulation for the forest. Hence, the logical values of planting, thinning, mowing, grazing, harvesting, ploughing, fertilizing, and irrigation were set to 0 (flag = 0). The thicknesses of the layers from the surface to the bottom were 5, 15, and 20 cm. Thus, the first, second, and third layers were located at depths of 0–5, 5–20 and 20–40 cm, respectively.

This model runs with a daily time step and requires four input files for execution. The first file is the initialization file containing basic site-related information (e. g., elevation, soil texture, CO2 concentration, and N-decomposition data). The second file is the daily meteorological data file and includes daily air maximum temperature, minimum temperature, precipitation, VPD, solar radiation and day length. The third file is the ecophysiological file and includes the ecophysiological parameters (e.g., ratio of leaf carbon to nitrogen, fine roots and coarse roots, fraction of leaf N in the Rubisco catalytic enzyme, and the maximum stomatal conductance). In this study, the ecophysiological parameter values in Biome-BGC MuSo were determined by the optimized results during the model run. The last input file is a special restart file, which is the output of the spinup and provides inputs for running the model under normal situations. The spinup phase was first performed using the meteorology covering the period 1981 to 2002 obtained from the Data Center of Chinese Meteorological Bureau, and the output endpoint is the input for normal simulation covering the period 2003 to 2007.

In the carbon flux module of the Biome-BGC MuSo model, GPP is calculated using Farquhar’s photosynthesis routine and data on the catalytic enzyme Rubisco in relation to temperature (Farquhar et al. 1980). Photosynthesis is the only process whereby the model can provide carbon into all of the pools. Root maintenance respiration was calculated layer-by-layer using the soil water content (SWC) and soil temperature of each active layer (which differs from the averaged soil water status or soil temperature of the whole soil in the original Biome-BGC model). Growth respiration (GR) in the model was considered as the proportion of all new tissue growth, which was 30% (Larcher 2003).

The net primary productivity (NPP) was calculated using GPP, MR, and GR in the model. The carbon storage of the ecosystem originates from the balance between NPP and heterotrophic respiration (HR), which are regulated by decomposition activities. All litter and soil pools decompose through HR. NEE represents the difference between NPP and HR.

The soil flux module generally describes the decomposition of dead plant material, or litter, in addition to SOM, N mineralization, and N balance (Schwalm et al. 2015). Soil hydrology has significant effects on many soil processes (e.g., SOM, N mineralization, and soil evaporation), and thereby on the carbon and water cycles. Therefore, accurate description of soil hydrology is essential. In the original Biome-BGC model, the soil layer works as a “bucket”, and the soil water flux considers only canopy, interception, snowmelt, outflow, and soil evaporation. Therefore, runoff, percolation, diffusion, pond water formation, and transpiration were added into Biome-BGC MuSo.

The movement of water that occurs within the soil is known as percolation and diffusion. Biome-BGC MuSo implements two calculation methods for soil water movements. The first is based on Richards’ equation (Balsamo et al. 2009). The second, the so called “tipping bucket method” (Ritchie 1998), is based on the semi-empirical estimation of percolation and diffusion fluxes and is generally used in crop modeling. Hydraulic conductivity (K) and hydraulic diffusivity (D) are used in diffusion and percolation calculations in the first method based on the diffusion equation based on Darcy’s diffusion law:

$$ \frac{\partial \theta }{\partial t}=\frac{\partial }{\partial z}\left[D\left(\theta \right)\bullet \frac{\partial \theta }{\partial z}\right]+\frac{\partial K}{\partial z}+S\left(\theta \right)\kern12em $$
(1)

where D is the hydraulic diffusivity (m2∙s− 1), K is the hydraulic conductivity (m∙s− 1) and S represents the source and sinks of soil water such as precipitation, evaporation, transpiration, runoff, and deep percolation. The Clapp-Hornberger formulation (Clapp and Hornberger 1978) was used to calculate K and D. These variables change rapidly and significantly as the SWC change. K and D were determined for each layer; the layer-integrated daily scale form was solved by this method of finite differences. The Richards equation was used to investigate soil water movements in this study.

Surface runoff occurs when the rate of rainfall exceeds the rate of water infiltrating the soil. Runoff simulation was conducted using the semi-empirical method (Williams 1991). Under the conditions of intensive rainfall, when not all of the precipitation can infiltrate, pond water forms the surface. In Biome-BGC MuSo, evaporation of pond water is assumed to be equal to potential soil evaporation.

The soil temperature of each active layer was calculated using two methods. The first method involved logarithmic downward dampening of temperature fluctuations within the soil (Zheng et al. 1993). In this method, the soil surface temperature is determined by air temperature changes considering the insulating effect of snowcover and the shading effect of vegetation. The temperature of intermediate soil layers is calculated under the conditions of linear temperature change between soil layer depths of 0 cm and 3 m. The soil temperature below 3 m in the model is assumed to be the mean annual air temperature. The other method, uses DSSAT/4 M (Sándor and Fodor 2012) to empirically calculate the soil temperature. Because the former method is preferred (Zheng et al. 1993), we selected the same in this study and compared the results with measurements obtained at the Changbai Mountains forest flux site.

Ensemble Kalman filter

The EnKF algorithm, used mainly to forecast the error covariance of a model, is based on the Monte Carlo method (Evensen 2003), and can integrate multi-source observations sequentially in time. The basic assumptions of this algorithm are that system and measurement noises are both based on white and Gaussian distributions. It is assumed that the N ensembles first generated from the background and observations are initialized to time t0, and that the ensembles of the state variables are acquired by adding noise directly (Eq. 2). Then, independent model runs are invoked. For each model run, each time a new observation becomes available, and the analysis and regeneration of the state variables are conducted at time t–1, i.e., before the prediction of the state variables at time t. EnKF involves forecasting and measurement updates, and comprises five steps, as given below.

  1. (1)

    Initialization of the ensemble

The N ensembles to be generated are first defined. The state variable x is calculated at time t0 as follows:

$$ {x}_{t_0,i}=\overline{x_{t_0,i}}+{p}_i $$
(2)
$$ {p}_i\sim N\left(0,\sigma \right) $$
(3)

where xt0, i is the initialized state vector at time t0; \( \overline{x_{t0,i}} \) is the expectation in background; pi represents the noise, and is distributed as Gaussian values with a mean of 0 and a variance of σ.

  1. (2)

    Forecasting

The state variables are predicted at time t using input data (time t – 1) and the model operator (Biome-BGC MuSo model):

$$ {x}_{i,t}^f={x}_{i,t-1}^a+{B}_t{\mu}_i $$
(4)

where \( {x}_{i,t}^f \) is the forecasted state vector at time t, with superscript f referring to the forecasted value; Ft denotes the model operator; \( {x}_{i,t-1}^a \) is the analyzed state value at time t – 1, with superscript a representing the analyzed value; Bt is the control matrix, which applies the effect of each control input parameter in vector μi on the state vector; and μi represents the model error, which follows a Gaussian distribution.

Uncertainties of noise in EnKF are reflected by the covariance matrix, with consideration of the error propagation at any time (Moradkhani et al. 2005). The covariance matrix is calculated during the entire forecasting process according to its properties as

$$ {P}_t^f={F}_t{P}_{t-1}^a{F}_t^T+{Q}_t\kern20.5em $$
(5)

where \( {P}_t^f \) is the covariance matrix at time t, and Qt is the covariance.

  1. (3)

    Calculation of the Kalman gain matrix

The core of data assimilation lies in the Kalman filter system, and it is assumed that observations are related to the true state. Therefore, the following expression applies for adding observations to the model at time t:

$$ {Z}_t={H}_t{x}_{i,t}^f+{v}_t $$
(6)

where Zt is the observation vector at time t, and Ht is the operator that maps the model variable space to the observation space. vt is a Gaussian random error vector with mean zero and observation error covariance R.

The Kalman gain matrix defined as

$$ {K}_t={P}_t^f{H}_t^T{\left(H{P}_t^f{H}_t^T+{R}_t\right)}^{-1} $$
(7)

The EnKF forecast and analysis error covariance are acquired directly from the ensemble of model simulation as

$$ {P}_t^f=E\left[\left({x}_{i,t}^f-{\overline{x}}_t^f\right){\left({x}_{i,t}^f-{\overline{x}}_t^f\right)}^T\right]=\frac{1}{N-1}{\sum}_{i=1}^N\left({x}_{i,t}^f-{\overline{x}}_t^f\right){\left({x}_{i,t}^f-{\overline{x}}_t^f\right)}^T $$
(8)
$$ {H}_t{P}_t^f{H}_t^T=\frac{1}{N-1}{\sum}_{i=1}^N\left[{H}_t\left({x}_{i,t}^f\right)-{H}_t\left({\overline{x}}_t^f\right)\right]{\left[{H}_t\left({x}_{i,t}^f\right)-{H}_t\left({\overline{x}}_t^f\right)\right]}^T $$
(9)

The variance is based on the uncertainty of the data. Kalman gain at time t (Kt) is expressed in Eq. 9 and Rt is the covariance of Zt.

  1. (4)

    Analysis and update

Under the above assumptions, the estimated state and error covariance using the Kalman gain are updated as

$$ {x}_{i,t}^a={x}_{i,t}^f+{K}_t\left({Z}_t-{H}_t{x}_{i,t}^f\right) $$
(10)
$$ {P}_t^a=\left(1-{K}_t{H}_t\right){P}_t^f $$
(11)
  1. (5)

    Repeat of steps (2), (3) and (4)

Iterations are established when running the algorithm from steps (1) to (5).

Data assimilation scheme

In this study, the assimilations of soil temperature and moisture were implemented using Eq. 10, with H equal to (1 1 1 1)T. Once the daily soil temperature and moisture data were available, the model run was interrupted, EnKF updated the Biome-BGC MuSo state variables, and the simulation was re-initialized with the updated states and re-run until the next update was available. All the simulations were conducted from 2003 to 2007. An uncertainty of 10% for model parameters was considered and perturbed based on the Gaussian distribution (White et al. 2000). Sequential assimilation of observed data can be used to correct some uncertainty involved in model parameters (Das et al. 2008). The ensemble members were generated by randomly sampling model parameter combinations from the perturbed arrays (Ines et al. 2013). Two hundred ensemble members were selected to optimize the EnKF framework’s performance in terms of accuracy and computational time. Errors of the soil observations were obtained from the literature (Wang and Pei 2002).

We assimilated daily soil temperature and moisture to increase the numbers of observations, and we update the modeled soil respiration and transpiration. In Biome-BGC MuSo, soil temperature (Tsoil) is a key parameter for calculating root respiration. Thus,

$$ \mathrm{MR}=\sum \limits_1^{n_r}\left({N}_{\mathrm{root}}\bullet {M}_{\mathrm{layer}}\bullet \mathrm{mrpern}\bullet {Q}_{10}^{\frac{T_{\mathrm{soil}\left(\mathrm{layer}\right)}-20}{10}}\right) $$
(12)

where nr is the number of soil layers, Nroot is the total N content of the soil, Mlayer is the proportion of the total root mass in the given layer, mrpern is an adjustable ecophysiological parameter, Q10 is the fractional change in respiration with a temperature change of 10 °C, and Tsoil(layer) is the soil temperature of the given layer. The input of daily soil temperature updated the root respiration using the updated Eq. 10, and the updated variable was used to calculate ER for the next step.

Soil moisture was calculated using the volumetric water content (VWC), soil layer thickness, and water density in Biome-BGC MuSo. Assimilation of the daily SWC in the spinup is converted into the VWC array, which in turn provides reliable SWC during the model simulation phase through the restart file.

Once the daily observations were assimilated into the model, the initialization processes were implemented, and the soil variables were corrected on a daily basis throughout model runtime. This study compared normal simulations using calibrated Biome-BGC and Biome-BGC MuSo and simulations that assimilated soil temperature and moisture. All simulations were conducted for the period 2003–2007.

Evaluation and analysis of modeled estimates

To evaluate the simulated carbon and water fluxes, we used the results derived from EC measurements as ground truth observations, and we calculated R2, Eq. 13; RMSE (Eq. 14); and relative error (RE), Eq. 15 to evaluate the accuracy of each model simulation. Additionally, a significance test (p-value) was conducted to disprove the concept of “chance” and to reject a null hypothesis by adhering to the observed patterns.

$$ {R}^2=1-\frac{\sum_{i=1}^t{\left({X}_{\mathrm{obs}}-{X}_{\mathrm{mod}}\right)}^2}{\sum_{i=1}^t{\left({X}_{\mathrm{obs}}-\overline{X_{\mathrm{mod}}}\right)}^2} $$
(13)
$$ \mathrm{RMSE}=\sqrt{\frac{\sum_{i=1}^t{\left|{X}_{\mathrm{obs}}-{X}_{\mathrm{mod}}\right|}_i^2}{t}}\kern5.25em $$
(14)
$$ \mathrm{RE}=\raisebox{1ex}{$\left|{X}_{\mathrm{mod}}-{X}_{\mathrm{obs}}\right|$}\!\left/ \!\raisebox{-1ex}{${X}_{\mathrm{obs}}$}\right.\kern11.25em $$
(15)

In these equations, Xobs is the observation made at the forest flux site; Xmod is the simulated carbon or water flux, and i is the day of the year. t refers to the total number of days or day windows within one year.

We also analyzed the data assimilation performance of by comparing the difference (ΔRMSE) between RMSEDA and RMSEMuSo. A moving window of 15 days was used here. A positive ΔRMSE indicates that the accuracy of the model simulation was improved by our proposed data assimilation stratagem and vice versa. We examined the relationships of ΔRMSE with varying climatic forcings including Temp, Precip, and PAR and three biophysical factors such as soil temperature, soil moisture, and LAI. Therefore, this analysis addressed the situations showing the most significant improvements after assimilating soil temperature and moisture, thereby providing insights to the application of the proposed method to other forest ecosystems.

Results

The daily observed soil temperature and moisture from 2003 to 2007 were assimilated into the Biome-BGC MuSo model using the EnKF with an assimilation window of one day. When the size of the ensemble was larger than 200, R2 and the RMSEs between the predicted carbon and water fluxes and the EC measurements reached approximately stable values. The uncertainties in the observed soil temperature and moisture were determined according to (Wang et al. 2002). The variances in different soil temperatures of 8.03, 6.75 and 5.58 °C and moisture levels of 0.112, 0.116, and 0.049, corresponding to 5-, 20- and 40- cm soil depth layer, respectively, were calculated using 30-min observations and were applied to the EnKF algorithm. The model error was estimated simultaneously to be − 0.32 to 0.44, with a variance of 0.616 in the EnKF.

Evaluating modeled ET with EC measurements

Overall, the original Biome-BGC model underestimated forest ET as shown in Fig. 4a, and the annual average value of 313.04 mm·yr.− 1 is obviously lower than that of ET_EC, at 448.52 mm·yr.− 1. The forest ecosystem never experienced soil saturation as per Biome-BGC; this condition is incompatible with the actual conditions in winter and early spring, when deep soil usually converts to permafrost in the Changbai Mountains. According to the coefficient analysis and T-test (Fig. 4b, c, d), ET in Biome-BGC MuSo was improvement, particularly in the growing seasons, with R2 = 0.72, RMSE = 0.90 mm·d− 1, and p < 0.01, compared with the Biome-BGC values of R2 = 0.68, RMSE = 1.15 mm·d− 1, and p < 0.01. Stomatal closure occurred as a result of anoxic conditions, which was not considered in the original model; heavy precipitation usually occurs in the Changbai Mountains during summer. The Biome-BGC MuSo model characterized this aspect. After the optimal soil moisture content was attained, the soil stress index decreased owing to saturation soil stress, which is a characteristic of anoxic soil. Furthermore, with the assimilation of observed soil moisture, the simulation of soil transpiration improved, which promoted the enhancement of ET compared with the EC measurements. In this case, the variables were R2 = 0.81, RMSE = 0.70 mm·d− 1, and p < 0.01, and the annual average ET, 450.48 mm·yr.− 1, was close to ET_EC (Table 1).

Fig. 4
figure 4

ET results from various models. a Season variations of ET obtained from EC measurements, calibrated Biome-BGC, Biome-BGC MuSo and assimilated Biome-BGC MuSo; b Comparison and validation of ET values from EC measurements and the calibrated Biome-BGC model; c Comparison and validation of ET values from EC measurements and Biome-BGC MuSo model; d Comparison and validation of ET values from EC measurements and the assimilated Biome-BGC MuSo model

Table 1 Annual and seasonal ET derived from EC and each model during 2003 to 2007

Evaluating modeled carbon fluxes with EC flux

The daily EC measurements obtained during 2003–2007 were used to evaluate the simulated fluxes of the Changbai Mountains forest flux site. Compared with the daily EC measurements, the calibrated Biome-BGC (ER_Cali) significantly overestimated the ER (Fig. 5); the annual average ER was 1868.55 gC·m− 2·yr.− 1, which is significantly higher than ER_EC, at 1035.55 gC·m− 2·yr.− 1 (Table 2). Furthermore, the overestimation was particularly prominent in summer, and the average value of ER_Cali, at 1004.88 gC·m− 2·yr.− 1, was nearly twice that of ER_EC, at 578.43 gC·m− 2·yr.− 1. In the original model, SOM decomposition was affected by soil temperature, moisture, soil carbon and N content, whereas root maintenance respiration was influenced by soil temperature as well as the carbon and N content. In Biome-BGC MuSo, the soil temperature and moisture affect HR and are calculated layer-by-layer using soil temperature and the SWC of each active layer. Accordingly, the estimate for ecosystem respiration (ER_MuSo), at R2 = 0.81, RMSE = 2.50 gC·m− 2·d− 1, and p < 0.01, showed improvement over ER_Cali, at R2 = 0.78, RMSE = 3.24 gC·m− 2·d− 1, and p < 0.01. Along with the inputs of observed daily soil temperature and moisture, the variations in ER_DA were constrained at both seasonal and annual scales. In particular, the value in the summers was at 850.30 gC·m− 2·yr.− 1, and the annual value was 1467.05 gC·m− 2·yr.− 1. This led to improvements in the respiration estimates, at R2 = 0.85, RMSE = 1.97 gC·m− 2·d− 1, and p < 0.01 over the ER_MuSo.

Fig. 5
figure 5

ER results from various models. a Season variations of ER obtained from EC measurements, calibrated Biome-BGC, Biome-BGC MuSo and assimilated Biome-BGC MuSo; b Comparison and validation of ER values from EC measurements and the calibrated Biome-BGC model; c Comparison and validation of ER values from EC measurements and Biome-BGC MuSo model; d Comparison and validation of ER values from EC measurements and the assimilated Biome-BGC MuSo model

Table 2 Annual and seasonal ER derived from EC and each model during 2003 to 2007

According to the EC measurement, the forest site served as a carbon sink in 2003 and 2004, with average total NEE values in winter of 9.76 gC·m− 2·yr.− 1 and 2.13 gC·m− 2·yr.− 1, respectively. This result indicates that photosynthesis exceeded the vegetation respiration under low-temperature conditions. The three modes captured the daily patterns in NEE, as indicated by the EC measurements. The simulated carbon exchange with the atmosphere derived from the calibrated Biome-BGC (NEE_Cali), Biome-BGC MuSo (NEE_MuSo), and assimilated Biome-BGC MuSo (NEE_DA) models were evaluated against EC measurements (Fig. 6). In general, NEE_Cali, NEE_MuSo and NEE_DA captured the same seasonal pattern for the carbon sink and source at this forest site (Fig. 6a). According to the R2 and T-test results shown in Fig. 6b–d, NEE_DA agreed the best with EC flux measurements, with R2 = 0.70, RMSE = 1.16 gC·m− 2·d− 1, and p < 0.05, followed by NEE_MuSo and NEE_Cali, at R2 = 0.67 and 0.64, RMSE = 1.23 gC·m− 2·d− 1 and 3.34 gC·m− 2·d− 1, and p < 0.05 and < 0.01, respectively.

Fig. 6
figure 6

NEE results from various models. a Season variations of NEE obtained from EC measurements, calibrated Biome-BGC, Biome-BGC MuSo and assimilated Biome-BGC MuSo; b Comparison and validation of NEE values from EC measurements and the calibrated Biome-BGC model; c Comparison and validation of NEE values from EC measurements and Biome-BGC MuSo model; d Comparison and validation of NEE values from EC measurements and the assimilated Biome-BGC MuSo model

Statistically, the annual and seasonal average NEE during 2003–2007 obtained from EC measurements and the three modes shown in Table 3. Additionally, REs were calculated between the simulated and measured NEEs, which indicated that annual average NEE from NEE_DA, with RE = 14.9%, outperformed those from NEE_MuSo and NEE_Cali, with RE = 15.2% and 23.6%, respectively. NEE_Cali presented a significant underestimate, particularly in summer and winter, with RMSE = 1.85 gC·m− 2·d− 1 and 0.54 gC·m− 2·d− 1, respectively. However, the underestimate for NEE was mitigated in NEE_MuSo and NEE_DA, with RMSE = 0.52 gC·m− 2·d− 1 and 0.48 gC·m− 2·d− 1, respectively.

Table 3 Annual and seasonal NEE derived from EC and each model during 2003 to 2007

The improved estimates in NEE_MuSo are attributed to the advancements in multi-layer simulation, and those in NEE_DA resulted in an improvement in soil respiration optimized by the soil temperature and water content for each given soil layer. The assimilation of daily multi-layer soil temperature and moisture data into Biome-BGC MuSo facilitated the daily running of the model by correcting it in real time.

Analysis of climatic and biophysical factors

Figures 7 and 8 provide three-dimensional graphs for ΔRMSE, and the averaged climatic and biophysical factors with a window length (WL) of 15-d. Most of the ΔRMSEET and ΔRMSEER values were positive, which illustrated that the assimilation promoted the performances of ER and ET, even under extreme climate conditions of low air temperature and PAR and little precipitation. However, even under suitable climatic conditions, negative values of ΔRMSENEE occurred frequently, which demonstrates that the performances of NEE were synthetically affected by aboveground and underground ecological processes.

Fig. 7
figure 7

Three-dimensional representation of ΔRMSE and climatic factors: a ET, b ER, c NEE

Fig. 8
figure 8

Three-dimensional representation of ΔRMSE and biophysical factors: a ET, b ER, c NEE

As shown in Fig. 8, the high values of ΔRMSE are related to suitable soil temperature and sufficient water conditions. This finding also proves a direct relationship between soil temperature and ER and between soil moisture and ET. LAI is an important biophysical parameter; its high value contributed to improvement of the assimilation scheme. This assimilation strategy appeared to be more suitable for a densely forest area. However, the effects of soil temperature, soil moisture, and LAI on ΔRMSENEE were not significant.

Discussion

The carbon and water fluxes were quantified by integrating the observations and Biome-BGC MuSo in this study, where the original model’s structural features were improved. For example, acclimation of autotrophic respiration was introduced, which facilitated more realistic modeling in terms of simulations related to climate-change. Notably, the soil flux module was improved by addition of the multi-layer soil module, and the observed soil parameters were assimilated into the model after using EnKF for error-related corrections. The improvements in the simulated ER, NEE, and ET were significant because the accurate soil temperature and moisture data were able to directly improve the soil respiration and transpiration values in the simulations. However, underestimations in winter remained for the carbon and water fluxes, indicating that parameter uncertainty in Biome-BGC MuSo requires further investigation. Calibration of the parameters of Biome-BGC in a previous study could serve as a reference for Biome-BGC MuSo (Yan et al. 2016).

Given the realities of global climate change, climate warming may accelerate the decomposition of soil carbon, and warming-induced carbon losses from soil may offset enhanced carbon absorption by vegetation (Yang et al. 2010). In addition to drought, anoxic stress is also considered in Biome-BGC MuSo. Because the study area has a temperate and moist climate, anoxic conditions in the Changbai Mountains caused by sufficiently high precipitation can influence soil processes such as SOM decomposition. Soil temperature is also the main determinant of ecosystem fluxes in the Changbai Mountains, although its effects usually occur within the top two layers including litter falls and humic substances) (Wang et al. 2016).

The integration of observed soil parameters and models is a possible strategy for enhancing the carbon and water fluxes. Several remote sensing soil products have emerged in recent years that provide possible data sources for data assimilation schemes over local and regional scales. For example, passive and active satellite microwave soil moisture products such as AMSR-E, SMOS, and SMAP are available online.

We suggest that different forest flux sites under varied climatic and biophysical conditions should be tested to evaluate the credibility of the assimilation scheme proposed in this study. However, the difficulties in stratified soil data acquisition and in building flux monitoring stations limit the expansion of these experiments. By using this assimilation strategy, analyses of climatic and biophysical conditions at the Changbai Mountains forest flux site can facilitate estimation of the carbon and water fluxes at arid or cold sites.

Thus far, we have assimilated daily remotely sensed surface soil temperature and moisture products with a spatial resolution of 1 km, supported by the National Basic Research Program of China (973 Program), into the calibrated Biome-BGC model in an attempt to improve carbon fluxes over the Greater Khingan range in Northeast China. The simulated annual NPPs from 2003 to 2015 from the assimilation scheme were then evaluated against dendrochronological regional measurements, which were collected from the comprehensive field experiments of 2013 and 2016. The above strategy highlights the possibility of regional simulation of forest carbon and water fluxes using soil parameters assimilation.

Conclusions

This study designed a data assimilation scheme using EnKF to improve simulations of carbon and water fluxes and to reduce errors by integrating observations multi-layer soil temperature and moisture. This method assimilated two data streams, from the observations and the model, to ensure that the output behavior is consistent with the observations. Our results proved that soil temperature and moisture are crucial drivers for soil respiration and transpiration, which are closely related to carbon and water fluxes. After the assimilation, the simulated seasonal patterns showed better matches with the flux measurements, and the overall performance improved significantly compared with those of Biome-BGC and Biome-BGC MuSo.

The climatic and biophysical analyses demonstrated that the assimilation scheme is appropriate for application to various forest ecosystems, although it is more effective in densely forested areas. Although the assimilation scheme helped to improve ET and ER, it had a marginal effect on NEE.